Production-ready algorithm: Take random sentences as an example

This article describes the author's algorithm learning experience, because the author is not familiar with the algorithm, please feel free to correct me if there are any errors.

Random sentence question

I have seen an algorithm problem these days: generate random sentences. It is speculated that this question may come from the work practice of a search engine company?

The meaning of the question is: input a sentence (ie a sequence of words) and an expected length. Words need to be randomly selected from the input sentence to form a new sentence (just a sequence of words, regardless of grammar), and the number of words in the new sentence is equal to the expected length. The specific rules for random extraction are: randomly select a word as the first word and add it to a new sentence. Whenever a word is added to a new sentence, for all occurrences of this word in the input sentence, randomly select the next word in one occurrence (ie The next word) is added to a new sentence, this addition will cause the addition of a new word, and so on, until the new sentence reaches the expected length, the new sentence is output. That is to implement a function String generateSentence(String sentence, int length).

For example, input sentence="this is a sentence it is not a good one and it is also bad" length=5, if "sentence" is selected as the first word, there is only one next word, "it", then "it" The next word of "can be selected in two positions, but both positions are just "is", and the next word of "is" can be "not" or "also". If you choose "not", then There is only one choice "a" for the next word. At this time, 5 words have been collected and the new sentence "sentence it is not a" is obtained.

The above is the basic version of this question, it also has an enhanced version: make a modification to the existing rules, and then give an input m, for the first time randomly add m consecutive words of the input sentence to the new sentence, and then add it each time For a word, every time a word is added to a new sentence, it is no longer to find the position of the word in the input sentence, but for all the positions of the "phrase formed by the last m words" of the new sentence in the input sentence. Randomly select a word next to it appearing in the position to add to the new sentence. The basic version can be regarded as a special case of m=1.

Basic version of the solution

The solution basically follows this structure:

Convert the input sentence into a sequence of words (String -> String[], namely tokenize), and prepare a memory buffer for the new sentence.
Randomly select the first word and put it into the memory buffer.
Search all positions of the last selected word in the input sentence in a loop, randomly select a position, and put the next word into the memory buffer until the new sentence reaches the expected length, then stop and output the new sentence.

Note that the input sentence is regarded as a loop. If the end word of the input sentence is searched, the next word is the first word of the sentence, that is, the sentence end jumps back to the beginning of the sentence. In order to do the loop jump processing when calculating the next neighbor position, I originally wrote this (subtract the total sentence length once when crossing the boundary):

int nextPos(int pos, String[] words) {
  int next = pos + 1;
  if (next >= words.length) {
    next -= words.length;
  }
  return next;
}

Later, I heard that it is enough to take the modulus, namely:

int nextPos(int pos, String[] words) {
  return (pos + 1) % words.length;
}

For the basic version, subtraction is OK, but for the enhanced difficulty version? If m>=the length of the input sentence (although this input is not very reasonable), the lower neighbor position may cross the loop twice, and it needs to be subtracted twice to be correct. Therefore, the modulus is indeed more concise and elegant, and more sound.

Now introduce three algorithm implementations. The first is a brute force search method, which traverses the input sentence every time and finds all the positions of a given word. The second is the optimized search method, which optimizes the brute force search method. Instead of finding all positions of a given word each time, it starts from a random position and finds a certain position of a given word. The third is the hash index method, which pre-builds a hash table for the input sentence and can quickly find the next word corresponding to any word.

It is important to state in advance that the random selection of optimized search methods can sometimes be very unfair. For example, if you have two consecutive identical words such as "go go", a left-to-right search will almost always select the one on the left. This problem can be solved by randomly determining the search direction each time (from left to right or from right to left). But what about 3 consecutive identical words like "go go go"? The probability of the word in the middle being selected is very small. The key to solving this problem is to traverse the array in a non-linear manner. We will provide this algorithm in the future. (Update on August 15, 2021: The algorithm is provided in the "Appendix: Nonlinear Search Method" at the end of the article.)

Implementation code of brute force search method

class Solution1 {
  public static void main(String[] args) {
    System.out.println(generateSentence(
      "this is a sentence it is not a good one and it is also bad", 5));
  }

  public static String generateSentence(String sentence, int length) {
    String[] words = sentence.split(" ");
    if (words.length == 0) {
      return sentence;
    }

    List<String> newWords = new ArrayList<>();

    int randIdx = new Random().nextInt(words.length);
    String prev = words[randIdx];
    newWords.add(prev);

    for (int i = 1; i < length; i++) {
      List<Integer> nexts = searchNexts(words, prev);
      int chosen = new Random().nextInt(nexts.size());
      String chosenWord = words[nexts.get(chosen)];
      newWords.add(chosenWord);
      prev = chosenWord;
    }

    return String.join(" ", newWords);
  }

  // 在words中找到prev的所有出现位置，收集其下邻位置(nexts)
  private static List<Integer> searchNexts(String[] words, String prev) {
    List<Integer> nexts = new ArrayList<>();
    for (int i = 0; i < words.length; i++) {
      if (words[i].equals(prev)) {
        nexts.add(nextPos(i, words));
      }
    }

    return nexts;
  }

  private static int nextPos(int pos, String[] words) {
    return (pos + 1) % words.length;
  }
}

Optimize the implementation code of search method

class Solution1 {
  public static void main(String[] args) {
    System.out.println(generateSentence(
      "this is a sentence it is not a good one and it is also bad", 5));
  }

  public static String generateSentence(String sentence, int length) {
    String[] words = sentence.split(" ");
    if (words.length == 0) {
      return sentence;
    }

    List<String> newWords = new ArrayList<>();

    int randIdx = new Random().nextInt(words.length);
    String prev = words[randIdx];
    newWords.add(prev);

    for (int i = 1; i < length; i++) {
      String chosenWord = randomNextWord(words, prev);
      newWords.add(chosenWord);
      prev = chosenWord;
    }

    return String.join(" ", newWords);
  }

  private static String randomNextWord(String[] words, String prev) {
    int randomBeginIndex = new Random().nextInt(words.length);

    for (int _i = 0; _i < words.length; _i++) {
      int idx = (randomBeginIndex + _i) % words.length;
      if (words[idx].equals(prev)) {
        return words[nextPos(idx, words)];
      }
    }

    return null;
  }

  private static int nextPos(int pos, String[] words) {
    return (pos + 1) % words.length;
  }
}

Implementation code of hash index method

class Solution1 {
  public static void main(String[] args) {
    System.out.println(generateSentence(
      "this is a sentence it is not a good one and it is also bad", 5));
  }

  public static String generateSentence(String sentence, int length) {
    String[] words = sentence.split(" ");
    if (words.length == 0) {
      return sentence;
    }

    Table table = new Table(words);

    List<String> newWords = new ArrayList<>();

    int randIdx = new Random().nextInt(words.length);
    String prev = words[randIdx];
    newWords.add(prev);

    for (int i = 1; i < length; i++) {
      String chosenWord = table.randomNextWord(prev);
      newWords.add(chosenWord);
      prev = chosenWord;
    }

    return String.join(" ", newWords);
  }

  private static int nextPos(int pos, String[] words) {
    return (pos + 1) % words.length;
  }

  static class Table {
    private Map<String, List<String>> map = new HashMap<>();

    Table(String[] words) {
      for (int i = 0; i < words.length; i++) {
        List<String> nexts = map.computeIfAbsent(words[i], key -> new ArrayList<>());
        nexts.add(words[nextPos(i, words)]);
      }
    }

    String randomNextWord(String word) {
      List<String> nexts = map.get(word);
      int chosen = new Random().nextInt(nexts.size());
      return nexts.get(chosen);
    }
  }
}

Hash index method introduces a Table structure to encapsulate the logic of table building and query, which is not only more efficient in theory, but also easier to understand the code.
How to analyze the complexity of the algorithm? Set n=words.length and l=length, then:

The brute force search method mainly spends time traversing and searching n words each time in n iterations, with a time complexity of O(n^2 + l) and a space complexity of O(l).
The optimized search method mainly spends time traversing the search average n/(k+1) words each time in n iterations, k is the number of occurrences of the searched word (k=1 when it only appears once, then the average traversal is n/ 2 words, that is, half a sentence), the time complexity is O(n^2/k + l), and the space complexity is O(l). k is generally small, so it is only a constant level optimization, and the actual effect is to be tested.
The hash index method mainly spends time building a hash table of size n, with a time complexity of O(n+l) and a space complexity of O(n+l). In fact, if the same input is called repeatedly, since the hash table can be reused, its time cost can be neglected after amortization, that is, O(l).

The solution of the enhanced version of the difficulty

The solution basically follows this structure:

Convert the input sentence into a sequence of words (String -> String[], or tokenize), and prepare a memory buffer for the new sentence.
Randomly select m consecutive words and put them into the memory buffer.
Iteratively search for all positions of the new sentence "the phrase formed by the last m words" in the input sentence, randomly select a position, put the next word into the memory buffer, and stop and output the new sentence until the new sentence reaches the expected length sentence.

If m is used as the variable name of the input parameter, wouldn't it be meaningless? According to the semantics of m, this variable can be named lookBack, that is, "the number of items viewed in the backward direction" (incidentally, the syntax analysis of the compiler has a concept called lookAhead, that is, the "number of items viewed in the forward direction").

Now, we have found that we can refactor the code to make it safer and more elegant: when dealing with loop jumps, the nextPos modulus is afraid of the array out of bounds. If this is the case, add a security line of defense to array access, and take all array offsets. Isn't it all right?
Design this reusable array access helper function to replace the nextPos function:

String safeGetWord(String[] words, int index) {
  return words[index % words.length];
}

Now implement the three algorithms again.

Implementation code of brute force search method

class Solution2 {
  public static void main(String[] args) {
    System.out.println(generateSentence(
      "this is a sentence it is not a good one and it is also bad", 5, 2));
  }

  public static String generateSentence(String sentence, int length, int lookBack) {
    String[] words = sentence.split(" ");
    if (words.length == 0) {
      return sentence;
    }
    if (lookBack > length) {
      throw new IllegalArgumentException("lookBack exceeds length");
    }

    List<String> newWords = new ArrayList<>();

    generateLeading(newWords, words, lookBack);

    for (int _i = lookBack; _i < length; _i++) {
      List<String> prevs = newWords.subList(newWords.size() - lookBack, newWords.size());
      List<Integer> nexts = searchNexts(words, prevs);

      int chosen = new Random().nextInt(nexts.size());
      String chosenWord = safeGetWord(words, nexts.get(chosen));
      newWords.add(chosenWord);
    }

    return String.join(" ", newWords);
  }

  // 生成最初的几个单词
  private static void generateLeading(List<String> newWords, String[] words, int lookBack) {
    int randIdx = new Random().nextInt(words.length);
    for (int i = 0; i < lookBack; i++) {
      newWords.add(safeGetWord(words, randIdx + i));
    }
  }

  private static List<Integer> searchNexts(String[] words, List<String> prevs) {
    List<Integer> nexts = new ArrayList<>();

    for (int i = 0; i < words.length; i++) {
      // 试匹配一个词组
      int matchedCount = 0;

      for (int j = 0; j < prevs.size(); j++) {
        if (!safeGetWord(words, i + j).equals(prevs.get(j))) {
          matchedCount = -1;
          break;
        }

        matchedCount++;
      }

      if (matchedCount == prevs.size()) {
        nexts.add(i + prevs.size());
      }
    }

    return nexts;
  }

  private static String safeGetWord(String[] words, int index) {
    return words[index % words.length];
  }
}

Optimize the implementation code of search method

class Solution2 {
  public static void main(String[] args) {
    System.out.println(generateSentence(
      "this is a sentence it is not a good one and it is also bad", 5, 2));
  }

  public static String generateSentence(String sentence, int length, int lookBack) {
    String[] words = sentence.split(" ");
    if (words.length == 0) {
      return sentence;
    }
    if (lookBack > length) {
      throw new IllegalArgumentException("lookBack exceeds length");
    }

    List<String> newWords = new ArrayList<>();

    generateLeading(newWords, words, lookBack);

    for (int ig = lookBack; ig < length; ig++) {
      List<String> prevs = newWords.subList(newWords.size() - lookBack, newWords.size());
      String chosenWord = randomNextWord(words, prevs);
      newWords.add(chosenWord);
    }

    return String.join(" ", newWords);
  }

  // 生成最初的几个单词
  private static void generateLeading(List<String> newWords, String[] words, int lookBack) {
    int randIdx = new Random().nextInt(words.length);
    for (int i = 0; i < lookBack; i++) {
      newWords.add(safeGetWord(words, randIdx + i));
    }
  }

  private static String randomNextWord(String[] words, List<String> prevs) {
    int randomBeginIndex = new Random().nextInt(words.length);

    for (int _i = 0; _i < words.length; _i++) {
      int idx = randomBeginIndex + _i;
      // 试匹配一个词组
      int matchedCount = 0;

      for (int j = 0; j < prevs.size(); j++) {
        if (!safeGetWord(words, idx + j).equals(prevs.get(j))) {
          matchedCount = -1;
          break;
        }

        matchedCount++;
      }

      if (matchedCount == prevs.size()) {
        return safeGetWord(words, idx + prevs.size());
      }
    }

    return null;
  }

  private static String safeGetWord(String[] words, int index) {
    return words[index % words.length];
  }
}

Implementation code of hash index method

class Solution2 {
  public static void main(String[] args) {
    System.out.println(generateSentence(
      "this is a sentence it is not a good one and it is also bad", 5, 2));
  }

  public static String generateSentence(String sentence, int length, int lookBack) {
    String[] words = sentence.split(" ");
    if (words.length == 0) {
      return sentence;
    }
    if (lookBack > length) {
      throw new IllegalArgumentException("lookBack exceeds length");
    }

    Table table = new Table(words, lookBack);

    List<String> newWords = new ArrayList<>();

    generateLeading(newWords, words, lookBack);

    for (int ig = lookBack; ig < length; ig++) {
      Phrase phrase = new Phrase(newWords.subList(newWords.size() - lookBack, newWords.size()));
      String chosenWord = table.randomNextWord(phrase);
      newWords.add(chosenWord);
    }

    return String.join(" ", newWords);
  }

  private static void generateLeading(List<String> newWords, String[] words, int lookBack) {
    int randIdx = new Random().nextInt(words.length);
    for (int i = 0; i < lookBack; i++) {
      newWords.add(safeGetWord(words, randIdx + i));
    }
  }

  private static String safeGetWord(String[] words, int index) {
    return words[index % words.length];
  }

  static class Phrase {
    private List<String> elements;

    Phrase(List<String> elements) {
      Objects.requireNonNull(elements);
      // TODO 应当拷贝一份以确保不可变性
      this.elements = elements;
    }

    Phrase(int lookBack, String[] words, int beginIndex) {
      elements = new ArrayList<>(lookBack);
      for (int j = 0; j < lookBack; j++) {
        elements.add(safeGetWord(words, beginIndex + j));
      }
    }

    @Override
    public boolean equals(Object o) {
      if (this == o) return true;
      if (!(o instanceof Phrase)) return false;
      Phrase phrase = (Phrase) o;
      return elements.equals(phrase.elements);
    }

    @Override
    public int hashCode() {
      return elements.hashCode();
    }
  }

  static class Table {
    private Map<Phrase, List<String>> map = new HashMap<>();

    Table(String[] words, int lookBack) {
      for (int i = 0; i < words.length; i++) {
        Phrase phrase = new Phrase(lookBack, words, i);

        List<String> nexts = map.computeIfAbsent(phrase, key -> new ArrayList<>());
        nexts.add(safeGetWord(words, i + lookBack));
      }
    }

    String randomNextWord(Phrase phrase) {
      List<String> nexts = map.get(phrase);
      int chosen = new Random().nextInt(nexts.size());
      return nexts.get(chosen);
    }
  }
}

This time, the hash index method introduced a Phrase structure as the HashMap key. Why not directly use List<String> as the key? There are two reasons:

Readability is better.
The key should be immutable. If the key value written in the HashMap is different from the key value used in the query, the data will not be queried. Phrase is an immutable object that encapsulates List<String>.
Phrase's hashCode method can be modified, and it can be optimized to improve performance in the future. The current hashCode is implemented with List<String>, the hashCode calculation cost of the algorithm is O(mn), and m is lookBack. String caches the hashCode, so the main cost is to add the hashCode of each String. In fact, the performance is good.

How to analyze the complexity of the algorithm? Suppose n=words.length, l=length, m=lookBack, then:

The brute force search method has a time complexity of O(mn^2 + l) and a space complexity of O(l).
The optimized search method has a time complexity of O(mn^2/k + l) and a space complexity of O(l).
The hash index method mainly spends time building an index table of size n, with a time complexity of O(mn + l) and a space complexity of O(mn + l).

Towards production-ready

Has the above algorithm reached production ready? Let us examine.

Correctness

The optimized search method is not fair due to certain circumstances, and has a discount in meeting functional requirements. The hash index method is efficient, easy to understand, easy to implement correctly, and is expected to be suitable for use in a production environment.

Is the algorithm implemented correctly? Need to do a functional test. The algorithm is very suitable for unit testing, but it contains random numbers, how to reproduce it stably, more specifically, how to ensure that the boundary conditions are measured?
I won't put the code, just talk about the most important question: how to reproduce stably, and even ensure that the boundary conditions are measured.
The answer is simple: random numbers should be mocked. Extract the code used to generate random numbers into a function, and mock it to return a certain value in the unit test. This is because random properties do not need to be tested together, other properties need to be tested.

Randomness

Note that the generation of random numbers must introduce entropy to have sufficient randomness. Take Java's Random pseudo-random number as an example. If a Random object is shared throughout the whole process, it is just pseudo-random. If new Random() multiple times, since the interval time may fluctuate randomly, time entropy is introduced to obtain a true random number. You can also share a SecureRandom object throughout the entire process, which can provide true random numbers, but the speed is slightly slower.

performance

The hash index method is n times faster in theory, how fast is it in practice?
The actual performance should be tested. As the saying goes, run a benchmark.
We do a small performance test, and then a large performance test, and use the brute force search method as a benchmark to evaluate the performance.

Small performance test (test code omitted):
Repeat the example sentence 8 times to get a sentence that is 8 times longer, and the other parameters remain unchanged (length=5, lookBack=2). Warm up the algorithm 50,000 times, execute it again 200,000 times, and take the average of multiple tests. (200,000 times) The time spent is as follows:

Basic version:
1. Brute force search method: 1358ms
2. Optimized search method: 850ms
3. Hash index method: (Do not reuse the hash table) 1212ms, (Reuse the hash table) 699ms
Difficulty enhanced version:
1. Brute force search method: 1186ms
2. Optimized search method: 480ms
3. Hash index method: (do not reuse the hash table) 1032ms, (reuse the hash table) 382ms

The results are surprising, the hash index method (not reusing hash tables) is not significantly faster than the benchmark (brute force search method)! The optimized search rule performs very well, almost twice as fast!
It may be that the amount of data is too small to take advantage of the hash table, so try a larger amount of data!

Big performance test:
Increase the amount of data, rearrange the example sentences randomly, and repeat 100 times to get a sentence that is 100 times longer, length=100, lookBack=2. Warm up the algorithm 1 thousand times, execute it again 5 thousand times, and take the average of multiple tests.
The test code is more complicated, so the code provided is as follows:

public static void main(String[] args) {
  String sentence = "this is a sentence it is not a good one and it is also bad";
  sentence = times(sentence, 100);

  // 预热
  for (int i = 0; i < 1000; i++) {
    generateSentence(
      sentence, 100, 2);
  }

  // 测试
  long start = System.currentTimeMillis();
  for (int i = 0; i < 5000; i++) {
    generateSentence(
      sentence, 100, 2);
  }
  long cost = System.currentTimeMillis() - start;
  System.out.println(cost);
}

// 生成长句子
static String times(String input, int n) {
  List<String> buffer = new ArrayList<>();
  List<String> words = Arrays.asList(input.split(" "));
  for (int i = 0; i < n; i++) {
    Collections.shuffle(words);
    buffer.addAll(words);
  }
  return String.join(" ", buffer);
}

(5 thousand times) The time spent is as follows:

Basic version:
1. Brute force search method: 4600ms
2. Optimized search method: 450ms
3. Hash index method: (do not reuse the hash table) 500ms, (reuse the hash table) 405ms
Difficulty enhanced version:
1. Brute force search method: 10000ms
2. Optimized search method: 1000ms
3. Hash index method: (do not reuse the hash table) 1000ms (reuse the hash table) 420ms

Finally took advantage of the hash index method!
In this amount of data, the optimized search method is still no less than the hash index method, and the space complexity is better!
If this phenomenon can be explained theoretically, it would be even better. How to explain it? Because the input sentence is generated by repeating the example sentence 100 times, the k value of the optimized search method is higher, and of course the performance is very good. Then try to generate a large number of random words to reduce the value of k: the performance of the optimized search method is reduced by half, and the hash index method is not affected.

By the way, I tested the optimization of hashCode. If Phrase only calculates the hashCode of the first word, there is no improvement in performance when lookBack=2, and the performance is increased by 8% when lookBack=20.

in conclusion

The high efficiency, high stability, and easy understanding of the hash index method make it the first choice for the production environment. Pay attention to several issues: space complexity, hash table can only be reused for the same input, randomness, immutable key, and hashCode can be optimized.
The interestingness of the optimized search method makes it worthy of further study.

Appendix: Non-linear search method

As mentioned above:

Random selection of optimized search methods can sometimes be very unfair. For example, if you have two consecutive identical words such as "go go", a left-to-right search will almost always select the one on the left. This problem can be solved by randomly determining the search direction each time (from left to right or from right to left). But what about 3 consecutive identical words like "go go go"? The probability of the word in the middle being selected is very small.

The key to solving this problem is to traverse the array in a non-linear manner. In this way, consecutive identical words can be bypassed.
The algorithm must meet the following requirements:

When searching for the same word in succession, the middle or back word will not be blocked by the previous word.
Make sure to traverse the entire array (ie search space) for each search, so as not to find a word or the algorithm cannot stop.
Absolute fairness is not pursued here. In fact, the commonly used random numbers are not so fair. What we need is that every word has a considerable probability of being selected.

A simple non-linear traversal method is the "remainder hop traversal method". Take the remainder of 2 as an example: randomly select a starting point and a traversal direction (from left to right "→" or right to left "←") on the array; for each position on the array, the distance between it and the starting point Take the modulus of 2 (the value of the modulus can only be 0 or 1), that is, divide the array into two groups of odd and even, the modulus of 1 is the odd group, and the modulus of 0 is the even group; traverse the even group in sequence, and then traverse in sequence Odd group, so the entire array is traversed. The above algorithm can also be called "odd-even skipping traversal method", which supports up to 3 consecutive identical words. If the remainder of 3 is used to divide the array into 3 groups, a maximum of 5 consecutive identical words are supported.

Is there a better way? Can it support any number of consecutive identical words? Yes, "logarithmic step traversal method". Take 2 as an example. Do you know how to store a binary tree with an array? Imitate this principle, but use a different calculation formula: take the logarithm of the array length to base 2, and only retain the integer part (not the decimal part) of the result; randomly choose a starting point on the array (optional or not the traversal direction) ; For each position in the array, subtract it from the maximum power of 2 that it can subtract, and divide it into corresponding groups according to the remainder (for example, 9 can only be subtracted from 8, 8 is 2 to the 3rd power, 9-8= 1 is divided into 1 group) (for example, 15 can only be subtracted from 8, 15-8=7, so it is divided into 7 groups); for each group, traverse its members in order, and traverse all groups to traverse the entire array. For example, if the array length is 15 and the starting point is 1, the traversal sequence can be: [1, 2, 4, 8], [3, 5, 9], [6, 10], [7, 11], 12 , 13, 14, 0. The tail 5 numbers degenerate to a linear traversal, which is actually the worst case of the algorithm, that is, the length of the array is the power of 2 minus 1. Don't worry too much. Due to the random selection of the starting point, the probability that consecutive identical words are at the end of the linear traversal interval is very small (for n=15 is 1/3, for larger n is about 1/√n, that is, it is small enough to be ignored), that is, most of the time it is Can be traversed non-linearly.

The code is provided below. Note that linear traversal should be used when the array is too small to jump.

The main implementation code of the nonlinear search method

private static String randomNextWord(String[] words, List<String> prevPhrase) {
  int foundIndex = nonlinearSearch(words, prevPhrase);
  if (foundIndex >= 0) {
    return safeGet(words, foundIndex + 1);
  } else {
    return null;
  }
}

private static <T> T safeGet(T[] array, int index) {
  return array[index % array.length];
}

private static int random(int bound) {
  return new Random().nextInt(bound);
}

private static <T> int nonlinearSearch(T[] array, List<T> phrase) {
  int randomBeginIndex = random(array.length);

  int maxBatch = floorLog2(array.length) - 1;

  if (maxBatch <= 0) {
    return linearSearch(array, phrase);
  }

  int foundIndex = -1;
  for (int batch = 0; batch < maxBatch && foundIndex < 0; batch++) {
    foundIndex = nonlinearSearchBatch(array, phrase, randomBeginIndex, batch);
  }
  return foundIndex;
}

private static <T> int nonlinearSearchBatch(T[] array, List<T> phrase, int beginIndex, int batch) {
  for (int i = 1; i <= array.length; i *= 2) {
    if (batch < i) {
      int index = i + beginIndex + batch;
      if (match(array, index, phrase)) {
        return index;
      }
    }
  }

  return -1;
}

private static int floorLog2(int n) {
  int r = 0;
  while ((n >>= 1) > 0) {
    r++;
  }
  return r;
}

private static <T> int linearSearch(T[] array, List<T> phrase) {
  int randomBeginIndex = random(array.length);

  for (int _i = 0; _i < array.length; _i++) {
    int index = randomBeginIndex + _i;
    if (match(array, index, phrase)) {
      return index;
    }
  }

  return -1;
}

private static <T> boolean match(T[] array, int beginIndex, List<T> phrase) {
  int matchedCount = 0;

  for (int j = 0; j < phrase.size(); j++) {
    if (!safeGet(array, beginIndex + j).equals(phrase.get(j))) {
      return false;
    }
    matchedCount++;
  }

  return matchedCount == phrase.size();
}

"Three Body" mentioned that from the high-dimensional world, you can visit any place in the low-dimensional world without hindrance. The non-linear search method can also be called the high-dimensional search method. The one-dimensional sequence is accessed from the two-dimensional tree, so that no words will be blocked, and then the randomness obtained from the random start is used to achieve the demand.

Production-ready algorithm: Take random sentences as an example

Random sentence question

Basic version of the solution

Implementation code of brute force search method

Optimize the implementation code of search method

Implementation code of hash index method

The solution of the enhanced version of the difficulty

Implementation code of brute force search method

Optimize the implementation code of search method

Implementation code of hash index method

Towards production-ready

Correctness

Randomness

performance

in conclusion

Appendix: Non-linear search method

The main implementation code of the nonlinear search method

sorra

引用和评论

Orca ORM的开发状态和技术路线

DeepSeek(私有化)+IDEA+Dify+微信搭建AI助手保姆级教程

大模型中的Token究竟是什么？从原理到作用深度解析

功率器件热设计基础（五）——功率半导体热容

前端算法题

工业人工智能白皮书2025年：边缘AI驱动，助力新质生产力报告汇总PDF洞察（附原数据表）

【专题】2025年我国机器人产业发展形势展望：人形机器人量产及商业化关键挑战报告汇总PDF洞察（附原数据表）

Production-ready algorithm: Take random sentences as an example

Random sentence question

Basic version of the solution

Implementation code of brute force search method

Optimize the implementation code of search method

Implementation code of hash index method

The solution of the enhanced version of the difficulty

Implementation code of brute force search method

Optimize the implementation code of search method

Implementation code of hash index method

Towards production-ready

Correctness

Randomness

performance

in conclusion

Appendix: Non-linear search method

The main implementation code of the nonlinear search method

sorra

引用和评论

Orca ORM的开发状态和技术路线

DeepSeek(私有化)+IDEA+Dify+微信 搭建AI助手保姆级教程

大模型中的Token究竟是什么？从原理到作用深度解析

功率器件热设计基础（五）——功率半导体热容

前端算法题

工业人工智能白皮书2025年：边缘AI驱动，助力新质生产力报告汇总PDF洞察（附原数据表）

【专题】2025年我国机器人产业发展形势展望：人形机器人量产及商业化关键挑战报告汇总PDF洞察（附原数据表）

DeepSeek(私有化)+IDEA+Dify+微信搭建AI助手保姆级教程