Big talk search

Searching generally refers to enumerating in a limited state space, and finding the solution or the number of solutions that meet the conditions by exhausting all the possibilities. According to the different search methods, the search algorithm can be divided into DFS, BFS, A* algorithm and so on. Here only introduces DFS and BFS, and a technique that occurs on DFS-backtracking.

The coverage of search questions is very extensive, and it also occupies a high proportion of algorithmic questions. I even mentioned in a public speech that the front-end algorithm interview, especially the domestic company .

There are many subtopics in the search topic, and the well-known BFS and DFS are only the special basic content. In addition, there are status recording and maintenance, pruning, China Unicom components, topological sorting and so on. I will introduce these contents to you one by one here.

In addition, even if we only consider the two basic algorithms of DFS and BFS, there are many tricks that can be played. For example, the two-way search of BFS, such as the first, middle and last order of DFS, iterative deepening, and so on.

Regarding search, we have already introduced it in the binary tree section. The search here is actually a further generalization. The data structure is no longer limited to the aforementioned arrays, linked lists or trees. And it extends to such as two-dimensional arrays, multi-trees, graphs, etc. But the core is still the same, only the data structure has changed.

What is the core of search?

In fact, the essence of searching for the topic is to map the states in the topic to points in the graph, and map the connections between states to edges in the graph. The state space is constructed according to the topic information, and then the state space is traversed. The traversal process needs to record and maintain the state, and improve the search efficiency through pruning and data structure.

The data structure of the state space will lead to different algorithms. For example, searching for arrays is different from searching for trees and graphs.

Again, the arrays, trees, and graphs I'm talking about here are state space , not the data structure given by the title. For example, the title gives an array and lets you find the search subset of the array. Although the title gives a linear data structure array, in fact we are searching for a non-linear data structure like a tree. This is because the state space .

For search issues, what information is our core concern? How to calculate it? This is also the core concern of the search article. Many materials on the market are not very detailed. There are many indicators that need to be paid attention to at the core of search, such as the depth of the tree, the DFS order of the graph, the distance between two points in the graph, and so on. These indicators are indispensable to complete advanced algorithms, and these indicators can be achieved through some classic algorithms . This is why I always emphasize that must first learn the basic data structure and algorithm .

However, it is not easy to say that these narrations are complete, so that it may take a lot of time to write them completely, so I have not done anything to write them.

In addition, other data structures can be regarded as special cases of graphs. Therefore, it is easy to extend the basic idea of transparent graphs to other data structures, such as trees. Therefore, I intend to explain around the graph, and gradually concretize it to other special data structures, such as trees.

State space

Conclusion first: state space is actually a graph structure. The nodes in the graph represent the states, and the edges in the graph represent the connections before the states. This connection is the various relations given in the title.

The state space of search questions is usually non-linear. For example, the example mentioned above: find a subset of an array. The state space here is actually various combinations of arrays.

For this problem, a feasible way to divide the state space is:

  • A subset of length 1
  • A subset of length 2
  • 。。。
  • A subset of length n (where n is the length of the array)

And how to determine all the above subsets.

A feasible solution can be determined one by one in a manner similar to divide and conquer.

For example, we can:

  • First determine what the first number of a certain seed set is
  • And then determine what the second number is
  • 。。。

How to determine the first number and the second number. . . What?

brute force enumerates all possibilities.

This is the core of the search problem, the others are auxiliary, so please remember this sentence.

The so-called brute force enumeration of all possible here is to try all possible numbers in the array.

  • For example, what is the first number? Obviously it may be any item in the array. ok, we enumerate n cases.
  • What about the second number? Obviously it can be any number other than the number selected above. ok, we enumerate n-1 cases.

Based on this, you can draw the following decision tree.

(The figure below describes the part of the decision-making process for an array of length 3. The number in the tree node represents the index. That is to say, there are three choices for the first number, and the second number will be changed according to the last choice. The remaining two options)

Animated picture demonstration of the decision-making process:

搜索-决策树.svg

Some search algorithms are based on this simple idea, the essence is to simulate this decision tree . There are actually many interesting details, which we will explain in more detail later. Now everyone only needs to solution space is and how to traverse the solution space. I will continue to deepen this concept later on

Here everyone just needs to remember that the state space is a graph, and constructing a state space is constructing a graph. How to build it? is described according to the title.

DFS and BFS

DFS and BFS are the core of search and run through the search chapter, so it is necessary to explain them first.

DFS

The concept of DFS comes from graph theory, but there are still some differences between DFS in search and DFS in graph theory. DFS in search generally refers to violent enumeration through recursive functions.

If you do not use recursion, you can also use the stack to achieve. But the essence is similar.

First, state space of the title to a graph. The state is the node in the graph, and the connection between the states is the edge in the graph. Then DFS performs depth-first traversal on this graph. And BFS is similar, except that the traversal strategy is changed to breadth first , which is spread out layer by layer. So BFS and DFS are just two ways to traverse this state diagram. How to construct the state diagram is the key .

Essentially, traversing the above graph will generate a search tree . In order to , we need to record the visited node . These are all search algorithms of , so I won't repeat them later.

If you are traversing on a tree, there will be no loops, and naturally there is no need to avoid the generation of a loop , because the tree is essentially a simple acyclic graph.

Algorithm flow

  1. First put the root node into stack .
  2. Take the first node from stack and check if it is the target. If the target is found, the search ends and the result is returned. Otherwise, add one of its direct child nodes to 160b7025087919 stack .
  3. Repeat step 2.
  4. If there is no direct child node that has not been detected. Add the upper-level node to the stack .
    Repeat step 2.
  5. Repeat step 4.
  6. If stack is empty, it means that the entire graph has been checked—that is, there is no target to search for in the graph. End the search and return "No target found".
The stack here can be understood as a self-implemented stack, or it can be understood as a call stack

Algorithm template

Below we use recursion to complete DFS.

const visited = {}
function dfs(i) {
    if (满足特定条件){
        // 返回结果 or 退出搜索空间
    }

    visited[i] = true // 将当前状态标为已搜索
    for (根据i能到达的下个状态j) {
        if (!visited[j]) { // 如果状态j没有被搜索过
            dfs(j)
        }
    }
}

Common skills

Pre-order traversal and post-order traversal

The common forms of DFS are pre-order and post-order . The usage scenarios of the two are also completely different.

The above describes that the essence of search is to traverse in the state space, and the states in the space can be abstracted as points in the graph. Then if during the search process, the results of the current point need to depend on other nodes (in most cases, there will be dependencies), then the traversal order becomes important.

For example, the current node needs to rely on the calculation information of its child nodes, so it is necessary to use the post-order traversal bottom-up recursion. And if the current node needs to rely on the information of its parent node, it is not difficult to think of using pre-order traversal for top-down recursion.

For example, the depth of the calculation tree will be discussed below. Due to the depth of the tree, the recursive formula is: $f(x) = f(y) + 1$. Where f(x) represents the depth of node x, and x is a child node of y. Obviously, the base case of this recursive formula is that the depth of the root node is one. Through this base case, we can recursively find the depth of any node in the tree. Obviously, using the pre-order traversal top-down method of statistics is simple and straightforward.

Another example is the number of child nodes of the calculation tree described below. Since the recursive formula for the child nodes of the tree is: $f(x) = sum_{i=0}^{n}{f(a_i)}$ where x is a node in the tree, and $a_i$ is the value of the node in the tree Child node. The base case does not have any child nodes (that is, leaf nodes), at this time $f(x) = 1$. Therefore, we can use the post-order traversal bottom-up to complete the statistics of the number of child nodes.

Regarding which traversal method is used to analyze the recurrence relationship, I described it in detail in the subtopic "Simulation, Enumeration and Recursion" in the basic chapter of "91 Tianxue Algorithm". 91 students can directly view. Regarding the various tree traversal methods, I a detailed introduction in 160b7025087bed tree topic

Iterative deepening

Iterative deepening is essentially a feasible pruning. Regarding pruning, I will do more in the "Retrospect and Pruning" section later.

The so-called iterative deepening refers to the when the recursion tree is relatively deep, by setting the recursion depth threshold and exiting when the threshold is exceeded. this algorithm tells us that the answer does not exceed xxx , so that we can use xxx as the recursive depth threshold, so that not only will not miss the correct solution, but also effectively reduce unnecessary operations in extreme cases.

Specifically, we can use the top-down method to record the level of the recursive tree, which is the same as the method described above on how to calculate the depth of the tree. before the main logic to determine whether the current level exceeds the threshold .

Main code:

MAX_LEVEL = 20
def dfs(root, level):
    if level > MAX_LEVEL: return
    # 主逻辑
dfs(root, 0)

This technique is not common in actual use, but can play an unexpected role in some cases.

Two-way search

Sometimes the scale of the problem is large, and the direct search will time out. At this time, you can consider searching from the starting point to half of the problem size. Then save the state generated during this process. Next, the goal is to find a state that satisfies the condition in the stored intermediate state. In turn, the effect of reducing time complexity is achieved.

The above statement may not be easy to understand. Next, an example will help everyone understand.

Subject address

https://leetcode-cn.com/problems/closest-subsequence-sum/

Title description
给你一个整数数组 nums 和一个目标值 goal 。

你需要从 nums 中选出一个子序列,使子序列元素总和最接近 goal 。也就是说,如果子序列元素和为 sum ,你需要 最小化绝对差 abs(sum - goal) 。

返回 abs(sum - goal) 可能的 最小值 。

注意,数组的子序列是通过移除原始数组中的某些元素(可能全部或无)而形成的数组。

 

示例 1:

输入:nums = [5,-7,3,5], goal = 6
输出:0
解释:选择整个数组作为选出的子序列,元素和为 6 。
子序列和与目标值相等,所以绝对差为 0 。
示例 2:

输入:nums = [7,-9,15,-2], goal = -5
输出:1
解释:选出子序列 [7,-9,-2] ,元素和为 -4 。
绝对差为 abs(-4 - (-5)) = abs(1) = 1 ,是可能的最小值。
示例 3:

输入:nums = [1,2,3], goal = -7
输出:7
 

提示:

1 <= nums.length <= 40
-10^7 <= nums[i] <= 10^7
-10^9 <= goal <= 10^9

Ideas

It can be seen from the data range that the high probability of this problem is a solution of $O(2^m)$ time complexity, where m is half of nums.length.

why? First, if the length of the problem array is limited to less than or equal to 20, then the high probability is a solution of $O(2^n)$.

If you don’t know this, I suggest you take a look at this article https://lucifer.ren/blog/2020/12/21/shuati-silu3/ In addition, my brushing plug-in leetcode-cheatsheet also gives the time complexity Quick reference sheet for your reference.

Cut the 40 in half and you can get AC. In fact, the number 40 is a strong signal.

Back to the topic. We can use a binary bit to represent a subset of the original array nums, so that an array with a length of $2^n$ can describe all the subsets of nums, which is state compression. The general question data range is <= 20, you should think about it.

Here, if 40 is half and half, it is 20.

If you are not familiar with state compression, you can my article 160b7025087fdd What is DP? This problem solution takes you to get started

Next, we use dynamic programming to find the sum of all subsets.

It is not difficult to find out. The transfer equation is: dp[(1 << i) + j] = dp[j] + A[i] , where j is a subset of i, i and j are numbers, and the binary representation of i and j is the choice of nums.

The dynamic programming sub-set and code are as follows:

def combine_sum(A):
    n = len(A)
    dp = [0] * (1 << n)
    for i in range(n):
        for j in range(1 << i):
            dp[(1 << i) + j] = dp[j] + A[i]
    return dp

Next, we divide nums into two parts and calculate the subset sum:

n = len(nums)
c1 = combine_sum(nums[: n // 2])
c2 = combine_sum(nums[n // 2 :])

Where c1 is the sum of subsets of the first half of the array, and c2 is the sum of subsets of the second half.

The next question is transformed into: Find two numbers in the two arrays c1 and c2, the sum of which is closest to goal. And this is a very classic double pointer problem, the logic is similar to the sum of two numbers.

It's just that the sum of two numbers is one array picking two numbers, here is two arrays picking one number separately.

In fact, only one pointer is needed to point to the head of an array, and the other to point to the end of another array.

The code is not difficult to write:

def combine_closest(c1, c2):
    # 先排序以便使用双指针
    c1.sort()
    c2.sort()
    ans = float("inf")
    i, j = 0, len(c2) - 1
    while i < len(c1) and j >= 0:
        _sum = c1[i] + c2[j]
        ans = min(ans, abs(_sum - goal))
        if _sum > goal:
            j -= 1
        elif _sum < goal:
            i += 1
        else:
            return 0
    return ans

If you don't understand the above code, look at the sum of two numbers.

Code

Code support: Python3

Python3 Code:

class Solution:
    def minAbsDifference(self, nums: List[int], goal: int) -> int:
        def combine_sum(A):
            n = len(A)
            dp = [0] * (1 << n)
            for i in range(n):
                for j in range(1 << i):
                    dp[(1 << i) + j] = dp[j] + A[i]
            return dp

        def combine_closest(c1, c2):
            c1.sort()
            c2.sort()
            ans = float("inf")
            i, j = 0, len(c2) - 1
            while i < len(c1) and j >= 0:
                _sum = c1[i] + c2[j]
                ans = min(ans, abs(_sum - goal))
                if _sum > goal:
                    j -= 1
                elif _sum < goal:
                    i += 1
                else:
                    return 0
            return ans

        n = len(nums)
        return combine_closest(combine_sum(nums[: n // 2]), combine_sum(nums[n // 2 :]))

complexity analysis

Let n be the length of the array and m be $\frac{n}{2}$.

  • Time complexity: $O(m*2^m)$
  • Space complexity: $O(2^m)$

Recommended related topics:

What does this question have to do with two-way search?

Back to the beginning of my words: Sometimes the scale of the problem is very large, and the direct search will time out. At this time, you can consider searching from the starting point to half of the problem size. Then save the state generated during this process. Next, the goal is to find a state that satisfies the condition in the stored intermediate state. In turn, the effect of reducing time complexity is achieved.

Corresponding to this question, if we directly search for violence. That is to enumerate the sum of all the subsets, and then find the closest to the goal. The idea is simple and straightforward. But this will time out, then search halfway, and then save the state (corresponding to this question is saved in the dp array). The next question is transformed into the operation of two dp arrays. This algorithm essentially moves the constant term located at the exponent position to the coefficient position . This is a common two-way search, I will call it DFS two-way search. The purpose is to distinguish it from the subsequent BFS bidirectional search.

BFS

BFS is also a kind of algorithm in graph theory. Different from DFS, BFS uses a horizontal search method, from the initial state to the target state, and usually adopts a queue structure in the data structure.

Specifically, we continue to take out the state from the head of the queue, and then pushes all the new states generated by the decision corresponding to this state to the end of the , and repeat the above process until the queue is empty.

Note that there are two key points here:

  1. The decision corresponding to this state. In fact, this sentence refers to the edge of the graph in the state space, regardless of the DFS and BFS edges are determined. In other words, the decision is the same whether it is DFS or BFS. What is different? The difference is that the direction of decision-making is different.
  2. All new states are pushed to the end of the team. The above said that BFS and DFS are different in the direction of decision-making. This can be reflected through this action. Because all neighboring edges 160b702508859c of the current point in the state space of placed at the end of the queue. Due to the first-in-first-out feature of the queue, the current point will not continue to expand until the neighbor access of the current point is completed. You can compare this with DFS.

The simplest BFS adds one step each time it expands a new state, approaching the answer step by step in this way. In fact, it is equivalent to performing BFS on a graph with a weight of 1. Due to the and the binary nature of , it is the minimum number of steps when the target state is first taken out. Based on this feature, BFS is suitable for solving some minimum operation of .

Regarding monotonicity and duality, I will explain later in the comparison between BFS and DFS.

In the previous DFS section, it was mentioned that needs to record and maintain the status no matter what the search is. One of them is the node access status to prevent the generation of the ring . In BFS, we often use it to find the shortest distance of a point. It is worth noting that sometimes we use a hash table dist to record the distance from the source point to other points in the graph. This dist can also serve as the function to prevent ring generation . This is because after reaching a point for the first time reaches this point again must be greater than when it reaches 160b7025088653. Use this point to know whether it is the first time. Visited.

Algorithm flow

  1. First put the root node in the queue.
  2. Take the first node from the queue and check whether it is the target.

    • If the target is found, the search ends and the result is returned.
    • Otherwise, all of its direct child nodes that have not been checked are added to the queue.
  3. If the queue is empty, it means that the entire picture has been checked-that is, there is no target to be searched in the picture. End the search and return "target not found".
  4. Repeat step 2.

Algorithm template

const visited = {}
function bfs() {
    let q = new Queue()
    q.push(初始状态)
    while(q.length) {
        let i = q.pop()
        if (visited[i]) continue
        for (i的可抵达状态j) {
            if (j 合法) {
                q.push(j)
            }
        }
    }
    // 找到所有合法解
}

Common skills

Two-way search
Question address (126. Word Solitaire II)

https://leetcode-cn.com/problems/word-ladder-ii/

Title description
按字典 wordList 完成从单词 beginWord 到单词 endWord 转化,一个表示此过程的 转换序列 是形式上像 beginWord -> s1 -> s2 -> ... -> sk 这样的单词序列,并满足:

每对相邻的单词之间仅有单个字母不同。
转换过程中的每个单词 si(1 <= i <= k)必须是字典 wordList 中的单词。注意,beginWord 不必是字典 wordList 中的单词。
sk == endWord

给你两个单词 beginWord 和 endWord ,以及一个字典 wordList 。请你找出并返回所有从 beginWord 到 endWord 的 最短转换序列 ,如果不存在这样的转换序列,返回一个空列表。每个序列都应该以单词列表 [beginWord, s1, s2, ..., sk] 的形式返回。

 

示例 1:

输入:beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
输出:[["hit","hot","dot","dog","cog"],["hit","hot","lot","log","cog"]]
解释:存在 2 种最短的转换序列:
"hit" -> "hot" -> "dot" -> "dog" -> "cog"
"hit" -> "hot" -> "lot" -> "log" -> "cog"


示例 2:

输入:beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
输出:[]
解释:endWord "cog" 不在字典 wordList 中,所以不存在符合要求的转换序列。


 

提示:

1 <= beginWord.length <= 7
endWord.length == beginWord.length
1 <= wordList.length <= 5000
wordList[i].length == beginWord.length
beginWord、endWord 和 wordList[i] 由小写英文字母组成
beginWord != endWord
wordList 中的所有单词 互不相同
Ideas

This question is the idiom solitaire game that we play daily. That is, let you start from beginWord, the endWord of Solitaire. Allows you to find shortest Solitaire way, if there is more, then full return .

It is different from the beginning and the end of the idiom solitaire. What this kind of solitaire needs is next word is different from the previous word

We can abstract the problem: is to construct a graph of size n. Each point in the graph represents a word. Our goal is to find a shortest path from node beginWord to node endWord.

This is an out-and-out BFS problem on the picture. It can be easily solved by applying the above problem-solving template. The only thing to pay attention to is builds the graph . Furthermore, it is constructs the edge .

According to the conversion rule of title information: Only a single letter is different between each pair of adjacent words . It is not difficult to know that if two words differ only by a single letter, means that there is an edge between the two.

Knowing this, we can construct the adjacency matrix.

Core code:

neighbors = collections.defaultdict(list)
for word in wordList:
    for i in range(len(word)):
        neighbors[word[:i] + "*" + word[i + 1 :]].append(word)

The graph is constructed. All BFS has to do is to clarify the starting point and ending point. For this question, the starting point is beginWord and the ending point is endWord.

Then we can join beginWord into the team. Keep doing BFS on the map until the first time you encounter endWord.

Applying the above BFS template, it is not difficult to write the following code:

Here I used cost instead of visitd, the purpose is to let everyone see a variety of writing methods. The following optimized solution will use visited to record.

class Solution:
    def findLadders(self, beginWord: str, endWord: str, wordList: List[str]) -> List[List[str]]:
        cost = collections.defaultdict(lambda: float("inf"))
        cost[beginWord] = 0
        neighbors = collections.defaultdict(list)
        ans = []

        for word in wordList:
            for i in range(len(word)):
                neighbors[word[:i] + "*" + word[i + 1 :]].append(word)
        q = collections.deque([[beginWord]])

        while q:
            path = q.popleft()
            cur = path[-1]
            if cur == endWord:
                ans.append(path.copy())
            else:
                for i in range(len(cur)):
                    for neighbor in neighbors[cur[:i] + "*" + cur[i + 1 :]]:
                        if cost[cur] + 1 <= cost[neighbor]:
                            q.append(path + [neighbor])
                            cost[neighbor] = cost[cur] + 1
        return ans

When the end point can be searched backwards, we can also try two-way BFS. The more essential point is: If the edges of the state space you construct are bidirectional, then you can use bidirectional BFS.

It is similar to the two-way search idea of DFS. We only need to use two queues to store the nodes whose starting point and ending point are expanded respectively (I call them the starting point set and the ending point set). When the start point and the end point meet at a certain moment, it means that a path from the start point to the end point has been found, and the path length is the sum of the path lengths of the two queues.

The above is the general idea of two-way search. It is like this to show with a graph:

As shown in the figure above, we start the search from the starting point and the focus point (A and Z) respectively. If the expanded state of the starting point and the expanded state of the end point overlap (essentially the elements in the queue overlap), then we know a point from node to end point The shortest path.

Animation demo:

双向搜索.svg

Seeing this, it is necessary to pause and insert a few words.


Why is the two-way search fast? Will it be faster under any circumstances? Then why not use two-way search at all? What are the conditions of use?

We answer one by one.

  • Why is bidirectional search faster? From the above figure, we find that there are usually fewer edges and fewer data in the queue at the beginning. As the search progresses, the search tree becomes larger and larger, and the number of nodes in the queue increases . Similar to the above two-way search, this growth rate is exponential in many cases, and the two-way search can move the constant coefficient of the exponent to the polynomial coefficient . If you don't use two-way search, the search tree looks like this:

It can be seen that the search tree is so large that I can't draw many points, so I have to use "..." to indicate.

  • When is it faster? Compared to one-way search, two-way search is usually faster. Of course, there are exceptions. For an extreme example, if there is only one path from the start point to the end point, the results will be the same regardless of whether the one-way search or the two-way search is used.

As shown in the figure, it is the same whether one-way search or two-way search is used.

  • Why not use two-way search at all? In fact, when doing the questions, I suggest that you try to use one-way search as much as possible, because it is more node-oriented, and most of them can pass all the test cases. Unless you anticipate that it may time out or find a timeout after submission, try to use two-way search.
  • What are the conditions of use? As mentioned earlier: "When the end point can be searched backwards, you can try two-way BFS. The more essential point is: If the edge of the state space you construct is two-way, then you can use two-way BFS. "

Let us continue back to this question. In order to be able to judge whether the two converge, we can use two hashSets to store the start point set and the end point set respectively. When a node appears in both the starting point and the ending point, it means that there has been an intersection.

In order to save the amount of code and space consumption, I did not use the above queue, but directly used the hash table instead of the queue. The key to the feasibility of this approach is still the binary and monotonic queue mentioned above.

Because is out of the queue before the new round of , the weights in the queue are all the same. So it doesn't matter if you traverse from left to right or from right to left, or even traverse in any order. (Many questions do not matter) So it is also possible to use a hash table instead of a queue. This point needs everyone's attention. I hope everyone has a deeper understanding of the nature of BFS.

So do we not need a queue, just use a hash table, and just store the hash collection? No! I will announce it for everyone in the deque.

The specific algorithm of this question:

  • Define two queues: q1 and q2, to search from the start and end respectively.
  • Construct adjacency matrix
  • Try to expand from the smaller of q1 and q2 each time. This can achieve the effect of pruning.

  • If q1 and q2 converge, just join the paths of the two together.
Code
  • Language support: Python3

Python3 Code:

 class Solution:
    def findLadders(self, beginWord: str, endWord: str, wordList: list) -> list:
        #  剪枝 1
        if endWord not in wordList: return []
        ans = []
        visited = set()
        q1, q2 = {beginWord: [[beginWord]]}, {endWord: [[endWord]]}
        steps = 0
        # 预处理,空间换时间
        neighbors = collections.defaultdict(list)
        for word in wordList:
            for i in range(len(word)):
                neighbors[word[:i] + "*" + word[i + 1 :]].append(word)
        while q1:
            # 剪枝 2
            if len(q1) > len(q2):
                q1, q2 = q2, q1
            nxt = collections.defaultdict(list)
            for _ in range(len(q1)):
                word, paths = q1.popitem()
                visited.add(word)
                for i in range(len(word)):
                    for neighbor in neighbors[word[:i] + "*" + word[i + 1 :]]:
                        if neighbor in q2:
                            # 从 beginWord 扩展过来的
                            if paths[0][0] == beginWord:
                                ans += [path1 + path2[::-1] for path1 in paths for path2 in q2[neighbor]]
                            # 从 endWord 扩展过来的
                            else:
                                ans += [path2 + path1[::-1] for path1 in paths for path2 in q2[neighbor]]
                        if neighbor in wordList and neighbor not in visited:
                            nxt[neighbor] += [path + [neighbor] for path in paths]
            steps += 1
            # 剪枝 3
            if ans and steps + 2 > len(ans[0]):
                break
            q1 = nxt
        return ans
to sum up

I want to convey a lot of knowledge points to everyone through this question. They are:

  • The queue does not have to be a regular queue, it can also be a hash table, etc. However, in some cases it must be a double-ended queue. I will talk about the double-ended queue later.
  • Bidirectional BFS is only suitable for bidirectional graphs. That is to say, push forward from the end point.
  • Two-way BFS expands from the end of the less state can play a pruning effect
  • Both visited and dist/cost can play the role of recording point visits to prevent the generation of loops. However, the role of dist is more, and the corresponding space occupation is also larger.
Deque

As mentioned above, BFS can essentially be seen as traversing on a graph with edge weight 1. In fact, we can make a simple extension. What if the edge weights in the graph are not all 1, but 0 and 1? In this way, we actually use a deque.

The deque can be inserted and deleted at the head and tail at the same time, while the ordinary queue only allows deletion at the head and insertion at the tail.

Use a deque, when one state is taken out each time. If we can without cost , then we can put it directly at the head of the team, otherwise at the end of the team. From the monotonicity and binary nature of the queue is not difficult to get the correctness of the algorithm. And if the state transition comes at a cost, just put it at the end of the line. This is also one of the reasons why the built-in data structure provided by many languages is a deque instead of a queue.

As shown below:

The above queue is a normal queue. In the double-ended queue below, we can see that we cut a B at the head of the queue.

Animation demo:

双端队列.svg

Thinking: What if the weights corresponding to the graph are not 0 and 1, but are arbitrary positive integers?

We mentioned earlier that does not need a queue, just use a hash table, and store the hash collection or something? Here is the secret for everyone. impossible. Because the hash table cannot handle the case where the weight is 0 here.

Comparison of DFS and BFS

What kind of problems do BFS and DFS handle respectively? What is the difference between the two? These are all worthy of our serious study.

Simply put, whether it is DFS or BFS, it searches the state space corresponding .

Specifically, the difference between the two is:

  • DFS will choose any one to enter deep at the fork point, then return when it encounters the end point, and try the next option after returning to the fork point again. Based on this, we can record some data on the path. can also derive a lot of interesting things from this.

As shown in the figure below, we traverse to A and there are three options. At this time, we can choose any one. For example, if B is selected, the program will continue to select branches 2, 3. . .

The following animation demonstrates a typical DFS process. In the following chapters, we will bring you more complicated DFS on the graph.

binary-tree-traversal-dfs

  • At the bifurcation point, BFS will choose the search path and try each time. When a queue is used to store the elements to be processed, most queue will only have two levels of elements, and the monotonicity is satisfied, that is, the elements of the same level are together. has many interesting optimizations based on this feature.

As shown in the figure below, breadth-first traversal will select all the search options before entering the next layer. As above, I have indicated a possible sequence of program execution.

It can be found that it is the same as what I said above. The queue on the right always has at most two levels of nodes, and the same level is always together. In other words, the elements of the queue at the level meet the monotonicity .

The following animation demonstrates a typical BFS process. In the following chapters, we will bring you more complicated BFS on the graph.

binary-tree-traversal-bfs

to sum up

The above is all the content of "Search Chapter (Part 1)". Summarize the problem-solving ideas of the search article:

  • Construct a state space (picture) based on the topic information.
  • Traverse the graph (BFS or DFS)
  • Record and maintain status. (For example, visited maintenance access status, decision direction of queue and stack maintenance status, etc.)

We spent a lot of space on BFS and DFS in detail, including the comparison of the two.

The core points need everyone to pay attention to:

  • DFS usually has a recursive relationship, and the recursive relationship is the edge of the graph. According to the recursive relationship, you can choose to use pre-order traversal or post-order traversal.
  • BFS is suitable for solving the shortest distance problem due to its monotonicity.
  • 。。。

The essence of the two-way search is to move the constant term of complexity from a position with greater influence (such as the exponent bit) to a position with less influence (such as the coefficient bit).

The knowledge points in the search chapter are relatively intensive, I hope you will summarize and review more.

In the next section, we introduce:

  • Backtracking and pruning.
  • Commonly used indicators and statistical methods. Specifically:

    1. Tree depth and subtree size
    2. DFS sequence of the graph
    3. Topological order of graph
    4. Connectivity components of the graph
The content of the next section will be first published in "91 Tianxue Algorithm". Those who want to participate can here for details: 160b70250896eb https://lucifer.ren/blog/2021/05/02/91algo-4/

lucifer
5.3k 声望4.6k 粉丝