Do you really understand binary trees? (Basic Tree Structure)

首图.gif

Contact us : Youdao technical team assistant : ydtech01 / mailbox : <ydtech@rd.netease>.com

Foreword

The tree structure, especially the binary tree, is used frequently in our usual development process, but there is no systematic and comprehensive understanding and cognition of the tree structure before, so I take this opportunity to sort it out.

This article belongs to one of the series of "Do you really understand binary trees". It mainly introduces the basis of tree structure. After reading this article, if you want to be more proficient in binary trees, you can read another Do you know about binary trees (hand tear algorithm)" (released next week).

1. Tree structure foundation

Compared with the linked list each node can only point to the next node uniquely (the linked list here is a one-way linked list), the tree is that each node can have several child nodes, so we have a tree structure can be expressed as follows:

interface TreeNode {
  data: any;
  nodes: TreeNode[]
}

1.1 Degree of tree

PS: In the graph structure, there is also the concept of degree, which is divided into out-degree and in-degree. If the tree is regarded as part of the graph, then strictly speaking, the degree of the tree is actually the out-degree. However, in the tree structure, we usually take the concept of degree as a description of how many child nodes the current tree node has.
That is, each node has several children. Therefore, the maximum degree of a binary tree is 2, and the maximum degree of a linked list (which can be regarded as a tree with only one child) is 1.

Theorem of In a binary tree, nodes with degree 0 are 1 more than nodes with degree 2.
proves: If a tree has n nodes, then the tree must have n-1 edges, that is, = the number of edges + 1 (this conclusion applies to all tree structures , Not just a binary tree) as shown below:

图1.png

This tree has 7 nodes, and the connections between nodes, that is, there are only 6 edges.

Then, we assume that the number of nodes with degree 0 is n0, the number of nodes with degree 1 is n1, the number of nodes with degree 2 is n2, and because it is a node with degree 0, the number of edges is N0=0, The number of node edges with degree 1 is N1=n1 1, and the number of node edges with degree 2 is N2=n2 2. Then the total number of nodes is:

above, the expression for the number of edges is as follows
N0=0;
N1=1*n1;
N2=2*n2;
above, the number of nodes = the number of edges + 1
n0 + n1 + n2 = N0 + N1 + N2 + 1;
N0, N1, N2 to get:
n0 + n1 + n2 = 0 + n1 + 2*n2 + 1;
to:
n0 = n2 +1;
is, the number of nodes with degree 0 is always one more than the number of nodes with degree 2

From this, we have proved the above theorem. It may be easier to understand if we change the description of the theorem:

In a binary tree, as long as you know how many leaf nodes there are, then the number of nodes with degree 2 is the number of leaf nodes minus 1. On the contrary, if you know the number of nodes with degree 2, then the number of leaf nodes is degree 2. The number of nodes is increased by 1.

1.2 Tree traversal

表格.jpg

1.3 The idea of tree traversal

The tree is inherently a data structure suitable for recursive traversal, because every time the left subtree and the right subtree are processed, it is actually a process of recursive traversal.

preorder traversal: "root node" "output result of recursive traversal of the left subtree" "output result of recursive traversal of the right subtree"
in-order traversal: "output result of recursive traversal of left subtree" "root node" "output result of recursive traversal of right subtree"
post-order traversal: "output result of recursive traversal of the left subtree" "output result of recursive traversal of the right subtree" "root node"

1.4 Thinking Divergence

Seeing this, some friends may feel familiar, have they seen some knowledge about trees? In fact, we discussed this topic before when we studied the data structure of the stack.

We know that the stack is inherently suitable for expression evaluation. So, what is its logical structure in the process of processing expression evaluation?

Such as: 3*(4+5) this expression. In fact, although we are using the idea of a stack when solving, in fact, at the logical level, we are simulating the operation of a tree. Do not believe? Let's take a look:

        [ * ]
[ 3 ]             [ + ]

           [ 4 ]        [ 5 ]

Above, we borrowed this expression into a tree structure. When we encounter () in the expression, it means that the sub-expression inside needs to be processed first. Then, we regard it as a sub-tree of our binary tree. Tree.

We all know that the tree's traversal idea is recursively traverse , which solves the problem layer by layer from bottom to top. In this way, in the process of recursive call, he will first solve the sub-problem of the right sub-tree, and after the result is obtained, Perform the final calculation with the result calculated by the left subtree to get the final result.

1.5 Restore binary tree

If we know preorder traversal result, preorder results, follow-up through the results any of the three two , we will be able to restore a complete binary tree. E.g:

# 前序遍历输出结果
1 5 2 3 4
# 中序遍历输出结果
5 1 3 2 4

The above are the output results of the two traversal methods. We know that the first node of the pre-order traversal must be root node . Therefore, at this time, we already know that the root node of the original binary tree is 1. Next, we Take this 1 node to the output result of the in-order traversal, and find the position of 1.

And because the output result of the in-order traversal is left root and right, it is not difficult to know that the one on the left of 1 is the output of the in-order traversal of the left subtree of the original binary tree, and the one on the right of 1 is the middle of the right subtree of the original binary tree. Sequence traversal output. In this way, we can divide the mid-order traversal output into the following pieces:

# 切割中序遍历结果
        5            1            3 2 4 
# 左子树        根          右子树
# 由上可知，我们左子树就已经出来了，就只有一个节点，就是5，但是右子树还是一个序列，那么我们继续往下走。

# 由上，我们已经知道了原二叉树的左子树序列、和右子树序列，那么，我们也来切割以下前序遍历结果
        1            5            2 3 4
#        根        左子树     右子树

#切割了前序遍历结果之后，我们找到右子树的序列，他的序列的第一位就是右子树的根节点，也就是2，找到根节点后，就很简单了，重复上面的步骤，在二叉树的中序遍历结果的右子树中就能找到右子树的左子树和右子树分别为3和4，到此，我么就已经还原了这颗二叉树了
    1
   / \
  5   2
     / \
    3   4

Is the tree with only 5 nodes on it very simple? Next, let's come to a slightly more difficult thinking question:

Given the pre-order traversal result and the middle-order traversal result of a 10-node binary tree, restore this binary tree.
Preorder traversal output sequence: 1 2 4 9 5 6 10 3 7 8
In-order traversal output sequence: 4 9 2 10 6 5 1 3 8 7

# 由2.可知，1是根节点，所以左子树序列：4     9     2     10     6     5 ；右子树序列：3     8     7
1.中序：             4     9     2     10     6     5             1                         3     8     7        
# 断言：1是根节点
2.前序：                1         2     4     9     5     6     10                             3     7     8        
# 由1.2可知，2是根节点，所以左子树序列：4     9 ；右子树序列：10     6     5
1.1中序：            4     9             2             10     6     5                                    
# 断言：2是根节点
1.2前序：            2                4     9        5     6     10                                     
# 由1.2.1可知，4是根节点，所以9是右子树
1.1.1中序：            4        9                                                                
# 断言：4是根节点
1.2.1前序：            4        9                                                                
# 由1.2.2可知，5是根节点，所以左子树序列为：10        6
1.1.2中序：            10        6        5                                                        
# 断言：5是根节点
1.2.2前序：            5        6        10                                                        
# 由1.2.2.2可知，6为根节点，所以10位左子树
1.1.2.1中序：        10        6                                                                
# 断言：6为根节点
1.2.2.2前序：        6        10                                                                
# 由2.2可知，3位根节点，所以右子树序列为：8        7
2.1中序：            3        8        7                                                        
# 断言：3为根节点
2.2前序：            3        7        8                                                        
# 由2.2.2可知，7为根节点，所以8为左子树
2.1.1中序：            8        7                                                                
# 断言：7为根节点
2.2.2前序：            7        8                                                                

# 最终二叉树长成这样

                 1
                /       \
            2         3
          /    \       \
         4      5       7
          \    /       /
           9  6       8
             /
            10

1.6 Common classifications of binary trees

1.6.1 Complete Binary Tree

Only in the last layer of right side of the missing node binary tree is called a complete binary tree, that is, complete binary tree on the left side is full, only the right side allowed free node.

                  1
            /           \
           2             3
          /   \         /  \
         4     5       6

The complete binary tree is a very good data structure. It has the following two main characteristics, which can allow us to have a better experience in performance and program implementation.

node number can be calculated

From the complete binary tree above, we can see a rule :

For the node numbered n, the number of the root node of his left subtree must be 2n, and the number of the root node of his right child must be 2n+1, as shown in Figure 2 above, the number of the root node of the left subtree is 4, which is 2 2=4. The number of the root node of the right subtree is 5, which is 2 2+1=5.
So what can we do with this law?
We know that in addition to the data field for storing data, an ordinary binary tree requires additional storage space to store the pointers of the left subtree and the right subtree, that is, the pointer field. If we can directly calculate the number of the root node of the left subtree and right subtree of the current node through the above rules, is there no need for additional storage space to store the storage addresses of the left and right subtrees? When a tree is large enough At that time, this can save us a considerable amount of storage space.

The above method of recording the storage address is replaced by calculation, which leads to an algorithm idea that we often use in our daily work: record and calculation ideas

Recording formula (saves time, consumes space, does not need to be calculated, and directly takes the value, that is: space for time): Store the information, and take it out when needed.
calculation formula (saving space, time-consuming, no need to store, calculation value, namely: time for space): obtained by calculation, such as 1+1=2 in 2 is obtained by calculating the expression 1+1 the result of.

These two methods have their own advantages and disadvantages. It is meaningless to compare the advantages and disadvantages of the two methods without the problem itself. We should combine the specific problems to see which method can bring you greater benefits.

Scenario One : When limited memory space and , when the calculation time is not strong, such as running a program on a machine with a small memory, we will choose the calculation formula and use time for space.
Scenario 2 : When our memory space is large enough to , and there is a requirement for computing speed, such as running real-time computing data on an enterprise-level application server, we will choose the record type and use space for time, because an enterprise-level For applications, the general memory is large enough and can be dynamically expanded. At this time, the benefits of time are far greater than those of space.

can use continuous storage space to store

In addition to the feature that the node number (ie node address) can be calculated, the complete binary tree because its number is continuous, from top to bottom in ascending and continuous sequence, therefore, we can store the complete binary tree in a continuous storage area, such as : In the array, the element with subscript 0 of the array stores node 1 and the element with index 1 stores node 2.

Using this feature, when we implement a complete binary tree, we don’t need to define a structure separately like a normal binary tree, and define data fields and pointer fields to store data and pointers separately. We can use an array directly. Store data, which is also the most common manifestation of our complete binary tree.

Let’s imagine: when you implement it in your program, you use a one-dimensional linear structure, which is represented by an array, but in your mind, you should transform it into a two-dimensional tree structure to think about the problem. It is also a relatively advanced programming logic thinking ability that allows us to "compile" the data structure we see in our minds into what it looks like at real runtime.

Of course, it is not an overnight thing to have this ability. It takes a lot of exercise to have this ability. At least, at the moment when the author writes down this trip, there is no way to reach this state.

1.6.2 Full Binary Tree

A binary tree without nodes with degree 1 is called a full binary tree, that is, all nodes either have no child nodes or have two child nodes.
PS: We often see many articles on the Internet. The definition of a perfect binary tree is placed on a full binary tree on blogs, but it is actually wrong. See below for the specific definition of a perfect binary tree.

                  1
            /            \
           2              3
         /   \           /  \
        4     5            6    7
             / \
            8   9

1.6.3 Perfect Binary Tree

The degree of all nodes is 2. It can be seen that the definition of a perfect binary tree is still different from a full binary tree. We can say that the perfect binary tree is special full binary tree .

                  1
            /           \
           2             3
         /   \          /  \
        4     5           6    7

2. In-depth understanding of tree structure

2.1 node

The nodes of the tree represent a set , and the child nodes represent disjoint subsets under the parent set. This may be difficult to understand. Then, let’s look at the following diagram:

      5
  /       \
 2         3
# 上面的二叉树，5节点，我们可以把它当做是一个全集，而下面的两个子节点2和3则是这个全集下的两个互不相交的子集，两个子集相加应该等于全集

From the above figure, we can draw a conclusion:

A node of the tree represents a set, and the child nodes represent disjoint subsets under the complete set. can get the complete set 16125c04bd387c.

2.2 side

Each side of the tree represents a relationship.

Third, learn the role of binary trees

3.1 Search operations applied to various scenarios

Because the binary tree structure includes the natural recursive structure and the characteristics that perfectly fit the binary idea, the binary tree and its various variant structures are extremely suitable for efficient search operations in various scenarios. There are also many designs at the bottom of our computer based on the binary tree and the binary tree variant structure. Yes, because of its excellent performance can provide efficient and stable search efficiency.

3.2 Helps to understand the basics of advanced data structures

complete binary tree (maintain the most
- heap
- Priority queue
/Forest
- A magic weapon to solve string and related conversion problems
  - Dictionary tree
  - AC automata
- The magic weapon to solve the connectivity problem
  - And check set
Binary Sort Tree
- The underlying implementation of the important data retrieval container in the language standard library
  - AVL tree (binary balanced tree)
  - 2-3 tree (binary balanced tree)
  - Red-black tree (binary balanced tree)
File system, important data structure at the bottom of the database
- B tree/B+ tree (multiple balanced tree)

3.3 The best choice for practicing recursive skills

Learn the top-level thinking mode of recursion:

Design/understand a recursive program:

Mathematical Induction => Structural Induction

If k0 is correct, assuming ki is correct, then k(i+1) is also correct. Such as solving the Fibonacci sequence:

function fib(n) {
    // 首先要确定k0是正确的，也就是前提条件（边界条件）是正确的，在这题中，k0就是n=1时，结果为1，n=2时，结果为2  
  if(n <= 2) return n;
  return fib(n - 1) + fib(n - 2);
}

gives the recursive function a clear meaning

In the above code, fib(n) represents the value of the nth Fibonacci sequence.

Thinking about boundary conditions

In the above code, our boundary is the known condition. When n=1, it is 1, and when n=2, it is 2. Special treatment is needed for this boundary.

implements the recursive process

After dealing with the boundary problem, you can go down recursively.

If you were to design a program for preorder traversal of a binary tree, how would you design it?

Function meaning : Preorder traverses the binary tree with root as the root node;
boundary condition : when root is empty, there is no need to traverse, directly return to root;
Recursive process : respectively traverse the left subtree and the right subtree in the preorder.

// 函数意义：前序遍历以root为根节点的二叉树
function pre_order(root) {
  // 边界条件：root为空时无需遍历，直接返回root
  if(!root) return root;
  console.log(root.val);
  // 递归过程：分别前序遍历左子树和前序遍历右子树
  pre_order(root.left);
  pre_order(root.right);
}

3.4 Use the left child has brother method to save space

Convert any non-binary tree into a binary tree, such as converting a trigeminal tree into a binary tree:

# 注意，要始终保证二叉树的左边是子节点，右边是兄弟节点

# 原三叉树

          1
        / | \
       2  3  4
         / \
        5   6
                    
# 按照左孩子右兄弟的方式转换成二叉树

         1
        /
       2
        \
         3
        /  \
       5    4
        \
         6
# 因为2是1的孩子，所以放在左子树，因为3是2的兄弟，所以放在2的右子树，4是3的兄弟，放在3的右子树，5是3的孩子，放在3的左子树，6是5的兄弟，所以放在5的右子树

You can find that when a tree is converted into a binary tree through the left-child-right sibling method, the right subtree of the root node is always empty. Then, can we effectively use this right subtree to merge multiple trees? In a binary tree? For example, in the following example, two binary trees are merged together to form a forest.

# 如果要把下面的两棵树合并到一个二叉树中呢
          1                 7
        / | \             /    \
       2  3  4           8      9
         / \
        5   6


           1
     /           \
    2             7
     \           /
      3         8
    /   \        \
   5     4        9
    \
     6
# 这样，我们就将两棵树合并成一颗树了，也就是森林了。这棵树看似一颗二叉树，但其实表示的是两棵树组成的森林

Monte Carlo tree search algorithm implemented in the well-known Alpha Go algorithm source code is algorithm framework, which is called the confidence upper limit tree algorithm ( UCT ) is a search tree implemented by the left child right brother method , Used to represent the whole chessboard position. Normally, if you want to store a chessboard position, it will be stored in a tree structure, but because there are too many chessboard positions, it is possible to form a tree with more than 100 forks. In order to avoid this situation in Alpha Go, this 100-odd tree is converted into a binary tree through the notation that the left child has siblings. Interested students can check out pachi .

So, why is this way to save space? Think about it, every node of a trigeminal tree will have three pointer fields for storing its subtrees. Regardless of whether there are subtrees or not, these spaces must be reserved. For example, the trigeminal tree above has 6 nodes. There are a total of 18 pointer domains, of which there are only 5 valid pointer domains (the so-called valid pointer domain means that the pointer domain is not empty, that is, the number of edges = the number of nodes-1), then there are 18-5=13 pointers It is empty. If the binary tree is converted into a binary tree by the way of left child and right brother, let us see that there are a total of 12 pointer fields, and there are 5 valid pointer fields, then only 12-5=7 pointer fields are empty, which is obviously more than the previous 13 This saves a lot of space.

tree with n nodes will have at most 16125c04bd4730 k * n edges, and his edges actually only have n-1 , so he wasted: k n-(n-1) =(k-1) n+1 edges, which means that the more we fork, the more space we waste, so we need to convert the k-ary tree to a binary tree, because the binary tree wastes The edge of is: n+1, which is only related to the node where we actually store the data.

IV. Conclusion

At this point, we have almost talked about some basic knowledge of binary trees. In order to control the length and the acceptance of small partners with different foundations, we will not start any more in-depth discussions. Originally, I would like to brush up on the algorithm questions about the binary tree with everyone to consolidate some related knowledge of the binary tree, but this will make this article smelly and long, so I should split it into two articles.