Mastering the characteristics of different data structures allows you to use appropriate data structures to deal with different problems and achieve a multiplier effect with half the effort.

So this time we introduce the characteristics of various data structures in detail, and hope you can understand them well.

intensive reading

Array

<img width=200 src="https://img.alicdn.com/imgextra/i2/O1CN01noho9m1Vltg5ISaq2_!!6000000002694-2-tps-418-110.png">

Array is very commonly used, it is a continuous memory space, so it can be accessed directly according to the subscript, and its search efficiency is O(1).

However, the efficiency of inserting and deleting arrays is low, only O(n). The reason is that in order to maintain the continuity of the array, some operations must be performed on the array after insertion or deletion: for example, to insert the Kth element, the following elements need to be moved back ; To delete the Kth element, you need to move the following elements forward.

Linked list

<img width=280 src="https://img.alicdn.com/imgextra/i3/O1CN010JfUOo1b0A5muE4sE_!!6000000003402-2-tps-584-112.png">

The linked list was invented to solve the array problem. It improves the efficiency of insertion and deletion, while sacrificing the efficiency of search.

The efficiency of inserting and deleting the linked list is O(1), because as long as the corresponding position element is broken and reconnected, the insertion and deletion can be completed, without having to care about other nodes.

The corresponding search efficiency is low, because the storage space is not continuous, so it cannot be directly searched through the subscript like an array, but needs to be searched continuously through the pointer, so the search efficiency is O(n).

By the way, the linked list can be transformed into a doubly linked list .prev .next form a binary tree ( .left .right ) or a .next (N 0609b51b0cf5a8).

<img width=160 src="https://img.alicdn.com/imgextra/i2/O1CN01IqNVQI1m5ABrZXV4i_!!6000000004902-2-tps-318-316.png">

Stack

<img width=240 src="https://img.alicdn.com/imgextra/i3/O1CN01xSS8e21xbe3LP1khU_!!6000000006462-2-tps-466-122.png">

The stack is a first-in-last-out structure, which can be simulated by an array.

const stack: number[] = []

// 入栈
stack.push(1)
// 出栈
stack.pop()

heap

<img width=500 src="https://img.alicdn.com/imgextra/i2/O1CN019O42yy1qE6NlA8w6V_!!6000000005463-2-tps-1154-952.png">

Heap is a special kind of complete binary tree, divided into big top heap and small top heap.

The big top heap means that the root node of the binary tree is the largest number, and the small top heap means that the root node of the binary tree is the smallest number. For the convenience of description, the following takes the large top stack as an example, and the logic of the small top stack can be reversed.

In the big top heap, any node is larger than its leaf node, so the root node is the largest node. The advantage of this data structure is that the maximum value can be found with O(1) efficiency (the small top heap finds the minimum value), because directly taking stack[0] is the root node.

Here is a little mention of the mapping between the binary tree and the array structure, because the binary number is manipulated in the array mode, both operation and space have advantages: the first item stores the total number of nodes, for the node with subscript K, its parent node subscript It is floor(K / 2) , and its sub-node subscripts are K * 2 and K * 2 + 1 , so you can quickly locate the parent-child position.

With this feature, the efficiency of inserting and deleting can reach O(logn) , because the order of other nodes can be adjusted by moving up and down. For a complete binary tree with n nodes, the depth of the tree is logn .

Hash table

<img width=300 src="https://img.alicdn.com/imgextra/i4/O1CN01u3J1JF1Sl25HB6Q0I_!!6000000002286-2-tps-740-598.png">

The hash table is the so-called Map, and different Maps are implemented in different ways. Common ones are HashMap, TreeMap, HashSet, and TreeSet.

The implementation of Map and Set is similar, so take Map as an example to explain.

First calculate the ASCII code value of the character to be stored, and then locate the subscript of an array according to methods such as remainder. The same subscript may correspond to multiple values, so this subscript may correspond to a linked list, and further search according to the linked list , This method is called the zipper method.

If the stored value exceeds a certain number, the query efficiency of the linked list will be reduced, and it may be upgraded to red-black tree storage. In short, the efficiency of such addition, deletion, and O(1) is 0609b51b0cf81b, but the disadvantage is that its content is disordered.

In order to ensure the order of the content, a tree structure can be used for storage. This data structure is called HashTree, so the time complexity is reduced to O(logn) , but the advantage is that the content can be ordered.

Tree & Binary Search Tree

<img width=380 src="https://img.alicdn.com/imgextra/i4/O1CN01vOCoG91w82pzSITaQ_!!6000000006262-2-tps-800-504.png">

Binary search tree is a special type of binary tree. There are more complex red-black trees, but I won't go into it here. Only binary search trees will be introduced.

The binary search tree satisfies that for any node, left <root node <all nodes of right, note that this is all nodes, so all cases need to be considered recursively when judging.

The advantage of a binary search tree is that the time complexity of access, search, insertion, and deletion is O(logn), because any operation can be performed in a binary manner. But in the worst case, it will be downgraded to O(n). The reason is that after multiple operations, the binary search tree may no longer be balanced, and finally degenerate into a linked list, which becomes the time complexity of the linked list.

Better solutions include AVL trees, red-black trees, etc. The binary search trees implemented by JAVA and C++ standard libraries are all red-black trees.

Dictionary tree

<img width=400 src="https://img.alicdn.com/imgextra/i2/O1CN01TqDeaL1ll0lTX75y3_!!6000000004858-2-tps-872-510.png">

The dictionary tree is mostly used in word search scenarios, as long as a single beginning is given, you can quickly find several recommended words in the back.

For example, in the above example, input "o", you can quickly find the two words "ok" and "ol" behind. It should be noted that each node must have an attribute isEndOfWord indicating whether it is a complete word so far: for example, go and good are both complete words, but goo is not, so the second o and the fourth d has a isEndOfWord tag, which means that a complete word can be found after reading this, and the tag of the leaf node can also be omitted.

And check set

<img width=300 src="https://img.alicdn.com/imgextra/i4/O1CN01B5xA5r21rSBj442z3_!!6000000007038-2-tps-622-172.png">

The combined search is used to solve the gang problem or the island problem, that is, to determine whether multiple elements belong to a certain set. The English of Union and Find is Union and Find, which means merge and search. Therefore, the data structure of Union and Find can be written as a class, providing two basic methods: union and find .

Among them, union can put any two elements in a set, and find can find which root set any element belongs to.

The data structure of the array used by the union search set only has the following special meanings, and the subscript is k:

  • nums[k] indicates the set it belongs to, if nums[k] === k indicates that it is the root node of this set.

If you want to count a total of several sets, you just need to count the number that meets the nums[k] === k condition, just like there are a few gangs, as long as there are a few bosses.

The implementation of the union check set is different, and the data will have subtle differences. When inserting the efficient union check set, the value of the element will be recursively pointed to the root boss as much as possible, so that the calculation of the search and judgment is faster, but even if it is not the root The boss, you can also find the root boss through recursion.

Bloom filter

<img width=300 src="https://img.alicdn.com/imgextra/i1/O1CN01CWabYX26RPkR0T3zs_!!6000000007658-2-tps-650-334.png">

Bloom Filter is just a filter, which can eliminate missed data at a speed much faster than other algorithms. However, the data that is not excluded may not actually exist, so further query is required.

How does Bloom Filter do this? It is judged by binary.

As shown in the figure above, we first store the two data a and b, convert them to binary, and change the corresponding to 1, then when we query a or b again, because the mapping relationship is the same, the result of the check is definitely exist.

However, when querying c, one item is found to be 0, indicating that c must not exist; but when querying d, both of them are found to be 1, but the actual d does not exist, which is the cause of the error.

Bloom filters are widely used in Bitcoin and distributed systems. For example, if Bitcoin queries whether transactions are on a certain node, use Bloom filters to block them to quickly skip unnecessary searches. Distributed systems Calculations such as Map Reduce also quickly filter out calculations that are not at a certain node through Bloom filters.

to sum up

Finally, the average and worst time complexity diagrams of each data structure "access, query, insert, delete" are given:

<img width=600 src="https://img.alicdn.com/imgextra/i1/O1CN01LV4sSl20vkHWdZ7nr_!!6000000006912-2-tps-2398-1272.png">

This picture is from bigocheatsheet , you can also click on the link to access it directly.

After learning these basic data structures, I hope you can master them, be good at combining these data structures to solve practical problems, and at the same time realize that no data structure is omnipotent, otherwise there will not be so many data structures to learn. Just use a versatile data structure.

For the combination of data structures, I will give two examples:

The first example is how to query the maximum or minimum value of a stack with O(1) average time complexity. At this time, one stack is not enough, and another stack B is needed. When a larger or smaller value is encountered, it will be pushed into stack B, so that the first number of stack B is the largest or smallest value in the current stack, and the query efficiency It is O(1), and only needs to be updated when it is popped from the stack, so the overall average time complexity is O(1).

The second example is how to improve the search efficiency of the linked list. You can use the idea of combining hash tables and linked lists, and use the hash table to quickly locate the position of any value in the linked list by space-for-time, and you can double the space by using the hash table. The time complexity of inserting, deleting, and querying at the expense are all O(1). Although the hash table can achieve this time complexity, the hash table is unordered; although the HashTree is ordered, the time complexity is O(logn), so only by combining the HashMap and the linked list can the order and the order be achieved. Time complexity is better, but space complexity is sacrificed.

Including the last-mentioned Bloom filter is not used alone, it is just a firewall, blocking some illegal data with extremely high efficiency, but the ones that are not blocked are not necessarily legal, and further inquiry is required.

So I hope you can understand the characteristics, limitations, and combined usage of each data structure. I believe you can flexibly use different data structures in actual scenarios to achieve the optimal solution for the current business scenario.

The discussion address is: "React Server Component" · Issue #312 · dt-fe/weekly

If you want to participate in the discussion, please click here , there is a new theme every week, weekend or Monday. Front-end intensive reading-to help you filter reliable content.

Follow front-end intensive reading WeChat public

<img width=200 src="https://img.alicdn.com/tfs/TB165W0MCzqK1RjSZFLXXcn2XXa-258-258.jpg">

Copyright notice: Freely reprinted-non-commercial-non-derivative-keep the signature ( Creative Commons 3.0 License )

黄子毅
7k 声望9.6k 粉丝