1 Introduction
After the search engine gradually expands the amount of data, distributed search is the only way to go. In addition to data sharding, the distribution of search engines also needs to consider the state of data and the state flow of each component. Here, I will share some experiences and thinking about the design of distributed search engines based on ZK, including the evolution from the stand-alone version to the distributed version.
2. Distributed system
- A distributed system is a system in which hardware or software components are distributed on different network computers and communicate and coordinate with each other only through message passing. When the single-machine system cannot carry the amount of requests or data, it is necessary to consider reasonable distributed transformation and deployment of the system.
- The CAP (Consistency Availability Partition tolerance) theorem is a well-known concept. These three indicators cannot be achieved at the same time. Therefore, in practical applications, we always need to make trade-offs for the current business, such as in the core database field. For the sake of strong data consistency, we may compromise some of the availability, and we may give priority to availability in services with large traffic. In the search and recommendation application scenarios of Search, we should give priority to availability to ensure performance first, and in strong consistency Sexual compromise, only need to ensure eventual consistency.
3. Challenges faced by distributed systems
Building a complete distributed system needs to solve the following important problems:
- Reliable node status awareness Abnormalities in distributed systems come from many situations, including crashes caused by unavailability of server hardware, system crashes and exits with severe exceptions, link abnormalities and instability caused by network instability, and suspended animation due to excessive service load. and other abnormal conditions.
Reliability of data update As a stateful service, the search service needs to index a large amount of data. More importantly, the index data is not only written all the time, but also needs to ensure full data updates at the daily or hourly level. For an online service, it is necessary to ensure the stability of retrieval. It is not an exaggeration to compare the image to changing wheels at high speed.
4.Search distributed overall structure
Search distributed overall includes several major components:
- shard (core retrieval logic and index sharding)
- searcher (retrieve and request distribution)
- indexbuild (offline index build)
- search-client (service discovery client)
Search distributed framework:
5.shard module
The shard module of Search is the core part of the entire search engine. Its main function includes each independent retrieval unit. The main framework module includes the following parts:
5.1 Index
The index of Search includes various types, and each type of data structure is different. The existing internal indexes include forward index, inverted index, Term index, Tf index, vector index and other index forms.
positive index
The forward index of Search stores the mapping from each primary key ID in the engine to the complete data of each doc. The structure of the index is a Hashmap structure, each Key is the Hash value of the primary key ID, and the value is a pointer to each complete doc . The engine uses two Hashmaps internally, the first is the primary key ID to unique docid mapping and the other is the docid to complete doc pointer mapping.
Inverted index
Inverted index is essentially the mapping of recording keys to each doc. In retrieval, it is necessary to ensure that the inverted chain has efficient reading and writing capabilities, and the reading ability is conducive to efficient complex retrieval syntax operations, such as AND, OR, NOT and other complex operations. operate. At the same time, the data structure of the inverted chain also requires efficient writing capability. Real-time data needs to be written to the engine while the engine is retrieving, and it is inevitable to modify the inverted chain, so the efficient writing capability is also critical.
array
The advantage of using an array as an indexed structure is that the read is fast, the logical operation is fast, and the cache is friendly, but the write operation is not good, it can only be used for offline fixed data, and the incremental method is not written.
Skip List (SkipList)
The data structure of the jump table is a compromise to the linked list, the read and write performance is quite satisfactory, the CPU cache performance is relatively poor, and the space used to record a single docid is relatively large, requiring two pointers plus an integer.
Bitmap
The Bitmap type uses bits to represent binary information, and the bits of the Bitmap are used as the Key value. The inverted index structure of the search engine is more suitable for the data structure of Bitmap. At the same time, the structure of Bitmap is friendly to the CPU cache, and the read and write operations are very fast. , but because the Bitmap records the status of all keys, including the Bitmap being 0, the space may be wasted seriously.
Roaring Bitmap
RoaringBitmap is a Bitmap structure with a certain compression function. In addition to retaining the random read and write performance of Bitmap, the density of 1 and 0 in Bitmap is reasonably processed, which reduces storage space and has better overall performance. Each data structure of the inverted index has its own applicable scenarios and data. Generally speaking, the comprehensive performance of RoaringBitmap is better. The ES search engine (Elasticsearch) has a detailed test for these types of inverted indexes. Interested students can look at the respective test results for each test.
Term index
The index of Term is mainly used to store each Term in which each field has been segmented. Because the number of Term is very large, a lot of space will be wasted if stored in a normal manner. At the same time, the search engine needs prefix search, so the storage of Term words needs to satisfy the prefix. Inquire. The data structure used to store the term words in Search is the FST (Finite-State Transducer) data structure, the corresponding detailed paper address, the FST data structure is more space-saving than the prefix query tree Trie tree, and the query efficiency is basically the same. .
vector index
Inside the vector index is a special inverted index. According to different approximate vector query algorithms, different indexes are produced. For the vector quantization algorithm, the vector index after training will be clustered into a certain number of inverted indexes. , each clustering result forms a codeID, and the inverted row is the vector corresponding to this clustering. So vector indices are a special kind of inverted indices.
5.2 Query sorting
The query module is the core functional module of Search, including many core business logics of retrieval, including the self-developed tokenizer MusicWs, analysis part-of-speech analysis module, grammar parsing and logical search module, Search sorting framework and cache module and other partial modules.
6.searcher module
The searcher module is the core part of Search and the upstream of the shard module. The main functions include sharding and merging of requests and reordering of data. The overall structure of the searcher is as follows:
6.1 Query routing
Route module
The main function of the Route module is to horizontally split the original Query of the request. Route will shard the request according to the sharding information saved in the ZK path. For example, the request will truncate the fulllimit with the maximum recall, and the Route will shard the request according to the value of the fulllimit. At the same time, it is allocated according to the number of shards, and then distributed to each shard node.
Merge module
The Merge module processes, aggregates and processes the data returned by the shard, and processes and aggregates the data returned by each shard module.
6.2 Sorting Framework
The sorting framework in the searcher is mainly to reorder the global final results. For example, the final song retrieval will be uniformly scored in the song. Each shard will upload the corresponding normalized score of the song to the searcher module, and finally score the score. Uniform sorting. At the same time, the sorting framework supports custom developed scorers and sorting plugins.
7.Search client and service discovery mechanism
Search's service discovery mechanism is the core module that communicates between services. In addition to ensuring normal RPC data calls, it also ensures normal traffic switching scheduling when services are abnormal. Search service discovery function module:
The service discovery of Search consists of two parts, the server and the client, which interact through ZK. ZK stores the machine IP and port of each cluster, and the client monitors the changes of the path. When any IP in the list is deleted, ZK Call back the client to sense, and the client cuts the traffic away from this machine. At the same time, there is a heartbeat between the client and the server, which is used for traffic flow in abnormal situations such as the server service being stuck.
8. Design of Search distributed nodes
The most complicated part of a distributed system with state is the handling of exceptions, including data update and node exception handling. For Search, data update will cause nodes to go offline, including state changes. Expansion and shrinkage will cause drastic changes in each node and bring exceptions. At the same time, if a node has a problem, it also requires intelligent cluster processing and routing. Therefore, a reliable processing mechanism must be designed in the early stage.
8.1 Design of each node
The nodes of shard and searcher are the top priority in the entire Search system. It is preferred to design a reasonable hierarchical structure to component the overall distributed system.
- The above figure shows the path distribution of shard nodes in ZK. It is distributed layer by layer according to the cluster name and application name. The node at the end of the path stores the shard information of each shard. The first one is the total shard, the second one The bit is the ID of the shard, and the cluster IP and port list of all shards are registered under this path. The searcher service listens to this path to obtain the specific number of shards currently distributed and the corresponding shard ID.
When the capacity needs to be expanded, the new node service registers its corresponding IP and port on the new node after updating the data. As the old sharding machine gradually updates the data to the new shard, the corresponding old node There are fewer and fewer IPs in the sharded cluster, and finally all of them are gradually migrated to the new nodes. This is the completion of the expansion. Similarly, when the shard node is reduced in size, the reverse operation is completed.
8.2 Request design of shard node and searcher node
In the node design of the shard, there is no distinction between the master and the copy, and each copy has request traffic before. The reason for this consideration is to improve the utilization of the machine, but the value of the simple copy is not large, so the weight of all copies is balanced and all traffic is received.
When deployed, each row is a complete data set and a minimum request row for the whole. And each column is the same data set, there is no master-slave distinction, and there is traffic on any node. When there is a problem with one of the nodes, such as the node crashes and the process exits, the internal mechanism of the shard side will take the initiative to go offline before the crash, and the searcher will automatically distribute the traffic to the remaining shard column nodes.
9. Search distributed data flow design
Search is a stateful retrieval service. There will be real-time data written all the time and offline data updated daily or hourly to the engine. Reliable data update is very important. For distributed, the output update and The writing of real-time data is a very important part.
- The engine is divided into real-time and offline. In the construction system of the engine, the original data will be sharded evenly according to the total number of shards set in the middle stage. The sharding logic is to take Hash according to the primary key ID of each piece of data and then congruence, and then Build the index for the build system, and put the final index in the HDFS path of Search.
- After the real-time data is aggregated by Kafka, each shard will consume the data in Kafka uniformly, and then perform Hash congruence according to the primary key ID in the data to determine whether it is the shard where it is located, and finally determine whether to write to the index where it is located.
For consistency processing, because the consumption speed of multiple copies in the same shard is different, in theory, only the final consistency of multiple copies in the same shard can be guaranteed, that is, there is a data most at a certain moment. The moment it arrives in a shard first, it is retrieved first, and the same search term may not be retrieved in other shards, but this situation is almost imperceptible, because the consumption speed of multiple copies is every second Processing tens of thousands to 100,000 levels of data, that is to say, the incremental write capability of Search is less than 1ms, unless there is a network problem in one of the nodes or abnormal disk conditions will cause writing problems, and eventually some node data retrieval will occur. Exceptions, but these exceptions will be alerted in time through the alarm, and node processing will be carried out.
10. Summary
This article mainly summarizes the distributed design and implementation of search engines. The main important parts are how to design a stateful distributed system, and the most important part is how to control the state of each node. Changes are processed, and data is sharded and processed reasonably. Among them, the path node design of ZK, the realization of automatic expansion and contraction, the service discovery of the client, and the status awareness function are the core parts.
*Text/Zuri
@德物科技public account
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。