1. Background
1.1 Distributed database architecture
There are many current distributed database architectures, but the overall architecture is not much different. The main components include coordination nodes, data sharding, metadata nodes, and global clocks. A common distributed architecture is as follows:
- gtm : global transaction manager (global clock), one master and multiple backups;
- catalog : Metadata management, one master and multiple backups;
- group : horizontal sharding, each group consists of one primary and multiple backup data storage nodes;
- proxy : Coordination node, stateless, responsible for processing client requests, sending requests to data shards according to sharding rules, summarizing the data returned by data shards, and cooperating with other components to ensure the consistency of distributed transactions.
1.2 Sorting problem
Sorting is also an important feature in distributed databases. A query sort statement select *from t1 order by field1, the data to be queried may be distributed in different data shards. This requires the proxy to reorder the ordered data returned for different data shards, and then return the globally ordered data to the client.
When the amount of related data is not large, the proxy can store the data returned by different data shards in the memory, and then reorder the data in the memory and return it to the client. When the amount of related data is relatively large, if the data to be reordered is put into memory, it may cause OOM. If the data to be reordered is temporarily stored in the disk of the proxy, there is also the risk of exhausting the disk and there will be a large number of Disk IO. The following will introduce a distributed database sorting and optimization method.
2. Solutions
2.1 Introduction to sorting scheme
In order to improve the performance of distributed sorting, each data shard itself also participates in sorting. In this way, the data returned by the fragments obtained on the proxy is ordered, and the proxy can use merge sort or priority queue sorting method to reorder the ordered data, which greatly reduces the pressure on the proxy.
The sort buffer size can be configured according to the proxy memory size, usually the default is 10M. If a query statement is associated with N data shards, it needs to be divided into N parts in the sort buffer, and the size of the sorted sort buffer corresponding to each data shard is 10M/N.
Directly in the memory, the specific steps are as follows:
- The client sends the order query statement select *from t1 order by id to the proxy.
- The proxy issues the order query statement select *from t1 order by id to the related data shards group1 and group2 according to the sharding key and sharding rules.
- After data sharding queries and sorts the data locally, it sends the sorted data to the proxy.
- The proxy stores the ordered data returned by the data shard in the sort buffer corresponding to the data shard, and merges and sorts the ordered data.
- The proxy sends the merge-sorted data to the client.
2.2 Defects of sorting scheme
This method can only satisfy the sorting of small amounts of data. When the amount of sorted data is large, we can choose to increase the sort buffer on the proxy. However, increasing the size of the sort buffer will take up more memory resources, so you cannot increase the size of the sort buffer indefinitely.
2.3 Sorting optimization ideas
Save the ordered data returned by the data shard to disk, and then reorder the disk data. The following will introduce an optimization scheme, a method for distributed sorting for large data volumes.
3. Optimization plan
3.1 Introduction to Sorting Scheme
Due to the limitation of memory, it is not feasible to merge and sort large amounts of data in memory. In this case, it is necessary to temporarily store the data returned by the data shards on the disk. The specific optimization steps are as follows:
1) The client sends the order query statement select *from t1 order by id to the proxy.
2) The proxy sends the sort query statement select *from t1 order by id to the related data shards group1 and group2 according to the shard key.
3) After data sharding queries and sorts the data locally, it sends the ordered data to the proxy.
4) The proxy stores the ordered data returned by the data shard in the disk file corresponding to the data shard.
5) Reorder using the priority queue sorting method:
- Each data shard produces a data build heap, and the number of nodes contained in the heap is equal to the number of data shards.
- In order to avoid performance problems caused by reading data from disk one by one during the sorting process of the priority queue, the proxy reads data from the disk file and pre-fills it into the sort buffer corresponding to the data shard.
- The sort buffer of each shard outputs a piece of data into a heap.
- Pop data from the top of the heap and send it to the client.
- After the top data of the heap is popped, read another piece of data from the sort buffer corresponding to the popped node and push it to the heap.
- After the data in the sharded sort buffer is fetched, it is necessary to continue to pull data from the corresponding disk file to fill the sort buffer.
- Until all data is fetched and sent to the client.
3.2 Defects of sorting scheme
- The proxy needs to collect the ordered data of all relevant data shards and store them on the disk to solve the problem of insufficient memory, but the disk is also limited. When the amount of data is too large, the disk may not be able to accommodate the data that needs to be sorted on the proxy.
- The proxy stores data on disk, and there is a large amount of disk IO.
- Take select from t1 order by field1 limit 100w as an example: if the queried data is on 50 data shards, the proxy node needs to pull 100w of data from each data shard and save it to disk. This needs to save 5000W of data (100w 50), while the client only needs 100w of data, which wastes a lot of network bandwidth and disk IO.
3.3 Sorting optimization ideas
This method is that the proxy pulls all the ordered data of the related data shards to the proxy, and then sorts it. Do we pull data from data shards in batches, and then pull the next batch of data from data shards after batch data processing? A method of batch sorting will be introduced below.
Fourth, the final plan
4.1 Introduction to Sorting Scheme
The data of the data shards is not stored on the disk on the proxy. It pulls ordered data of a fixed size from the data shards at a time. The proxy fills the pulled data into the sort buffer corresponding to the shard. After the data in the sort buffer is used, it will be retrieved from the corresponding The data is pulled from the shard. The specific steps are as follows:
1) The client sends the order query statement select *from t1 order by id to the proxy.
2) The proxy sends the sort query statement select *from t1 order by id to the related data shards group1 and group2 according to the shard key.
3) After data sharding queries and sorts the data locally, it sends the ordered data of fixed size to the proxy.
4) The proxy stores the ordered data returned by the data shard in the sort buffer corresponding to the data shard.
5) Priority queue sorting.
- The sort buffer corresponding to each data shard outputs a piece of data to build a heap, and the number of heap nodes is equal to the number of data shards.
- Pop data from the top of the heap and send it to the client.
- After the top data of the heap is popped, read another piece of data from the sort buffer corresponding to the popped node and push it to the heap.
- After the data in the sharded sort buffer is fetched, it is necessary to continue to pull data from the corresponding data sharding node to fill the sort buffer.
- Until all data is fetched and sent to the client.
4.2 Analysis of sorting scheme
The solution to the three defects existing in the optimization scheme 3.2.
Defect 1 : The proxy needs to collect the ordered data of all relevant data fragments and store it on the disk to solve the problem of insufficient memory, but the disk is also limited. When the amount of data is too large, the disk may not be able to accommodate the ordered data on the proxy. data.
solves the situation : It can be seen from the figure that the proxy disk does not save the data of the data fragment.
Defect 2 : The proxy stores data on disk, and there is a lot of disk IO.
solves the situation : The proxy's disk does not save the data of the data fragment, so there is no problem of too much disk pressure.
defect 3 : select from t1 order by field1 limit 100w as an example: if the data queried this time is on 50 data shards, the proxy node needs to pull 100w of data from each data shard and save it to disk. It needs to save 5000W of data (100w 50), and the client only needs 100w of data, which wastes a lot of network bandwidth and disk IO.
solves the situation : Each time a fixed size of data is pulled from the data shard, the data is returned to the client while sorting, and when the data returned to the client reaches 100W, the query is completed, and the network bandwidth waste is greatly improved.
Assuming that the size of the sort buffer corresponding to the data shard on the proxy is 2M, the amount of data pulled from the data shard:
- Worst case : The amount of data pulled is 2M*50+100W, and there is no need to save the disk.
- Best case : The data distribution is very uniform. After returning 100W data to the client, the data corresponding to all sort buffer shards is basically empty (there is one left), and the amount of data pulled at this time is 100W+50.
4.3 Program usage restrictions
1) The data sharding node itself supports sorting, and most data shards support sorting.
2) Data sharding needs to support batch reading.
Taking MySQL as the data sharding example, you need to use streaming query or cursor query on the proxy. In addition, some distributed databases are designed with some distributed problems in mind. Their data sharding nodes keep the context until the end of the query, and their batch read performance is higher, so I will not give an example here.
5. References
1. JDBC operation MySQL (3) - query
2. MySQL JDBC StreamResult Communication Principle Analysis
Author: vivo Internet Database Team - Xia Qianyong
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。