In-depth analysis of PolarDB&#39;s parallel query engine

Introduction to with open source MySQL and other MySQL products, 1614adb82c1c0b PolarDB, in addition to the advanced architecture of separation of computing and storage, another core technological breakthrough is the development of a parallel query engine that other MySQL products do not have. As for the query engine, PolarDB not only maintains its own advantages in OLTP applications, but also has a qualitative leap in support for OLAP, which is far ahead of other MySQL products.

PolarDB is compared with open source MySQL and other MySQL products. In addition to the advanced architecture that separates computing and storage, another core technological breakthrough is the development of a parallel query engine that other MySQL products do not have. Through the parallel query engine, PolarDB In addition to maintaining its own advantages in OLTP applications, it also has a qualitative leap in support for OLAP, which is far ahead of other MySQL products.

More and more analysis and statistics needs of users

As we all know, MySQL's optimizer currently does not support parallel optimization, nor does it support parallel execution. Of course, MySQL itself is gradually exploring the possibility of parallel execution, such as the parallel execution of count(*), but as a whole, it has not yet formed a mature parallel optimization and parallel execution mechanism, but only for some special scenarios. optimization. With the vigorous development of MySQL-like products on the cloud, more and more traditional users are migrating to MySQL-like products, which poses some new challenges to MySQL. While many traditional users require OLTP, they also require the database to have some analysis, statistics, and reporting capabilities. Compared with traditional commercial databases, MySQL has obvious disadvantages in this respect. In order to meet the increasing OLTP capabilities of users and the demand for OLAP analysis at the same time, PolarDB's parallel query engine came into being. Since its inception, through powerful parallel execution capabilities, queries that originally took hundreds of seconds now only take a few seconds, saving users a lot of time and money, and getting great praise from a large number of users.

PolarDB parallel query engine came into being

The optimizer is the core of the database. The quality of the optimizer can almost determine the success or failure of a database product. The development of a brand new optimizer is a huge challenge for any team. Without mentioning the complexity of the technology, it is a very difficult difficulty to achieve sufficient stability of the product. Therefore, even traditional commercial databases continue to improve on the basis of existing optimizers, gradually increase support for parallelism, and eventually become a mature parallel optimizer. The same is true for PolarDB. When designing and developing a parallel query engine, we make full use of the existing optimizer's technical accumulation and implementation foundation, continuous improvement and continuous polishing, and finally formed a continuous iterative technical solution to ensure the new optimizer Stable operation and technological innovation.

For an OLAP-like query, it is obvious that it is usually a query of large quantities of data. Large amount of data means that the data is much larger than the memory capacity of the database. Most of the data may not be cached in the buffer of the database, but must be executed in the query It is dynamically loaded into the buffer only when it is time, which will cause a large number of IO operations, and IO operations are the most time-consuming, so the first thing to consider is how to speed up IO operations. Due to hardware limitations, the time consumption of each IO is basically fixed. Although there is a difference between sequential IO and random IO, the difference between the two is gradually approaching when SSDs have become popular today. So is there any other way to speed up IO? Obviously parallel IO is a simple and easy method. If multiple threads can initiate IO at the same time, each thread only reads part of the data, so that the data can be read to the database quickly In the buffer. But if you just read the data into the buffer instead of immediately performing subsequent processing, then these data will be swapped out due to the buffer being full, thus losing the meaning of accelerating IO.

Figure 1-Schematic diagram of parallel IO

Therefore, while reading data in parallel, these data must be processed in parallel at the same time, which is the basis for parallel query acceleration. Because the original optimizer can only generate serial execution plans, in order to read data in parallel and process data in parallel, the existing optimizer must first be modified so that the optimizer can generate the parallel plan we need. For example, which tables can be read in parallel and will bring sufficient benefits through parallel reading; or which operations can be executed in parallel and can bring sufficient benefits. This is not to say that parallel transformation will definitely have benefits. For example, for a table with a small amount of data, it may only be a few rows. If it is also read in parallel, the cost of the multi-threaded construction required for parallel execution may be huge. In terms of the benefits obtained, in general, parallel reading will require more resources and time, which is not worth the loss. Therefore, the transformation of parallelization must be based on cost, otherwise it may cause more serious performance degradation problems.

Parallel scan of fact table

Through calculation and comparison based on parallel cost, selecting tables that can be read in parallel as candidates is the first step in the parallel execution plan. Based on the new parallel cost, there may be a better JOIN order selection, but this requires more iteration space. To prevent the optimization process from consuming too much time, it is a good choice to maintain the original planned JOIN order. In addition, for each table participating in JOIN, because the access method of the table is different, such as full table scan, ref index scan, range index scan, etc., these will affect the cost of the final parallel scan.

Usually we choose the largest table as the parallel table, so that the benefit of parallel scanning is the largest. Of course, you can also choose multiple tables to perform parallel scanning at the same time. We will continue to discuss more complicated situations later.

The following is an example of querying the TOP 10 users of annual consumption:

SELECT c.c_name, sum(o.o_totalprice) as s FROM customer c, orders o WHERE c.c_custkey = o.o_custkey AND o_orderdate >= '1996-01-01' AND o_orderdate <= '1996-12-31' GROUP BY c.c_name ORDER BY s DESC LIMIT 10;

The orders table is an order table with a lot of data. This type of table is called a fact table, and the customer table is a customer table with relatively little data. This type of table is called a dimension table. Then the parallel execution plan of this SQL is shown in the following figure:

It can be seen from the plan that the orders table will be scanned in parallel and executed by 32 worker threads. Each worker only scans some shards of the orders table, and then performs JOIN and JOIN with the customer table according to o\_custkey as eq\_ref The result is sent to a collector component in the user session, and then the collector component continues to perform the subsequent GROUP BY, ORDER BY, and LIMIT operations.

Multi-table parallel JOIN

After scanning a table in parallel, I wonder why only one table can be selected? If there are 2 or more FACT tables in SQL, is it possible to scan all FACT tables in parallel? The answer is of course. Take the following SQL as an example:

SELECT o.o_custkey, sum(l.l_extendedprice) as s FROM orders o, lineitem l WHERE o.o_custkey = l.l_orderkey GROUP BY o.o_custkey ORDER BY s LIMIT 10;

The orders table and lineitem table are both FACT tables with a large amount of data. The parallel execution plan of this SQL is shown in the following figure:

From the plan, we can see that both the orders table and the lineitem table will be scanned in parallel, and both are executed by 32 worker threads. So how is the parallelism of multiple tables achieved? Let's take two tables as an example. When two tables perform JOIN, the usual JOIN methods include Nested Loop JOIN, HASH JOIN, etc. For different JOIN methods, in order to ensure the correctness of the results, a reasonable table scan method must be selected. Take HASH JOIN as an example. For a serially executed HASH JOIN, first select a table to create a HASH table called the Build table, then read another Probe table, calculate the HASH, and perform HASH matching in the Build table. If the match is successful, output the result, otherwise continue to read. If it is changed to parallel HASH JOIN, the parallel optimizer will parallelize the serially executed HASH JOIN to make it a parallel HASH JOIN. There are two solutions for the parallel transformation scheme. The first solution is to partition the two tables according to the HASH key, and the data with the same HASH value is in the same partition, and the HASH JOIN is executed by the same thread. The second solution is to create a shared Build table, which is shared by all threads executing HASH JOIN, and then each thread reads the fragments of another table belonging to its own thread in parallel, and then executes HASH JOIN.

Figure 2-Schematic diagram of parallel HASH JOIN

For solution one, you need to read all the data in the table, partition the data according to the selected HASH key, and send the data to different processing threads. This requires an additional Repartition operator, which is responsible for dividing the data according to the partitioning rules Send to different processing threads. In order to improve efficiency, message queue queues are usually used here.
For the second solution, the shared HASH build table needs to be created in parallel. When the build table is created successfully, each thread reads a fragment of the Probe table and executes HASH JOIN respectively. The fragments here do not need to be fragmented according to the HASH key. , Each thread can read disjoint fragments separately.

Parallel analysis of statistical operators

For an analysis and statistics requirement, the GROUP BY operation is an inevitable operation, especially for a large number of JOIN results and then the GROUP BY operation is the most time-consuming process in the entire SQL, so the parallelism of the GROUP BY is also the parallel query engine that must be prioritized solved problem.

Taking the SQL of TOP10 customers in annual consumption as an example, the parallel execution plan after parallelizing GROUP BY is shown in the following figure:

Compared with the previous execution plan, the new execution plan has one more collector component, and there are a total of 2 collector components. First, let’s look at the collector component in the second row. There are two "Using temporary; Using filesort" in its extra information, which means that it performs GROUP BY on the data received from the workers, and then sorts it by ORDER, because only the first A collector component is in the user's session, so this collector is also executed in parallel in the worker, that is to say, Group by, Order by, and Limit are performed in parallel; then look at the collector component in the first row, there is only one extra information in it "Merge sort" means that the session thread performs a merge sort on the data received from the workers, and then returns the result to the user. Some people may ask questions here, why can the session thread only do merge sort to complete the GROUP BY operation? And where is LIMIT?

First answer the second question. Because of the problems displayed in the explain plan, the LIMIT operation is not displayed in the normal mode, but the LIMIT operation is displayed in the Tree mode. As follows:

From the Tree-type plan tree, you can clearly see that there are two LIMIT operations, one is at the top of the plan, that is, on the session, the data is returned to the user after the limit is completed; the other is in the middle of the plan tree, It is actually in the execution plan of the worker thread. In each worker thread, a limit will be set after the sorting is completed, so that the amount of data returned by the worker to the session thread can be greatly reduced, thereby improving the overall performance.

Let's answer the first question, why GROUP BY only needs to be executed once on the worker thread to ensure the correctness of the result. Generally speaking, each worker has only one shard of all data, and doing GROUP BY on only one data shard is very risky to get the wrong GROUP BY result, because the data of the same GROUP group may not only be in the WORKER On the data shards, it may also be held by other WORKERs in the data shards of other WORKERs. But if we can ensure that the data of the same GROUP group must be in the same data slice, and this data slice is only held by one WORKER thread, then the correctness of the GROUP BY result can be guaranteed. Through the Tree type execution plan, you can see that after the parallel JOIN, the results of the JOIN are grouped by the KEY value of the GROUP: cc\_name for the Repartition operation, and the data of the same group is distributed to the same WORKER, so as to ensure that each WORKER has The data fragments do not cross each other to ensure the correctness of the GROUP BY results.

Because the GROUP BY operation of each WORKER is the final result, ORDER BY and LIMIT can also be pushed down to the WORKER for execution, which further improves the efficiency of parallel execution.

Linear acceleration of TPCH by parallel query engine

The picture shows the acceleration effect of a parallel query engine on TPCH. 100% of the SQL in TPC-H can be accelerated, 70% of the SQL speedup is more than 8 times, and the total speedup is nearly 13 times. Q6 and Q12 speed up even more than 32 times.

Continuously iterative and innovative parallel query engine

In short, through the support of parallel query engine, PolarDB not only maintains the stability of the query engine, but also greatly improves the performance of complex SQL, especially the performance of analytical statistical queries. Through planned and continuous iterations, PolarDB has gone farther and farther on the road of parallel query engines, and is becoming more and more powerful. In order to meet the ever-increasing performance needs of customers, and for the digital upgrade of more business users, PolarDB provides you with revolutionary A data engine to help you accelerate the embrace of the future of the Internet of Everything.

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

In-depth analysis of PolarDB's parallel query engine

More and more analysis and statistics needs of users

PolarDB parallel query engine came into being

Parallel scan of fact table

Multi-table parallel JOIN

Parallel analysis of statistical operators

Linear acceleration of TPCH by parallel query engine

Continuously iterative and innovative parallel query engine

阿里云开发者

引用和评论

福利来了！计算巢支持在已经购买的 ECS 上搭建幻兽帕鲁服务器，支持图形化管理配置

In-depth analysis of PolarDB&#39;s parallel query engine

More and more analysis and statistics needs of users

PolarDB parallel query engine came into being

Parallel scan of fact table

Multi-table parallel JOIN

Parallel analysis of statistical operators

Linear acceleration of TPCH by parallel query engine

Continuously iterative and innovative parallel query engine

阿里云开发者

引用和评论

福利来了！计算巢支持在已经购买的 ECS 上搭建幻兽帕鲁服务器，支持图形化管理配置

In-depth analysis of PolarDB's parallel query engine