Summary: GaussDB (for MySQL) parallel query faster? Huawei Cloud Database Kernel Expert Says This
This article is shared from the HUAWEI CLOUD community " HUAWEI CLOUD database kernel experts reveal for you: How fast is GaussDB (for MySQL) parallel query?" ", the original author: Rao Longhui, Huawei cloud database kernel expert.
Recently, the Huawei Cloud database team launched an expert technical interpretation series, which is based on GaussDB products and conducts in-depth analysis around 1 core technology point in each issue. Chief Scientist of the Database Lab of the Huawei Sweden Research Institute, interpreted 161370dfb4c4c2 "How powerful is the vertical integration of GaussDB (for MySQL) cloud stack?" ” . In this issue, Huawei Cloud Database Kernel Expert Rao Longhui will give a detailed interpretation of the "Parallel Query" (PQ: Parallel Query) of the key feature of GaussDB (for MySQL) introduced in Lu Manyi's article.
Generally speaking, databases have two very broad application scenarios, namely "transaction processing" and "query analysis". Open source MySQL, as one of the most used relational databases in China, began to support transaction processing scenarios as early as version 5.1. With the evolution of technology, open source MySQL not only continues to enhance transaction processing capabilities in the version update iteration process, and the latest MySQL 8.0 version has begun to support many query analysis features, such as Window function, CTE, Hash Join, column histogram, parallelism COUNT calculation and so on.
Huawei Cloud GaussDB (for MySQL) is Huawei’s latest generation of high-performance enterprise-level distributed relational database. It is rooted in the MySQL open source community. It not only inherits all the open source query and analysis features, but also develops many enhanced features for enterprise-level application query loads, such as Parallel query, query calculation pushdown (NDP: Near Data Processing), etc.
In traditional data solutions, "transaction processing" and "query analysis" are respectively completed on different types of libraries. This is because transaction processing focuses on data addition, deletion, modification, and small data query, and more attention is paid to real-time response , High throughput and transactional requirements. The "query analysis" focuses on large data volume and complex calculations, and pays more attention to data capacity scalability and complex computing capabilities. The following figure shows the traditional data solution: OLTP database meets the needs of online transaction processing, and OLAP data warehouse meets the needs of analytical query.
What kind of database does the customer need?
The current industry's general requirements for databases are: real-time response, high throughput, transactional satisfaction, good scalability, and support for complex queries. This has many advantages for users: firstly, it reduces deployment costs, users only need to deploy a set of database clusters; secondly, it solves the problem of data migration delay, and updates and queries run in the same cluster; and It is to support transactional queries, meet the requirements of consistency, principle, isolation and durability, and query results will be more accurate.
In response to user needs, Huawei Cloud native database GaussDB (for MySQL) through technological innovation has the ability to simultaneously satisfy transaction processing and query analysis. GaussDB (for MySQL) is an OLTP transactional relational database that is 100% compatible with MySQL syntax. The architecture adopts the design of separation of computing and storage, and computing resources and storage resources can be independently expanded online. The storage layer uses Huawei's self-developed distributed storage system DFV (Data Function Virtualisation), which can be expanded to a maximum capacity of 128TB. The maximum specification of a single computing node can support a 64-core CPU and a memory capacity of 512G, which can support up to 1 Two write nodes and 15 read-only nodes.
In addition to powerful resource expansion capabilities, GaussDB (for MySQL) is optimized for transaction processing and query analysis, which greatly enhances the capabilities of transaction read and write and query analysis. This article will focus on one of the enhanced query analysis capabilities: parallel query.
The way to improve query analysis: release CPU multi-core computing resources
As we all know, the improvement of software computing power benefits from the enhancement of CPU hardware capabilities on the one hand, and the ability to fully utilize the computing resources of the CPU at the software design level on the other hand. Current processors generally use multi-core designs, such as GaussDB (for MySQL), a single node can support up to 64-core CPUs. The single-threaded query method can use at most the CPU resources of one core, and the performance improvement is limited, which is far from meeting the requirements for reducing the delay in the enterprise large data query scenario. Therefore, the complex query analysis computing process must consider the full use of the multi-core computing resources of the CPU, and allow multiple cores to participate in parallel computing tasks in order to greatly improve the processing efficiency of query computing.
The following figure is an example of the process of using CPU multi-core resources to calculate the count ( ) of a table in parallel: the table data is diced and distributed to multiple cores for parallel calculation, and each core calculates partial data to obtain an intermediate count ( ) result. And in the final stage, all intermediate results are aggregated to get the final result.
GaussDB (for MySQL) query analysis and improvement tips: parallel query
GaussDB (for MySQL) supports parallel execution of queries, which is used to reduce the processing time of analytical query scenarios and meet the low latency requirements of enterprise-level applications. As mentioned earlier, the basic implementation principle of parallel query is to split the query task and distribute it to multiple CPU cores for calculation, and make full use of the multi-core computing resources of the CPU to shorten the query time. The performance improvement factor of parallel query is theoretically positively related to the number of CPU cores, that is, the higher the parallelism, the more CPU cores can be used, and the higher the performance improvement factor.
The following figure shows: COUNT(*) query time-consuming to query 100G data volume on 64U instance of GaussDB (for MySQL), different query concurrency corresponds to different time-consuming, the higher the concurrency, the more time-consuming the query short.
GaussDB (for MySQL) supports multiple types of parallel query operators to meet various complex query scenarios of customers. The current parallel query scenarios supported by the latest version (2021-9) include:
Primary key query, secondary index query
Primary key scan, index scan, range scan, index equivalent query, index reverse query
Parallel condition filtering (where/having), projection calculation
Parallel multi-table JOIN (including HashJoin, NestLoopJoin, SemiJoin, etc.) query
Parallel aggregate function operations, including SUM/AVG/COUNT/BIT_AND/BIT_OR/BIT_XOR, etc.
Parallel expression operations, including arithmetic operations, logical operations, general function operations and mixed operations, etc.
Parallel group by, sort order by, limit/offset, distinct operations
Parallel UNION, subquery, view query
Parallel partition table query
Data types supported by parallel query include: integer, character, time, floating point, etc.
Other inquiries
The following figure is the performance test result of GaussDB (for MySQL) parallel query for 22 query scenarios of TPC-H. The test data volume is 100G, and the concurrent thread data is 32. The following figure shows the performance improvement of parallel query compared to traditional MySQL single-threaded query: Under 32 parallel execution, single table complex query performance can be improved by up to 26 times, and the performance is generally improved by 20+ times. Multi-table JOIN complex queries can improve performance by up to nearly 27 times, and generally improve performance by 10+ times. Subquery performance has also been greatly improved.
All in all, GaussDB (for MySQL) parallel query fully uses the multi-core computing resources of the CPU, greatly reduces the processing time of analytical query scenarios, greatly improves database performance, and can well meet the needs of customers in a variety of complex query scenarios. Delay requirements. At present, the parallel query function of GaussDB (for MySQL) is available online, and more parallel query scenarios are constantly being unlocked. Welcome to the official website of Huawei Cloud to experience
Ps: After reading it, I don’t think it’s not enough? In the next issue, another expert of ours will bring you the interpretation of GaussDB (for MySQL) "Parallel DDL" technology, more exciting, so stay tuned!
Rao Longhui, Huawei Cloud Database Kernel Expert. More than ten years of database kernel research and development experience, proficient in MySQL kernel principles and source code. Responsible for the functional planning, design and development of the high-performance SQL query engine of Huawei Cloud Database GaussDB (for MySQL).
For more information, please visit the official website www.huawei.com/hc2021
Click to follow, and learn about the fresh technology of Huawei Cloud for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。