Internal Verification Tutorial Issue 4_Take you into the database SQL engine

As one of the three core modules of the database system, the SQL engine plays a linking role. The SQL engine is responsible for accepting the SQL request input by the client, and according to the load scenario, the SQL statement is processed by its parsing, optimization, execution and other modules, and then the result is returned to the client.

This seemingly simple running process involves the functions and working principles of each module of the SQL engine. At 19:30 on September 1st, the fourth phase of the internal verification tutorial will take you into the database SQL engine and learn about the various knowledge points of the SQL engine. At the same time, it will take you to understand the practice and experience of OceanBase in query rewriting and query optimization. As we all know, the query rewriting module is the focus and difficulty of the optimizer, and it is also the basic knowledge that SQL performance tuning workers and DBAs need to master. Through this tutorial, you will learn a lot of rewriting rules, master the essence of rewriting and optimization, design more efficient SQL.

In addition, in this tutorial, OceanBase technical experts will take you online to analyze the implementation ideas of MiniOB select-meta, and teach you the actual database kernel.

What problems can this issue help you solve?
1. How does the " query rewrite " module rewrite SQL so that it is "good SQL" for the kernel?
2. The execution performance can be improved by writing SQL statements. Why do you need a query optimizer to optimize SQL?
9.1.png

Live broadcast content grabs "fresh" knowledge

query rewrite

Why rewrite SQL?
We know that the query rewriting module has always been the focus and difficulty of the query optimizer, because SQL is a descriptive language, so in order to obtain the same query result, different users will write different SQL. The purpose of query rewriting is to convert "user's good SQL" into more easily optimized SQL.

In the process of query rewriting, there are usually two challenges.
The first challenge is correctness. Correctness is actually a very important indicator. If the rewritten SQL is not equivalent, it means that the returned result may be incorrect. In OceanBase, every time the staff rewrites the rules, they must make sure that they are semantically equivalent. Sometimes they need to prove that the rewrite rules are correct from the perspective of relational algebra, or ensure the correctness through a large number of test scenarios.
The second challenge is completeness, that is to say, the rewriting rules of the design must be general enough to cover all possible writing methods of users.

The completeness of query rewriting is reflected in the pattern matching, and the correctness is reflected in the equivalent change. In addition, query rewriting should also pay attention to the judgment of validity.
In OceanBase database, all query rewriting can be divided into two categories. One is rule-based rewriting, which always rewrites SQL in a good direction. Its main feature is that triggering this type of rewriting can always generate a better execution plan, and this type of rewriting algorithm is always effective.

The second type is cost-based rewriting. Its main feature is that in some scenarios, better execution plans can be generated after rewriting, but in some scenarios, it cannot. Whether this type of rewrite can be triggered requires the optimizer to evaluate the cost of the SQL plan before and after the rewrite, and decide whether to trigger the rewrite based on whether the rewrite can reduce the plan cost.

In the actual execution process, OceanBase's query rewriting framework will continue to iterate rule-based rewriting and cost-based rewriting in a predefined order until convergence.

At present, the rewriting module of OceanBase implements a lot of rewriting rules, including common rewriting rules in academia and industry, as well as rewriting rules extracted by optimizer developers from actual business scenarios. For example, rule-based constant folding, SPJ and non-SPJ view merging, join elimination, etc., cost-based Or-expansion, Group by replacement, window function rewriting, etc.

Query optimization

The physical optimizer of OceanBase mainly includes two parts: plan enumeration and cost calculation. Plan enumeration can enumerate various plan shapes including Deep Tree and Bushy Tree, and includes two enumeration algorithms including DP and Linear. Cost calculation involves statistical information, selectivity calculation and intermediate result estimation, as well as the impact of various factors such as cost model.

Plan enumeration is a deterministic algorithm. If there are three connections, six execution plans appear. Each execution plan will calculate the cost, and finally choose the execution plan with the least cost.

Under the star query, if there are more than 25 join tables, the logical execution plan has reached the level of 200 million without considering physical operator implementation, cost-based rewriting, and distributed optimization. Therefore, how to efficiently enumerate execution plans is an important challenge for query optimization.

In OceanBase, if the number of base tables is less than ten, the dynamic programming algorithm will be used. First, enumerate the execution plans of all one table; then, enumerate the execution plans of two tables; then, enumerate the execution plans of three tables, and so on. If there are more than ten base tables, there will be a heuristic algorithm in the open source version to quickly converge the execution plan.

When OceanBase enumerates execution plans, in addition to retaining the execution plan with the least cost, it also retains the plan with interesting order. Because the ordering of plans with interesting order may be used by subsequent operators to eliminate the ordering.

The execution time of index (a) is ten seconds; the execution time of index (b) is also ten seconds; the execution plan time of the main table is five seconds. At this time, since the cost of the main table is the smallest and p1 is an ordered index, p1 and p3 are reserved.
When the connection sequence is generated, the optimizer will assign other operators in sequence, such as group by, window function, order by, etc. In the process of allocating other operators, the principle of retaining the plan with the least cost and the plan with interesting order is still followed, and the plan with the lowest cost is regarded as the final execution plan after all operators are allocated.

For more details, please pay attention to the official course "From 0 to 1 Database Internal Verification Tutorial" at 19:30 on September 1.

appendix:
Inner Verification Tutorial Phase 1 | The first step to becoming a kernel developer: building a research and development environment
Internal Verification Tutorial Phase 2｜Take you to uncover the mystery of the database storage structure
Internal Verification Tutorial Issue 3 | Why can indexes make queries faster?
Course playback

Quickly scan the QR code below to enter the "OceanBase entry to actual combat" group, follow the course dynamics, and learn and progress with more friends

In order to help you better learn database knowledge and make new friends, OceanBase Meetup will also go to more cities in the future. After joining the group, you can modify your group nickname [Format: Name-City-Position]

Internal Verification Tutorial Issue 4_Take you into the database SQL engine

Live broadcast content grabs "fresh" knowledge

query rewrite

Query optimization

OceanBase技术站

引用和评论

OceanBase v4.3.1 列存 FAQ

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

在 Kubernetes 上用 KubeBlocks + Dify 快速构建生产级 AIGC 应用

数据库的下一场革命：S3 延迟已降至原先的 10%，云数据库架构该进化了

Ape-DTS：开源 DTS 工具，助力自建 MySQL、PostgreSQL 迁移上云

好用的开源埋点方案-ClkLog埋点用户分析系统

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场