SQL Rewriting Series Seven: Predicate Movement

Guide to the series of articles
OceanBase is 100% independent research and development, has stably supported Double 11 for 9 consecutive years , and innovatively launched a new city-level disaster recovery standard of "three locations and five centers". The native distributed database will be officially open sourced in June 2021. The query optimizer is the core module of the relational database system, the focus and difficulty of the database kernel development, and the "touchstone" for measuring the maturity of the entire database system. In order to help you better understand the OceanBase query optimizer, we will write a series of articles on query rewriting, so that you can better grasp the essence of query rewriting, become familiar with the equivalence of complex SQL, and write efficient SQL. This article is the seventh article in the OceanBase rewrite series. It will focus on introducing the technology of predicate movement. Welcome to discuss~
Columnist Introduction
The OceanBase optimizer team, led by OceanBase senior technical expert Xi Feng and technical expert Shan Wen, is committed to building the world's leading distributed query optimizer.
Contents of the series
This query rewriting series not only includes four modules of subquery optimization, aggregate function optimization, window function optimization, and complex expression optimization, but also more modules, so stay tuned! This article will help readers quickly understand the concept and role of these two strategies by introducing some pushdown and pullup scenarios for predicate movement. Welcome to the OceanBase open source user group ( DingTalk number: 33254054 ), and join the group to communicate with the OceanBase query optimizer team.

I. Introduction

Optimizing predicate inference to improve query performance is critical. A predicate can appear in multiple places in a SQL, it can be the Where/Having clause of the main query; it can also be the Where/Having clause of a subquery. The previous article "SQL Rewriting Series VI: Predicate Deduction" introduced that predicate deduction can only optimize each clause independently; they cannot be optimized comprehensively. This article will focus on predicate movement techniques , which can improve where a predicate appears in SQL, and can also synthesize predicates at multiple locations for derivation optimization.

2. Predicate push-down

Predicates are used to filter data. In general, filtering data as early as possible can reduce the size of the intermediate result set and reduce the amount of data that needs to be processed in subsequent calculations. Therefore, an important optimization point for predicates is predicate pushdown. The main function of this optimization strategy is to "press down" the predicate as much as possible and filter out some data in advance.

In this section, we will take Q1 as an example to introduce several scenarios of predicate pushdown.

Q1: 
SELECT V1.C1, V2.C2, V1.M1, V2.M2 
FROM
    (SELECT C1, 0 AS C2,  MAX(C3) as M1 FROM T1 GROUP BY C1) V1,
    (SELECT C1,      C2,  AVG(C3) as M2 FROM T2 GROUP BY C1, C2 HAVING M2 > 0) V2
WHERE V1.C1 = V2.C1 AND V1.C2 = V2.C2 AND V1.M1 > V2.M2 AND V1.C1 > 10;

The FROM part of Q1 is two inline views V1 and V2 . The WHERE part of the Q1 main query has multiple predicates. Where V1.C1 > 10 is the single table filter condition on the view V1 . A predicate pushdown can push it inside the view.

1. The predicate is pushed into the view

Obviously, we can move the predicate V1.C1 > 10 inside V1. Here, since there is a grouping aggregation operation in V1, the predicate pushed down from the outer layer is put into the Having clause.

 V1.C1 > 10 推入视图

V1-1: SELECT C1, 0 AS C2,  MAX(C3) as M1 FROM T1 GROUP BY C1 HAVING C1 > 10

After the predicate pushdown, the output of V1 guarantees C1 > 10 is necessarily true. Therefore, the determination result of the original main query V1.C1 > 10 is always true, which is a redundant predicate and can be removed directly.

2. Predicate push into grouping

Predicates in having clauses can also attempt to push down further. As you can see, C1 > 10 is a predicate on the grouping column. The effect of executing it before or after grouping is exactly the same. We push it into the Where clause, which can filter out part of the data in advance and reduce the amount of data that needs to be processed by the GROUP BY operation.

 C1 > 10 推入 WHERE

V1-2: SELECT C1, 0 AS C2, MAX(C3) AS M1 FROM T1 WHERE C1 > 10 GROUP BY C1;

Of course, not all predicates in Having can be pushed down to Where. There is a predicate in V2 M2 > 0 which refers to an aggregate function AVG(C3) . The result of the aggregate function depends on the grouped aggregate calculation. Therefore, this predicate cannot be pushed down before the grouping operation.

3. Predicate pushdown and predicate derivation

There are only join predicates on V1 and V2, and the predicate on V1 in the original query of Q1. But with the predicate derivation technique introduced in the previous article, we can derive the filter predicate for V2: V2.C1 > 10 . This derivation predicate can also be pushed into V2, optimizing the computation of V2. As you can see, predicate derivation and predicate pushdown are interactive. Derivation can generate more opportunities for pushdown. Similarly, after pushing down, more derivation opportunities can also be generated. Using these two optimization strategies, we can optimize Q1 to the following query:

 Q1 初步优化结果：
SELECT V1.C1, V1.C2, V1.M1, V2.M2 
FROM
 (SELECT C1, 0 AS C2, MAX(C3) AS M1
  FROM T1
  WHERE C1 > 10
  GROUP BY C1) V1,
 (SELECT C1, C2, AVG(C3) AS M2
  FROM T2
  WHERE C1 > 10
  GROUP BY C1, C2 HAVING M2 > 0) V2
WHERE V1.C1 = V2.C1 AND V1.C2 = V2.C2 AND V1.M1 > V2.M2;

Through predicate derivation and pushdown, we optimize Q1 to some extent. Is the above result the final answer for optimization? the answer is negative. Next, we will introduce another important optimization technique: predicate pullup . It can further optimize the query.

Three, predicate pull-up

This section introduces another predicate optimization technique - predicate pull-up. From the name, readers may intuitively think that this is a reverse optimization strategy to delay the filtering of predicates. The author needs to emphasize that this understanding is wrong . Predicate pull-up is not a reverse optimization of predicate push-down. It mainly "pulls up" the predicates in the inner subquery to the outer query, participates in the predicate derivation process of the outer layer, and helps to generate more meaningful predicates. Next, we continue to introduce some predicate pull-up scenarios to readers by analyzing the optimization points of Q1.

1. Predicate pull-up

There is a Having predicate M2 > 0 in the V2 view. For the main query, the output of V2 must satisfy V2.M2 > 0 ; combined with the predicate V1.M1 > V2.M2 in the main query, we can derive V1.M1 > 0 . Obviously, this derivation predicate can be pushed down into the V1 view. The reader can recall the derivation technique introduced in the previous article. V1 According to MAX(C3) > 0 can further derive a Where filter predicate C3 > 0 .

 上拉 M2 > 0，推导 M1 > 0 并推入 V1，再次推导生成 C3 > 0 

V1-3: SELECT C1, 0 AS C2, MAX(C3) AS M1 
      FROM T1 WHERE C1 > 10 AND C3 > 0
      GROUP BY C1

2. Constant pull-up

There is a special projection expression 0 AS C2 in the V1 view. For the main query, the output of V1 must satisfy V1.C2 = 0 ; Combined with the existing predicate in the main query V1.C2 = V2.C2 , we can derive V2.C2 = 0 . Obviously, this predicate can also be pushed into V2 .

上拉 V1.C2 = 0，推导 V2.C2 = 0 并推入 V2

V2-1: SELECT C1, C2, AVG(C3) AS M2 
      FROM T2 WHERE C1 > 10 AND C2 = 0
      GROUP BY C1, C2 HAVING M2 > 0

It can be seen that the main function of the predicate pull-up is to "pull up" the predicate in an inline view to the main query, which can participate in the predicate derivation of the main query and help generate more new predicates. These newly generated predicates get new down-pressing opportunities. Combining the three optimization techniques, Q1 can be finally optimized as:

 Q1 最终优化形式：
SELECT V1.C1, V1.C2, V1.M1, V2.M2 
FROM
 (SELECT C1, 0 AS C2, MAX(C3) AS M1 
  FROM T1
  WHERE C1 > 10 AND C3 > 0 
  GROUP BY C1) V1, 
 (SELECT C1, C2, AVG(C3) AS M2
  FROM T2
  WHERE C1 > 10 AND C2 = 0
  GROUP BY C1, C2 HAVING M2 > 0) V2
WHERE V1.C1 = V2.C1 AND V1.M1 > V2.M2;

4. Summary

This paper mainly introduces the technique of predicate movement. It contains two aspects, predicate pushdown and predicate pullup. The former is responsible for executing filter predicates as early as possible to reduce the amount of data that needs to be processed in subsequent operations; the latter mainly constructs more predicates to participate in the derivation process and strengthen the derivation effect. Predicate derivation and predicate movement cooperate and reinforce each other, and they are two very important predicate optimization techniques. This article introduces some pushdown and pullup scenarios to help readers understand the concepts and roles of these two strategies. There is a lot more that can be said about optimization of predicates. In the future, we will continue to introduce some optimization capabilities accumulated by OceanBase in terms of predicate derivation or movement.

SQL Rewriting Series Seven: Predicate Movement

I. Introduction

2. Predicate push-down

1. The predicate is pushed into the view

2. Predicate push into grouping

3. Predicate pushdown and predicate derivation

Three, predicate pull-up

1. Predicate pull-up

2. Constant pull-up

4. Summary

OceanBase技术站

引用和评论

OceanBase v4.3.1 列存 FAQ

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

DNS服务器地址大全

实战分享：DolphinScheduler 中 Shell 任务环境变量最佳配置方式