introduction
As we all know, inner joins and outer joins are the two most common table joins in databases; outer joins include left outer join, right outer join and full outer join. Figure 1 shows a schematic diagram of different table connections in the database (the dark part represents the result after the final connection). It is not difficult to find from the figure that the connection order of the inner connection can be exchanged at will, but the outer connection cannot be Yes, arbitrary swapping of join order allows the database's query optimizer to generate more alternative plans.
Figure 1 Connection diagram
Another advantage of inner joins compared to outer joins is that inner joins can press the base table predicate conditions at will, but outer joins are limited. For example, in the case of movie theaters, the theater uses Q1 to query all unscheduled and pending movies. The price of movies is scheduled and priced. In Q1, the predicate condition PLAY.price is null cannot be pushed down to the PLAY table in advance for data filtering; on the contrary, if the theater uses Q2 to query all movies with pending prices for pricing, it can be set in advance. PLAY.price is null The predicate condition is pressed to the PLAY table for data filtering, which can reduce the amount of connected data and improve query efficiency. You can first understand the difference between the above scenarios, and the principle behind it will be introduced below.
-- 影片表
MOVIE(movie_id, movie_name, release_date)
-- 排片表
PLAY(play_id, movie_id, time, price, seats)
-- 查询所有未排片和排片待定价格的电影
Q1:
SELECT MOVIE.movie_name,
PLAY.time,
PLAY.price
FROM MOVIE
LEFT JOIN PLAY
ON MOVIE.movie_id = PLAY.movie_id
where PLAY.price is null;
-- 查询所有排片待定价格的电影
Q2:
SELECT MOVIE.movie_name,
PLAY.time,
PLAY.price
FROM MOVIE
INNER JOIN PLAY
ON MOVIE.movie_id = PLAY.movie_id
where PLAY.price is null;
To sum up, compared with outer joins, inner joins have greater optimization space in the database query optimizer (more join order selection, wider base table predicate conditions), so in OceanBase, according to different In the query scenario, a series of strategies for outer join to inner join are designed and implemented. The following will mainly introduce the strategies of these outer joins to inner joins.
Common scenario of outer join to inner join
In the outer join of the database, if the data that meets the connection conditions cannot be found in the probe table (the right table of the outer join), it is necessary to fill in the null output for the projection column of the probe table in the corresponding join result, as shown in the left side of Figure 2. The outer join diagram shows a simple outer join situation. Considering the behavior of outer join, it is not difficult to see that if there is a base table predicate condition for the probe table that can filter out the null situation in the join result after the join, then the result of the outer join and the result of the inner join are In the same way, the outer join can be rewritten as an inner join.
Figure 2 Schematic diagram of left outer join For the probe table in the above query, there is a base table predicate condition that can filter null. The first thing that comes to mind is whether there is a column is not null condition, so that it can quickly judge whether the current outer join can be rewritten as an inner join. Figure 3 depicts different scenarios of outer join to inner join based on the column is not null base table predicate condition:
Figure 3 Schematic diagram of outer join to inner join The above scenario describes a basic strategy of outer join to inner join, that is, one can find the ones that can filter null in the relevant base table predicate conditions of the probe table, of which column is not null is the simplest Intuitively filter the base table predicate condition for null, in fact, there are more scenarios that can convert outer join to inner join, such as the following two general scenarios.
Scenario 1: Outer join to inner join with WHERE condition
Before introducing the outer join to inner join based on the WHERE condition, we first introduce an important concept - the null value rejection condition, as the name suggests, is the predicate condition that rejects all nulls, which is described above in the relevant base table predicate condition Find a condition that filters nulls. Therefore, the key to converting an outer join to an inner join based on the WHERE condition is whether it can find the null rejection condition of the probe table.
As can be seen from the above, column is not null is actually the most simple and intuitive null value rejection condition; in addition, there are many null value rejection conditions, for example, Q3 is used to query the price of arranging slices higher than 30 yuan Movies, in which PLAY.price > 30 is essentially a null rejection condition, because when the value of PLAY.price is null, the comparison result of null > 30 is empty in terms of SQL semantics, that is, the query result will not There are rows where PLAY.price is null, so as to filter out all the rows where the projection column of the probe table is filled with null because the join condition is not satisfied. Therefore, Q3 is equivalent to Q4, that is to say, if the query Q3 input by the user is rewritten as Q4, the subsequent optimizer's plan generation selection will have more optimization space.
Q3:
SELECT MOVIE.movie_name,
PLAY.time,
PLAY.price
FROM MOVIE
LEFT JOIN PLAY
ON MOVIE.movie_id = PLAY.movie_id
where play.price > 30;
Q4:
SELECT MOVIE.movie_name,
PLAY.time,
PLAY.price
FROM MOVIE
INNER JOIN play
ON MOVIE.movie_id = PLAY.movie_id
where PLAY.price > 30;
Of course, not all base table filter conditions are null-rejected, such as Q1 and Q2, the two are not equivalent in semantics, because column is null is not a null-value rejection condition, so it is based on the WHERE condition. When rewriting the outer connection to the inner connection, you need to pay special attention to the coverage scenario of the null value rejection condition to avoid falling into the rewriting trap. Common null rejection condition scenarios are as follows:
- Some conditions directly related to null comparison, such as column is not null;
- Ordinary comparison operation expressions, such as "=" "!=" ">" "<" "in" "not in", etc.;
Match related expressions, such as "regexp" "like", etc.
Scenario 2: Outer join to inner join of primary and foreign keys
Before introducing the use of primary and foreign key constraints to convert outer joins to inner joins, let's briefly introduce primary and foreign key constraints. If you know the database, then you should know that the primary key is a set of non-null unique values to distinguish each row in the table, and the foreign key is used to establish the connection between the two tables, from the table (the table where the foreign key is located) Some columns depend on some columns in the main table, and foreign key columns can only refer to the value or null value of the column in the main table, so as to achieve the purpose of constraining the consistency and integrity of the data in the two tables.
So how do outer joins use primary foreign key constraints to convert to inner joins? Let's look at an example. Two tables, MOVIE and PLAY, are created as follows. The PLAY.movie_id (non-null column) foreign key depends on MOVIE.movie_id (the primary key column). The difference between Q5 and Q6 is: the position of the slave table in the outer join is different, the slave table is the right table (probe table) of the outer join in Q5, and the left table (build table) of the outer join in Q6; Think about it, can both Q5 and Q6 be able to rewrite outer joins to inner joins?-- 影片表 MOVIE(movie_id, movie_name, release_date, PRIMARY KEY(movie_id)) -- 排片表 PLAY(play_id, movie_id not null, time, price, seats, foreign key(movie_id) REFERENCES MOVIE(movie_id)) Q5: SELECT MOVIE.movie_name, PLAY.time, PLAY.price FROM MOVIE LEFT JOIN PLAY ON MOVIE.movie_id = PLAY.movie_id; Q6: SELECT MOVIE.movie_name, PLAY.time, PLAY.price FROM PLAY LEFT JOIN MOVIE ON MOVIE.movie_id = PLAY.movie_id;
OK, first announce the answer: Q5 is not allowed, Q6 is allowed!
First analyze why Q6 is possible, because the equi-join conditions of the outer join come from the primary foreign key column, and the secondary table PLAY.movie_id is a non-null column, indicating that the values of the secondary table PLAY.movie_id come from the primary table MOVIE.movie_id , so when the slave table PLAY is used as the left table of the outer join and the equi-value join conditions all come from the primary foreign key column, it means that the corresponding column can always be found in the master table for the slave table PLAY.movie_id, and there will be no scenario of supplementing null , this is the reason why Q6 allows rewriting; on the contrary, if the main table MOVIE is used as the left table of the outer join, there is no guarantee that the data corresponding to the main table row can be found in the slave table. Overwriting is not allowed.a rewrite trap
Now we understand the principle of rewriting from outer connection to inner connection based on primary and foreign keys. In fact, the restrictions on rewriting based on primary and foreign keys are very strict. If you are not careful, you will fall into the trap of rewriting. For example, the following Q7 is compared to Q6 has an additional non-equivalent join condition for primary and foreign key columns, but Q7 does not allow rewriting. I believe you have seen the reason, mainly because the newly added MOVIE.movie_id > 100 predicate condition will destroy the primary and foreign key constraints in the join process. When the equi-join condition is met but the range predicate condition is not met, the Complement the null output, so the rewriting of the outer connection to the inner connection cannot be performed. This reminds us to pay special attention to whether the constraint relationship between the primary and foreign keys is broken when rewriting the outer join based on the primary foreign key to the inner join.
Q7: SELECT MOVIE.movie_name, PLAY.time, PLAY.price FROM PLAY LEFT JOIN MOVIE ON MOVIE.movie_id = PLAY.movie_id AND MOVIE.movie_id > 100; --不可发生改写
summary
In addition, because the conditions for converting an outer join to an inner join based on primary and foreign keys are harsh, here is a brief summary of some basic conditions for using primary and foreign key constraints to convert an outer join to an inner join:
- The join condition is only the equi-join of the primary and foreign key columns;
- The columns used in the equi-join condition are in one-to-one correspondence with the primary and foreign key columns;
- The foreign key column in the join condition has a non-null property;
- The slave table (the table where the foreign key is located) is the left table of the outer join.
In addition to the above basic conditions, there are actually some restrictions for special scenarios, such as limiting the partition of the main table (using partition hint), which cannot be rewritten due to the possibility of supplementing null output. Of course, the above basic conditions can also be generalized. For example, for the condition that "the foreign key column in the join condition has a non-null property", if there is no non-null column defined in the table, can you also find the corresponding foreign key in the WHERE condition? The null value of the key column rejects the condition, so that the rewrite of the outer join to the inner join can be performed. This article is just a guide, you can think deeply and enrich more rewriting scenarios and restrictions.
Summarize
This article mainly introduces some strategies for converting outer joins to inner joins. Outer joins to inner joins are indispensable for the database query optimizer. Inner joins allow more join order selections and base table predicate conditions, which help the query optimizer to select a better query execution plan and improve query efficiency. Therefore, outer join to inner join is a very important optimization technique.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。