Foreword:
DB2 paging query in the project is , the team leader assigned the optimization of this paging to me; then after a meal optimization (chaotic operation), the DB2 paging query time was reduced to a relatively satisfactory situation, [ Kaisen];
Then immediately reported the result to the team leader. After checking my demo, the team leader found that the paging query is indeed much faster, and it can reach the "acceptable level". Compared with the page before optimization, it is quite good to wait for
Note: The optimized demonstration environment is basically the same as the environment when the paging query is found to be slow, including the amount of data in the library, the configuration of DB2, and the configuration of the server.
First, by looking at the execution plan, it is found that the indexes in the SQL statement are all used, so for the time being, it is not index . Finally, it is found that there is a problem with the SQL statement . The SQL is optimized and the query is fast;
The following is a brief description of how DB2's paging SQL is optimized, and binggou starts;
Main line of this article:
- Prerequisite: the logical execution order of select query SQL
- Optimization process: DB2 paging SQL optimization process
- Knowledge expansion: share some special cases that cause slow paging queries
Prerequisites:
When optimizing query SQL statements, you need to know the logical execution order, which is very helpful for SQL optimization;
The SQL logical execution order refers to the SQL statement according to certain rules, how a whole statement should be executed, and when each keyword and clause part is executed;
1. The logical execution sequence of a simple select query SQL is as follows:
- First execute from table join table to obtain the table to be operated on and the associated table, calculate the Cartesian product of them, and get a virtual table v1;
- Then execute on condition , perform join query and filter on virtual table v1, and get a virtual table v2;
- Then execute where condition to filter the data in virtual table v2 to obtain virtual table v3;
- Then execute the group by statement to group the data in the virtual table v3 to obtain the virtual table v4;
- Then execute Having Conditional Filter , and conditionally filter the grouped virtual table v4 to get the virtual table v5;
- select projection column on the data of the virtual table v5, and only keep the displayed fields in the select to get the virtual table v6;
- Then execute the order by sorting statement to sort the data in the virtual table v6 to obtain the virtual table v7;
- Finally limit (limit the number) , and the virtual table v8 is obtained.
2. Simple analysis of slow query paging SQL:
The above briefly describes the logical execution order of query SQL, and the following will analyze the logical execution order of slow query paging statements;
SELECT * FROM
(
SELECT B.*, ROWNUMBER() OVER() AS RN FROM
(
select
ts.name,
ts.age,
tc.class_name,
ts.describe,
ts.birthday
from t_student ts
LEFT JOIN t_class tc on tc.class_id = ts.class_id
where
ts.age = 23
AND ts.birthday >= TO_DATE('2020-06-12','yyyy-MM-dd')
AND ts.birthday <= TO_DATE('2020-07-15' , 'yyyy-MM-dd')
order by ts.birthday desc
) AS B
)AS A WHERE A.RN BETWEEN 10 AND 20
There is a secondary index in the t_student table: index(age, birthday)
Primary key of t_class table: class_id
2.1, SQL logic execution sequence analysis:
- First, the Cartesian product will be calculated for the t_student ts and t_class tc tables. If the amount of data in the ts table is 10 and the amount of data in the tc table is 10, then the Cartesian product is 10*10 = 100, and the virtual table v1 is obtained.
- Then filter on condition to get virtual table v2
- Then filter the where condition to get the virtual table v3
- Then execute the select projection column, only keep the displayed fields in the select, and get the virtual table v4
- Then sort according to the birthday field to get the virtual table v5
- Finally, the data in the virtual table v5 is paged according to the row number, and the data required by the current page is obtained, and the final result set is obtained
2.2. According to the logical execution sequence obtained by analysis:
- In the first step, the Cartesian product is calculated. If the amount of data in the two tables is very large, the Cartesian product will be very large. The subsequent on condition filtering and where condition filtering will be much slower. Here you can find ways to optimize it.
- In the select projection column, you must return the fields on demand, especially the large fields that are not used on the front-end page, and must not be added to the projection column, like this SQL describe personal description field page does not need to display
- The last paging statement is to filter out the data needed for the current page according to the row number after filtering out all the data. I thought, yes or not, do not filter out all the data, but filter to the data needed by the current page.
Optimization process:
Through the analysis of the logical execution sequence of the above SQL, it is known that there are three places where you can try to optimize and see if the query becomes faster;
1. Reduce Cartesian product:
Because in the subsequent where condition filtering, the main table t_student table is filtered, so you can use the where condition to filter the t_student table in advance to get the virtual table,
Then use this virtual table and the connected table t_class to calculate the Cartesian product. At this time, the Cartesian product is much smaller;
The SQL statement is as follows:
SELECT * FROM
(
SELECT B.*, ROWNUMBER() OVER() AS RN FROM
(
select
ts.name,
ts.age,
tc.class_name,
ts.describe,
ts.birthday
from (select class_id, name, age, describe, birthday
from t_student
where
age = 23
AND birthday >= TO_DATE('2020-06-12','yyyy-MM-dd')
AND birthday <= TO_DATE('2020-07-15' , 'yyyy-MM-dd')
) ts
LEFT JOIN t_class tc on tc.class_id = ts.class_id
order by ts.birthday desc
) AS B
)AS A WHERE A.RN BETWEEN 10 AND 20
2. Unnecessary large fields are prohibited from returning:
In the select projection column, do describe not needed into it, because if it is placed in the projection column, it will occupy more buffer pool space when querying, if the buffer pool space is full , Disk IO is about to be performed, disk IO is very time-consuming;
Also, if there are large fields in the response, network IO will slow down the transmission speed and slow down when the page loads data;
The SQL statement is as follows:
SELECT * FROM
(
SELECT B.*, ROWNUMBER() OVER() AS RN FROM
(
select
ts.name,
ts.age,
tc.class_name,
ts.birthday
from (select class_id, name, age, birthday
from t_student
where
age = 23
AND birthday >= TO_DATE('2020-06-12','yyyy-MM-dd')
AND birthday <= TO_DATE('2020-07-15' , 'yyyy-MM-dd')
) ts
LEFT JOIN t_class tc on tc.class_id = ts.class_id
order by ts.birthday desc
) AS B
)AS A WHERE A.RN BETWEEN 10 AND 20
3. Restrict data filtering when paging:
Paging query is based on pageNo (page number), pageSize (number of pages) to filter the required page data;
For example, if the current page is the second page, each page displays 10 pieces of data, and the current page data filter is rownum between ((pageNo-1) pageSize) and (pageNo pageSize) , then the data before paging When filtering, you can use fetch first (page * rows) rows only limit the amount of data to be filtered, don't filter out all, but only filter to the largest row number in the current page at most;
Due to the use of fetch first (page * rows) rows only, a lot of filtering operations are reduced, and the speed will be much faster, especially when the first few pages are clicked, the speed is very fast, and the more you turn the page back, the response will be corresponding Slower, but when paging, basically the first few pages are viewed, and the latter are rarely viewed, so the effect looks very good at this time
The SQL statement is as follows:
select
ts.name,
ts.age,
tc.class_name,
ts.birthday
from (
select aa.class_id, aa.name, aa.age, aa.birthday
from(
select class_id, name, age, birthday, ROWNUMBER() OVER () AS RN
from t_student
where
age = 23
AND birthday >= TO_DATE('2020-06-12','yyyy-MM-dd')
AND birthday <= TO_DATE('2020-07-15' , 'yyyy-MM-dd')
order by birthday desc
fetch first 20 rows only
) aa where aa.RN BETWEEN 10 AND 20
) ts
LEFT JOIN t_class tc on tc.class_id = ts.class_id
So far, DB2 paging SQL has basically been optimized, and the query speed has increased by more than half. However, there is still room for optimization of this paging SQL. The query speed can be further improved by modifying the SQL statement and index (age, birthday);
Below is a brief description of how to optimize again in the knowledge expansion part;
Knowledge expansion:
1. Further optimization of paging SQL:
First look at the following SQL statement:
select class_id, name, age, birthday, ROWNUMBER() OVER () AS RN
from t_student
where
age = 23
AND birthday >= TO_DATE('2020-06-12','yyyy-MM-dd')
AND birthday <= TO_DATE('2020-07-15' , 'yyyy-MM-dd')
order by birthday desc
fetch first 20 rows only
This SQL statement will use the secondary index index (age, birthday) when executing this SQL statement. Since all the field values cannot be queried on this index, you need to go back to the table to query the class_id, name field values. The operation of returning to the table will cause the query to slow down, so you must find ways to avoid returning to the table at this time;
1.1. Avoid returning to the form method one:
Only modify the index, change the secondary index index (age, birthday) to index (age, birthday, name, class_id) , and the SQL statement does not need to be changed;
However, when the number of index fields increases, the operation of adding, updating, and deleting SQL will have a greater impact, which will cause its execution time to become longer, because the index needs to be maintained; so if there are more write operations on these two tables, It is not recommended to modify the index directly;
1.2. Method two to avoid returning to the form:
This method actually needs to return to the table, but the number of times to return to the table at this time will be much less than the number of times to return to the table before, and it is almost negligible;
This method does not need to modify the index, but modify the SQL statement. First, perform the secondary index index (age, birthday) query. It is no longer necessary to return class_id, name, age, birthday fields, but only returns The primary key stu_id t_student table, there is no need to return to the table at this time;
Then the birthday default sorting in the secondary index index (age, birthday) will be used. Finally, according to fetch first 20 rows only only the first 20 data will be returned. At this time, 20 primary key stu_id values will be returned, which can then be based on the primary key. stu_id left outer connection t_student to obtain other field values;
At this time, it is equivalent to only need to return to the table 20 times, and there is no need to return to the table;
The SQL statement is as follows:
select stu2.class_id, stu2.name, stu2.age, stu2.birthday, ROWNUMBER() OVER () AS RN
from (
select stu_id
from t_student
where
age = 23
AND birthday >= TO_DATE('2020-06-12','yyyy-MM-dd')
AND birthday <= TO_DATE('2020-07-15' , 'yyyy-MM-dd')
order by birthday desc
fetch first 20 rows only ) stu1
left join t_student stu2 on stu1.stu_id = stu2.stu_id
The final optimized paging SQL is as follows:
select
ts.name,
ts.age,
tc.class_name,
ts.birthday
from (
select aa.class_id, aa.name, aa.age, aa.birthday
from(
select stu2.class_id, stu2.name, stu2.age, stu2.birthday, ROWNUMBER() OVER () AS RN
from (
select stu_id
from t_student
where
age = 23
AND birthday >= TO_DATE('2020-06-12','yyyy-MM-dd')
AND birthday <= TO_DATE('2020-07-15' , 'yyyy-MM-dd')
order by birthday desc
fetch first 20 rows only ) stu1
left join t_student stu2 on stu1.stu_id = stu2.stu_id
) aa where aa.RN BETWEEN 10 AND 20
) ts
LEFT JOIN t_class tc on tc.class_id = ts.class_id
2. Special circumstances in paging query:
When performing a paging query, the total amount of data that meets the conditions is generally queried first. This data amount is used for paging to obtain the number of pages;
Then we may encounter some very special situations. The SQL for querying the total amount of data is as follows:
select
count(*)
from t_student ts
LEFT JOIN t_class tc on tc.class_id = ts.class_id
where
ts.age = 23
AND ts.birthday >= TO_DATE('2020-06-12','yyyy-MM-dd')
AND ts.birthday <= TO_DATE('2020-07-15' , 'yyyy-MM-dd')
The above SQL statement can actually be optimized. It does not need to be associated with the t_class table. The specific reasons are as follows:
- One is because the where conditions are all the filtering of the t_student of the main table
- The second is that the data relationship between the main table t_student and the external table t_class is one-to-one
You can directly query the data that meets the conditions in the main table, so the redundant table connections can be removed, which will greatly improve the SQL query speed;
Under the gossip, analyze why someone writes this way?
The guess is mainly for the convenience of illustration, and the directly copied paging SQL statement is modified [dumbly];
❤ Like + comment + forward
If this article is helpful to you, please wave your little hand that loves to make a fortune and like it. Your support is the motivation for my continuous creation, thank you!
You can search on [ public account, a large number of Java learning dry goods articles, you can come and have a look!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。