Development often encounters the need for paging queries, but when too many pages are turned, deep paging will occur, resulting in a sharp drop in query efficiency.
Is there any way to solve the problem of deep paging?
This article summarizes three optimization schemes, and the query efficiency is directly improved by 10 times. Let's learn together.
1. Prepare the data
First create a user table and only add an index on the create_time field:
CREATE TABLE `user` (
`id` int NOT NULL AUTO_INCREMENT COMMENT '主键',
`name` varchar(255) DEFAULT NULL COMMENT '姓名',
`create_time` timestamp NULL DEFAULT NULL COMMENT '创建时间',
PRIMARY KEY (`id`),
KEY `idx_create_time` (`create_time`)
) ENGINE=InnoDB COMMENT='用户表';
Then insert 1 million pieces of test data into the user table, here you can use the stored procedure:
drop PROCEDURE IF EXISTS insertData;
DELIMITER $$
create procedure insertData()
begin
declare i int default 1;
while i <= 100000 do
INSERT into user (name,create_time) VALUES (CONCAT("name",i), now());
set i = i + 1;
end while;
end $$
call insertData() $$
2. Validate deep paging issues
10 items per page, when we query the first page, the speed is very fast:
select * from user
where create_time>'2022-07-03'
limit 0,10;
It returned directly in less than 0.01 seconds, so the execution time is not shown.
When we turn to page 10,000, the query efficiency drops sharply:
select * from user
where create_time>'2022-07-03'
limit 100000,10;
The execution time has become 0.16 seconds, and the performance has dropped by at least dozens of times.
Where is the time spent mainly?
- The first 10 pieces of data need to be scanned, the amount of data is large and time-consuming
- create_time is a non-clustered index, you need to query the primary key ID first, then return the table query, and query all fields through the primary key ID
Draw the return form query process:
1. First query the primary key ID through create_time
2. Then query all fields in the table through the primary key ID
Don't ask why the structure of the B+ tree is like this? Asking is the rule.
Take a look at the first two articles.
Take a look at the first two articles.
Why use B+ tree for the underlying implementation of MySQL index?
An article explains MySQL's clustering/joining/covering index, return table, and index pushdown
Then we optimize for these two time-consuming reasons.
3. Optimize the query
3.1 Using subqueries
First use the subquery to find out the primary key that meets the conditions, and then use the primary key ID as the condition to find all the fields.
select * from user
where id in (
select id from user
where create_time>'2022-07-03'
limit 100000,10
);
However, this query will report an error, saying that limit is not supported in subqueries.
We add a layer of sub-query nesting, and that's it:
select * from user
where id in (
select id from (
select id from user
where create_time>'2022-07-03'
limit 100000,10
) as t
);
The execution time is shortened to 0.05 seconds, a reduction of 0.12 seconds, which is equivalent to a three-fold improvement in query performance.
Why use subquery to find out the qualified primary key ID first to shorten the query time?
We use explain to see the execution plan to understand:
explain select * from user
where id in (
select id from (
select id from user
where create_time>'2022-07-03'
limit 100000,10
) as t
);
You can see that the Extra column shows that the Using index is used in the subquery, which means that the covering index is used, so the subquery does not need to query the table, which speeds up the query efficiency.
3.2 Using inner join to associate query
Treat the result of the subquery as a temporary table, and then perform an associated query with the original table.
select * from user
inner join (
select id from user
where create_time>'2022-07-03'
limit 100000,10
) as t on user.id=t.id;
Query performance is the same as using subqueries.
3.3 Use paging cursors (recommended)
The implementation method is: when we query the second page, the query results of the first page are put into the query conditions of the second page.
For example: first query the first page
select * from user
where create_time>'2022-07-03'
limit 10;
Then query the second page, and put the query results of the first page into the query conditions of the second page:
select * from user
where create_time>'2022-07-03' and id>10
limit 10;
This is equivalent to querying the first page every time, and there is no problem of deep paging. It is recommended to use it.
The execution time is 0 seconds, and the query performance is directly improved by dozens of times.
Although this query method is easy to use, it also brings a problem, that is, it cannot jump to the specified number of pages, and can only page down page by page.
Therefore, this kind of query is only suitable for specific scenarios, such as the homepage of information APP.
Internet APP generally adopts the form of waterfall flow, such as Baidu homepage and Toutiao homepage, which all slide down to turn pages, and do not jump to the requirement of setting the number of pages.
If you don't believe it, you can take a look. This is the waterfall of headlines:
The query results on the previous page are included in the parameters.
In the response data, the query conditions on the next page are returned.
Therefore, the application scenarios of this query method are quite wide, so hurry up and use it.
Summary of knowledge points:
The article is continuously updated, and you can search for "One Light Architecture" on WeChat to read more technical dry goods as soon as possible.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。