1
头图

Development often encounters the need for paging queries, but when too many pages are turned, deep paging will occur, resulting in a sharp drop in query efficiency.

Is there any way to solve the problem of deep paging?

This article summarizes three optimization schemes, and the query efficiency is directly improved by 10 times. Let's learn together.

1. Prepare the data

First create a user table and only add an index on the create_time field:

 CREATE TABLE `user` (
  `id` int NOT NULL AUTO_INCREMENT COMMENT '主键',
  `name` varchar(255) DEFAULT NULL COMMENT '姓名',
  `create_time` timestamp NULL DEFAULT NULL COMMENT '创建时间',
  PRIMARY KEY (`id`),
  KEY `idx_create_time` (`create_time`)
) ENGINE=InnoDB COMMENT='用户表';

Then insert 1 million pieces of test data into the user table, here you can use the stored procedure:

 drop PROCEDURE IF EXISTS insertData;
DELIMITER $$
create procedure insertData()
begin
 declare i int default 1;
   while i <= 100000 do
         INSERT into user (name,create_time) VALUES (CONCAT("name",i), now());
         set i = i + 1; 
   end while; 
end $$

call insertData() $$

2. Validate deep paging issues

10 items per page, when we query the first page, the speed is very fast:

 select * from user 
where create_time>'2022-07-03' 
limit 0,10;

image-20220703181532231.png

It returned directly in less than 0.01 seconds, so the execution time is not shown.

When we turn to page 10,000, the query efficiency drops sharply:

 select * from user 
where create_time>'2022-07-03' 
limit 100000,10;

image-20220703181904656.png

The execution time has become 0.16 seconds, and the performance has dropped by at least dozens of times.

Where is the time spent mainly?

  1. The first 10 pieces of data need to be scanned, the amount of data is large and time-consuming
  2. create_time is a non-clustered index, you need to query the primary key ID first, then return the table query, and query all fields through the primary key ID

Draw the return form query process:

1. First query the primary key ID through create_time

image-20220703204919992.png

2. Then query all fields in the table through the primary key ID

image-20220703205108719.png

Don't ask why the structure of the B+ tree is like this? Asking is the rule.

Take a look at the first two articles.

Take a look at the first two articles.

Why use B+ tree for the underlying implementation of MySQL index?

An article explains MySQL's clustering/joining/covering index, return table, and index pushdown

Then we optimize for these two time-consuming reasons.

3. Optimize the query

3.1 Using subqueries

First use the subquery to find out the primary key that meets the conditions, and then use the primary key ID as the condition to find all the fields.

 select * from user 
where id in (
  select id from user 
  where create_time>'2022-07-03' 
  limit 100000,10
);

However, this query will report an error, saying that limit is not supported in subqueries.

image-20220703205602830.png

We add a layer of sub-query nesting, and that's it:

 select * from user 
where id in (
 select id from (
    select id from user 
    where create_time>'2022-07-03' 
    limit 100000,10
 ) as t
);

image-20220703205912970.png

The execution time is shortened to 0.05 seconds, a reduction of 0.12 seconds, which is equivalent to a three-fold improvement in query performance.

Why use subquery to find out the qualified primary key ID first to shorten the query time?

We use explain to see the execution plan to understand:

 explain select * from user 
where id in (
 select id from (
    select id from user 
    where create_time>'2022-07-03' 
    limit 100000,10
 ) as t
);

image-20220703215830336.png

You can see that the Extra column shows that the Using index is used in the subquery, which means that the covering index is used, so the subquery does not need to query the table, which speeds up the query efficiency.

3.2 Using inner join to associate query

Treat the result of the subquery as a temporary table, and then perform an associated query with the original table.

 select * from user 
inner join (
   select id from user 
    where create_time>'2022-07-03' 
    limit 100000,10
) as t on user.id=t.id;

image-20220703220449618.png

Query performance is the same as using subqueries.

3.3 Use paging cursors (recommended)

The implementation method is: when we query the second page, the query results of the first page are put into the query conditions of the second page.

For example: first query the first page

 select * from user 
where create_time>'2022-07-03' 
limit 10;

Then query the second page, and put the query results of the first page into the query conditions of the second page:

 select * from user 
where create_time>'2022-07-03' and id>10 
limit 10;

This is equivalent to querying the first page every time, and there is no problem of deep paging. It is recommended to use it.

image-20220703222259556.png

The execution time is 0 seconds, and the query performance is directly improved by dozens of times.

Although this query method is easy to use, it also brings a problem, that is, it cannot jump to the specified number of pages, and can only page down page by page.

Therefore, this kind of query is only suitable for specific scenarios, such as the homepage of information APP.

Internet APP generally adopts the form of waterfall flow, such as Baidu homepage and Toutiao homepage, which all slide down to turn pages, and do not jump to the requirement of setting the number of pages.

If you don't believe it, you can take a look. This is the waterfall of headlines:

image-20220703221836032.png

The query results on the previous page are included in the parameters.

image-20220703222026194.png

In the response data, the query conditions on the next page are returned.

Therefore, the application scenarios of this query method are quite wide, so hurry up and use it.

Summary of knowledge points:

image-20220703223109687.png

The article is continuously updated, and you can search for "One Light Architecture" on WeChat to read more technical dry goods as soon as possible.

一灯架构
44 声望11 粉丝