Three high Mysql - Mysql index and query optimization (partial combat part) - 个人文章

Three high Mysql - Mysql index and query optimization (partial combat part)

In the actual combat section, we will select some of the more common situations. In advance, it is emphasized that mysql 8.0.26 is used personally, so don’t be surprised if different versions have different test results. The new version will optimize some queries that were not optimized in the past.

The actual combat part follows the previous article: Three High Mysql - Mysql Index and Query Optimization Explanation (Partial Theoretical Part) - Nuggets (juejin.cn)

pre-preparation

It is necessary to repeat here, all databases and tables are from the official sakila-db , which is very good for learning and familiarizing with mysql database operations.

sakila-db

What is sakila-db? A very popular concept abroad, it refers to the foreign film rental market where foreigners use the rental method to watch movies. It was very popular with foreigners in the past. It is introduced here because the follow-up content uses this case, so we You need to prepare the relevant environment in advance and download it from the following address:

Download address: https://dev.mysql.com/doc/index-other.html

work-bench

work-bench is an officially developed visualization tool for database relationship diagrams. The specific relationship diagram of the official case is displayed as follows. Through these diagrams, you can see the general relationship between Sakila-db:

work-bench is free software, the download address is as follows:
https://dev.mysql.com/downloads/workbench/

The process of installing workbench and downloading sakila-db is not recorded here. When running, you need to create a database to run Sheme file, and then execute the data file, and finally use navicat to view the relationship between data and table structure:

body part

What should I do if the where query is too slow?

When the where query is too slow, our first step is to analyze the composition of the data type and whether the settings of the data table are reasonable. Second, we can use explain to analyze the query statement, the use method is very simple. Add the explain statement in front of the optimized query statement. For all queries, the search method of the covering index is the optimal solution, because the covering index does not need to return the table to query the data.

Covering index: Covering index is a query method. It is not an index. It refers to the same index that is used when the query returns results. At this time, it can be found that he does not need to return to the table at all, and can directly check the auxiliary index tree. The data is obtained, so the query efficiency of the covering index is relatively high.

How to use the sql statement to view the table creation statement of a table:
Answer: Just use show create table 表名称 .

Then under what circumstances will a covering index be used:

The query field is the field of the auxiliary index or the field of the clustered index.
It conforms to the leftmost matching principle . If it is not the leftmost matching, the index cannot be taken.

We use the above-mentioned sakila-db for experiments, here we can use the inventory table as an experiment, but this table needs some adjustments, please see the specific sql below:

 CREATE TABLE `inventory_test` (
  `inventory_id` mediumint unsigned NOT NULL AUTO_INCREMENT,
  `film_id` smallint unsigned NOT NULL,
  `store_id` tinyint unsigned NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`inventory_id`),
  -- KEY `idx_fk_film_id` (`film_id`),
  KEY `idx_store_id_film_id` (`store_id`,`film_id`)
  -- CONSTRAINT `fk_inventory_film` FOREIGN KEY (`film_id`) REFERENCES `film` (`film_id`) ON DELETE RESTRICT ON UPDATE CASCADE,
  -- CONSTRAINT `fk_inventory_store` FOREIGN KEY (`store_id`) REFERENCES `store` (`store_id`) ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=4582 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

We keep only one auxiliary index in the original sql table building statement. For example, the idx_fk_film_id index is deleted in the above statement. The experimental effect after deleting this index is as follows:

 explain select * from inventory_test where film_id = 1;
-- 案例1. 不符合最左原则不走索引 
-- 1  SIMPLE  inventory_test    ALL          1  100.00  Using where

explain select * from inventory_test where store_id = 1;
-- 案例2: 使用了辅助索引（联合索引）：
-- 1  SIMPLE  inventory_test    ref  idx_store_id_film_id  idx_store_id_film_id  1  const  1  100.00  

explain select inventory_id,film_id,store_id from inventory_test where store_id = 1;
-- 案例3:  正常使用索引 
-- 1 SIMPLE  inventory_test    ref  idx_store_id_film_id  idx_store_id_film_id  1  const  1  100.00  Using index

explain select film_id,store_id from inventory_test where store_id = 1;
-- 案例4:  覆盖索引 
-- 1  SIMPLE  inventory_test    ref  idx_store_id_film_id  idx_store_id_film_id  1  const  1  100.00  Using index

explain select film_id,store_id from inventory_test where film_id = 1;
-- 案例5: 正常使用索引，但是type存在区别 
-- 1  SIMPLE  inventory_test    index  idx_store_id_film_id  idx_store_id_film_id  3    1  100.00  Using where; Using index

explain select inventory_id,film_id,store_id from inventory_test where film_id = 1;
-- 案例6: 使用索引返回结果，但是type存在区别 
-- 1  SIMPLE  inventory_test    index  idx_store_id_film_id  idx_store_id_film_id  3    1  100.00  Using where; Using index

explain select inventory_id,film_id,store_id from inventory_test where store_id = 1;
-- 案例7: 覆盖索引 
-- 1  SIMPLE  inventory_test    ref  idx_store_id_film_id  idx_store_id_film_id  1  const  1  100.00  Using index

Cases 1 and 2 are typical examples of the wrong use of the index’s leftmost matching principle . They are also one of the pitfalls that many novice indexers may use incorrectly. The leftmost matching principle refers to the where condition needs to start from the most indexed index. Start the search in the left column, you can see that the asterisks here are in the same order as the fields when the table was created, that is, inventory_id , film_id,store_id , last_update , so although it is select * , it is normal to go to the index.

(Don't do this when you are actually working, this is just to demonstrate laziness)

What if I use the out-of-order column query without asterisks? In fact, if you change the data of the query column at this time, it will not affect it. You can still use the index if you change the order of the query column at will.

Next are several queries of case 3 - case 7. These query intentions explain the details of the use of covering indexes. In the above test case statement, you can see that case 4 is due to the query results and where conditions. Indexes are used, so in the end mysql uses a complete covering index, and at the same time conforms to the leftmost matching principle of the joint index, so the efficiency of the query reaches the ref level (this level is temporarily simple to understand that it is very fast. ).

Then in case 5, the where condition was changed again. It can be seen that although the index is still used, the efficiency is suddenly reduced because it does not meet the leftmost matching principle. In addition, the query level of case 5 can be understood as it It is necessary to scan the entire auxiliary index, that is, the tree of the joint index, and then perform where filtering. Naturally, the efficiency is not as fast as directly retrieving the sorted index value, but the index level is still much faster than the ALL.

After understanding the meaning of the above layer, it is very simple to understand cases 6 and 7. You can see that there is only one more primary key column query.

Readers here may think that you didn’t say that all the returned results are covered by index columns, why does adding the primary key column still work? The primary key is not on the clustered index, isn't it necessary to return to the table? In fact, these two questions are easy to answer, because the key stored in the auxiliary index is indeed the value of the index column, but his index value is the primary key ID. When mysql searches for the index column, it finds that there is one more column, but It is also found that this column is the primary key, so it is finally found that the results can be returned directly through the joint index without returning to the table, so the conditions for covering the index are also valid.

If the reader is not clear about the meaning of the query explain result column, you can refer to the following content comparison:

id: First of all, a select will have an id, usually a complex query will contain queries of multiple tables, such as join, in, etc.
select_type: This represents the type of query
table: table name
partitions: This represents the concept of table space, partition
type : For example, the optimization level of the query, const, index, all, respectively represent the clustered index, secondary index (auxiliary index), and the query search method of full table scan
Possiblekeys: Like type to determine the access method, determine which indexes can be selected,
key: Determine which options are available, and provide the corresponding length of the index
key_len: Indicates the length of the index
ref: Some matching related information that appears when the equivalent value is matched
Rows: Estimate how many pieces of data are read through the index or other means
filtered: The percentage of remaining data after being filtered by the search criteria.
extra: The extra information is not important, it is mainly used by the user to determine which index the query has gone.

Summarize

Through the above case, we can think about how to improve the index query speed from the following perspectives:

Use the covering index query method to improve the efficiency, and once again emphasize that the covering index is not an index is a way to optimize the index query.
If the data does not only use the index column then it does not constitute a covering index.
You can optimize the SQL statement or optimize the joint index to improve the hit rate of the covering index.

How to confirm which index to use?

This involves a problem of index cardinality. What is the index cardinality? In fact, it uses algorithms and probability statistics to determine the optimal indexing scheme. This value can be obtained by show index from 表名 . For example, the following 200 and 121 are the index cardinality .

Because of the existence of index cardinality, if the index does not meet our expectations, we can try to force the use of an index.

 > show index from actor;
actor  0  PRIMARY  1  actor_id  A  200        BTREE      YES  
actor  1  idx_actor_last_name  1  last_name  A  121        BTREE      YES

Introduction to the definition of index cardinality official documentation:

The following thing is simply that mysql will choose to use the index according to a certain algorithm according to the value of the cardinality, but sometimes if the query cannot meet the expected requirements, it needs to force the use of the index.

The number of distinct values in the table column. When a query references columns with associated indexes, the cardinality of each column affects the most efficient access method.
For example, for a column with a unique constraint, the number of distinct values is equal to the number of rows in the table. If a table has a million rows, but only 10 distinct values for a particular column,
Then each value occurs (on average) 100,000 times. Queries such as SELECT c1 FROM t1 WHERE c1 = 50 may therefore return 1 or a large number of rows,
And the database server might handle the query differently depending on the cardinality of c1.

If the distribution of values in a column is very uneven, cardinality may not be a good way to determine the best query plan. For example, SELECT c1 FROM t1 WHERE c1 = x;
1 row may be returned when x=50, and 1 million rows may be returned when x=30. In this case, you may want to use index hints to pass advice on which lookup method is more efficient for a particular query.

Cardinality can also be applied to the number of distinct values present in multiple columns, such as in a compound index.
References: Columns, Composite Indexes, Indexes, Index Hints, Persistent Statistics, Random Dive, Selectivity, Unique Constraints .

 原文：
The number of different values in a table column. When queries refer to columns that have an 
associated index, the cardinality of each column influences which access method is most 
efficient. For example, for a column with a unique constraint, the number of different 
values is equal to the number of rows in the table. If a table has a million rows but 
only 10 different values for a particular column, each value occurs (on average) 100,000 times.
 A query such as SELECT c1 FROM t1 WHERE c1 = 50; thus might return 1 row or a huge number of 
 rows, and the database server might process the query differently depending on the cardinality 
 of c1.

If the values in a column have a very uneven distribution, the cardinality might not be 
a good way to determine the best query plan. For example, SELECT c1 FROM t1 WHERE c1 = x;
 might return 1 row when x=50 and a million rows when x=30. In such a case, you might need 
 to use index hints to pass along advice about which lookup method is more efficient for a 
 particular query.

Cardinality can also apply to the number of distinct values present in multiple columns, 
as in a composite index.

See Also column, composite index, index, index hint, persistent statistics, random dive,
 selectivity, unique constraint.

How to make sql force the use of indexes

You can use the from table followed by a conditional statement: force index(索引) for processing. There are fewer cases of using forced indexes, unless the optimizer really chooses an optimization rule that does not meet the expectations and seriously affects query performance. Use The case of forced indexing is as follows:

 select * from actor force index(idx_actor_last_name);

What is the reason why count() is slow?

The count function does not need to be introduced, it is used to query the number of rows in the result, but it should be noted that the optimizer will compare and exclude the data with null values during processing, which means that when the number of rows is large, if you use Correct count will reduce query efficiency due to comparison data operations.

So here we just need to remember a specific rule, that is, as long as it is a query involving the number of rows, then use select(*) , the reason is just that mysql official has made special optimization for this, and there is no need to tangle Why do the officials want to optimize select(*) ? It can only be said that the convention is greater than the configuration . The following is the comparison between common query performances:

count (non-indexed field): theoretically the slowest, because the result of each row must be judged for null.
count (index field): Although the index is gone, it is still necessary to judge whether the result of each row is null.
count(1): Although no fields are involved, this method still needs to judge whether 1 is null.
count(*): Mysql is officially optimized, and the query efficiency is the fastest. Just remember this method .

index pushdown

The implementation version of index pushdown is Mysql5.6 or later.

Function: It is essentially an optimization method to reduce the number of times the secondary index (or secondary index) returns to the table .

Case: Please see the following table building statement. The key here is to establish a joint index of store_id and film_id .

Take the following SQL statement as an example. If it is a version before 5.6, although it is a query method of covering index, it cannot use the index . After the data is searched through the index, although the store_id is sorted in order, the film_id is out of order. During index retrieval, because there is no way to scan sequentially (if you do not know the index organization structure, you can look at the B+ tree index structure several times), it needs a row to use the primary key to return to the table for query, and the query actually needs to use each row of inentory_id Go back to the table 4 times to match whether the film_id is 3.

 select * from inventory_3 where store_id in (1,2) and film_id = 3;

According to human thinking, it seems very unreasonable, because we can find that there is a search method according to normal logic, which is to scan by "jumping" the index. When the index column is scanned, if it does not meet the conditions, it will jump directly. Indexing to the next index column is a bit similar to the way we "hopped house" when we were young to find the sandbags (index data) we need.

So how does index pushdown handle the above situation ? Although film_id cannot be scanned sequentially and does not conform to the index arrangement rules, it is found that the index can be aggregated according to the traversal of film_id and then returned to the table to check! For example, search and traverse according to the query conditions to find film=3, and then check the primary index according to the primary key corresponding to the secondary index column. At this time, you only need to return the table once to find the data. At this time, it should be based on the primary key of each secondary index. The key value is returned to the table to traverse the index and find the index value and then return to the table, and finally achieve the effect of reducing the number of times of returning to the table. This is also the answer to why the index push down is to reduce the number of times of returning to the table.

To enable and disable index pushdown, refer to the following commands:

 -- 索引下推变量值：
mysql> select @@optimizer_switch\G;
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on
1 row in set (0.00 sec)

-- 关闭索引下推
set optimizer_switch='index_condition_pushdown=off';
-- 开启索引下推
set optimizer_switch='index_condition_pushdown=on';

Loose and compact indexes

For loose index and compact index, you can refer to the following two documents for comparison and reading:

MySql Chinese Documentation - 8.2.1.15 GROUP BY Optimization | Docs4dev

MySQL :: MySQL 8.0 Reference Manual :: 8.2.1.17 GROUP BY Optimization

The concept of loose index and compact index is not particularly easy to understand. In fact, when MySQL uses index scan to achieve GROUP BY , it does not need to scan all the index keys that meet the conditions. Completing the operation yields the result , only the details of the situation handled are different.

In the past, Mysql’s operations for group by were to build temporary tables and operate on temporary tables. In the case of using indexes, grouping queries can be indexed:

 explain select last_name from actor  GROUP BY last_name
-- 1    SIMPLE    actor        index    idx_actor_last_name    idx_actor_last_name    182        200    100.00    Using index

Due to the group by operation and order by operation may generate a temporary table when the operation does not go to the index, at the same time group by operation has a c8499f9 similar to order by Sorting operation, sometimes we group query more than one field, so there may be multi-column index situation, so at this time mysql further optimizes multi-column joint index group query, provides loose index and compact index concept,

Loose indexes are officially defined as follows:

When the index scan is used completely to implement the group by operation, only part of the index column can be used to complete the operation
Although the secondary index of Btree is sorted internally and requires the index to be accessed sequentially, the biggest optimization for group by is that the where condition does not need to fully fit all index keys when scanning this sequential index.

There are two keywords in the above definition: complete and incomplete . The where condition does not need to completely fit the index key. For a better understanding, we use the official example here, assuming that there is an index on table t1(c1,c2,c3,c4) idx(c1,c2,c3) . The loose index scan access method can be used for the following queries:

 -- 可以不使用所有索引字段，可以走联合索引
SELECT c1, c2 FROM t1 GROUP BY c1, c2;
-- 去重操作内部也会进行隐式的分组行为
SELECT DISTINCT c1, c2 FROM t1;
-- 分组的极值查询可以使用松散索引，因为c2和c1依然有序
SELECT c1, MIN(c2) FROM t1 GROUP BY c1;
-- 分组前的where 条件
SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
-- 对于c3的极值操作依然和c1,c2构成索引
SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2;
-- 支持范围查询的同时走松散索引
SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
-- 最后一列等值查询依然可以视为松散索引
SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;    
-- 松散索引可以作用于下面的查询
SELECT COUNT(DISTINCT c1), SUM(DISTINCT c1) FROM t1;

SELECT COUNT(DISTINCT c1, c2), COUNT(DISTINCT c2, c1) FROM t1;

A loose index needs to meet the following conditions:

Group query is a single table query
group by conditions must be consecutive positions of the same index order index.
group by can only use max or min aggregate functions at the same time (but after 5.5, more function support has been added).
If applying group by other field conditions must exist in constant form .
The full index value must be used, which means that prefix indexes like like are not applicable.

If you want to determine whether the query uses a loose index, you can confirm whether the content is Using index for group-by extra to explain 607154093610e81bd81eec8370d3b454---.

Let's introduce more practical SQL, assuming that there is an index on table t1(c1,c2,c3,c4) idx(c1,c2,c3) . The loose index scan access method can be used for the following queries:

 
-- 自我实验：松散索引
EXPLAIN SELECT COUNT(DISTINCT film_id, store_id), COUNT(DISTINCT store_id, film_id) FROM inventory_test;
-- 1  SIMPLE  inventory_test    range  idx_store_id_film_id  idx_store_id_film_id  3    4  100.00  Using index for group-by (scanning)
-- 自我实验：松散索引
EXPLAIN SELECT COUNT(DISTINCT store_id), SUM(DISTINCT store_id) FROM inventory_test;
-- 1  SIMPLE  inventory_test    range  idx_store_id_film_id  idx_store_id_film_id  1    4  100.00  Using index for group-by (scanning)

-- 但是如果查询的不是同一个索引，不满足最左原则是不走松散索引的，而是走更快的索引扫描：
EXPLAIN SELECT COUNT(DISTINCT store_id), SUM(DISTINCT store_id) FROM inventory_test;
EXPLAIN SELECT COUNT(DISTINCT film_id), SUM(DISTINCT film_id) FROM inventory_test;
-- 1    SIMPLE    inventory_test        range    idx_store_id_film_id    idx_store_id_film_id    1        4    100.00    Using index for group-by (scanning)
-- 1    SIMPLE    inventory_test        index    idx_store_id_film_id    idx_store_id_film_id    3        3    100.00    Using index

compact index

The difference from the loose index is that the compact index must be a full index scan or a range index scan . When the group by index does not take effect, it is still possible to avoid creating a temporary table, and the compact index needs to read all The index key that meets the conditions will work, and then the group by operation will be completed according to the read data.

In order for this method to work for compact index queries, all columns in the query must have constant equality conditions , such as some keys before or between the GROUP BY keys.

In compact index scan mode, a range scan is performed on the index first, and then the resulting tuples are grouped. For a better understanding, take a look at the relevant case:

There is a gap in GROUP BY but it is covered by condition c2='a' .

 SELECT c1, c2, c3 FROM t1 WHERE c2 = 'a' GROUP BY c1, c3;

GROUP BY does not start with the first part of the key, but a condition provides a constant for this part.

 SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;

Let’s experiment according to the official case. The first is the table structure. We establish a joint index in the following table:

 CREATE TABLE `inventory_test` (
  `inventory_id` mediumint unsigned NOT NULL AUTO_INCREMENT,
  `film_id` smallint unsigned NOT NULL,
  `store_id` tinyint unsigned NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`inventory_id`),
  KEY `idx_store_id_film_id` (`store_id`,`film_id`)
) ENGINE=InnoDB AUTO_INCREMENT=4582 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

The following is a case of personal use of compact indexes. When the where condition is a constant value and is a constant value for the index, group by can go to the index, but if the where condition is a non-indexed field, a full table scan is still required , note that the group field here is not processed according to the leftmost prefix of the joint index and can still go through the index, which is a series of optimizations of mysql for grouping operations.

 -- 紧凑索引
EXPLAIN select count(*),max(film_id),sum(film_id), avg(film_id) from inventory_test where store_id = 1 GROUP BY film_id;
-- 1  SIMPLE  inventory_test    ref  idx_store_id_film_id  idx_store_id_film_id  1  const  1  100.00  Using index

EXPLAIN select count(*),max(film_id),sum(film_id), avg(film_id) from inventory_test where last_update > '2022-02-02 23:20:45' GROUP BY film_id;
-- 1  SIMPLE  inventory_test    ALL  idx_store_id_film_id        3  33.33  Using where; Using temporary

EXPLAIN select count(*),max(film_id),sum(film_id), avg(film_id) from inventory_test where last_update = '2022-02-02 23:20:45' GROUP BY film_id;
-- 1  SIMPLE  inventory_test    ALL  idx_store_id_film_id        3  33.33  Using where; Using temporary

Readers are advised to read more official documents to deepen their understanding of these two concepts.

How to optimize order by?

What is an intermediate result set?

For the conventional sort statement, the search results need to be sorted by size according to a certain field, and in order to complete this operation smoothly, mysql will put this operation on the hard disk or memory to complete.

The basic steps and principles of sorting

For statements involving sorting, it roughly works as follows:

Select the query field and perform conditional query according to where .
The query result set is generated sort_buffer , if the memory is not enough, you need to create an intermediate table on the hard disk for sorting.
Sort the intermediate table according to the Order field.
Return the table to generate a complete result set, and assemble the returned results.

Intermediate result set features

If the intermediate table is relatively small, put it in the memory to determine when it will exist in the memory. Mysql provides the parameter sort_buffer_size , which is responsible for controlling the size of the intermediate result set. If you optimize the memory, you need to adjust and reduce this parameter value, but if you want to optimize the query time, you need to increase this parameter.

Return the table to generate a complete result set

The operation of returning a table to generate a complete result set is actually not always executed. It will be judged according to the session parameter max_length_for_sort_data . If the current query is less than this value, an intermediate table with all fields will be generated. The field intermediate table is obtained, but if it is greater than this value, only the sorting field + primary key intermediate table (similar to the secondary index) will be generated, so it is obvious that the search cannot be found at this time, and the operation needs to be returned to the table to complete the operation.

It should be noted that the sorting field + primary key intermediate table looks like a secondary index, but it has nothing to do with the secondary index at all, just a simple list that needs to go to the primary table repeatedly to get data.

Summary: Full field intermediate table> max_length_for_sort_data >sort field + primary key intermediate table, the value is not the bigger the better, the bigger the impact the query efficiency.

Sort query optimization points

The fundamental problem is that the result of sorting is an intermediate result set. Although the result set can be processed in memory, it has the most essential problem that the intermediate table does not have an index and causes the index to fail. Therefore, in order to allow the intermediate table to use the index, we can Use index coverage .

Optimization method: index coverage is also the most efficient processing method. Index coverage can skip generating intermediate result sets and directly output query results.

The field of order by is the index (or the leftmost of the joint index).
Other fields (condition, output) are in the above index.
Index coverage can skip intermediate result sets and output query results directly.

What is index coverage?
Covering index: Covering index is a query method rather than an index. It means that the query conditions and the returned results in a SQL statement meet the index usage conditions. Of course, after Mysql5.6, the index pushdown is added, and the pushdown conditions are also satisfied. Go covering index.

For example, the following statement does not generate an intermediate result set and can make efficient use of indexes:

 explain select film_id, title from film order by title;
-- 1    SIMPLE    film        index        idx_title    514        1000    100.00    Using index

Summary: Improve sorting query speed

Add an index to the order by field, or use an index to the where field, so that the query can use a covering index.
Adjust the size of sort_buffer_size , or adjust the size of max_length_for_sort_data , so that the sorting can be done in memory as much as possible.

The problem of function operation index invalidation

As can be seen from the following cases, if we perform similar function operations on the indexed fields, mysql will give up the use of the index, and another case is that the date function such as the month() function will also invalidate the index.

Tips: Many people think that function operations are those sum(), count() functions. In fact, the addition, subtraction, multiplication and division operations of fields can be considered as function operations, because the bottom layer needs to call the computer's registers to complete the relevant instruction operations. In addition, it is necessary to distinguish between the signed index pushdown and the loose compact index. The loose and compact indexes are optimized for the grouping operation index, and the index pushdown was officially introduced to 5.6. Most older versions of mysql systems can't enjoy using function operations while walking indexes.

 -- sql1：对于索引字段进行函数操作
EXPLAIN SELECT
  title 
FROM
  film   
WHERE
  title + '22' = 'ACADEMY DINOSAUR' 
  AND length + 11 = 86;
  -- 1  SIMPLE  film    ALL          1000  100.00  Using where
  
-- sql2：如果对于其他字段使用函数操作，但是索引字段不进行 函数操作依然可以走索引
EXPLAIN SELECT
  title 
FROM
  film 
WHERE
  title  = 'ACADEMY DINOSAUR' 
  AND length + 11 = 86;
  -- 1  SIMPLE  film    ref  idx_title  idx_title  514  const  1  100.00  Using where

How to optimize the time function:

How do we optimize the time function? A relatively stupid way is to use between and instead. For example, to search for May, use the first day of May to the last day of May. The specific optimization case is as follows:

 explain select last_update from payment where month(last_update) =2;
-- last_update需要手动创建索引
-- 1  SIMPLE  payment    ALL          16086  100.00  Using where

If we need to optimize the above results, we can use other ways to replace the writing method:

 explain select * from payment where last_update between '2006-02-01' and '2006-02-28';
-- 1  SIMPLE  payment    ALL  idx_payment_lastupdate        16086  50.00  Using where

This is very strange, why is it different from the above? In fact, it is because last_update the data type used in this field is timestamp, and timestamp will give up using the index due to the judgment of the optimizer when searching! So the solution is relatively simple: use force index x to force SQL to use the index.

 explain select  * from payment force index(idx_payment_lastupdate) where last_update between '2006-02-01' and '2006-02-28' ;
-- 1  SIMPLE  payment    range  idx_payment_lastupdate  idx_payment_lastupdate  5    8043  100.00  Using index condition

After experiments here, it is found that if the field is datetime, you can use Between and index directly. There is no experiment for the timestamp type. From the existing table design, the results are as follows:

 -- 优化后
-- 1  SIMPLE  rental    range  rental_date  rental_date  5    182  100.00  Using index condition
explain select * from rental where rental_date between '2006-02-01' and '2006-02-28';

-- 1  SIMPLE  rental    ALL          16008  100.00  Using where
explain select * from rental where  month(rental_date) =2;

Character and numeric comparison:

Character and number comparison will also cause function conversion, which will also cause the index to fail. Therefore, when equality is matched, it is necessary to ensure that the left and right sides of the compared types are consistent. In addition, if the query cannot be modified, the cast function can be used to remedy it, such as the following deal with.

 select * from city where cast(city_id as SIGNED int) = 1;

Implicit character encoding conversion:

If the encoding of the two table fields is different, the problem of index invalidation will also occur, because the bottom layer needs to convert the encoding, and the solution is relatively simple. When comparing, try to compare the strings to ensure the same encoding. Then suppose that when two tables are compared, the fields of that table need to be converted, such as utf8 of table A and utf8mb4 of table B, when the fields in table A need to be compared with the fields of table B, the fields of table A need to be converted to sum The fields of table B are the same .

This is lazy and not experimenting. In most cases, the character set encoding format of the table will not be inconsistent as long as it follows the table level...

The principle of order by rand()

 select tilte, desciption from film order by rand() limit 1;
-- EXPLAIN select title, description from film order by rand() limit 1;
-- 1  SIMPLE  film    ALL          1000  100.00  Using temporary; Using filesort

rand() The function is a function that consumes a lot of database performance. During daily use, we may encounter a situation where we need to temporarily obtain a piece of data. At this time, we may use the rand() function, The following is the execution principle of the rand() function:

Create a temporary table, the temporary table field is rand、title、description .
Get a row from the temporary table, call rand(), put the result and data into the temporary table, and so on.
For the temporary table, put the rand field + row position (primary key) into sort_buffer .

You can see that the biggest problem here is that there are two intermediate result sets .

In response to this problem, the following temporary solution can be used. This temporary solution can be regarded as disassembling the internal work of rand() for processing, and it is also a relatively "stupid" solution without changing the business. :

 select max(film_id),min(film_id) into @M,@N from film;
set @x=FLOOR((@M-@N+1) * rand() + @N);
EXPLAIN select title,description from film where film_id >= @X limit 1;

Other processing methods are to use business and logic code to replace the internal processing of sql, such as processing in the following ways:

Query the total number of data tables total.
Within the total range, randomly select a number r.
Execute the following SQL:

 select title,description from film limit r,1;

Summary:

order by rand() limit This query is extremely inefficient, because it needs to generate the intermediate table twice to get the result, use this function with caution.
There are two solutions:
- Temporary solution: Pick one of the maximum and minimum values of the primary key.
- Easy-to-understand processing: business code plus limit processing
Advantages: directly by tuning SQL without changing the business
Disadvantages: Template code is more difficult to remember, and it is not a panacea, because it may not give you relevant permissions
It is recommended to use business logic code to handle not using the rand() function.

What should I do if the paging query is slow?

Note again that the database version used in this experiment is 8.0.26 .

Let's first look at what the "High Performance Mysql Third Edition" page 241-242 said. The author also uses the sakila table. The recommended method is to use the method of delayed association , such as optimizing the following sql:

 -- 优化前
select film_id,description from film order by title limit 50,5;

-- 优化后
select film_id,description from film inner join (select film_id from film order by title limit 50, 5) as lim using(film_id)

The second way is to use between ...and instead when the id conforms to a certain ordering rule and the business just conforms

 select * from film where film_id between 46 and 50 order position;

Finally, there is another way to use the sorting feature to sort the data and get the previous row:

 select * from film where film_id order position desc limit 5;

The above is an introduction to the "High Performance Mysql Third Edition" part. Let's see if we have any other way?

The deep paging problem, whether it is an interview or a problem often encountered in daily development, is related to the grammatical characteristics of limit. You can see the following:

 select * from film limit x,y;

The execution order of the limit statement is as follows:

First find out all the statements according to the column, if there is a where statement, find the data according to where
Find data and join the result set until (x+y) pieces of data are found.
Discard the first x bars and keep the y bars.
Return the remaining y pieces of data.

We have the following optimization and processing solutions for limit:

1. Simple optimization :

If the primary key is int auto-increment and the primary key is logically in line with business auto-increment, then we can use the following statement to optimize:

 select * from film where id >= 10000 limit y;

2. Subquery optimization :

The optimization method of self-query is a way to reduce the number of returns to the table. We can use the method of self-query. Since there are different processing methods between different businesses, here is a rough processing template:

 select * from film where ID in (select id from film where title = 'BANG KWAI') limit 10000,10

There are two advantages to doing this:

Queries are turned into search index columns and no disk IO is required.
Although a subquery is used, the efficiency is still relatively high because the index column is searched.

3. Delayed association

Like the method of "High-performance Mysql", it is actually an optimized version of the sub-query method. The idea of optimization is to change the filtered data into an index and then exclude it. Since it has been introduced above, it will not be repeated here. .

Summarize:

For the problem of deep paging, we generally have the following optimization ideas:

If the primary key conforms to auto-increment or business sorting, you can directly search for data through id>xxx and then limit.
If the relevant data can be correctly searched by sorting, you can directly get the number of records after sorting.
There are two ways to delay association. The first is to use in sub-query, and the second is to use inner join. The essence is to avoid the search of big data by indexing columns, and at the same time change to the way of index query.
If you can confirm the range, use between and instead.

Summarize

This section describes the handling of some problems that may be often encountered in the actual combat process. The slightly difficult parts are the index pushdown and the compact index part. These features

References

MySql Chinese Documentation - 8.2.1.15 GROUP BY Optimization | Docs4dev

MySQL :: MySQL 8.0 Reference Manual :: 8.2.1.17 GROUP BY Optimization

Three high Mysql - Mysql index and query optimization (partial combat part)