Scenario + case analysis, SQL optimization is the right thing to do!

Preface

In the early stage of application development, the amount of data is small, and developers pay more attention to the realization of functions when developing functions. With the growth of production data, many SQL statements begin to expose performance problems, which have an increasing impact on production, and sometimes it may be possible These problematic SQLs are the bottleneck of the overall system performance.

If you have different opinions, please leave a message to correct me and learn together!

SQL optimization general steps

1. Locate those SQL statements with low execution efficiency through slow log checking, etc.
2. Explain analyzes the execution plan of SQL

Need to focus on type, rows, filtered, extra .

Type from top to bottom, efficiency is getting higher and higher

ALL full table scan
index index full scan
range index range scan, common words <,<=,>=,between,in and other operations
ref uses a non-unique index scan or a unique index prefix scan to return a single record, which often appears in related queries
eq_ref is similar to ref, the difference is that the unique index is used, and the associated query using the primary key
const/system single record, the system will treat other columns in the matching row as constants, such as primary key or unique index query
null MySQL does not access any table or index, and returns the result directly

Although the efficiency is getting higher and higher from top to bottom, according to the cost model, suppose there are two indexes idx1(a, b, c), idx2(a, c), SQL is select * from t where a = 1 and b in (1, 2) order by c; if idx1 is taken, the type is range, if idx2 is taken, then the type is ref; when the number of rows to be scanned, idx2 is approximately 5 times greater than idx1, idx1 is used, Otherwise, idx2 will be used.

recommend:

Extra

Using filesort: MySQL requires an additional pass to find out how to retrieve rows in sorted order. Sorting is done by browsing all rows according to the connection type and saving the sort key and row pointers for all rows matching the WHERE clause. Then the keywords are sorted, and the rows are retrieved in sorted order.
Using temporary: Temporary table is used to save intermediate results, performance is particularly poor and needs to be optimized
Using index: Indicates that the covering index (Coveing Index) is used in the corresponding select operation, avoiding access to the data rows of the table, and the efficiency is good! If using where appears at the same time, it means that the data that meets the conditions cannot be queried directly through index search.
Using index condition: the new ICP after MySQL5.6, using index condtion is to use ICP (index push down), data filtering at the storage engine layer, rather than filtering at the service layer, using the index to reduce the existing data back to the table The data.

show profile analysis

Understand the status of the thread of SQL execution and the time consumed.

The default is closed, open the statement “set profiling = 1;”

SHOW PROFILES ;
SHOW PROFILE FOR QUERY  #{id};

trace

The trace analyzes how the optimizer chooses the execution plan. Through the trace file, we can further understand why the coupon chooses the A execution plan instead of the B execution plan.

set optimizer_trace="enabled=on";
set optimizer_trace_max_mem_size=1000000;
select * from information_schema.optimizer_trace;

Identify the problem and adopt corresponding measures

Optimize index
Optimize SQL statement: modify SQL, IN query segment, time query segment, filter based on the last data
Use other implementation methods: ES, data warehouse, etc.
Data fragmentation processing

Scene analysis

Case 1, the leftmost match

index

KEY `idx_shopid_orderno` (`shop_id`,`order_no`)

SQL statement

select * from _t where orderno=''

Query matching is matched from left to right. To use order_no to go to the index, the query condition must carry shop_id or index (shop_id, order_no) to change the order.

Case 2, implicit conversion

index

KEY `idx_mobile` (`mobile`)

SQL statement

select * from _user where mobile=12345678901

Implicit conversion is equivalent to doing operations on the index, which will invalidate the index. Mobile is a character type, and numbers are used, and string matching should be used, otherwise MySQL will use implicit substitution, which will cause the index to become invalid.

Case 3, big page

index

KEY `idx_a_b_c` (`a`, `b`, `c`)

SQL statement

select * from _t where a = 1 and b = 2 order by c desc limit 10000, 10;

For large paging scenarios, you can give priority to product optimization requirements. If there is no optimization, there are two optimization methods as follows:

One is to pass the last piece of data from the previous time, that is, the above c, and then do "c <xxx" processing, but this generally requires changing the interface protocol, which is not necessarily feasible.

The other is to use delayed association for processing to reduce SQL back to the table, but remember that the index needs to be completely covered to have an effect. The SQL changes are as follows

SELECT
 t1.* 
FROM
 _t t1,
 ( SELECT id FROM _t WHERE a = 1 AND b = 2 ORDER BY c DESC LIMIT 10000, 10 ) t2 
WHERE
 t1.id = t2.id;

Case 4. in + order by

index

KEY `idx_shopid_status_created` (`shop_id`, `order_status`, `created_at`)

SQL statement

SELECT
 * 
FROM
 _order 
WHERE
 shop_id = 1 
 AND order_status IN ( 1, 2, 3 ) 
ORDER BY
 created_at DESC 
 LIMIT 10

n*m at the bottom of MySQL through 0611ce83f25486, which is similar to union, but the efficiency is higher than that of union.

When the in query calculates the cost (cost = number of tuples * IO average value), the number of tuples is obtained by querying the values contained in in one by one, so this calculation process will be relatively slow, so MySQL settings A critical value (eq_range_index_dive_limit) is set. After 5.6, the cost of the column will not participate in the calculation after this critical value is exceeded.

Therefore, the execution plan selection will be inaccurate. The default is 200, that is, the in condition exceeds 200 data, which will cause problems in the cost calculation of in, and may cause the index selected by Mysql to be inaccurate.

Processing method, (order_status, created_at) can be exchanged before and after the order, and adjust the SQL to delay association.

recommend:

Case 5. Range query is blocked, and subsequent fields cannot be indexed

index

KEY `idx_shopid_created_status` (`shop_id`, `created_at`, `order_status`)

SQL statement

SELECT
 * 
FROM
 _order 
WHERE
 shop_id = 1 
 AND created_at > '2021-01-01 00:00:00' 
 AND order_status = 10

Range query also has "IN, between"

Case 6, not equal to, does not include fast search that cannot use the index. (ICP can be used)

select * from _order where shop_id=1 and order_status not in (1,2)

select * from _order where shop_id=1 and order_status != 1

In the index, avoid using NOT、!=、<>、!<、!>、NOT EXISTS、NOT IN、NOT LIKE etc.

Case 7. The optimizer chooses not to use the index

If the amount of data required to be accessed is small, the optimizer will still choose the auxiliary index, but when the accessed data accounts for a large part of the data in the entire table (usually about 20%), the optimizer will choose to find through the clustered index data.

select * from _order where  order_status = 1

Check out all unpaid orders. Generally, such orders are rare. Even if an index is built, the index cannot be used.

Case 8. Complex query

select sum(amt) from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01';

select * from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01' limit 10;

If you are counting some data, you may use the data warehouse to solve it;

If it is a business that has such a complex query, it may not be recommended to continue to use SQL, but to solve it in other ways, such as using ES to solve it.

Case 9, mixed use of asc and desc

select * from _t where a=1 order by b desc, c asc

Mixed use of desc and asc will cause index failure

Case 10. Big data

For the data storage of the push service, the amount of data may be very large. If the solution is selected, the final choice is to store it on MySQL, and save it with a validity period of 7 days.

Then you need to pay attention that frequent cleaning of data will result in data fragmentation, and you need to contact the DBA for data fragmentation processing.

Author: Wolf Lord
cnblogs.com/powercto/p/14410128.html

Scenario + case analysis, SQL optimization is the right thing to do!

Preface

SQL optimization general steps

Extra

show profile analysis

trace

Identify the problem and adopt corresponding measures

Scene analysis

Case 1, the leftmost match

Case 2, implicit conversion

Case 3, big page

Case 4. in + order by

Case 5. Range query is blocked, and subsequent fields cannot be indexed

Case 6, not equal to, does not include fast search that cannot use the index. (ICP can be used)

Case 7. The optimizer chooses not to use the index

Case 8. Complex query

Case 9, mixed use of asc and desc

Case 10. Big data

民工哥

引用和评论

早知道有这么个吊炸天的 CI&CD 工具，我就不用 Jenkins 了！

Redis 持久化原理分析和使用建议

Open WebUI：开源AI交互平台的全面解析

被 Manus 带火的 MCP 是什么｜一文看懂

MyBatis-Plus结合Spring Boot实现数据权限

53 倍性能提升！TiDB 全局索引如何优化分区表查询？

70k star，取代Postman！这款轻量级API工具，太香了！