MySQL index problem, 99% of people will step on the pit!

Recently received a Sentry alarm, the following SQL query timed out.

 select * from order_info where uid = 5837661 order by id asc limit 1

Execute show create table order_info to find that this table is actually indexed:

 CREATE TABLE `order_info` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `uid` int(11) unsigned,
  `order_status` tinyint(3) DEFAULT NULL,
  ... 省略其它字段和索引
  PRIMARY KEY (`id`),
  KEY `idx_uid_stat` (`uid`,`order_status`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8

In theory, executing the above SQL will hit the idx_uid_stat index, but actually execute explain to view

 explain select * from order_info where uid = 5837661 order by id asc limit 1

You can see that its possible_keys (indexes that this SQL may involve) is idx_uid_stat, but in fact (key) is a full table scan.

We know that MySQL chooses based on cost to perform the final execution plan based on a full table scan or an index, so it seems that the cost of a full table scan is less than that based on the idx_uid_stat index.

But my first feeling is very strange, although this SQL is a return table, its limit is 1. That is to say, only one of the statements that satisfies uid = 5837661 is selected. Even if the table is returned, only one record is returned. This cost is almost negligible. How could the optimizer choose a full table scan?

To see why the MySQL optimizer chose a full table scan, I turned on optimizer_trace to find out.

Voiceover: In MySQL 5.6 and later, we can use the optimizer trace function to view the entire process of the optimizer generating the execution plan.

The specific process of using optimizer_trace is as follows:

 SET optimizer_trace="enabled=on";        // 打开 optimizer_trace
SELECT * FROM order_info where uid = 5837661 order by id asc limit 1
SELECT * FROM information_schema.OPTIMIZER_TRACE;    // 查看执行计划表
SET optimizer_trace="enabled=off"; // 关闭 optimizer_trace

The MySQL optimizer first calculates the cost of a full table scan, then selects all indexes that may be involved in the SQL and calculates the cost of the index, and then selects the one with the lowest cost to execute.

Take a look at the key information given by the optimizer trace:

 {
  "rows_estimation": [
    {
      "table": "`rebate_order_info`",
      "range_analysis": {
        "table_scan": {
          "rows": 21155996,
          "cost": 4.45e6    // 全表扫描成本
        }
      },
      ...
      "analyzing_range_alternatives": {
          "range_scan_alternatives": [
          {
            "index": "idx_uid_stat",
            "ranges": [
            "5837661 <= uid <= 5837661"
            ],
            "index_dives_for_eq_ranges": true,
            "rowid_ordered": false,
            "using_mrr": false,
            "index_only": false,
            "rows": 255918,
            "cost": 307103,            // 使用idx_uid_stat索引的成本
            "chosen": true
            }
          ],
       "chosen_range_access_summary": {    // 经过上面的各个成本比较后选择的最终结果
         "range_access_plan": {
             "type": "range_scan",
             "index": "idx_uid_stat",  // 可以看到最终选择了idx_uid_stat这个索引来执行
             "rows": 255918,
             "ranges": [
             "58376617 <= uid <= 58376617"
             ]
         },
         "rows_for_plan": 255918,
         "cost_for_plan": 307103,
         "chosen": true
         }
         }  
    ...

It can be seen that the cost of full table scan is 4.45e6 , and the cost of selecting index idx_uid_stat is 307103 , which is much less than the cost of full table scan. And from the final selection result (chosen_range_access_summary), the index idx_uid_stat is indeed selected .

But why the choice seen from explain is to perform PRIMARY, that is, a full table scan? Is this execution plan wrong?

After taking a closer look at the execution plan, I found something tricky.

There is a considering_access_paths_for_index_ordering selection in the execution plan that caught my attention:

 {
    "reconsidering_access_paths_for_index_ordering": {
    "clause": "ORDER BY",
    "index_order_summary": {
      "table": "`rebate_order_info`",
      "index_provides_order": true,
      "order_direction": "asc",
      "index": "PRIMARY",    // 可以看到选择了主键索引
      "plan_changed": true,
      "access_type": "index_scan"
        }
    }
}

This selection means that one more index selection optimization is performed due to sorting.

Since our SQL uses id ordering (order by id asc limit 1), the optimizer finally chooses PRIMARY, which is a full table scan, to perform. That is to say, this choice will ignore the previous choice based on index cost.

Why is there such an option?

The main reasons are as follows:

The short explanation is that the optimizer thinks — or should I say hopes — that scanning the whole table (which is already sorted by the id field) will find the limited rows quick enough, and that this will avoid a sort operation. So by trying to avoid a sort, the optimizer ends-up losing time scanning the table.

As can be seen from this explanation, the main reason is that because we use order by id asc, the id-based sorting method is used. The optimizer thinks sorting is an expensive operation, so in order to avoid sorting, and it thinks that the n of limit n is very high. Small words can be executed quickly even if a full table scan is used, so it chooses a full table scan. It also avoids the sorting of id (full table scan is actually a scan of the clustered index based on the id primary key, which itself is sorted based on id).

That's fine if this choice is right, but in fact this optimization is buggy ! Actual selection of idx_uid_stat would execute much faster (only 28 ms)!

Many people have reported this problem on the Internet, and this problem is basically only related to the writing method of order by id asc limit n in SQL. If n is relatively small, there is a high probability that a full table scan will be performed, and if n is relatively large, the correct index will be selected.

This bug dates back to 2014, and many people have called on the official to fix this bug in time. It may be difficult to implement, until MySQL 5.7 and 8.0 have not been resolved, so we should try to avoid this writing method before the official repair, if we must use this writing method, what should we do?

There are mainly two options

1. Use force index to force the specified index to be used.

as follows:

 select * from order_info force index(idx_uid_stat) where uid = 5837661 order by id asc limit 1

Although this way of writing is possible, it is not elegant. What if the index is abandoned? So there is a second, more elegant solution.

2. Use the order by (id+0) scheme.

as follows:

 select * from order_info where uid = 5837661 order by (id+0) asc limit 1

This scheme also allows the optimizer to choose the correct index, which is more recommended!

Why does this trick work? Because this SQL is sorted by id, but a time-consuming operation such as addition is performed on the id (although it is only a useless 0, but it is enough to fool the optimizer), the optimizer thinks that based on a full table scan at this time, it will be more efficient. It consumes performance, so it will choose the index based on the cost size.

MySQL index problem, 99% of people will step on the pit!

skyarthur

引用和评论

面向单元测试的代码重构

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

DNS服务器地址大全

实战分享：DolphinScheduler 中 Shell 任务环境变量最佳配置方式