第31期:索引设计(索引数量探讨)

爱可生云数据库
English

一般提到索引,大家都只关注索引的优点,一个优秀的索引的确会让查询请求的效率大幅提升、亦或是大幅提升带有索引键进行过滤的写入请求(update、delete)效率。 不可否认,索引在这样的场景下带来巨大的性能提升,但是对于整张表的写请求来讲,则刚好相反,索引的存在会在一定程度上降低写入请求的效率。为了维护索引数据的准实时性而让优化器基于索引做出更优化的执行计划,数据库会自动更新索引数据分布,比如带来索引页的拆分与合并等。

那索引的存在对写入请求影响到底有多大? 也就是说,索引的数量多少与写请求效率的降低有没有一个对应的关系?这篇我用几个简单的例子看看能否得到一个明确的结论。(简单基于笔记本虚拟机测试,主要是基于 MySQL 单实例。)

我的例子主要针对四类语句:insert、update、delete、load data 。
用 mysqlsh 部署一个干净的实例:(带入端口,远程管理员,report_host 选项)
 MySQL  Py > dba.deploy_sandbox_instance(3500,{"password":"root","allowRootFrom":"%","mysqldOptions":["report_host=debian-ytt1"]})
A new MySQL sandbox instance will be created on this host in
/home/ytt/mysql-sandboxes/3500

...

Deploying new MySQL instance...

Instance localhost:3500 successfully deployed and started.
Use shell.connect('root@localhost:3500') to connect to the instance.

这里用来测试索引数量的表有 10 个,分别为 t1 到 t10 ,除了每张表索引数量不一样外,字段定义等都一样:t1 有 1 个索引,t2 有个 2 个,依次类推,t10 有 10 个索引,字段类型都是整形(索引数量不包含主键,只包含二级索引)。表 t0 表结构如下:

(debian-ytt1:3500)|(ytt)>show create table t0\G
*************************** 1. row ***************************
       Table: t0
Create Table: CREATE TABLE `t0` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `r0` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `r3` int DEFAULT NULL,
  `r4` int DEFAULT NULL,
  `r5` int DEFAULT NULL,
  `r6` int DEFAULT NULL,
  `r7` int DEFAULT NULL,
  `r8` int DEFAULT NULL,
  `r9` int DEFAULT NULL,
  `r10` int DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

在 shell 命令行批量建这 10 张表,并且分别加上对应的索引:

root@debian-ytt1:~# for i in `seq 1 10`;do mysql --login-path=root_ytt -e "use ytt;create table t$i like t0";done;
root@debian-ytt1:~# for i in `seq 1 10`; do for j in `seq 1 "$i"`; do mysql --login-path=root_ytt -e"use ytt;alter table t$i add key idx_r$j (r$j)"; done; done;
这里我简单写一个存储过程,用来模拟 Insert 、update、delete三种写入操作:
DELIMITER $$

USE `ytt`$$

DROP PROCEDURE IF EXISTS `sp_batch_write`$$

CREATE DEFINER=`root`@`%` PROCEDURE `sp_batch_write`(
    IN f_write ENUM('insert','update','delete'),
    IN f_table_name VARCHAR(64),
    IN f_num INT UNSIGNED
    )
BEGIN
  DECLARE i INT UNSIGNED DEFAULT 0;
  
  IF f_write = 'insert' THEN
    SET @stmt = CONCAT('insert into ',f_table_name,'(r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10)  
                        values (ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),
                        ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),
                        ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000))');
   
    SET @@autocommit=0;
    WHILE i < f_num
    DO
      PREPARE s1 FROM @stmt;
      EXECUTE s1;
      IF MOD(i,50) = 0 THEN
        COMMIT;
      END IF;
      SET i = i + 1;
    END WHILE;  
    DROP PREPARE s1;
    COMMIT;
    SET @@autocommit=1;

  ELSEIF f_write = 'update' THEN
    SET @stmt = CONCAT('update ',f_table_name,' set r0=ceil(rand()*10000),r1 = ceil(rand()*10000),
    r2 = ceil(rand()*10000),r3 = ceil(rand()*10000),r4 = ceil(rand()*10000),
    r5 = ceil(rand()*10000),r6 = ceil(rand()*10000),r7 = ceil(rand()*10000),
    r8 = ceil(rand()*10000),r9 = ceil(rand()*10000),r10 = ceil(rand()*10000)');
    PREPARE s1 FROM @stmt;
    EXECUTE s1;
    DROP PREPARE s1;
  ELSEIF f_write = 'delete' THEN
    SET @stmt = CONCAT('delete from ',f_table_name);
    PREPARE s1 FROM @stmt;
    EXECUTE s1;
    DROP PREPARE s1;
  END IF;

END$$

DELIMITER ;
  1. INSERT 操作这 10 张表的时间对比:
(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t1',100000);
Query OK, 0 rows affected (1 min 24.61 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t2',100000);
Query OK, 0 rows affected (1 min 25.40 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t3',100000);
Query OK, 0 rows affected (1 min 29.12 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t4',100000);
Query OK, 0 rows affected (1 min 36.65 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t5',100000);
Query OK, 0 rows affected (1 min 39.85 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t6',100000);
Query OK, 0 rows affected (1 min 41.97 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t7',100000);
Query OK, 0 rows affected (2 min 29.07 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t8',100000);
Query OK, 0 rows affected (2 min 17.62 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t9',100000);
Query OK, 0 rows affected (2 min 14.14 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t10',100000);
Query OK, 0 rows affected (2 min 32.32 sec)

基本上,越到后面,时间越长。 简单画了张图,见下图:

  1. UPDATE 操作这 10 张表时间对比:
(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t1',100000);
Query OK, 0 rows affected (22.81 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t2',100000);
Query OK, 0 rows affected (47.42 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t3',100000);
Query OK, 0 rows affected (1 min 8.96 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t4',100000);
Query OK, 0 rows affected (1 min 55.50 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t5',100000);
Query OK, 0 rows affected (2 min 13.13 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t6',100000);
Query OK, 0 rows affected (2 min 6.79 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t7',100000);
Query OK, 0 rows affected (2 min 35.01 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t8',100000);
Query OK, 0 rows affected (2 min 44.50 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t9',100000);
Query OK, 0 rows affected (3 min 1.27 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t10',100000);
Query OK, 0 rows affected (4 min 43.74 sec)

对表更新的影响最大,随着索引数量的增多,表更新时间越来越慢。见下图:

  1. DELETE操作最这10张表的时间对比:

删除前,先看下每张表对应磁盘文件大小,也是按照索引数量多少依次由小到大排列,所以数量越少,占用磁盘空间越小。

root@debian-ytt1:/home/ytt/mysql-sandboxes/3500/sandboxdata/ytt# ls -Sihlr
总用量 521M
2229717 -rw-r----- 1 ytt ytt 112K 6月   1 14:03 t0.ibd
2229718 -rw-r----- 1 ytt ytt  28M 6月   1 16:14 t1.ibd
2229719 -rw-r----- 1 ytt ytt  36M 6月   1 16:16 t2.ibd
2229720 -rw-r----- 1 ytt ytt  40M 6月   1 16:16 t3.ibd
2229721 -rw-r----- 1 ytt ytt  44M 6月   1 16:16 t4.ibd
2229722 -rw-r----- 1 ytt ytt  52M 6月   1 16:03 t5.ibd
2229723 -rw-r----- 1 ytt ytt  56M 6月   1 16:06 t6.ibd
2229725 -rw-r----- 1 ytt ytt  60M 6月   1 15:25 t8.ibd
2229724 -rw-r----- 1 ytt ytt  64M 6月   1 16:08 t7.ibd
2229734 -rw-r----- 1 ytt ytt  68M 6月   1 15:30 t9.ibd
2229735 -rw-r----- 1 ytt ytt  72M 6月   1 15:32 t10.ibd

删除时间肯定随着磁盘文件大小递增:

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t1',100000);
Query OK, 0 rows affected (31.80 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t2',100000);
Query OK, 0 rows affected (59.89 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t3',100000);
Query OK, 0 rows affected (1 min 21.83 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t4',100000);
Query OK, 0 rows affected (2 min 0.33 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t5',100000);
Query OK, 0 rows affected (2 min 17.66 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t6',100000);
Query OK, 0 rows affected (2 min 9.37 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t7',100000);
Query OK, 0 rows affected (2 min 53.61 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t8',100000);
Query OK, 0 rows affected (2 min 39.29 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t9',100000);
Query OK, 0 rows affected (3 min 38.26 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t10',100000);
Query OK, 0 rows affected (4 min 22.88 sec)

结果和预期一样,见下图:

这也就是为什么对一张表记录清空时,不建议使用 delete 的原因! (delete 每行打标记,后期 purge 线程定期清理,空间也不会立即释放,用 truncate 代替 delete !)

  1. LOAD DATA 导入操作对这 10 张表时间对比:
(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t1;
Query OK, 100000 rows affected (11.01 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t2;
Query OK, 100000 rows affected (14.94 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t3;
Query OK, 100000 rows affected (19.33 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t4;
Query OK, 100000 rows affected (22.06 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t5;
Query OK, 100000 rows affected (28.86 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t6;
Query OK, 100000 rows affected (30.57 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t7;
Query OK, 100000 rows affected (30.83 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t8;
Query OK, 100000 rows affected (35.88 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t9;
Query OK, 100000 rows affected (36.76 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t10;
Query OK, 100000 rows affected (45.79 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

导入时间也是随着索引数量增多而增加,见下图:

从以上简单测试结果可以得到, 新增一个索引,就会导致额外的写入时间消耗。 在写入优先的业务中,索引并不是越多越好,最好是保留必要的索引,删除掉不必要的索引。说到这里,可能在互联网中最常用的一种场景,比如主从架构,主机和从机表结构一样,但是索引数量不一样;主机只有主键,从机除了主键外,可能有多个二级索引。 其实这样对于写请求在主机,读请求在从机,并且从机可以接受一定程度的数据延迟时间的场景, 那其实非常合适;但是如果从机不能接受太长时间的数据延迟,此时就得舍弃掉一堆不是很常用的索引来提升从机的日志消费效率。


关于 MySQL 的技术内容,你们还有什么想知道的吗?赶紧留言告诉小编吧!

阅读 294

中国领先的企业数据处理技术整体解决方案提供商,开源数据库领域优秀企业。为大型行业用户的特定场景提...

242 声望
144 粉丝
0 条评论
你知道吗?

中国领先的企业数据处理技术整体解决方案提供商,开源数据库领域优秀企业。为大型行业用户的特定场景提...

242 声望
144 粉丝
文章目录
宣传栏