Issue 31: Index Design (Discussion on the Number of Indexes)

Generally speaking, when it comes to indexes, everyone only pays attention to the advantages of indexes. An excellent index will greatly improve the efficiency of query requests, or greatly improve the efficiency of write requests (update, delete) with index key filtering. It is undeniable that the index brings huge performance improvement in such a scenario, but for the write request of the entire table, it is just the opposite. The existence of the index will reduce the efficiency of the write request to a certain extent. In order to maintain the quasi-real-time performance of the index data and let the optimizer make a more optimized execution plan based on the index, the database will automatically update the index data distribution, such as the split and merge of index pages.

How much impact does the existence of the index have on write requests? In other words, is there a corresponding relationship between the number of indexes and the reduction of write request efficiency? In this article, I use a few simple examples to see if I can get a clear conclusion. (Simple test based on notebook virtual machine, mainly based on MySQL single instance.)

My example is mainly for four types of statements: insert, update, delete, load data.

Deploy a clean instance with mysqlsh: (bring in port, remote administrator, report_host option)

 MySQL  Py > dba.deploy_sandbox_instance(3500,{"password":"root","allowRootFrom":"%","mysqldOptions":["report_host=debian-ytt1"]})
A new MySQL sandbox instance will be created on this host in
/home/ytt/mysql-sandboxes/3500

...

Deploying new MySQL instance...

Instance localhost:3500 successfully deployed and started.
Use shell.connect('root@localhost:3500') to connect to the instance.

There are 10 tables used to test the number of indexes here, from t1 to t10. Except for the number of indexes in each table, the field definitions are the same: t1 has one index, t2 has two, and so on, t10 There are 10 indexes, and the field types are all plastic (the number of indexes does not include the primary key, only the secondary index). Table t0 The table structure is as follows:

(debian-ytt1:3500)|(ytt)>show create table t0\G
*************************** 1. row ***************************
       Table: t0
Create Table: CREATE TABLE `t0` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `r0` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `r3` int DEFAULT NULL,
  `r4` int DEFAULT NULL,
  `r5` int DEFAULT NULL,
  `r6` int DEFAULT NULL,
  `r7` int DEFAULT NULL,
  `r8` int DEFAULT NULL,
  `r9` int DEFAULT NULL,
  `r10` int DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

Build these 10 tables in batches on the shell command line, and add the corresponding indexes:

root@debian-ytt1:~# for i in `seq 1 10`;do mysql --login-path=root_ytt -e "use ytt;create table t$i like t0";done;
root@debian-ytt1:~# for i in `seq 1 10`; do for j in `seq 1 "$i"`; do mysql --login-path=root_ytt -e"use ytt;alter table t$i add key idx_r$j (r$j)"; done; done;

Here I simply write a stored procedure to simulate the three write operations of Insert, update, and delete:

DELIMITER $$

USE `ytt`$$

DROP PROCEDURE IF EXISTS `sp_batch_write`$$

CREATE DEFINER=`root`@`%` PROCEDURE `sp_batch_write`(
    IN f_write ENUM('insert','update','delete'),
    IN f_table_name VARCHAR(64),
    IN f_num INT UNSIGNED
    )
BEGIN
  DECLARE i INT UNSIGNED DEFAULT 0;
  
  IF f_write = 'insert' THEN
    SET @stmt = CONCAT('insert into ',f_table_name,'(r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10)  
                        values (ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),
                        ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),
                        ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000))');
   
    SET @@autocommit=0;
    WHILE i < f_num
    DO
      PREPARE s1 FROM @stmt;
      EXECUTE s1;
      IF MOD(i,50) = 0 THEN
        COMMIT;
      END IF;
      SET i = i + 1;
    END WHILE;  
    DROP PREPARE s1;
    COMMIT;
    SET @@autocommit=1;

  ELSEIF f_write = 'update' THEN
    SET @stmt = CONCAT('update ',f_table_name,' set r0=ceil(rand()*10000),r1 = ceil(rand()*10000),
    r2 = ceil(rand()*10000),r3 = ceil(rand()*10000),r4 = ceil(rand()*10000),
    r5 = ceil(rand()*10000),r6 = ceil(rand()*10000),r7 = ceil(rand()*10000),
    r8 = ceil(rand()*10000),r9 = ceil(rand()*10000),r10 = ceil(rand()*10000)');
    PREPARE s1 FROM @stmt;
    EXECUTE s1;
    DROP PREPARE s1;
  ELSEIF f_write = 'delete' THEN
    SET @stmt = CONCAT('delete from ',f_table_name);
    PREPARE s1 FROM @stmt;
    EXECUTE s1;
    DROP PREPARE s1;
  END IF;

END$$

DELIMITER ;

INSERT operation time comparison of these 10 tables:

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t1',100000);
Query OK, 0 rows affected (1 min 24.61 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t2',100000);
Query OK, 0 rows affected (1 min 25.40 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t3',100000);
Query OK, 0 rows affected (1 min 29.12 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t4',100000);
Query OK, 0 rows affected (1 min 36.65 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t5',100000);
Query OK, 0 rows affected (1 min 39.85 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t6',100000);
Query OK, 0 rows affected (1 min 41.97 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t7',100000);
Query OK, 0 rows affected (2 min 29.07 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t8',100000);
Query OK, 0 rows affected (2 min 17.62 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t9',100000);
Query OK, 0 rows affected (2 min 14.14 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t10',100000);
Query OK, 0 rows affected (2 min 32.32 sec)

Basically, the farther you go, the longer the time. I simply drew a picture, see the picture below:

UPDATE operation time comparison of these 10 tables:

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t1',100000);
Query OK, 0 rows affected (22.81 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t2',100000);
Query OK, 0 rows affected (47.42 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t3',100000);
Query OK, 0 rows affected (1 min 8.96 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t4',100000);
Query OK, 0 rows affected (1 min 55.50 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t5',100000);
Query OK, 0 rows affected (2 min 13.13 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t6',100000);
Query OK, 0 rows affected (2 min 6.79 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t7',100000);
Query OK, 0 rows affected (2 min 35.01 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t8',100000);
Query OK, 0 rows affected (2 min 44.50 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t9',100000);
Query OK, 0 rows affected (3 min 1.27 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t10',100000);
Query OK, 0 rows affected (4 min 43.74 sec)

It has the greatest impact on table updates. As the number of indexes increases, the table update time becomes slower and slower. See below:

DELETE operation time comparison of these 10 tables:

Before deleting, first look at the disk file size corresponding to each table, which is also arranged in descending order according to the number of indexes, so the smaller the number, the smaller the disk space occupied.

root@debian-ytt1:/home/ytt/mysql-sandboxes/3500/sandboxdata/ytt# ls -Sihlr
总用量 521M
2229717 -rw-r----- 1 ytt ytt 112K 6月   1 14:03 t0.ibd
2229718 -rw-r----- 1 ytt ytt  28M 6月   1 16:14 t1.ibd
2229719 -rw-r----- 1 ytt ytt  36M 6月   1 16:16 t2.ibd
2229720 -rw-r----- 1 ytt ytt  40M 6月   1 16:16 t3.ibd
2229721 -rw-r----- 1 ytt ytt  44M 6月   1 16:16 t4.ibd
2229722 -rw-r----- 1 ytt ytt  52M 6月   1 16:03 t5.ibd
2229723 -rw-r----- 1 ytt ytt  56M 6月   1 16:06 t6.ibd
2229725 -rw-r----- 1 ytt ytt  60M 6月   1 15:25 t8.ibd
2229724 -rw-r----- 1 ytt ytt  64M 6月   1 16:08 t7.ibd
2229734 -rw-r----- 1 ytt ytt  68M 6月   1 15:30 t9.ibd
2229735 -rw-r----- 1 ytt ytt  72M 6月   1 15:32 t10.ibd

The deletion time must increase with the size of the disk file:

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t1',100000);
Query OK, 0 rows affected (31.80 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t2',100000);
Query OK, 0 rows affected (59.89 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t3',100000);
Query OK, 0 rows affected (1 min 21.83 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t4',100000);
Query OK, 0 rows affected (2 min 0.33 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t5',100000);
Query OK, 0 rows affected (2 min 17.66 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t6',100000);
Query OK, 0 rows affected (2 min 9.37 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t7',100000);
Query OK, 0 rows affected (2 min 53.61 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t8',100000);
Query OK, 0 rows affected (2 min 39.29 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t9',100000);
Query OK, 0 rows affected (3 min 38.26 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t10',100000);
Query OK, 0 rows affected (4 min 22.88 sec)

The result is as expected, as shown in the figure below:

This is why it is not recommended to use delete when clearing a table record! (delete marks each row, and the purge thread periodically cleans up the space later, and the space will not be released immediately. Use truncate instead of delete!)

LOAD DATA import operation to compare the time of these 10 tables:

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t1;
Query OK, 100000 rows affected (11.01 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t2;
Query OK, 100000 rows affected (14.94 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t3;
Query OK, 100000 rows affected (19.33 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t4;
Query OK, 100000 rows affected (22.06 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t5;
Query OK, 100000 rows affected (28.86 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t6;
Query OK, 100000 rows affected (30.57 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t7;
Query OK, 100000 rows affected (30.83 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t8;
Query OK, 100000 rows affected (35.88 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t9;
Query OK, 100000 rows affected (36.76 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t10;
Query OK, 100000 rows affected (45.79 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

The import time also increases as the number of indexes increases, as shown in the figure below:

It can be obtained from the above simple test results that adding an index will cause additional write time consumption. In the write-first business, the more indexes are not the better, it is better to keep the necessary indexes and delete unnecessary indexes. Having said that, perhaps the most commonly used scenario in the Internet, such as the master-slave architecture, the master and slave tables have the same structure, but the number of indexes is different; the master has only the primary key, and the slave may have multiple secondary keys in addition to the primary key. index. In fact, this is very suitable for the scenario where the write request is on the master, the read request is on the slave, and the slave can accept a certain degree of data delay time; but if the slave cannot accept the data delay for too long, you have to Abandon a bunch of indexes that are not very commonly used to improve the log consumption efficiency of the slave.

What else do you want to know about the technical content of MySQL? Hurry up and leave a message to tell the editor!

Issue 31: Index Design (Discussion on the Number of Indexes)

My example is mainly for four types of statements: insert, update, delete, load data.

Deploy a clean instance with mysqlsh: (bring in port, remote administrator, report_host option)

Here I simply write a stored procedure to simulate the three write operations of Insert, update, and delete:

INSERT operation time comparison of these 10 tables:

UPDATE operation time comparison of these 10 tables:

DELETE operation time comparison of these 10 tables:

LOAD DATA import operation to compare the time of these 10 tables:

爱可生开源社区

引用和评论

如何巧妙解决 Too many connections 报错？

【深度剖析】MySQL B+树索引：一次查询的内部奥秘

【深度剖析】MySQL 索引下推：原理、案例与性能提升实战

深入理解 StarRocks Bitmap 索引和 Bitmap 去重

如何合理规划Elasticsearch的索引｜得物技术