Issue 42: Is Multicolumn Partitioning Necessary in MySQL

In the previous chapters, we discussed the single-column-based partition table, so is it necessary to build a multi-column-based partition table? Is this partitioned table data evenly distributed? Are there any special application scenarios? Are there any special optimization strategies? Based on these questions, this paper will focus on interpretation.

MySQL supports not only single-column-based partitioning, but also multi-column-based partitioning. For example, to create a partitioned table based on fields (f1, f2, f3), the usage and usage scenarios are somewhat similar to joint indexes. For example, the following query statement filters the columns (f1, f2, f3).

 select * from p1 where f1 = 2 and f2 = 2 and f3 = 2;

The premise of a multi-column partition table is that the columns participating in the partition have an equal frequency of retrieval. If they are not equal, there is no need to use multi-column partitioning.

We still use specific examples to verify the advantages, disadvantages and applicable scenarios of multi-column partitions, so that we can understand them more thoroughly.

Create a table p1, the fields r1, r2, r3 are 1-8, 1-5, 1-5 respectively.

 create table p1(r1 int,r2 int,r3 int,log_date datetime);

According to the distribution range of the fields (r1, r2, r3), I will write a stored procedure to process the following table p1 and become a partition table. The stored procedure code is as follows:

 DELIMITER $$

USE `ytt_new`$$

DROP PROCEDURE IF EXISTS `sp_add_partition_ytt_new_p1`$$

CREATE DEFINER=`root`@`%` PROCEDURE `sp_add_partition_ytt_new_p1`()
BEGIN
    DECLARE i,j,k INT UNSIGNED DEFAULT 1;
    SET @stmt = '';
    SET @stmt_begin = 'ALTER TABLE p1 PARTITION BY RANGE COLUMNS (r1,r2,r3)(';
        WHILE i <= 8 DO
       set j = 1;
       while j <= 5 do
         set k = 1;
         while k <= 5 do
               SET @stmt = CONCAT(@stmt,' PARTITION p',i,j,k,' VALUES LESS THAN (',i,',',j,',',k,'),');
               set k = k + 1;
         end while;
         set j = j + 1;
       end while;
       SET i = i + 1;        
        END WHILE;    
    SET @stmt_end = 'PARTITION p_max VALUES LESS THAN (maxvalue,maxvalue,maxvalue))';
        SET @stmt = CONCAT(@stmt_begin,@stmt,@stmt_end);
        PREPARE s1 FROM @stmt;
        EXECUTE s1;
        DROP PREPARE s1;
        SET @stmt = NULL;
        SET @stmt_begin = NULL;
        SET @stmt_end = NULL;    
    END$$

DELIMITER ;

Call the stored procedure and change the table p1 to a multi-column partition table. At this time, the table p1 has 201 partitions and the number of records is 500W.

 mysql> call sp_add_partition_ytt_new_p1;
Query OK, 0 rows affected (14.89 sec)

mysql> select count(partition_name) as partition_count  from information_schema.partitions where table_schema = 'ytt_new' and table_name ='p1';
+-----------------+
| partition_count |
+-----------------+
|             201 |
+-----------------+
1 row in set (0.00 sec)

mysql> select count(*) from p1;
+----------+
| count(*) |
+----------+
|  5000000 |
+----------+
1 row in set (12.01 sec)

Create a partition table p2 in the same way to compare the performance of a single-column partition table and a multi-column partition table in some scenarios:

The partition table p2 is partitioned according to the field r1, and only 9 are divided.

 mysql> CREATE TABLE `p2` (
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `r3` int DEFAULT NULL,
  `log_date` datetime DEFAULT NULL
) ENGINE=InnoDB
PARTITION BY RANGE  COLUMNS(r1)
(PARTITION p1 VALUES LESS THAN (1) ,
 PARTITION p2 VALUES LESS THAN (2) ,
 PARTITION p3 VALUES LESS THAN (3) ,
 PARTITION p4 VALUES LESS THAN (4) ,
 PARTITION p5 VALUES LESS THAN (5) ,
 PARTITION p6 VALUES LESS THAN (6) ,
 PARTITION p7 VALUES LESS THAN (7) ,
 PARTITION p8 VALUES LESS THAN (8) ,
 PARTITION p_max VALUES LESS THAN (MAXVALUE) 
)
1 row in set (0.00 sec)

mysql> insert into p2 select * from p1;
Query OK, 5000000 rows affected (1 min 37.92 sec)
Records: 5000000  Duplicates: 0  Warnings: 0

Performance comparison of equal value filtering for multiple fields: For the same query conditions, table p1 (execution time 0.02 seconds) is dozens of times faster than p2 (execution time 0.49 seconds).

 mysql> select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
|    24992 |
+----------+
1 row in set (0.02 sec)

mysql> select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2;
+----------+
| count(*) |
+----------+
|    24992 |
+----------+
1 row in set (0.49 sec)

Check the comparison of the two execution plans: For the same query, the number of scanned rows in table p1 is only 2W, while the number of scanned rows in table p2 is 62W, which is a huge difference.

 mysql> explain select count(*) from p1 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p1
   partitions: p223
         type: ALL
...
         rows: 24711
     filtered: 0.10
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from p2 where r1 = 2 and r2 = 2 and r3 = 2\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p2
   partitions: p3
         type: ALL
...
         rows: 623239
     filtered: 0.10
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

What if the filter field is incomplete? For example, without retrieving the last column, do the comparison again: the execution time of the same table p1 (0.1 seconds) is several times less than that of the table p2 (0.52 seconds).

 mysql> select count(*) from p1 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
|   124649 |
+----------+
1 row in set (0.10 sec)

mysql> select count(*) from p2 where r1 = 2 and r2 = 2;
+----------+
| count(*) |
+----------+
|   124649 |
+----------+
1 row in set (0.52 sec)

What about retrieving only the first column: this time, the execution time of tables p1 and p2 is similar, and p2 has a slight advantage.

 mysql> select count(*) from p1 where r1 = 2 ;
+----------+
| count(*) |
+----------+
|   624599 |
+----------+
1 row in set (0.56 sec)

mysql> select count(*) from p2 where r1 = 2 ;
+----------+
| count(*) |
+----------+
|   624599 |
+----------+
1 row in set (0.45 sec)

Take a look at the execution plan comparison: table p1 scans 26 partitions, table p2 scans only one partition, and table p2 has a relatively small number of partitions.

 mysql> explain select count(*) from p1 where r1 = 2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p1
   partitions: p211,p212,p213,p214,p215,p221,p222,p223,p224,p225,p231,p232,p233,p234,p235,p241,p242,p243,p244,p245,p251,p252,p253,p254,p255,p311
         type: ALL
...
         rows: 648074
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from p2 where r1 = 2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p2
   partitions: p3
         type: ALL
...
         rows: 623239
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

What if the field r1 is removed? The execution time is also similar, both table p1 and table p2 will scan all partitions.

 mysql> select count(*) from p1 where  r2 = 2;
+----------+
| count(*) |
+----------+
|   998700 |
+----------+
1 row in set (3.87 sec)

mysql> select count(*) from p2 where  r2 = 2;
+----------+
| count(*) |
+----------+
|   998700 |
+----------+
1 row in set (3.75 sec)

In view of this, let's discuss another question: Does the order of fields matter for multi-column partitioning?

This order needs to be explained one by one with the filter conditions corresponding to our query statement. Similar to the following two types of SQL:

 SQL 1： select * from p1 where r1 = 2 and r2 = 2 and r3 = 2;

For SQL 1, the order does not matter because all three columns are included at query time;

 SQL 2: select * from p1 where r1 = 2 and r2 = 2;

For SQL 2, both (r1,r2,r3) and (r2,r1,r3) are satisfied.

 SQL 3： select * from p1 where r2 = 2 and r3 = 2;

For SQL 3, both (r2,r3,r1) and (r3,r2,r1) are also satisfied.

Use the same method to create the partition table p3, the order of the partition fields is (r2, r3, r1):

 mysql> show create table p3\G
*************************** 1. row ***************************
       Table: p3
Create Table: CREATE TABLE `p3` (
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `r3` int DEFAULT NULL,
  `log_date` datetime DEFAULT NULL
) ENGINE=InnoDB 
/*!50500 PARTITION BY RANGE  COLUMNS(r2,r3,r1)
(PARTITION p111 VALUES LESS THAN (1,1,1) ENGINE = InnoDB,
...

For table p3: the execution time of the following SQL is dozens of times faster than that of table p1. Due to the different order of partition fields, table p1 needs to scan all partitions to get the result.

 mysql> select count(*) from p3 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
|   199648 |
+----------+
1 row in set (0.22 sec)

mysql> select count(*) from p1 where r2 = 1 and r3 = 4 ;
+----------+
| count(*) |
+----------+
|   199648 |
+----------+
1 row in set (5.05 sec)

So for a multi-column partitioned table, as mentioned at the beginning, the usage, precautions and usage scenarios of it and the joint index are also very similar. For some specific scenarios, using multi-column partitioning can significantly speed up query performance.

Issue 42: Is Multicolumn Partitioning Necessary in MySQL

爱可生开源社区

引用和评论

SQLShift 全新上线：Oracle→OceanBase 迁移利器

Open WebUI：开源AI交互平台的全面解析

被 Manus 带火的 MCP 是什么｜一文看懂

MySQL × 向量数据库：大模型时代的黄金组合实战指南

百万级群聊的设计实践

分布式数据库解析

MySQL Binlog 深度解析