Issue 37: Appropriate use of MySQL native table partitioning

The main engine used by the MySQL database is InnoDB. InnoDB does not have a native table splitting scheme similar to the MERGE engine, but there is a native partition table that splits the record set in a horizontal manner, which is transparent to the application side.

The existence of the partition table provides an additional choice for the retrieval request and daily management of the super-large table. Proper use of partition tables will greatly improve database performance.

The partition table has the following advantages:

Significantly improve the performance of certain queries.
Simplify daily data operation and maintenance workload and improve operation and maintenance efficiency.
Parallel query and balanced write IO.
Transparent to the application, no routing or middle layer is required to be deployed at the application layer.

Next, we use practical examples to make the first two advantages reflected and updated clearly.

For retrieval:

Optimize query performance (range query)

Split the appropriate partition table, for the same query, the number of scanned records is much less than that of the non-partitioned table, and the performance is far more efficient than that of the non-partitioned table.

The following example table t1 is a non-partitioned table, and the corresponding partition table is p1. The two tables have the same number of records, both of which are 1KW.

localhost:ytt> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int NOT NULL,
  `r1` date DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)


localhost:ytt> show create table p1\G
*************************** 1. row ***************************
       Table: p1
Create Table: CREATE TABLE `p1` (
  `id` int NOT NULL,
  `r1` date DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY RANGE (`id`)
(PARTITION p0 VALUES LESS THAN (1000000) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (2000000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (3000000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (4000000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (5000000) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (6000000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (7000000) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (8000000) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (9000000) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
1 row in set (0.00 sec)

localhost:ytt> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.94 sec)

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.92 sec)

Let's perform a range search on the two tables respectively. The following is the execution plan:

localhost:ytt> explain format=tree select count(*) from t1 where id < 1000000\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (t1.id < 1000000)  (cost=407495.19 rows=2030006)
        -> Index range scan on t1 using PRIMARY  (cost=407495.19 rows=2030006)

1 row in set (0.00 sec)

localhost:ytt> explain format=tree select count(*) from p1 where id < 1000000\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (p1.id < 1000000)  (cost=99980.09 rows=499369)
        -> Index range scan on p1 using PRIMARY  (cost=99980.09 rows=499369)

1 row in set (0.00 sec)

Table t1 compares the execution plan of table p1. In terms of cost and the number of scanned records, the former is several times more than the latter. Obviously, the performance of partitioned tables is more efficient than non-partitioned tables.

Let's take a look at the execution plan that does not equal to the retrieval of the two tables:

localhost:ytt> explain format=tree select count(*) from t1 where id != 2000001\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (t1.id <> 2000001)  (cost=1829866.58 rows=9117649)
        -> Index range scan on t1 using PRIMARY  (cost=1829866.58 rows=9117649)

1 row in set (0.00 sec)

localhost:ytt> explain format=tree select count(*) from p1 where id != 2000001\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (p1.id <> 2000001)  (cost=1002750.23 rows=4993691)
        -> Index range scan on p1 using PRIMARY  (cost=1002750.23 rows=4993691)

1 row in set (0.00 sec)

For such low-efficiency SQL, in terms of execution plan results, partitioned tables have advantages over non-partitioned tables in terms of cost and number of scanned records.

###### Optimize write performance (UPDATE with filter).

For this type of update request, partitioned tables are also more efficient than non-partitioned tables.

The following is a comparison of the execution plan of a non-partitioned table and a partitioned table in an update scenario of equivalent filtering: just look at the number of scan rows, and the number of scanned records of a partitioned table is less than that of a non-partitioned table.

localhost:ytt> explain update t1 set r1 = date_sub(current_date,interval ceil(rand()*5000) day) where id between 1000001 and 2990000\G
*************************** 1. row ***************************
           id: 1
  select_type: UPDATE
        table: t1
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 3938068
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

localhost:ytt> explain update p1 set r1 = date_sub(current_date,interval ceil(rand()*5000) day) where id between 1000001 and 2990000\G
*************************** 1. row ***************************
           id: 1
  select_type: UPDATE
        table: p1
   partitions: p1,p2
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 998738
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

For operation and maintenance:

Partitioned table data and non-partitioned data are exchanged.

The specific partition data of the partition table can be easily exported and imported, and can be quickly exchanged with non-partition table data.

Create a table t_p1 to exchange data with partition p1 of table p1.

localhost:ytt> create table t_p1 like t1;
Query OK, 0 rows affected (0.06 sec)

The partition p1 itself contains 100W rows of records. It only took 0.07 seconds to exchange data using the native data exchange function of the partition table.

localhost:ytt> alter table p1 exchange partition p1 with table t_p1;
Query OK, 0 rows affected (0.07 sec)

Looking at the exchanged data, there are 100W rows missing in table p1, partition p1 is cleared, and table t_p1 has 100W rows missing.

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
|  9000000 |
+----------+
1 row in set (0.79 sec)

localhost:ytt> select count(*) from t_p1;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.13 sec)

The data can be exchanged back at any time, and the exchanged table is emptied.

localhost:ytt> alter table p1 exchange partition p1 with table t_p1;
Query OK, 0 rows affected (0.77 sec)

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.91 sec)

localhost:ytt> select count(*) from t_p1;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

To compare the data exchange of non-partitioned tables, the steps are:

Select the interchange table to be exchanged.
Select data from the original table and import it into the interchange table.
Delete the data involved in the original table.

If you need to exchange the exchanged data back into the original table at this time, you need to repeat the above steps in reverse, which increases the difficulty of operation and maintenance and the operation is inefficient.

The biggest advantage of partition table replacement is that the amount of logs recorded by non-partitioned tables is much smaller. Let's do the above replacement operation again. Delete all binary logs.

localhost:ytt>reset master;

Query OK, 0 rows affected (0.02 sec)

Do a partition replacement

localhost:ytt>alter table p1 exchange partition p2 with table t_p1;
Query OK, 0 rows affected (2.42 sec)

Do replacement again to delete table t_p1 data

localhost:ytt>alter table p1 exchange partition p2 with table t_p1;
Query OK, 0 rows affected (0.45 sec)

At this time, the two replacement operations are recorded in the binary log ytt1.000001.

localhost:ytt>show master status;
...
 ytt1.000001 ： 47d6eda0-6468-11ea-a026-9cb6d0e27d15:1-2

Re-flushing the log, non-partitioned table replacement records.

localhost:ytt>flush logs;
Query OK, 0 rows affected (0.01 sec)


localhost:ytt>insert into t_p1 select * from p1 partition (p2) ;
Query OK, 934473 rows affected (5.25 sec)
Records: 934473  Duplicates: 0  Warnings: 0


localhost:ytt>show master status;
...
 ytt1.000002 ： 47d6eda0-6468-11ea-a026-9cb6d0e27d15:1-3

Let's take a look at the specific log file, ytt1.000001 only occupies 588 bytes, but ytt1.000002 takes up 7.2M.

root@ytt-pc:/var/lib/mysql/3306# ls -sihl ytt1.00000*
2109882 4.0K -rw-r----- 1 mysql mysql  588 7月  23 11:13 ytt1.000001
2109868 7.2M -rw-r----- 1 mysql mysql 7.2M 7月  23 11:14 ytt1.000002

###### Quickly clean up a single partition data.

The performance of deleting a single partition data is better than deleting a certain range of data from a non-partitioned table.

For example, to clear the partition table p1 and partition p0, directly truncate a single partition.

localhost:ytt> alter table p1 truncate partition p0;
Query OK, 0 rows affected (0.07 sec)

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
|  9000001 |
+----------+
1 row in set (0.92 sec)

Non-partitioned tables only have the function of truncate the entire table, so some data cannot be quickly cleaned up. Data can only be deleted based on filter conditions, and the performance is much worse. The same operation is dozens of times slower than a non-partitioned table.

localhost:ytt> delete from t1 where id < 1000000;
Query OK, 999999 rows affected (26.80 sec)

Summarize:

MySQL partition table is very efficient to use in many scenarios. This article introduces the basic advantages of partition table in simple retrieval and operation and maintenance. We will discuss the application of partition table in more scenarios one by one.

What else do you want to know about the technical content of MySQL? Hurry up and leave a message to tell the editor!

Issue 37: Appropriate use of MySQL native table partitioning

The partition table has the following advantages:

Next, we use practical examples to make the first two advantages reflected and updated clearly.

For retrieval:

Optimize query performance (range query)

For operation and maintenance:

Partitioned table data and non-partitioned data are exchanged.

Summarize:

爱可生开源社区

引用和评论

如何巧妙解决 Too many connections 报错？

Open WebUI：开源AI交互平台的全面解析

被 Manus 带火的 MCP 是什么｜一文看懂

MySQL × 向量数据库：大模型时代的黄金组合实战指南

百万级群聊的设计实践

MySQL Binlog 深度解析

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份