Issue 43: How to Make Good Use of Partition Tables in Multi-Table Association Scenarios

How to properly use partitioned tables to improve query performance in multi-table association scenarios? Based on the previous introductions about partitioned tables, you must have a very comprehensive understanding of MySQL partitioned tables: the purpose of partitioned tables is to reduce the amount of data retrieved each time and improve overall performance.

The previous articles introduced single-table applications based on partitioned tables. Is there a certain performance improvement for partitioned tables when multiple tables are associated? People often ask such questions: I use a partition table, but the query is not faster at all, but slower, what is the reason? Is the partition table itself defective? Or am I not understanding the scenario where partition table fits? For these questions, I will use several types of typical query scenarios to illustrate today.

The first scenario: two tables are associated, the association key is the partition key, but there is no filter condition.

Something like this: select * from t1 inner join t2 using(id);

Using partitioned tables in such scenarios will only make query performance worse, and will not speed up query performance.

When the partition table is not used, the number of table associations is only two; when the partition table is used, the number of tables involved in the table association is not only two, but also many table partitions. The more the number of partitions, the worse the query performance.

For a simple example: table t1 is a hash partition table with 1000 partitions and 50W rows of records.

 localhost:ytt>show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 
/*!50100 PARTITION BY HASH (`id`)
PARTITIONS 1000 */
1 row in set (0.00 sec)

Table t1_no_pt is an ordinary table, a clone of table t1, but the table partition is removed, and the number of records is also 50W.

 localhost:ytt>show create table t1_no_pt\G
*************************** 1. row ***************************
       Table: t1_no_pt
Create Table: CREATE TABLE `t1_no_pt` (
  `id` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 
1 row in set (0.00 sec)

The query performance comparison of the two tables in this scenario: For the associated query between the partition table and the ordinary table, the execution time is 6.76 seconds.

 localhost:ytt>select count(*) from t1_no_pt a inner join t1 b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (6.76 sec)

The associated query between two partitioned tables takes 4.32 seconds to execute.

 localhost:ytt>select count(*) from t1 a inner join t1 b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (4.32 sec)

The associated query between two ordinary tables only took 0.87 seconds to execute.

 localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (0.87 sec)

The same query, partitioned table is even worse in such a scenario.

The second scenario: two tables are associated, the association key is the partition key, but there are filter conditions.

This is subdivided into two sub-scenarios:

1. The filter condition is the partition key

A query like this: select * from t1 inner join t2 using(id) where t1.id = xxx;

In this scenario, it is recommended to use a partition table! The filter condition is the partition key and it is an equal-value query. Finally, the optimizer will locate a fixed table partition to reduce the number of retrieved records, which is perfectly suitable for partitioned tables.

Again, take a simple example with table t1 and table t1_no_pt:

The two-partition table is associated and the filter condition is the partition key, and the execution time is 0.01 seconds.

 localhost:ytt>select count(*) from t1 a inner join t1 b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
|       81 |
+----------+
1 row in set (0.01 sec)

The association of two ordinary tables, under the same conditions, takes 0.55 seconds to execute, which is many times slower than the association of two partitioned tables.

 localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
|       81 |
+----------+
1 row in set (0.55 sec)

Using a partition table to associate with a normal table, the execution time is 0.32 seconds, which is between the first two.

 localhost:ytt>select count(*) from t1 a inner join t1_no_pt b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
|       81 |
+----------+
1 row in set (0.32 sec)

Comparing the execution plan of a two-partition table association with that of two ordinary tables, it will be more obvious: the associated cost of the partition table is 381.9, and the number of scanned rows is 280; the associated cost of the ordinary table is 249264389.78, and the number of scanned rows is 249125777. At this time, the performance improvement of partition table association is very obvious!

 localhost:ytt>explain  format=tree select count(*) from t1 a inner join t1 b using(id) where a.id = 19172\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Inner hash join (no condition)  (cost=381.90 rows=280)
        -> Filter: (b.id = 19172)  (cost=1.02 rows=53)
            -> Table scan on b  (cost=1.02 rows=529)
        -> Hash
            -> Filter: (a.id = 19172)  (cost=53.65 rows=53)
                -> Table scan on a  (cost=53.65 rows=529)

1 row in set (0.00 sec)

localhost:ytt>explain  format=tree select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.id = 19172\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Inner hash join (no condition)  (cost=249264389.78 rows=249125777)
        -> Filter: (b.id = 19172)  (cost=1.87 rows=49913)
            -> Table scan on b  (cost=1.87 rows=499125)
        -> Hash
            -> Filter: (a.id = 19172)  (cost=50257.25 rows=49913)
                -> Table scan on a  (cost=50257.25 rows=499125)

1 row in set (0.00 sec)

2. The filter condition is not a partition key

A query like this: select * from t1 inner join t2 using(id) where t1.r1 = xxx;

In this scenario, the partition table will not bring about a performance improvement, but will cause a sharp drop in performance.

Still using table t1 and table t1_no_pt as an example: the association between the two partition tables, the execution time is 6.16 seconds.

 localhost:ytt>select count(*) from t1 a inner join t1 b using(id) where a.r1 = 10;
+----------+
| count(*) |
+----------+
|    50552 |
+----------+
1 row in set (6.16 sec)

The two ordinary tables are associated, and the execution time is 0.7 seconds, which is much faster than the partition table.

 localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.r1 = 10;
+----------+
| count(*) |
+----------+
|    50552 |
+----------+
1 row in set (0.70 sec)

The third scenario: two tables are associated, the association key is not the partition key, but the filter condition is the partition key.

For such a scenario, the partition table also cannot bring performance improvement!

The two-partition table association performance is poor, with an execution time of 6.05 seconds.

 localhost:ytt>select count(*) from t1 a inner join t1 b using(r1) where a.id = 19172;
+----------+
| count(*) |
+----------+
|   225868 |
+----------+
1 row in set (6.05 sec)

The association performance of the two ordinary tables is much better, and the execution time is 0.54 seconds.

 localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(r1) where a.id = 19172;
+----------+
| count(*) |
+----------+
|   225868 |
+----------+
1 row in set (0.54 sec)

Since the filter condition is the partition key, consider making the partition table associated with the normal table.

Change the previous SQL and use the filtered partition table data to associate with the ordinary table, so the performance is better than the association between two ordinary tables: the execution time is 0.39 seconds.

 localhost:ytt>select count(*) from (select  * from t1 a where a.id = 19172) t inner join t1_no_pt b using(r1);
+----------+
| count(*) |
+----------+
|   225868 |
+----------+
1 row in set (0.39 sec)

The fourth scenario: The partition table is associated, and the association key is also the partition key, but the partitioning algorithms or the number of partitions between the two partition tables are different.

Table t2 and table t1 have the same structure and the same number of records, but the number of partitions is different. Table t1 has 1000 partitions, and table t2 has only 50 partitions:

 localhost:ytt>show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `id` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 
/*!50100 PARTITION BY HASH (`id`)
PARTITIONS 50 */
1 row in set (0.01 sec)

Based on this, two partition tables are associated: the execution time is 6.43 seconds.

 localhost:ytt>select count(*) from t1 a inner join t2 b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (6.43 sec)

Similarly, two ordinary tables are associated: the execution time is 1.98 seconds. Execution time is faster than partitioned table.

 localhost:ytt>select count(*) from t1_no_pt a inner join t2_no_pt b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (1.98 sec)

The reasons for the above performance differences have been partially mentioned in previous articles and will not be described here.

That's a summary based on whether the table association should use a partitioned table:

Use partition table for association, it is best to meet the following conditions, otherwise it will be counterproductive:

The partition key is the association condition.
If the partition key is a non-associative condition, the filter condition must be the partition key.
The partition method of the two-partition table, the number of partitions must be the same.

Issue 43: How to Make Good Use of Partition Tables in Multi-Table Association Scenarios

The first scenario: two tables are associated, the association key is the partition key, but there is no filter condition.

The second scenario: two tables are associated, the association key is the partition key, but there are filter conditions.

This is subdivided into two sub-scenarios:

1. The filter condition is the partition key

2. The filter condition is not a partition key

The third scenario: two tables are associated, the association key is not the partition key, but the filter condition is the partition key.

The fourth scenario: The partition table is associated, and the association key is also the partition key, but the partitioning algorithms or the number of partitions between the two partition tables are different.

That's a summary based on whether the table association should use a partitioned table:

爱可生开源社区

引用和评论

如何巧妙解决 Too many connections 报错？

Open WebUI：开源AI交互平台的全面解析

被 Manus 带火的 MCP 是什么｜一文看懂

MySQL × 向量数据库：大模型时代的黄金组合实战指南

百万级群聊的设计实践

分布式数据库解析

MySQL Binlog 深度解析