1

The previous article "The Specific Implementation of MySQL Time Class Partitioning" introduced the implementation method of time class partitioning. This article is an extension of the previous article and introduces the relevant SQL preparation precautions based on this type of partition.

There are only two kinds of retrievals for partitioned tables, one is with a partition key, and the other is without a partition key. Generally speaking, retrieval conditions with a partition key will execute faster, and without a partition key, the execution speed will be slower. This conclusion is suitable for most scenarios, but it cannot be generalized, and the most suitable SQL statement should be written for different partition table definitions. The purpose of using a partitioned table is to reduce the number of records retrieved by the SQL statement. If the expected effect is not achieved, the partitioned table can only bring about side effects. Next I list a few classic SQL statements:

Careful readers may have some doubts after reading the last article. The SQL statement based on table ytt_p1 is as follows:

select count(*) from ytt_pt1 where log_date >='2018-01-01' and log_date <'2019-01-01';

The same is the partition table ytt_pt1_month1. The SQL statement based on this table is as follows:

select count(*) from ytt_pt1_month1 where log_date in ('2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05','2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10','2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15');

The retrieval requirements of the two tables are similar, why is the writing method different? Why should the latter be written in list form and not continue to be written in simple range retrieval form? With this question in mind, we continue.

MySQL has an optimization technique called partition pruning for partitioned tables, which translates to partition pruning. The general meaning is that MySQL will calculate according to the partition function corresponding to the filter condition of the SQL statement, and penetrate the calculation result to the underlying partition table to reduce the number of scan records as an optimization strategy. For time types (DATE, TIMESTAMP, TIME, DATETIME), MySQL only supports partition pruning for some functions: to_days, to_seconds, year, unix_timestamp. Then let's look at the previous question: the partition function of the table ytt_pt1_month1 is month. Although the MySQL partition table supports the month function, the partition pruning technology does not include this function. Next, this article will be introduced in two parts.

First, to experience MySQL's partition pruning technology, create a new table pt_pruning: the partition function is to_days.
create table pt_pruning (
id int,
r1 int,
r2 int,
log_date date)
partition by range(to_days(log_date))
(
PARTITION p_01 VALUES LESS THAN (to_days('2020-02-01')) ENGINE = InnoDB,
 PARTITION p_02 VALUES LESS THAN (to_days('2020-03-01')) ENGINE = InnoDB,
 PARTITION p_03 VALUES LESS THAN (to_days('2020-04-01')) ENGINE = InnoDB,
 PARTITION p_04 VALUES LESS THAN (to_days('2020-05-01')) ENGINE = InnoDB,
 PARTITION p_05 VALUES LESS THAN (to_days('2020-06-01')) ENGINE = InnoDB,
 PARTITION p_06 VALUES LESS THAN (to_days('2020-07-01')) ENGINE = InnoDB,
 PARTITION p_07 VALUES LESS THAN (to_days('2020-08-01')) ENGINE = InnoDB,
 PARTITION p_08 VALUES LESS THAN (to_days('2020-09-01')) ENGINE = InnoDB,
 PARTITION p_09 VALUES LESS THAN (to_days('2020-10-01')) ENGINE = InnoDB,
 PARTITION p_10 VALUES LESS THAN (to_days('2020-11-01')) ENGINE = InnoDB,
 PARTITION p_11 VALUES LESS THAN (to_days('2020-12-01')) ENGINE = InnoDB,
 PARTITION p_12 VALUES LESS THAN (to_days('2021-01-01')) ENGINE = InnoDB,
 PARTITION p_max VALUES LESS THAN MAXVALUE ENGINE = InnoDB
)
This table contains data for the whole year of 2020, about 100W pieces, and the data creation process is omitted here.
(localhost:ytt)<mysql>select min(log_date),max(log_date),count(*) from pt_pruning;
+---------------+---------------+----------+
| min(log_date) | max(log_date) | count(*) |
+---------------+---------------+----------+
| 2020-01-02    | 2020-12-31    |  1000000 |
+---------------+---------------+----------+
1 row in set (0.72 sec)
Execute the following SQL respectively:

SQL 1: Find the number of records whose date contains '2020-01-02'.

SQL 1: select count(*) from pt_pruning where log_date <= '2020-01-02';

SQL 2 and SQL 3: Find the number of records in January 2020.

SQL 2: select count(*) from pt_pruning where log_date < '2020-02-01';

SQL 3:  select count(*) from pt_pruning where log_date between '2020-01-01' and '2020-01-31';

SQL 1 and SQL 2 execute in 0.04 seconds and SQL 3 executes in 0.06 seconds. The effect is still ideal under the condition that no index is used.

(localhost:ytt)<mysql> select count(*) from pt_pruning where log_date <= '2020-01-02';
+----------+
| count(*) |
+----------+
|     2621 |
+----------+
1 row in set (0.04 sec)

(localhost:ytt)<mysql>select count(*) from pt_pruning where log_date < '2020-02-01';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.04 sec)

(localhost:ytt)<mysql>select count(*) from pt_pruning where log_date between '2020-01-01' and '2020-01-31';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.06 sec)
So remember to use the partition function specified by the MySQL partition clipping technology to create the partition table, so that writing SQL will be relatively casual. If, for historical reasons, the partitioned table does not use the partitioning function specified above, there are two possible optimization strategies:
  1. Manually change the SQL statement to make it optimal.
  2. Add HINT to prompt MySQL to use a specific partition.
Second, if the partition function used by the partition table does not meet the rules of MySQL partition pruning technology, how to optimize such SQL statements?

To avoid confusion with the content in the previous article, create a new table pt_month and copy the table definition of table ytt_pt1_month1. The table pt_month is the same as the table pt_pruning, which stores records for the whole year of 2020, and the total number of records is also 100W.

(localhost:ytt)<mysql>select min(log_date),max(log_date),count(*) from pt_month;
+---------------+---------------+----------+
| min(log_date) | max(log_date) | count(*) |
+---------------+---------------+----------+
| 2020-01-02    | 2020-12-31    |  1000000 |
+---------------+---------------+----------+
1 row in set (0.72 sec)

Execute the previous three SQLs again and replace the table name with pt_month:

The execution time of SQL 1 is 1.26 seconds, which is much slower than before. Looking at the execution plan, it was found that the MySQL partition pruning technology was not used, and unnecessary table partitions were scanned. (here are all table partitions)

(localhost:ytt)<mysql>select count(*) from pt_month where log_date <= '2020-01-02';
+----------+
| count(*) |
+----------+
|     2621 |
+----------+
1 row in set (1.26 sec)

(localhost:ytt)<mysql>explain 
    -> select count(*) from pt_month where log_date <= '2020-01-02'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: pt_month
   partitions: p_01,p_02,p_03,p_04,p_05,p_06,p_07,p_08,p_09,p_10,p_11,p_max
...
         rows: 992805
     filtered: 33.33
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

Next, perform a simple optimization on SQL 1: since it is looking for records with a date of '2020-01-02', then do not use <= to filter, but directly use = to filter: the execution time is 0.03 seconds. Looking at the execution plan, the modified SQL is directly located in the table partition p_01, which achieves the effect of partition pruning.

(localhost:ytt)<mysql>select count(*) from pt_month where log_date = '2020-01-02';
+----------+
| count(*) |
+----------+
|     2621 |
+----------+
1 row in set (0.03 sec)

(localhost:ytt)<mysql>explain 
    -> select count(*) from pt_month where log_date = '2020-01-02'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: pt_month
   partitions: p_01
         type: ALL
...
         rows: 82522
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

Continue to execute SQL 2 and SQL 3: the execution time is between 1 second and 2 seconds, the efficiency is very poor, and the MySQL partition pruning technology is not used.

(localhost:ytt)<mysql>select count(*) from pt_month where log_date < '2020-02-01';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (1.35 sec)

(localhost:ytt)<mysql>select count(*) from pt_month where log_date between '2020-01-01' and '2020-01-31';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (1.93 sec)

To continue to optimize SQL 2 and SQL 3, since the two requirements are consistent, the range retrieval can be changed to the specified list retrieval: the execution time is only 0.04 seconds.

(localhost:ytt)<mysql>select count(*) from pt_month where log_date in ('2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05','2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10','2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15','2020-01-16','2020-01-17','2020-01-18','2020-01-19','2020-01-20','2020-01-21','2020-01-22','2020-01-23','2020-01-24','2020-01-25','2020-01-26','2020-01-27','2020-01-28','2020-01-29','2020-01-30','2020-01-31');
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.04 sec)

After changing the range query to an IN list, the efficiency is greatly improved, and the query plan shows that the MySQL optimizer is only retrieving records on partition p_01.

...
   partitions: p_01
...

In addition to transforming SQL statements, you can also add HINT to the statement to allow MySQL to use partition pruning technology: for example, after adding HINT to SQL 2, the execution time is 0.04 seconds, which is comparable to the execution efficiency of the previously transformed statement.

(localhost:ytt)<mysql>select count(*) from pt_month partition (p_01) where log_date < '2020-02-01';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.04 sec)
Summarize:

If the partition table does not use the MySQL partition pruning technology due to historical reasons, you can manually prune and optimize the partition table according to the following rules:

  1. select * from tbname where partition_key = value;
  2. select * from tbname where partition_key in (value1,value2,...,valueN);
  3. The above two rules still apply to multi-table JOIN.

爱可生开源社区
426 声望207 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。


引用和评论

0 条评论