This chapter will introduce how to adjust and stabilize the query plan through the tuning methods provided by TiDB when the query execution plan generated by the optimizer is not in line with expectations. This article analyzes the adjustment and optimization principle of query execution plan. It mainly introduces how to use HINT to adjust the query execution plan, and how to use TiDB SPM to bind the query execution plan of the query statement. Finally, it will introduce some planning Features.

Related Reading:

TiDB Query Optimization and Tuning Series (1) Introduction to TiDB Optimizer

TiDB Query Optimization and Tuning Series (2) Introduction to TiDB Query Plan

TiDB Query Optimization and Tuning Series (3) Slow Query Diagnosis, Monitoring and Troubleshooting

Use HINT to adjust the query execution plan When the optimizer selects an unexpected or suboptimal execution plan, the user needs to use Hint to adjust the execution plan. TiDB is compatible with MySQL's USE INDEX, FORCE INDEX, and IGNORE INDEX syntax, and developed TiDB's own Optimizer Hints syntax, which is based on the comment-like syntax introduced in MySQL 5.7, such as /+ TIDB_XX(t1, t2) / . The following is a list of Hint syntax currently supported by TiDB:

Use USE INDEX, FORCE INDEX, IGNORE INDEX
Similar to MySQL, a query plan that does not use the expected index is a common cause of slow queries. In this case, USE INDEX should be used to specify the index used for the query. For example, in the following example, USE/FORCE INDEX makes the original full table scan SQL become through the index scanning.

mysql> explain select * from t;
id estRows task access object operator info
TableReader_5 8193.00 root data:TableFullScan_4
└─TableFullScan_4 8193.00 cop[tikv] table:t keep order: false

2 rows in set (0.00 sec)

mysql> explain select * from t use index(idx_1);
id estRows task access object operator info
IndexLookUp_6 8193.00 root
├─IndexFullScan_4(Build) 8193.00 cop[tikv] table:t, index:idx_1(a) keep order: false
└─TableRowIDScan_5(Probe) 8193.00 cop[tikv] table:t keep order: false

3 rows in set (0.00 sec)

mysql> explain select * from t force index(idx_1);
id estRows task access object operator info
IndexLookUp_6 8193.00 root
├─IndexFullScan_4(Build) 8193.00 cop[tikv] table:t, index:idx_1(a) keep order: false
└─TableRowIDScan_5(Probe) 8193.00 cop[tikv] table:t keep order: false

3 rows in set (0.00 sec)
The following example IGNORE INDEX makes the SQL that was originally indexed into a full table scan

mysql> explain select a from t where a=2;
id estRows task access object operator info
IndexReader_6 1.00 root index:IndexRangeScan_5
└─IndexRangeScan_5 1.00 cop[tikv] table:t, index:idx_1(a) range:[2,2], keep order:false

2 rows in set (0.00 sec)

mysql> explain select a from t ignore index(idx_1) where a=2 ;
id estRows task access object operator info
TableReader_7 1.00 root data:Selection_6
└─Selection_6 1.00 cop[tikv] eq(test.ta, 2)
└─TableFullScan_5 8193.00 cop[tikv] table:t keep order: false

3 rows in set (0.00 sec)
Unlike MySQL, currently TiDB does not distinguish between USE INDEX and FORCE INDEX. USE INDEX is recommended when there are multiple indexes on the table. TiDB tables are relatively large, and analyzing tables will have a great impact on cluster performance, so statistics cannot be updated frequently. At this time, use USE INDEX to ensure the correctness of the query plan

Using JOIN HINT
TiDB's current table join methods include Sort Merge Join, Index Nested Loop Join, and Hash Join. For the implementation details of each join method, please refer to the TiDB source code reading series syntax:

TIDB_SMJ(t1, t2)
SELECT / + TIDB_SMJ(t1, t2) /* from t1, t2 where t1.id = t2.id;
Prompt the optimizer to use the Sort Merge Join algorithm. In simple terms, it is to sort the two tables of the Join according to the join attribute, and then perform a scan and merge, and then the final result can be obtained. This algorithm usually takes less memory, but the execution time will be longer. When the amount of data is too large, or the system memory is insufficient, it is recommended to try it.

TIDB_INLJ(t1, t2)
SELECT / + TIDB_INLJ(t1, t2) /* from t1, t2 where t1.id = t2.id;
Prompt the optimizer to use the Index Nested Loop Join algorithm. Index Look Up Join will read the data of the outer table and query the primary key or index key of the inner table. This algorithm may be faster in some scenarios and consume less system resources. will be slower and consume more system resources. For scenarios where the outer table is filtered by the WHERE condition and the result set is small (less than 10,000 rows), you can try to use it. The parameter in TIDB_INLJ() is the candidate table for the inner table when building the query plan. That is, TIDB_INLJ(t1) will only consider building a query plan using t1 as the inner table

TIDB_HJ(t1, t2)
SELECT / + TIDB_HJ(t1, t2) /* from t1, t2 where t1.id = t2.id;
Prompt the optimizer to use the Hash Join algorithm. In simple terms, the Hash Join between the t1 table and the t2 table requires us to select an Inner table to construct a hash table, and then go to this hash table for each row of data in the Outer table to check whether there is any Matching data This algorithm is executed concurrently by multiple threads, and the execution speed is faster, but it will consume more memory.

In addition, other hint syntaxes are also under development such as /+ TIDB_STREMAGG() / , /+ TIDB_HASHAGG() / etc.

Using Hint usually adjusts the behavior of the execution plan by modifying the SQL statement when the execution plan changes, but sometimes it is necessary to intervene in the selection of the execution plan without modifying the SQL statement. Execution plan binding provides a set of functions that allow the selection of a specified execution plan without modifying the SQL statement.

Use MAX_EXECUTION_TIME(N)
MAX_EXECUTION_TIME(N) can be used in statements such as SELECT, which will limit the execution time of the statement to no more than N milliseconds, otherwise the server will terminate the execution of this statement. For example, the following example sets a timeout of 1 second

SELECT / + MAX_EXECUTION_TIME(1000) / * FROM t1
In addition, the environment variable MAX_EXECUTION_TIME will also limit the execution time of the statement. For high-availability and time-sensitive services, it is recommended to use MAX_EXECUTION_TIME to avoid incorrect query plans or bugs affecting the performance and even stability of the entire TiDB cluster. The OLTP service query timeout generally does not exceed 5 seconds. It should be noted that the query timeout setting of MySQL jdbc has no effect on TiDB. When the real client perceives the timeout, it sends a KILL command to the database, but since tidb-server is load balanced, tidb-server will not execute this KILL to prevent the connection from being terminated on the wrong tidb-server. At this time, MAX_EXECUTION_TIME should be used to ensure the effect of query timeout.

Using SPM to bind the query execution plan The execution plan is a very critical factor that affects the SQL execution performance, and the stability of the SQL execution plan also has a great impact on the efficiency of the entire cluster. However, the optimizer cannot guarantee a good execution plan when statistics are out of date, indexes are added or removed, etc. At this point, the execution plan may change unexpectedly, causing the execution time to be too long. Therefore, TiDB provides the SQL Plan Management function, which is used to bind execution plans (SQL Bind) for certain types of SQL, and the bound execution plans will continue to evolve according to changes in data (Note: the evolution function is not yet GA) .

SQL Bind is the first step in SQL Plan Management. Using it, users can bind execution plans for a certain type of SQL. When the execution plan is not optimal, you can use SQL Bind to quickly repair the execution plan without changing the business. To create a binding you can use the following SQL:

CREATE [GLOBAL | SESSION] BINDING FOR SelectStmt USING SelectStmt;
The statement can bind an execution plan for SQL within the GLOBAL or SESSION scope. When no scope is specified, the default scope is SESSION. The bound SQL is parameterized and stored in system tables. When processing SQL queries, the corresponding optimizer Hint can be used as long as the parameterized SQL matches a bound SQL in the system table.

"Parameterization" refers to replacing the constants in SQL with "?", unifying the upper and lower case of the statement, and cleaning up redundant spaces, line breaks and other operations. Create a binding example:

TiDB(root@127.0.0.1:test) > create binding for select from t where a = 1 using select from t use index(idx_a) where a = 1;
Query OK, 0 rows affected (0.00 sec)
Looking at the binding just created, Original_sql in the following output is the parameterized SQL:

TiDB(root@127.0.0.1:test) > show bindings;
Original_sql Bind_sql Default_db Status Create_time Update_time Charset Collation
select * from t where a = ? select * from t use index(idx_a) where a = 1 test using 2020-03-08 14:00:28.819 2020-03-08 14:00:28.819 utf8 utf8_general_ci

1 row in set (0.00 sec)
If you want to delete the created binding, you can use the following statement:

TiDB(root@127.0.0.1:test) > drop binding for select * from t where a = 1;
Query OK, 0 rows affected (0.00 sec)

TiDB(root@127.0.0.1:test) > show bindings;
Empty set (0.00 sec)
In order to solve the problem that Bindings can only be created manually, TiDB in version 4.0 provides the function of automatically creating Bindings. By setting the value of the tidb_capture_plan_baselines variable to on, you can automatically create bindings for SQL that appears multiple times in a certain period of time. TiDB will create bindings for those SQLs that appear at least twice, and counting the occurrences of SQLs depends on the Statements Summary function provided in TiDB 4.0. The switch for automatically creating bindings for SQL that appears more than twice can be turned on by:

set tidb_enable_stmt_summary = 1; -- enable statement summary
set tidb_capture_plan_baselines = 1; -- Enable the automatic binding function and then run the following query twice in a row to automatically create a binding for it:

TiDB(root@127.0.0.1:test) > select * from t;
Empty set (0.01 sec)

TiDB(root@127.0.0.1:test) > select * from t;
Empty set (0.00 sec)
Look at the global bindings again to find the automatically created bindings:

TiDB(root@127.0.0.1:test) > show global bindings;
Original_sql Bind_sql Default_db Status Create_time Update_time Charset Collation
select * from t SELECT / + USE_INDEX(@ sel_1 test . t ) /* FROM t test using 2020-03-08 14:09:30.129 2020-03-08 14:09:30.129

1 row in set (0.00 sec)
Diagnosis and tuning functions in other optimizer development or planning are aimed at monitoring, diagnosing, troubleshooting, and tuning query execution plans. In addition to the methods and functions described in the above chapters, TiDB optimizer has developed some internal functions. More related functions are being developed or planned to be developed and will be released in subsequent versions. These features include but are not limited to:

Plan Change Capture: used to verify whether the query execution plan regression/change will be caused during the upgrade;

Plan Replayer: It is used to collect relevant information of user's problem query with one click, and import it into TiDB for problem recurrence and query plan regression care with one click;

Optimizer Trace: It is used to collect and monitor the internal optimization logic flow of the optimizer, improve the problem diagnosis ability and efficiency of the user site, and provide data input for subsequent feedback optimization based on diagnostic monitoring;

Visual Explain: Graphical display of query plans, especially for complex query execution plans, can improve efficiency, and can integrate more diagnostic information in the follow-up;

Optimizer Diagnosis and Advisor: Optimizer self-diagnosis and optimization suggestion function; and integrated with TiDB Dashboard, Auto Pilot, etc.;

SPM extension: add multi-baseline plan version binding, improve the evolution of binding plan;

Plan Hint: Improve and provide a richer Plan Hint;

This article is the fourth article in the "TiDB Query Optimization and Tuning" series. It introduces in detail how to adjust and optimize the query execution plan through TiDB HINT and SPM, and briefly lists the diagnostic and tuning functions in other optimizer development or planning. Wait. The next article is the last of the series, and will introduce the practice of TiDB query optimization through several specific cases.

If you have any suggestions for TiDB products, welcome to communicate with us at internals.tidb.io.


PingCAP
1.9k 声望4.9k 粉丝

PingCAP 是国内开源的新型分布式数据库公司,秉承开源是基础软件的未来这一理念,PingCAP 持续扩大社区影响力,致力于前沿技术领域的创新实现。其研发的分布式关系型数据库 TiDB 项目,具备「分布式强一致性事务...