阿里云开发者

Introduction to AnalyticDB For MySQL provides users with efficient, real-time, feature-rich and intelligent "SQL Intelligent Diagnosis" and "SQL Intelligent Tuning" functions, providing users with ideas, directions and specific methods for tuning SQL performance. Reduce user costs and improve the efficiency of users using ADB

SQL is a simple and easy-to-use business logic expression language, but as the scan data volume and query complexity increase, query performance will become slower and slower. If you want to tune SQL, you often need to pay attention to the following parts:

Need to understand the engine architecture: Users often need to have a certain understanding of the SQL engine's architecture characteristics, in order to perfectly combine with the business data distribution characteristics and business scenario characteristics to perform data modeling, thereby designing a table structure that meets the characteristics of the SQL engine architecture .
SQL features are quite different: the SQL for ad hoc queries often varies greatly, including the number of tables participating in Join, Join conditions, number of fields for grouping and aggregation, and filter conditions.
Data characteristics are quite different: the user's data distribution characteristics will change with the changes in business characteristics. If you always follow the original modeling method and SQL statements, it is impossible to guarantee that you can play the biggest advantage of the SQL engine, data characteristics or Changes in the business model will cause SQL performance to fall back.

Based on this, AnalyticDB For MySQL (a new generation of cloud-native real-time data warehouse, syntax compatible with MySQL, hereinafter referred to as ADB) provides users with efficient, real-time, feature-rich and intelligent "SQL Intelligent Diagnosis" and "SQL Intelligent Tuning" Excellent" function, provides users with ideas, directions and specific methods for tuning SQL performance, reducing user costs and improving the efficiency of users using ADB.

Next, we use "Slow query discovery" + "Diagnosing slow query" two steps, combined with a scene Case, to introduce ADB's newly released "SQL intelligent diagnosis" function. (PS: "SQL Smart Tuning" will be released in subsequent versions)

One, find slow query

To locate the slow query, the user first needs to find the slow query. ADB's user console provides a variety of ways to help users, such as "Gantt Chart" and "Query List" etc., which can be searched in multiple dimensions to help users quickly locate slow queries and diagnose tools. Ensure that users can perform full query retrieval and analysis in the last two weeks.

(1) Gantt chart

Users can enter the SQL intelligent diagnosis function through "Cluster Console"-"Diagnosis and Optimization"-"SQL Diagnosis".

First, you will see a Gantt chart (also known as a swim lane chart, the query flows through different swim lanes, the swim lanes here are not the query queue of ADB, just to distinguish the queries executed at different times), the Gantt chart is graphical The method vividly shows the execution order of the query on the ADB instance. Each color block represents a query. The left side of the color block is the submission time of the query, the right side of the color block is the end time of the query, and the relative length of the color block indicates The execution time of a certain query, the color of the color block has no special meaning, just to distinguish different swim lanes.

Through the Gantt chart, users can intuitively see the time-consuming queries executed in the current time range, and can intuitively see which queries are executed in parallel and the time period of parallel execution, which can help users judge Find out which queries are affected by a certain Bad SQL. The density of color blocks can be used to intuitively determine whether the period of time when the ADB instance is under high pressure is related to the high concurrency of certain queries.

(2) Query list

The Gantt chart can reflect the time correlation between queries in an intuitive way, but when the user selects a large time range, the color blocks in the Gantt chart will be densely distributed and difficult to distinguish, and the indicators on the Gantt chart are relatively more Limited, users can use the query list function in the diagnostic tool at this time. The query list provides more than 10 important query level indicators, such as database name, user name, client segment IP, time consumption, memory consumption, and scan volume. These information and indicators can help users further determine the source and the source of slow queries. Resource consumption, etc.

In terms of advanced search capabilities, the diagnostic tool provides three types of search methods:

1. Fuzzy retrieval and precise retrieval: users can perform fuzzy matching based on keywords in SQL, and the precise retrieval function helps users to accurately retrieve this query after confirming the query ID;
2. The search conditions of the string type: The search tool will automatically identify the data name, user name, client IP and resource group used within the time range selected by the user, and provide a drop-down box for the user to choose and improve the user Retrieval efficiency;
3. Numerical search conditions: users can freely choose the index unit of the search, such as KB, MB, GB, etc., without manual conversion.

At the same time, when users use diagnostic tools, they often have download requirements for slow queries. After downloading slow queries, you can perform more customized slow query management and analysis in tools such as Excel, so we also provide a query list Download function.

2. Diagnose slow query

(1) Query execution process in ADB

Before introducing the ADB execution process, we need to briefly introduce three related basic concepts:

Stage

In the execution phase, the query in ADB will first be divided into multiple stages for execution according to whether shuffle is generated. A stage is a physical entity of a certain part of the execution plan. The data source of the stage can be the data in the underlying storage system or the data transmitted in the network. A stage is composed of tasks of the same type distributed on different computing nodes, and multiple tasks will process data in parallel.

Task

Task is the execution entity of a stage on an Executor node. Multiple tasks of the same type form a stage, and data is processed in parallel within the cluster.

Operator

Operator is the smallest data processing unit of ADB. ADB will decide whether to use parallel or serial execution to process data based on the semantics expressed by the operators or the dependencies between the operators.

Let's take a typical branch aggregation query as an example to understand the execution process of the query in ADB. The SQL statement is as follows:

SELECT COUNT(*), SUM(salary)

FROM emplayee

WHERE age>30 ADN age<32

GROUP BY sex

In ADB, the front-end Controller node first receives the SQL statement request, and performs statement analysis and syntax analysis (Parser), and finally uses the optimizer (Optimizer) to execute the final execution plan The plan will be divided into sub-plans according to the division principle of the Stage. As shown in the figure, Plan0, Plan1, and Plan2 are delivered to the corresponding nodes.

Among them, the sub-plan Plan2 will process data in the form of Task instances on 4 computing nodes in parallel. First, scan and filter the data, and then perform partial aggregation of the data. After processing, the data will be calculated downstream according to the sex field Repartition. The node, that is, the node of Stage1, performs the final aggregation of data according to the requirements of the sub-plan Plan1. Finally, the data is summarized by the node of Stage0 and returned to the client.

Like a typical data warehouse, ADB's execution plan is generally divided into "logical execution plan" and "physical execution plan":

Logical execution plan: understand the query processing flow at a macro level

The logical execution plan displays the processing logic of the query at a higher level. The rule-based execution plan (RBO) will determine whether the filter conditions can be pushed down, and the cost-based execution plan (CBO) will determine the order of multiple table associations, etc. . Therefore, the logical execution plan does not pay attention to the specific processing method during physical execution, such as whether multiple operators need to be fused to reduce function calls during execution, or the granularity of automatically generated code. These logical execution plans do not pay attention to this. This leads to logical execution plans that often only contain stage-level execution statistics. However, users often need statistical information accurate to the operator level when tuning.

Physical execution plan: understand the processing performance of each operator at a micro level

Compared with the logical execution plan, the physical execution plan contains the data processing flow diagram between the operators, and also contains the execution statistics of the operators. You can accurately see the memory occupied by a certain Join operator or aggregation operator, or See the amount of data before and after filtering by the filter operator. However, not all operator users need to be able to correctly understand its meaning, especially some physical operators and the user’s SQL statement can’t be correlated, which will also cause a greater problem for users to locate the physical execution plan alone. Doubts.

ADB’s "SQL Intelligent Diagnosis" function provides users with a fusion view of the logical execution plan and the physical execution plan. The user can use the fusion execution plan to understand the query processing flow from a macro level, or to understand each of them from a micro level. The processing performance of the operator can help users locate the performance bottleneck of the query more accurately and quickly.

(B) SQL self-diagnosis function

Although we provide converged and hierarchical execution plans to help users analyze query performance issues, we have found that users encounter difficulties in using converged execution plans in two types of scenarios:

Primary users of ADB

In order to reduce the cost of learning and migration for MySQL users, ADB has achieved most of the syntax and MySQL compatibility, but the backend of ADB is not the MySQL kernel, but a set of distributed data storage and distributed computing systems independently developed by itself, facing ADB The primary users of ADB often don’t know where the focus of optimization is and cannot start with the execution plan.

Complex SQL in ADB

For complex SQL, it often involves the join operation of hundreds of tables, the number of stages will reach more than a few hundred, the number of operators will reach thousands, and the execution plan graph is very large, even for advanced users of ADB, Faced with such a complicated execution plan, it is often impossible to start. The following figure is an execution plan diagram of 196 stages:

In response to the above problems, we have added SQL self-diagnosis capabilities in the execution plan diagram. The SQL self-diagnosis capabilities will reflect expert experience in the execution plan in the form of rules. For the first contact with ADB, the query execution can be determined based on the diagnosis results. For performance bottlenecks in the process, you can also learn key operators that need to be paid attention to in the ADB execution plan based on the diagnosis results. For complex execution plans, SQL self-diagnosis can help users quickly locate the positions in the execution plan that need to be tuned, and provides related tuning methods and documents, so that users can be more targeted in the tuning process.

The SQL self-diagnosis capability displays the diagnosis results and optimization suggestions through three levels: "Query level diagnosis results", "Stage level diagnosis results", and "Operator level diagnosis results".

We take an online complex SQL as an example to introduce an example of using execution plan and SQL self-diagnostic tools to locate performance problems. First, we use the slow query retrieval tool to search for a query that consumes a lot of memory. Click "Diagnose" to open the diagnosis page of the query, switch to the "Execution Plan" tab, and we will see that the query level diagnosis result has determined the current query memory, as shown in 1 in the following figure: 1613f050b7f332

At this time, we need to locate the reason for the larger memory effect. We click to sort by memory. You can see that on the right side, flashback sorting is performed according to the percentage of memory consumed by Stage. It is very obvious that Stage[1] occupies The current query of 87% of the memory ratio, we click on Stage[1], the diagnostic tool will automatically focus to the position of Stage[1] in the execution plan tree, click on Stage[1], we can see the execution of Stage[1] At the same time, we can see the position of 5, which reminds us that there is an operator inside Stage1 that takes up a lot of memory, but there is no detailed information, so next, we need to enter the inside of Stage[1], see See Stage[1] which operator occupies more memory.

Click "View Stage Execution Plan" to enter Stage[1]. First, we still sort by memory. You can see that the Join[317] operator occupies more than 99% of the memory of the entire Stage. Click The operator and the operator execution plan tree are automatically located to the current operator. At this time, we can see the detailed information of the operator's diagnosis result. The information reminds us that it takes up a large amount when constructing the Hash table user Left Join Memory, the diagnosis result also provides a link to the official tuning document. According to the tuning method given in the document, we can reduce the memory usage of the operator.

The above example uses "Query Level Diagnosis Results" and "Operator Level Diagnosis Results" to locate SQL performance problems. Let's look at an "Stage Level Diagnosis Results" .

As shown in the figure below, we can see that Stage[10] has the largest proportion of time-consuming after sorting by time-consuming. Click Stage[10] in the execution plan diagram, and you can see two types of diagnostic results in the diagnostic result column, one type It is "Stage Diagnosis", and the other is "Operator Diagnosis". The Stage diagnosis tells us that the output data of the current Stage is skewed, and tells us which fields are skewed (data skew is a problem that seriously affects performance in a distributed system , Stage output data skew will not only cause the current stage processing data to have a long tail in time, but also cause a long tail in downstream data processing). At the same time, you can see that there is an operator diagnosis result that prompts us that the table scan is skewed, then We can preliminarily determine that the current stage output data skew is caused by scanning a table with data skew. Next, we enter Stage[0] for positioning and analysis.

Enter the Stage memory, we sort according to the time-consuming, we can see that the TableScan operator is the most time-consuming, then we click on the TableScan operator, we can see that in the diagnosis result, there is detailed diagnosis result information about the data skew of the table. Due to the inappropriate selection of data distribution fields, the table has serious data skew problems. At the same time, you can see that there are related official tuning documents. According to the tuning documents, we can adjust to the appropriate distribution fields to reduce the data skew of the table. The impact of query performance.

Through the above two examples, we can see that the integration of execution plan and SQL self-diagnosis function can quickly help us locate query performance problems, and give certain tuning suggestions, reducing a lot of unnecessary time and effort The waste of ADB reduces the threshold for primary users to use ADB. For more diagnostic results of SQL self-diagnosis, please refer to the official website document: SQL self-diagnosis. There are currently 20+ diagnostic rules online, involving query-related memory consumption, time consumption, data skew, disk IO, and execution plan. More diagnostic rules will be launched in the future.

3. Follow-up planning

Through the above explanation and example analysis, we can see that the current diagnosis and tuning tools can help users to troubleshoot SQL performance problems in many aspects, but we still found that the actual online problem handling and on-duty Requirements when analyzing instance performance issues:

Which SQL should I tune?

When the user opens the diagnostic tuning page, facing tens of thousands or even tens of millions of SQL running on the instance, although the SQL that needs to be tuned can be sorted by time-consuming, memory consumption, or scanning volume, in fact, In fact, users lack the perspective of a specific diagnosis result, for example:

Which SQL is skewed by data scanning?
Which SQL is not efficient for index filtering?
Which SQL is the stage output skewed?
Which SQL is unreasonable for partition selection?

After the user completes the tuning of the specific diagnosis result of a certain SQL, he actually needs to know which similar SQL needs to be tuned, so we will provide the user with a tool to analyze the specific diagnosis result dimension in a one-time Solve a specific problem.

There is a problem with my SQL, is it related to the poor table creation method?

The ADB backend is a distributed data storage and distributed execution framework that relies on the data to be evenly distributed to each backend node. At the same time, ADB has designed different table types for different business scenarios, such as partitioned tables, replicated tables, and some tables. Aggregate storage of fields during storage will also improve query performance, but users often do not know which queries are affected by poor table creation methods. Later, we will "data modeling diagnosis results" and "query diagnosis results" . Users can quickly know which SQL is affected by the bad table structure through the diagnosis results of data modeling, and vice versa. The diagnosis result of a certain SQL knows which tables need to be optimized. The two types of diagnosis results are coordinated and optimized, which can solve the query performance problems of the instance from the root cause.

Fourth, summarize the outlook

"SQL Intelligent Diagnosis" The function has been launched recently. Users can quickly get started with actual business. If you have any questions, please feel free to send feedback in the ADB Developer Dingding Group (23128105).

In the future, ADB will combine users' actual business scenarios and usage scenarios, and continue to iteratively optimize, so that users can quickly "discover slow SQL", "diagnose slow SQL", "tune slow SQL", and focus more energy on business development. . ADB itself will also develop towards automation and intelligence. Students who are interested in solving the problems of the data warehouse system through massive data analysis are welcome to join us. If you are interested, please send your resume to: cubo.ly@alibaba-inc.com

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright, and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

Heavy release | A new generation of cloud-native data warehouse AnalyticDB "SQL intelligent diagnosis" function in detail