Function update｜DAS launches global workload optimization function to realize SQL automatic diagnosis
Introduction to DAS introduced a global Workload optimization function, which can detect database load changes in time, identify new SQL, SQL that executes changes, and SQL with poor performance, and comprehensively consider SQL execution frequency and related SQL information. Give optimization suggestions.
In daily database optimization, creating appropriate indexes on database tables is a very important and commonly used solution to solve the problem of slow SQL queries. During processing, DBAs or developers usually optimize based on the slow SQL information on the instance. DAS automatic SQL optimization function has realized automatic diagnosis based on slow SQL and created appropriate indexes. However, the program will face the following challenges:
- Data collection problem: Some business SQL does not reach the threshold of slow SQL collection (such as 1s), and these SQL queries themselves do not make good use of the index, the query efficiency is not high, and there is still a lot of room for optimization. In the case of increased concurrency or increased table data, these queries can easily cause a sudden deterioration in instance performance and cause failures.
- Write cost issue: When creating indexes, we usually pay more attention to improving the efficiency of data reading, while ignoring the impact of index maintenance on write performance and the cost of space occupation. For tables with more writes and less reads, creating too many indexes will affect it. System throughput.
- Workload change problem: Once an index is created, it usually rarely changes, but the business has been changing dynamically. As the business continues to iteratively change, some indexes may no longer have SQL use, or the frequency of use is very low. At this time, it is necessary to introduce a better index design to improve the processing performance of the database.
In order to solve the above problems, DAS introduced the global Workload optimization function, which can detect the load change of the database in time, identify the new SQL, the SQL that executes the change, and the SQL with poor performance, and comprehensively consider the execution frequency of the SQL and related SQL. Information, give optimization suggestions.
Global Workload optimization is mainly composed of three parts.
Workload detection: According to the performance indicators related to Workload on the database instance (such as RT, CPU, etc.) and full SQL-related indicators (execution times, execution time, scan rows, etc.), train the data model and detect the SQL execution of Workload in real time , So as to identify the new SQL, the SQL that executes the change, and the cycle of the entire load change.
As shown in the figure below, the full SQL execution status indicators are in periodic status in period1 and period2, until period3, the execution status changes. Global Workload optimization, according to the data training model, it is easy to realize the time interval of the load change.
Global diagnosis: Global diagnosis optimization is based on the execution of all SQL in the database within a certain time range, comprehensively considering SQL query and writing performance and space occupation, recommends the optimal index combination, so as to maximize the database performance from the SQL perspective Performance, reducing the probability of problems caused by the database.
Smart stress test: Smart stress test can replay all SQL in a certain period of time on the instance (this function will be explained in detail in related articles). After the global diagnosis and smart stress test are combined, the system can make recommendations based on the diagnosis on the test instance Automatically create indexes, replay historical traffic and compare SQL execution before and after the recommendations are adopted, and generate test reports.
Global workload diagnosis supports two modes: user-defined triggering and system automatic detection triggering: user triggering can set a time interval according to business needs, triggering global diagnosis to obtain optimization suggestions; automatic detection will monitor the load information of the instance in real time, and detect abnormal SQL in the database Appears, or finds that the overall trend of the Workload changes, and triggers the global workload diagnosis in time. Among them, abnormal SQL includes: (1) New SQL; (2) SQL with a fluctuation of more than 20% in the number of executions; (3) SQL with an average RT fluctuation of more than 20%, etc.
Through the automatic detection mechanism, it can help users find out the scenarios where the structural design lags behind business changes in time, reducing the probability of failure and waste of resources.
The data source for global workload diagnosis is SQL audit, including SQL type, SQL template, execution times, and SQL performance information. SQL audit records all SQL executed during the diagnosis period, so SQL problems that are not slow SQL but poor performance can be found.
Associated SQL analysis
By parsing the SQL template and metadata, the access relationships among SQL, tables, and columns can be analyzed, so that a set of SQLs that may affect each other can be obtained. Through correlation analysis, the complexity of subsequent problem solving can be effectively reduced, while providing basic data support for the performance tracking service after the index is online.
Candidate index generation and cost evaluation
This module and subsequent optimization solutions are the core modules of global workload optimization. In single-SQL index recommendation, the index can be recommended based on some rules or experience, and certain effects can be achieved, but the rule-based method of global workload optimization is almost ineffective, and the cost must be quantified. Our external optimizer based on DAS can quickly and accurately parse the syntax tree, sample and collect statistical information, generate candidate indexes, and calculate the cost of using an index.
In the case of determining the candidate index set and the index cost, the process of selecting the most index set can be equivalent to a variant of the knapsack problem. The benefit of choosing an index is equivalent to the value of the items put in the backpack. Since creating an index can bring positive benefits to queries and secondary benefits to writing and space costs, the value can be positive or negative. . The capacity of the backpack is the threshold for creating the most indexes on a table (user setting or system default, not a database storage constraint). Our goal is to maximize the value of the items in the backpack. In addition, it should be noted that when an index is selected, it will affect the value of other indexes. Therefore, when selecting items each iteration, it is necessary to update the value of the remaining indexes to be selected according to the existing index.
Index I cost = execution times* (a*read income-b*write cost-c*space occupation)
In order to ensure the effectiveness of the optimization suggestions, we integrate with the intelligent pressure test function to provide a quick and convenient verification solution. The intelligent stress testing system will automatically build test instances and synchronize real data, and then automatically adopt optimization suggestions in the test, play back the full amount of SQL during the diagnostic time period, collect the performance data of SQL execution, and finally generate a test report for comparison. The advantage of this solution is that it not only ensures the consistency of the test scenarios and online business, but also does not affect the online business, and at the same time can estimate the impact of adopting the suggestion.
For example, there are 6 SQLs in Table 1. If you look at each SQL independently, the resulting optimized index may be the 4 indexes in Table 2. From the perspective of the workload dimension, the indexes can be merged into two indexes in Table 3. Comparing the two results, the overall RT dropped by 14.45%, and the index space was saved by 50%.
SQL2 ： idx\_is\_deleted\_gmt\_modified (is\_deleted, gmt\_modified)
SQL4 : idx\_name(name)
SQL5: idx\_name\_id\_birth\_date (name, id, birth\_date)
SQL6: idx\_name\_nick\_name （name, nick\_name）
idx\_is\_deleted\_gmt\_modified (is\_deleted, gmt\_modified)
idx\_name\_id\_birth\_date (name, id, birth\_date)
Global workload optimization will create a closed loop of automatic optimization in the future, including workload anomaly detection, global workload diagnosis, intelligent pressure measurement effect evaluation, automatic adoption of recommendations, effect tracking, and exception handling. In addition, the current global workload optimization considers the impact of SQL execution frequency, SQL query and writing, but does not consider issues such as fixed parameters or parameter skew. These business attributes can be further taken into consideration later.
Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright, and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.