Take you to understand the performance monitoring report of WDR-GaussDB (DWS)

Abstract: Through this article, readers can know what WDR is, how to create performance data snapshots and generate WDR reports.

This article is shared from the Huawei Cloud Community " WDR-GaussDB (DWS) Performance Monitoring Report ", author: Zhang Jingyao.

GaussDB (DWS) 8.1.1 version introduced the load diagnosis report (Workload Diagnosis Report, hereinafter referred to as WDR) function, which can provide performance data within a specified time period and present it to users in the form of an html web page report. By analyzing the report, it can help users find anomalies, diagnose problems, optimize performance, etc. The content is rich and intuitive, and it is a powerful tool for database tuning.

This article briefly introduces the principles and usage of WDR. Through this article, readers can know what WDR is, how to create performance data snapshots and generate WDR reports.

1. Introduction to WDR

WDR is a derivative of GaussDB (DWS) database monitoring feature. The database kernel has accumulated a large amount of first-hand data during its operation, such as the number of executed various SQLs, the number and time of table and index access, and the running status of the underlying software and hardware such as CPU and memory. Users can query these data in real time through the system view provided by the database. For example, query the PGXC_WORKLOAD_SQL_COUNT view to get the number of executions of various types of SQL since the database was started:

postgres=# select * from PGXC_WORKLOAD_SQL_COUNT;
 node_name |   workload   | select_count | update_count | insert_count | delete_count | ddl_count | dml_count | dcl_count
-----------+--------------+--------------+--------------+--------------+--------------+-----------+-----------+-----------
 cn_5001   | default_pool |      1125616 |         1236 |      322303 |            0 |        13 |   1573048 |       102
 cn_5002   | default_pool |      1128231 |         1322 |       321048 |            0 |         2 |   1575517 |    173973
 cn_5003   | default_pool |      1128191 |         1130 |       309138 |            0 |         0 |   1575430 |    173980
(3 rows)

Save these first-hand performance monitoring data and perform automated statistical analysis to know the operating conditions of the database during a certain period of time, such as whether it is busy, whether there is unreasonable SQL, and what abnormal events are there, so as to diagnose the problem , Performance tuning provides a reference basis. This is the origin of WDR.

Figure 1 WDR principle

2. Create a performance snapshot

As mentioned in the previous chapter, the monitoring data must be saved before analysis. For performance reasons, the database kernel puts various monitoring data in the memory for quick update and read. Once the process restarts, these data will be lost. Therefore, the first thing WDR does is to save the detection data in the memory to the peripheral. These saved performance monitoring data are called "snapshots", and the process of saving is called "snapshot creation".

Be careful not to confuse the "snapshot" here with the snapshot of the database. The latter refers to the state of the data in the Database at a certain moment, and the "snapshot" created by WDR refers to saving the content of the system view queried at a certain moment in a special table.

The snapshot created by WDR is saved in a table under the dbms_om schema. Each system view corresponds to a table, the name of the table = "dbms_om.snap_" + view name. For example, the snapshot of the PGXC_WORKLOAD_SQL_COUNT view is saved in the dbms_om.snap_pgxc_workload_sql_count table.

WDR supports creating snapshots in two ways. One is created periodically by a background thread, the time interval is specified by the wdr_snapshot_interval parameter, and the default is created once an hour. The other is to execute the create_wdr_snapshot system function to create in real time. Note that this function requires administrator privileges.

postgres=# select create_wdr_snapshot();
           create_wdr_snapshot           
------------------------------------------
 WDR snapshot request has been submitted.
(1 row)

No matter how it is created, the enable_wdr_snapshot parameter needs to be set to on.

Every time a snapshot is taken, several rows will be added to the snapshot-related table under dbms_om. You can view the ID and the start and end time of the existing snapshot through the table dbms_om.snapshot. E.g:

postgres=# select * from dbms_om.snapshot order by snapshot_id desc limit 5;
 snapshot_id |           start_ts            |            end_ts            
-------------+-------------------------------+-------------------------------
         312 | 2021-05-29 12:36:32.509763+08 | 2021-05-29 12:36:39.938167+08
         311 | 2021-05-29 11:36:32.489812+08 | 2021-05-29 11:36:40.23865+08
         310 | 2021-05-29 10:36:32.214843+08 | 2021-05-29 10:36:37.436173+08
         309 | 2021-05-29 09:48:11.587959+08 | 2021-05-29 09:48:18.247602+08
         308 | 2021-05-29 09:36:31.687527+08 | 2021-05-29 09:36:36.897983+08
(5 rows)

Taking the first line as an example, it can be seen that the snapshot with ID=312 was created from 2021-05-29 12:36:32 and completed at 2021-05-29 12:36:39. The snapshot_id is the unique identifier of each snapshot.

WDR will take snapshots of multiple system views each time, and the start and end time of a single view snapshot can be queried through the dbms_om.tables_snap_timestamp table. Take the snapshot with ID=312 as an example:

postgres=# select * from dbms_om.tables_snap_timestamp where snapshot_id=312;
 snapshot_id | db_name  |            tablename             |           start_ts            |            end_ts            
-------------+----------+----------------------------------+-------------------------------+-------------------------------
         312 | postgres | snap_pgxc_os_run_info            | 2021-05-29 12:36:32.511503+08 | 2021-05-29 12:36:32.691519+08
         312 | postgres | snap_pgxc_wait_events            | 2021-05-29 12:36:32.692543+08 | 2021-05-29 12:36:33.982895+08
         312 | postgres | snap_pgxc_instr_unique_sql       | 2021-05-29 12:36:33.983801+08 | 2021-05-29 12:36:34.131792+08
.......
(19 rows)

Compared to dbms_om.snapshot, the dbms_om.tables_snap_timestamp table records a more accurate snapshot time.

In order to avoid taking up too much space, snapshots that are too old will be cleared regularly. The clearing time is set by wdr_snapshot_retention_days, ranging from 1 to 15 days, and the default is 8 days, which means that snapshots created 8 days ago will be deleted by default. If you want to keep the snapshot data for a longer time, you need to do the dump by yourself.

3. Generate WDR report

With the snapshot, the performance monitoring data can be calculated and analyzed, and reports can be generated for users to view. Different analysis angles and analysis methods can get different reports. At present, WDR supports comparing the data of two snapshots before and after, and generating a performance monitoring report for that time period, referred to as "WDR report".

As GaussDB (DWS) is a distributed database consisting of multiple nodes, the nodes are divided into CN (Coordinator) and DN (Datanode). Accordingly, WDR reports are also divided into clusters and single nodes. (Scope). The report content of the two scopes is different. The former consists of the overall performance data of the cluster, and the latter analyzes and calculates the performance of a single node. For a single-node scope report, the content of the report on CN and DN is also different. The report scope is specified by the user when the report is generated.

GaussDB (DWS) provides the system function generate_wdr_report() for creating WDR reports. Before generating the report, first determine the start and end snapshot_id. By querying the dbms_om.snapshot table, the snapshot_id corresponding to the two time points is obtained for the time period of interest. For example, want to view the performance status between 2021-02-21 03:00:00 and 2021-02-21 04:00:00:

postgres=# select * from dbms_om.snapshot where start_ts > '2021-02-21 03:00:00'::timestamptz and start_ts < '2021-02-21 04:00:00'::timestamptz order by snapshot_id;
 snapshot_id |           start_ts            |            end_ts            
-------------+-------------------------------+-------------------------------
      2147 | 2021-02-21 03:02:40.000716+08 | 2021-02-21 03:03:17.840595+08
        2148 | 2021-02-21 03:12:39.873876+08 | 2021-02-21 03:13:15.963517+08
        2149 | 2021-02-21 03:22:39.875301+08 | 2021-02-21 03:23:16.659778+08
        2150 | 2021-02-21 03:32:40.857761+08 | 2021-02-21 03:33:18.477795+08
        2151 | 2021-02-21 03:42:41.454982+08 | 2021-02-21 03:43:17.977323+08
        2152 | 2021-02-21 03:52:41.794683+08 | 2021-02-21 03:53:18.676577+08
(6 rows)

Through the above query, the corresponding start and end snapshot_id of this time period are 2147 and 2152 respectively.

Next, determine the type of report to be generated. The current version supports three types of WDRs: summary, detail and all. The summary type only includes brief analysis and calculation results, the detailed type only includes detailed index data, and the all type includes all the content of the summary type and the detailed type.

Finally, execute the generate_wdr_report() function to generate the WDR report. For example, to generate a summary cluster-wide WDR report between 2021-02-21 03:00:00 and 2021-02-21 04:00:00, you can execute:

postgres=# select generate_wdr_report(2147, 2152, 'summary', 'cluster', '');
                           generate_wdr_report                            
--------------------------------------------------------------------------
 Report summary-cluster-2147-2152-20210301125740.html has been generated.
(1 row)

To generate all WDR reports for node cn_5001 between 2021-02-21 03:00:00 and 2021-02-21 04:00:00, you can execute:

postgres=# select generate_wdr_report(2147, 2152, 'all', 'node', 'cn_5001');
                         generate_wdr_report                         
----------------------------------------------------------------------
 Report all-cn_5001-2147-2152-20210301125906.html has been generated.
(1 row)

After success, it will prompt "$ report file name has been generated". The report file is located in the pg_log directory of the current CN node by default, and can be opened with a web browser:

Figure 2 WDR report

Each part of the report is expanded by default. If you double-click the title, the relevant part of the content will be collapsed, and the "-" before the title will change to "+", as shown in Figure 3:

Figure 3 WDR report style after folding

For the meaning of each indicator in the report, please refer to the product documentation. How to use the WDR report to analyze and locate problems will be discussed in subsequent articles.

4. Matters needing attention

As mentioned earlier, the WDR feature relies on various database core monitoring functions to provide data. For performance reasons, some kernel monitoring functions are turned off by default and need to be turned on through GUC parameters. If it is not turned on, the result of the corresponding performance view query is empty, and the correct snapshot and WDR report cannot be generated. If you want to get a complete WDR report, you need to set the GUC parameters in the following table to the values in the last column:

If you want to know more about GuassDB (DWS), welcome to search "GaussDB DWS" on WeChat and follow the WeChat official account, and share with you the latest and most complete PB-level digital warehouse black technology. You can also get a lot of learning materials in the background~

Click to follow to learn about Huawei Cloud's fresh technology for the first time~

Take you to understand the performance monitoring report of WDR-GaussDB (DWS)

1. Introduction to WDR

2. Create a performance snapshot

3. Generate WDR report

4. Matters needing attention

华为云开发者联盟

引用和评论

华为云开发者联盟入选 2023 中国技术品牌影响力企业榜，深耕开发者生态

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

DNS服务器地址大全

实战分享：DolphinScheduler 中 Shell 任务环境变量最佳配置方式