Author: Zhao LimingA member of the MySQL DBA team of Eikesheng, familiar with Oracle, MySQL and other databases, good at database performance problem diagnosis, transaction and lock problem analysis, etc., responsible for handling the daily operation and maintenance of the customer's MySQL and our self-developed DMP platform. For open source databases Related technology is very interested.
Source of this article: original submission
* Produced by Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.
background
What is exec_time
Environmental preparation
- Test 1 (binlog_format=row)
- Conclusion 1
- Note 1
- Test 2 (binlog_format=statement)
- Conclusion 2
- Note 2
Principle analysis
Failure case
Failure simulation
- Test 1: A scenario that simulates a disk IO bottleneck
- Test 2: Simulate the scenario of the CPU bottleneck
- Test 3: Scenario that simulates network delay
to sum up
Reference documents
background
Recently, when dealing with a client's MySQL database master-slave delay problem, I found a strange phenomenon related to exec_time, so I took the time to do some tests and share some experience with this article.
What is exec_time
The exec_time here refers to the value recorded in the binlog. As the name suggests, it is an execution time. What is it about? A single statement, the entire transaction, or something else? Let's do a few tests first.
Environmental preparation
- Environmental information
Character | Host/IP | Database version |
---|---|---|
Main library | 10.186.60.62 | MySQL 5.7.32 |
Slave library | 10.186.60.68 | MySQL 5.7.32 |
- Test sheet
zlm@10.186.60.62 [zlm]> show create table t\G
*************************** 1. row ***************************
Table: t
Create Table: CREATE TABLE `t` (
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
zlm@10.186.60.62 [zlm]> select * from t;
Empty set (0.01 sec)
Test 1 (binlog_format=row)
Submit a transaction, which contains 2 insert statements (GTID=xxxx:1)
zlm@10.186.60.62 [zlm]> select @@binlog_format;
+-----------------+
| @@binlog_format |
+-----------------+
| ROW |
+-----------------+
1 row in set (0.00 sec)
zlm@10.186.60.62 [zlm]> begin;insert into t values(sleep(5));insert
into t values(sleep(10));commit;
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (5.04 sec)
Query OK, 1 row affected (10.00 sec)
Query OK, 0 rows affected (0.00 sec)
zlm@10.186.60.62 [zlm]>
- Main library binlog
The main library exec_time recorded 6s, which was slightly larger than 5.04s of the first insert statement, and the entire transaction was executed for 16s (beginning at 15:57:00 and ending at 15:57:16). The exec_time on the main library obviously cannot represent the execution time of the entire transaction. What about the slave library?
- Binlog from library
The slave library exec_time records 16s, which is "exactly" the same as the entire transaction execution time of the main library. Can we use this value to judge how long a transaction executes?
- Change the wording, add 2 select statements to extend the transaction commit time (GTID=xxxx:2)
-- 这里模拟一个1064语法错误,让第2个select消耗的时间不被记录到事务中
zlm@10.186.60.62 [zlm]> begin;insert into t values(null);select
sleep(10);insert into t values(null);select sleep(5));commit;
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.03 sec)
+-----------+
| sleep(10) |
+-----------+
| 0 |
+-----------+
1 row in set (10.01 sec)
Query OK, 1 row affected (0.00 sec)
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ')' at line 1
Query OK, 0 rows affected (0.00 sec)
-- 事务中的第2个select语句执行失败并不影响整个事务的提交
zlm@10.186.60.62 [zlm]> select * from t;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
+----+
4 rows in set (0.00 sec)
- Main library binlog
Since there is no delay function in the first insert statement, 0.03s is executed, exec_time records 0s (approximately equal to the first SQL time), and select sleep(); it is not a transactional SQL and will not be recorded in the binlog ( The select is executed in front of the insert, and the result is the same, and interested friends can test it by themselves). The execution time of the entire transaction is 10s (start at 16:05:51 and end at 16:06:01), which includes the total execution time of 2 insert statements and the first select statement.
- Binlog from library
This time, the exec_time recorded in the binlog of the library is less than 5s. Why? Because the main library took 10s to execute the entire transaction, the 5s consumed by the second select was not executed. In this way, it seems that the exec_time recorded from the binlog of the library is the execution time of the transaction corresponding to the main library?
Conclusion 1
- In the case of binlog_format=row, exec_time is only recorded once for the same transaction.
- The exec_time in the binlog of the main library is approximately equal to the execution time of the first statement in the transaction (the first insert was executed in 5.04s, and the record was 6s; the second insert was executed in 0.03s, and the record was 0s, but not including select ).
- The exec_time in the binlog from the library is approximately equal to the execution time of the entire transaction (5s of the second select of transaction 2 is not recorded in the transaction).
Note 1
- If you want to record complete SQL in binlog, in row mode, you need to set the parameter binlog_rows_query_log_events=1 in the main library. It is recommended to enable this parameter to help you quickly find relevant business SQL during troubleshooting, so that you can locate specific functional modules , Do further fault analysis.
Test 2 (binlog_format=statement)
- 提交一个事务,其中包含2个insert语句(insert语句位置互换,GTID=xxxx:3)
zlm@10.186.60.62 [zlm]> set @@session.binlog_format=statement;
Query OK, 0 rows affected (0.00 sec)
zlm@10.186.60.62 [zlm]> select @@session.binlog_format;
+-------------------------+
| @@session.binlog_format |
+-------------------------+
| STATEMENT |
+-------------------------+
1 row in set (0.00 sec)
zlm@10.186.60.62 [zlm]> begin;insert into t values(sleep(10));insert into t values(sleep(5));commit;
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected, 1 warning (10.01 sec)
Query OK, 1 row affected, 1 warning (5.01 sec)
Query OK, 0 rows affected (0.00 sec)
zlm@10.186.60.62 [zlm]>
- Main library binlog
After swapping the insert order, the exec_time of the begin statement is consistent with the execution time of the first insert statement, which is 10s. The entire transaction has been executed for 15 seconds (start at 16:27:53 and end at 16:28:08). exect_time is recorded 3 times, namely 10s, 10s, and 5s.
- Binlog from library
Binlog_format has not been modified from the library, it is still row format, and the exec_time recorded 25s. The execution time of the entire transaction is 15s (start at 16:27:53 and end at 16:28:08). The value recorded by exec_time is 10s longer than the execution time of the transaction. It seems to count the 10s of begin again?
- Change the wording and add 2 select statements to extend the transaction commit time (select statement positions are interchanged, GTID=xxxx:4)
zlm@10.186.60.62 [zlm]> begin;insert into t values(null);select sleep(5);insert into t values(null);select sleep(10);commit;
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.02 sec)
+----------+
| sleep(5) |
+----------+
| 0 |
+----------+
1 row in set (5.00 sec)
Query OK, 1 row affected (0.00 sec)
+-----------+
| sleep(10) |
+-----------+
| 0 |
+-----------+
1 row in set (10.00 sec)
Query OK, 0 rows affected (0.00 sec)
- Main library binlog
- Binlog from library
Conclusion 2
- When binlog_format=statement, each DML statement in the transaction will record an exec_time (except the select statement)
- The exec_time in the binlog of the main library is approximately equal to the execution time of the first statement in the transaction (consistent with the conclusion of Test 1)
- The exec_time in the binlog from the library is approximately equal to the execution time of the entire transaction (consistent with the conclusion of Test 1)
Note 2
- Statement format binlog is not allowed under RC isolation level, otherwise an error will be reported
Principle analysis
- The value of exec_time in binlog represents the difference (unixtime) between the time stamp at the beginning of the transaction execution and the time stamp at the end of the transaction execution (unixtime). If there is only one SQL statement in the transaction (in the case of batch running, a transaction usually contains multiple SQL ), can be approximately regarded as the execution time of a transaction on the main library, in seconds.
- Once the transaction is submitted on the main library, the transaction end timestamp will be written in the binlog, and then passed to the slave library to form a relay log. At this time, the transaction start timestamp recorded in the relay log is the same as that in the main library binlog The value is consistent.
- If the slave library wants to catch up with the transaction of the main library, it needs to continuously replay the received relay log and generate its own binlog, but it should be noted that the transaction start timestamp recorded in the binlog generated by the slave library itself It is still consistent with the main library, and the end timestamp is recorded after the transaction playback is completed, so this exec_time cannot be regarded as the time consumed by the slave library to execute SQL on the slave library, but must be greater than this value (except for the main Library execution
Time, also consider the time when the main library binlog is placed, the time when the binlog is transmitted over the network, the time when the relay log is read from the library, etc., and then the execution time of the slave library itself).
Failure case
After previous tests, we have got a better understanding of exec_time. Let’s go back to the strange phenomenon mentioned at the beginning of the article. Let’s take a look at the screenshot of the scene at the time of the fault. An instance appeared at 5 am The master-slave delay exceeds 1000 seconds.
Based on the previous test, we might think that each transaction on the slave database took more than 1,000 seconds. Could it be caused by a large transaction at the time of the failure or an operation on a table without a primary key/unique key?
After parsing multiple binlogs, a strange phenomenon was found. Starting from 5 o'clock, the value of exec_time increased linearly, until it increased to more than 1000 seconds and triggered the alarm of master-slave delay.
In fact, exec_time cannot be simply regarded as the time consumed by the execution of transactions from the library, and other factors (CPU, disk, network, etc.) must be considered, but this indicator can be used to determine the master-slave delay to a certain extent. It is basically the same as real-time The value of Seconds_Behind_Master matches.
Failure simulation
- The stress tool is used here to achieve this. It is a linux tool mainly used to simulate scenarios with high system load (CPU, disk, memory, etc.).
-- 查看帮助信息
03:35 PM dmp2 (master) ~# stress --help
`stress' imposes certain types of compute stress on your system
Usage: stress [OPTION [ARG]] ...
-?, --help show this help statement
--version show version statement
-v, --verbose be verbose
-q, --quiet be quiet
-n, --dry-run show what would have been done
-t, --timeout N timeout after N seconds
--backoff N wait factor of N microseconds before work starts
-c, --cpu N spawn N workers spinning on sqrt()
-i, --io N spawn N workers spinning on sync()
-m, --vm N spawn N workers spinning on malloc()/free()
--vm-bytes B malloc B bytes per vm worker (default is 256MB)
--vm-stride B touch a byte every B bytes (default is 4096)
--vm-hang N sleep N secs before free (default none, 0 is inf)
--vm-keep redirty memory instead of freeing and reallocating
-d, --hdd N spawn N workers spinning on write()/unlink()
--hdd-bytes B write B bytes per hdd worker (default is 1GB)
Example: stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --timeout 10s
Note: Numbers may be suffixed with s,m,h,d,y (time) or B,K,M,G (size).
Test 1: A scenario that simulates a disk IO bottleneck
Use sysbench to put some pressure on the main library (using the oltp_read_write.lua script, each transaction contains only a simple DML).
- Parallel replication based on LOGICAL_CLOCK (8 worker threads) is started from the library.
Observe for a while, the replication delay from the library is always 0. Under normal circumstances, there is basically no delay when parallel replication is turned on.
- Use iostat to observe the disk IO status, there is already a certain amount of pressure at this time, the utilization rate is about 50%, and the await reaches about 4-5ms.
- Run stress to start adding stress to the disk.
- The disk IO load reached the bottleneck instantly, the utilization rate was 100%, and the await reached 350ms+.
- Although parallel replication has been turned on, there is still an unavoidable delay (the slave library is 6 seconds behind the main library).
- Parsing the binlog file from the library, observe that exec_time increases from 0.
- Check the master-slave delay monitoring graph, the 8s delay was reached at the highest peak, and then it started to fall back because the stress filled the disk and its process hung up by itself.
- Disk monitoring information shows that at 15:36, the peak iops peak is 6.9k.
- Directly filter exec_time=8 in binog, you can find the corresponding value, which can be matched with Seconds_Behind_Master.
Test 2: Simulate the scenario of the CPU bottleneck
- stress adds 4 CPU load
- Disk usage is normal at this time
- The load of the CPU goes up, the us% reaches 90%+, and the load reaches about 11.
- After the CPU load increases, the delay from the library gradually begins
- It is more intuitive to see from the master-slave delay monitoring chart, the delay starts at 16:08.
- The timestamp recorded in the binlog also starts at 16:08, and piles up when the transaction is played back from the library.
- The CPU usage and CPU load monitoring graphs are more intuitive
- Observe Senconds_Behind_Master again before stopping the stress. It is 17 seconds behind the main library, and this value will get larger and larger.
Filtering exec_time=17 in the binlog from the library can also match the value of Senconds_Behind_Master.
Test 3: Scenario that simulates network delay
- Network related faults can be realized by using tc tool, which will delay 500ms from the library network card setting.
- The slave library quickly began to experience delays.
- Similarly, transaction replays are piled up.
- After waiting for a while, it is observed that the master-slave delay has exceeded 7 minutes.
- Parse the binlog from the library, filter exec_time=427, and match Seconds_Behind_Master.
- Stop the tc tool and restore the network delay.
- After stopping tc at 16:42, the network traffic on the slave library suddenly increased, and the transaction started to be played back normally until it was tied to the main library.
to sum up
- The exec_time of the binlog from the library can reflect the transaction execution speed (row/statement) to a certain extent, but it is not necessarily the same as the transaction execution time of the main library.
- If the main library exec_time=0 appears, and the slave library exec_time has been increasing, it is likely that there has been a master-slave delay at this time. You can check from the server resource load. It is most likely that the CPU, disk, network, etc. are caused by bottlenecks. .
- The above test scenarios are based on the situation that there is no artificial modification of timestamp, otherwise exec_time will have no reference significance.
Reference documents
https://www.percona.com/blog/2011/01/31/what-is-exec_time-in-binary-logs/
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。