Author: Yang Cailin
The DBA of the East China Delivery Department of Acsen is mainly responsible for MySQL daily problem handling and DMP product support. I love dancing and watching dramas.
Source of this article: original contribution
*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.
foreword
Everyone should have known or used some binlog parsing tools in the normal operation and maintenance process, such as my2sql and binlog2sql to be introduced today. My2sql is based on the secondary development of my2fback and binlog_rollback tools (go language), and binlog2sql is a mysql binlog parsing tool developed in python language. Both can generate original SQL by parsing the binlog file, roll back SQL, and remove the insert of the primary key. sql, etc., and then realize the functions of fast data flashback and data compensation after master-slave switching.
Since both tools can implement the function of parsing binlog, how should we choose between them? Please follow the editor to see "the battle between mysql2sql and binlog2sql".
Introduction to main parameters [only a few commonly used parameters are listed here, and parameters with other requirements can be viewed in the link]
mysql2sql
parameter | Parameter explanation |
---|---|
-mode | repl: pretend to be a slave library to get the binlog file from the main library; file: get the binlog file from the local file system, the default repl |
-local-binlog-file when specified | When specifying -mode=file parameter, you need to specify -local-binlog-file binlog file relative path or absolute path |
-sql | The sql type to be parsed, optional parameters insert, update, delete, all parsed by default |
-file-per-table | Generate a sql file for each table |
-output-dir | Store the generated results in the specified directory |
-threads | Number of threads, default 2, support concurrency |
-work-type | 2sql: generate original sql, rollback: generate rollback sql, stats: only count DML and transaction information |
binlog2sql
parameter | Parameter explanation |
---|---|
--stop-never | Continue parsing binlog, default false, sync to the latest binlog location where the command is executed |
-B, --flashback | Generate rollback sql, can parse large files, not limited by memory, the default is false, and stop-never or no-primary-key cannot be added at the same time |
--start-file | Start parsing the file, just the file name, not the full path |
--start-position/--start-pos | Start parsing position. Defaults to the start position of start-file. |
--stop-file/--end-file | Terminate parsing the file. Defaults to the same file as start-file. If the parsing mode is stop-never, this option has no effect. |
--stop-position/--end-pos | Terminate parsing position. Optional. The default is the last position of stop-file; if the parsing mode is stop-never, this option is invalid. |
--sql-type | Only parse the specified type, support INSERT, UPDATE, DELETE. Multiple types are separated by spaces, such as --sql-type INSERT DELETE . The default is to parse additions, deletions and changes. If this parameter is used but no type is filled, the three will not be parsed. |
Standard sql parsing of battle [parsing the same binlog file of 2G size]
Environment preparation:
## 查看mysql的相关参数设置
mysql> select @@server_id,@@binlog_format,@@binlog_row_image,@@max_binlog_size,@@log_bin_basename;
+-------------+-----------------+--------------------+-------------------+-----------------------------------+
| @@server_id | @@binlog_format | @@binlog_row_image | @@max_binlog_size | @@log_bin_basename |
+-------------+-----------------+--------------------+-------------------+-----------------------------------+
| 46733 | ROW | FULL | 1073741824 | /data/mysql/3309/binlog/mysql-bin |
+-------------+-----------------+--------------------+-------------------+-----------------------------------+
1 row in set (0.00 sec)
## 使用sysbench创建测试数据表
[root@10-186-61-119 ~]# sysbench /usr/share/sysbench/oltp_read_write.lua --mysql-host=10.186.61.119 \
--mysql-port=3309 --mysql-user=root --mysql-password=root --mysql-db=test --table-size=2000000 --tables=1 \
--threads=50 --db-ps-mode=disable --auto_inc=off --report-interval=3 --max-requests=0 --time=20 \
--percentile=95 --skip_trx=off --mysql-ignore-errors=6002,6004,4012,2013,4016,1062 --create_secondary=off prepare
sysbench 1.0.17 (using system LuaJIT 2.0.4)
Initializing worker threads...
Creating table 'sbtest1'...
Inserting 2000000 records into 'sbtest1'
## 使用sysbench对测试表进行压测,生成2G左右binlog数据
[root@10-186-61-119 ~]# sysbench /usr/share/sysbench/oltp_read_write.lua --mysql-host=10.186.61.119 \
--mysql-port=3309 --mysql-user=root --mysql-password=root --mysql-db=test --table-size=2000000 --tables=1 \
--threads=50 --db-ps-mode=disable --auto_inc=off --report-interval=3 --max-requests=0 --time=180 \
--percentile=95 --skip_trx=off --mysql-ignore-errors=6002,6004,4012,2013,4016,1062 --create_secondary=off run
sysbench 1.0.17 (using system LuaJIT 2.0.4)
Running the test with following options:
Number of threads: 50
Report intermediate results every 3 second(s)
Initializing random number generator from current time
Initializing worker threads...
Threads started!
## 使用my2sql与binlog2sql解析mysql-bin.000013,mysql-bin.000014
[root@10-186-61-119 binlog]# ll
total 4579116
-rw-r----- 1 mysql mysql 209 Aug 3 14:17 mysql-bin.000010
-rw-r----- 1 mysql mysql 1073760482 Aug 3 14:32 mysql-bin.000011
-rw-r----- 1 mysql mysql 1074119415 Aug 3 14:36 mysql-bin.000012
-rw-r----- 1 mysql mysql 1073822542 Aug 3 15:53 mysql-bin.000013
-rw-r----- 1 mysql mysql 1074588226 Aug 3 16:15 mysql-bin.000014
-rw-r----- 1 mysql mysql 392707488 Aug 3 16:16 mysql-bin.000015
-rw-r----- 1 mysql mysql 246 Aug 3 16:15 mysql-bin.index
mysql parsing process
## 1 thread:使用脚本实现连续解析两个binlog的功能
[root@10-186-61-119 my2sql]# cat /root/sql.sh
#! /bin/bash
date;
my2sql -user root -password root -host 10.186.61.119 -port 3309 -mode file
-local-binlog-file /data/mysql/3309/binlog/mysql-bin.000013 -work-type 2sql \
-start-file /data/mysql/3309/binlog/mysql-bin.000013 -threads 1 \
-output-dir /data/my2sql/1thread/13 > /data/my2sql/1thread/13/output;
my2sql -user root -password root -host 10.186.61.119 -port 3309 -mode file \
-local-binlog-file /data/mysql/3309/binlog/mysql-bin.000014 -work-type 2sql \
-start-file /data/mysql/3309/binlog/mysql-bin.000014 -threads 1 \
-output-dir /data/my2sql/1thread/14 > /data/my2sql/1thread/14/output
[root@10-186-61-119 ~]# ./sql.sh
Thu Aug 4 17:17:24 CST 2022
## 查看解析结果
[root@10-186-61-119 13]# pwd
/data/my2sql/1thread/13
[root@10-186-61-119 13]# ll
total 347088
-rw-r--r-- 1 root root 107 Aug 4 17:17 biglong_trx.txt
-rw-r--r-- 1 root root 7488 Aug 4 17:24 binlog_status.txt
-rw-r--r-- 1 root root 355389762 Aug 4 17:24 forward.13.sql ## 标准sql文件
-rw-r--r-- 1 root root 9965 Aug 4 17:24 output
[root@10-186-61-119 13]# cd ../14 && ll
total 577564
-rw-r--r-- 1 root root 107 Aug 4 17:24 biglong_trx.txt
-rw-r--r-- 1 root root 3312 Aug 4 17:28 binlog_status.txt
-rw-r--r-- 1 root root 591403854 Aug 4 17:28 forward.14.sql ## 标准sql文件
-rw-r--r-- 1 root root 9877 Aug 4 17:28 output
## 查看mysql-bin.000014的解析输出,完成时间为17:28。my2sql单线程解析完2G左右的binlog文件大概需要11分钟
[root@10-186-61-119 14]# less output
......
[2022/08/04 17:28:35] [info] file.go:71 finish parsing binlog from local files
[2022/08/04 17:28:35] [info] stats_process.go:266 exit thread to analyze statistics from binlog
[2022/08/04 17:28:35] [info] events.go:185 exit thread 1 to generate redo/rollback sql
[2022/08/04 17:28:35] [info] events.go:274 finish writing redo/forward sql into file
[2022/08/04 17:28:35] [info] events.go:277 exit thread to write redo/rollback sql into file
## 4 thread:使用脚本实现连续解析两个binlog的功能【脚本内容与1 thread执行的一致,只需更改线程数即可,后续不再冗余展示】
# 执行解析binlog语句
[root@10-186-61-119 ~]# ./sql.sh
Thu Aug 4 17:32:52 CST 2022
# 查看mysql-bin.000014的解析输出,完成时间为17:44。my2sql 4线程并发解析完2G左右的binlog文件大概需要12分钟
[root@10-186-61-119 14]# less output
。。。。。。。。
[2022/08/04 17:44:42] [info] file.go:71 finish parsing binlog from local files
[2022/08/04 17:44:42] [info] stats_process.go:266 exit thread to analyze statistics from binlog
[2022/08/04 17:44:42] [info] events.go:185 exit thread 4 to generate redo/rollback sql
[2022/08/04 17:44:42] [info] events.go:185 exit thread 2 to generate redo/rollback sql
[2022/08/04 17:44:42] [info] events.go:185 exit thread 3 to generate redo/rollback sql
[2022/08/04 17:44:42] [info] events.go:185 exit thread 1 to generate redo/rollback sql
[2022/08/04 17:44:42] [info] events.go:274 finish writing redo/forward sql into file
[2022/08/04 17:44:42] [info] events.go:277 exit thread to write redo/rollback sql into file
## 6thread:使用脚本实现连续解析两个binlog的功能
# 执行解析binlog语句
[root@10-186-61-119 ~]# ./sql.sh
Thu Aug 4 17:46:31 CST 2022
# 查看mysql-bin.000014的解析输出,完成时间为18:00。my2sql 6线程并发解析完2G左右的binlog文件大概需要14分钟
[root@10-186-61-119 14]# less output
。。。。。。。。
[2022/08/04 18:00:11] [info] file.go:71 finish parsing binlog from local files
[2022/08/04 18:00:11] [info] events.go:185 exit thread 6 to generate redo/rollback sql
[2022/08/04 18:00:11] [info] stats_process.go:266 exit thread to analyze statistics from binlog
[2022/08/04 18:00:11] [info] events.go:185 exit thread 1 to generate redo/rollback sql
[2022/08/04 18:00:11] [info] events.go:185 exit thread 2 to generate redo/rollback sql
[2022/08/04 18:00:11] [info] events.go:185 exit thread 5 to generate redo/rollback sql
[2022/08/04 18:00:11] [info] events.go:185 exit thread 4 to generate redo/rollback sql
[2022/08/04 18:00:11] [info] events.go:185 exit thread 3 to generate redo/rollback sql
[2022/08/04 18:00:11] [info] events.go:274 finish writing redo/forward sql into file
[2022/08/04 18:00:11] [info] events.go:277 exit thread to write redo/rollback sql into file
## 8thread:使用脚本实现连续解析两个binlog的功能
# 执行解析binlog语句
[root@10-186-61-119 ~]# ./sql.sh
Thu Aug 4 18:01:54 CST 2022
# 查看mysql-bin.000014的解析输出,完成时间为18:18。my2sql 8线程并发解析完2G左右的binlog文件大概需要17分钟
[root@10-186-61-119 14]# less output
。。。。。。。。
[2022/08/04 18:18:51] [info] file.go:71 finish parsing binlog from local files
[2022/08/04 18:18:51] [info] stats_process.go:266 exit thread to analyze statistics from binlog
[2022/08/04 18:18:51] [info] events.go:185 exit thread 2 to generate redo/rollback sql
[2022/08/04 18:18:51] [info] events.go:185 exit thread 6 to generate redo/rollback sql
[2022/08/04 18:18:51] [info] events.go:185 exit thread 4 to generate redo/rollback sql
[2022/08/04 18:18:51] [info] events.go:185 exit thread 1 to generate redo/rollback sql
[2022/08/04 18:18:51] [info] events.go:185 exit thread 8 to generate redo/rollback sql
[2022/08/04 18:18:51] [info] events.go:185 exit thread 5 to generate redo/rollback sql
[2022/08/04 18:18:51] [info] events.go:185 exit thread 3 to generate redo/rollback sql
[2022/08/04 18:18:51] [info] events.go:185 exit thread 7 to generate redo/rollback sql
[2022/08/04 18:18:51] [info] events.go:274 finish writing redo/forward sql into file
[2022/08/04 18:18:51] [info] events.go:277 exit thread to write redo/rollback sql into file
binlog2sql parsing process
## 使用脚本实现连续解析两个binlog的功能【--stop-never可以实现自动连续解析binlog文件,目前最新的binlog是15,
但是为了与my2sql解析效率进行对比,所以不用这个参数】
[root@10-186-61-119 ~]# cat binlog.sh
#! /bin/bash
date;
python /data/binlog2sql/binlog2sql/binlog2sql.py -h10.186.61.119 -P3309 -uroot -p'root' \
--start-file='mysql-bin.000013' > /tmp/binlog13.sql;
python /data/binlog2sql/binlog2sql/binlog2sql.py -h10.186.61.119 -P3309 -uroot -p'root' \
--start-file='mysql-bin.000014' > /tmp/binlog14.sql;
date;
# 执行解析binlog语句
[root@10-186-61-119 ~]# ./binlog.sh binlg2sql解析完2G左右的binlog文件大概需要54分钟
Wed Aug 3 17:53:59 CST 2022
Wed Aug 3 18:47:08 CST 2022
# 查看解析结果
[root@10-186-61-119 ~]# ll /tmp/
total 1688232
-rw-r--r-- 1 root root 859359140 Aug 3 18:27 binlog13.sql
-rw-r--r-- 1 root root 869387465 Aug 3 18:47 binlog14.sql
battle result
As shown in the table below, from the perspective of parsing efficiency, mysql is better
Comparison of other functions between mysql2sql and binlog2sql
Summarize
- From the perspective of parsing efficiency, my2sql is better than binlog2sql, and my2sql also has the function of parsing large or long transactions and generating DML statistics. It is a very practical binlog parsing tool, but through experiments, my2sql's The concurrency function does not seem to speed up its parsing efficiency. It is recommended to use the default number of threads for parsing.
Although my2sql has many advantages, you still need to pay attention when using it. It also has some limitations, such as
- My2sql pretends to be a slave library to obtain the main library binlog online, and then parses it, so the database user who performs the operation needs to have the permissions of SELECT, REPLICATION SALVE, and REPLICATION CLIENT.
- When using the rollback/flashback function, the binlog format must be row, and binlog_row_image=full, DML statistics and large transaction analysis are not affected
- Only DML can be rolled back, DDL cannot be rolled back
- When using the my2sql tool, you need to pay attention to the data type of the following fields. There are parsing bugs, which will cause the parsed data to be different from the original data. For details, please refer to: https://github.com/liuhr/my2sql/issues/31
You also need to pay attention when using binlog2sql, it also has some limitations, such as
- MySQL Server must set server_id, log_bin, max_binlog_size=1G, binlog_format=row, binlog_row_image=full these parameters,
- Like my2sql, it also pretends to pull binlog from the library. Users who need to connect to the database have SELECT, REPLICATION SLAVE, and REPLICATION CLIENT permissions.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。