1

Author: Mo Shan

Senior DBA of an Internet company.

Source of this article: original contribution

*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.


Demand introduction

There are two MySQL clusters in the business that are synchronized through MQ. Last night, there was an abnormality in MQ, and many primary key conflicts were reported. I would like to ask dba to check whether the data of the two clusters are consistent.

I. Introduction

When I received this request, I didn't take it seriously. I had a vague impression that pt-table-checksum could implement MySQL data verification through dsn, so it was accepted at that time. I never thought about it, slapped my face, it was really sloppy in retrospect.

This article refers to the verification logic of pt-table-checksum, traverses each table based on the data block, and then compares the value of checksum to determine whether the block is consistent. This article mainly wants to talk about my implementation of the data verification script. The problems and solutions encountered, I hope to help everyone.

2. Test dsn

Use online configuration files to build a master-slave environment.

  • Build steps
  • This example is tested with mysql 5.7.26
  • This example uses percona-toolkit-3.2.1 for testing

1. Verify master-slave data consistency

This use case will connect the slave library via dsn.

Configuration dsn process omitted
$ ./bin/pt-table-checksum h='192.168.1.1',u='mydba',p='test123456',P=6666 --nocheck-replication-filters --replicate=test.checksums --no-check-binlog-format -d dbatest1  --recursion-method dsn=t=percona_schema.dsns
Checking if all tables can be checksummed ...
Starting checksum ...
            TS ERRORS  DIFFS     ROWS  DIFF_ROWS  CHUNKS SKIPPED    TIME TABLE
01-13T17:48:20      0      0        0          0       1       0   0.377 dbatest1.dbatest
You can see that the test passed and the verification can be done normally.

2. Verify non-master-slave data consistency

This use case will connect the slave library through dsn, but will stop the replication link of the slave library and clear the replication information.

mysql> stop slave;
Query OK, 0 rows affected (0.01 sec)

mysql> reset slave all;
Query OK, 0 rows affected (0.01 sec)

mysql> 


$ ./bin/pt-table-checksum h='192.168.1.1',u='mydba',p='test123456',P=6666 --nocheck-replication-filters --replicate=test.checksums --no-check-binlog-format -d dbatest1  --recursion-method dsn=t=percona_schema.dsns
Checking if all tables can be checksummed ...
Starting checksum ...
Replica mysql2 is stopped.  Waiting.
Replica mysql2 is stopped.  Waiting.
Replica mysql2 is stopped.  Waiting.
Replica mysql2 is stopped.  Waiting.
If you roll over the car directly, you can see that the verification will prompt the slave library [is stopped. Waiting], which is embarrassing. pt-table-checksum does not support data verification in non-master-slave environments. In this case, you can only think of other ways. .

3. Problems encountered with development tools

1. Solve the complex joint primary key problem

(1) The query index is invalid, or the query reports an error

Friends who are familiar with pt-table-checksum should know that this tool scans data rows based on the primary key (non-null unique key). , It seems that it is not so simple, and you can think about it if you are interested. Let me explain the general logic first:

Step 1: Determine whether _min_rowid is empty, if it is empty, take the first row of the table and record it as _min_rowid.

if [ -z "${_min_rowid}" ]
then #拿出当前表的最小行id, 每次拿数据的起始行
    _min_rowid="$(${mysql_comm} -NBe "select min(pk) from table")"
fi

Step 2: Scan the table according to _min_rowid as a condition, take the data of the next data block, record the primary key value of the last row of data in the data block, record the value of checksum, and write down _min_rowid.

select * from table where pk > ${_min_rowid} limit checksize #计算这个块的checksum值
_min_rowid="$(${mysql_comm} -NBe "select max(pk) from (select pk from table where pk > ${_min_rowid} order by pk limit checksize)a")" #记录下一个最小行pk值

Step 3: Determine whether _min_rowid is empty, repeat the second step if it is not empty, and exit the check if it is empty.

if [ -z "${_min_rowid}" ]
then
    break
else
    continue
fi

Through the above three steps, we can see that if it is a single-column integer primary key, it is very simple to implement, but the problem is that the primary keys of business tables are various, some are joint primary keys, some are character-type joint primary keys, and integer + Character-type joint primary key, then the above implementation is obviously problematic. Therefore, several issues need to be considered for implementation:

  • Need to consider whether the primary key is a union primary key.

    If it is a joint primary key, the query condition when fetching a data block is where pk1 > xxx and pk2 > yyy
  • Need to consider whether the data type of the primary key field is integer or character.

    If the primary key field is character type, the query condition is where pk > 'xxx' when fetching the data block, otherwise the query will not use the index.

In view of the above two problems, you can refer to the following implementation logic:

  • Get a list of primary key fields and put them in an array
pri_name=($(${mysql_comm} -NBe "select COLUMN_NAME from information_schema.columns where table_name = 'table' and table_schema = 'db' and COLUMN_KEY = 'PRI';"))
  • Get the data type of the field according to the primary key field name and put it in the associative array
 for tmp in ${pri_name[@]}
 do #将各个主键字段的数据类型记录到字典, 后文会判断主键字段的类型是否是字符型, 如果是字符型就需要特殊处理(带引号)
     __="select DATA_TYPE from information_schema.columns where table_schema = 'db' and table_name = 'table' and COLUMN_KEY = 'PRI' and COLUMN_NAME = '${tmp}'"
     pri_type["${tmp}"]="$(${mysql_comm} -NBe "${__}" 2>/dev/null)"
 done
  • According to the data type of the field, if it is a character type, special processing is required
for tmp in ${pri_name[@]}
do #这步主要是解决将主键弄清楚, 到底是单列主键还是多列, 到底是整型还是其他, 然后根据不同的类型拼接成一个字符串, 主要是用来作为后面取数是否要加单引号
   #因为整型取出来不用做特殊处理, 但是非整型需要加上单引号, 要不然作为where条件对比的时候肯定有问题
    if [ "$(grep -ic "int" <<< "${pri_type["${tmp}"]}")x" != "1x" ]
    then
        select_str_tmp="concat(\"'\",${tmp},\"'\")"
        pk_style="str"
    else
        select_str_tmp="${tmp}"
    fi

    if [ -z "${select_str}" ]
    then
        select_str="${select_str_tmp}"
    else
        select_str="${select_str},${select_str_tmp}"
    fi
done

The function of this step is to record different values according to the type of the primary key when fetching the data of each block, when _min_rowid needs to be recorded, for example:

  • Integer is recorded as _min_rowid=1
  • The character type is recorded as _min_rowid='1'
  • The combined primary key of integer + character is recorded as _min_rowid=1,'1'
  • The combined primary key of character type is recorded as _min_rowid='1','2'

In this way, each time the data block is fetched, the condition behind where can use the index correctly, and it will not report an error because it is a non-integer without quotation marks.

(2) How to define the boundary of the left interval of each data block

If there is such a joint primary key field primary key (a, b, c) is an integer, how to write traversal sql? At first my idea was simple, as follows:

_min_rowid=(xxx,yyy,zzz)
select * from where 1 = 1 and a >= xxx and b >= yyy and c > zzz order by a,b,c limit checksize
At first glance, the logic seems to be no problem, but when I actually ran the script, I found that this logic could not completely scan the entire table. Later, after many tests and verifications, the following logic sql was obtained.
_min_rowid=(xxx,yyy,zzz)
select * from where 1 = 1 and ((a > xxx) or (a = xxx and b > yyy) or (a = xxx and b = yyy and c > zzz)) order by a,b,c limit checksize

At this point, the two problems encountered in the process of writing the verification script have come to an end, and the rest is all kinds of logical processing, but more details, those who are interested can read the script file by themselves.

Fourth, what changes have been made to the data verification tool

1. Cancel for update

In order to minimally affect the business, the locking logic is cancelled. However, it is necessary to ensure the data consistency of the data block. If the data block is hot data and is currently being changed, it will inevitably be inconsistent during verification. Therefore, it can only be achieved through multiple verifications. The default is to verify 20 times. One of the verification results is consistent, and it is considered to be consistent. If the data of this data block does not change during the first five verifications, it will also be Treated as inconsistent (maybe because of latency, or really inconsistent).

In addition, the checksum state is written to the temporary file instead of the business database.

2. Support table structure verification

pt-table-checksum does not check the table structure, and adds the check of the table structure when rewriting.

3. Support table-based parallel verification

The parallel check based on the table can be performed, and the number of parallels can be specified by the user, but the script has a safety mechanism. If the number of parallels specified by the user is greater than the current number of idle cores, the current (number of idle cores - 1) will be used as the number of parallels.

4. Support network monitoring

Add network monitoring, and the user specifies the network upper limit percentage. When the network card traffic exceeds this percentage, the task will be suspended, and the task will continue when the network card traffic is lower than the threshold. This is mainly for middleware (mycat) scenarios or distributed database (tidb) scenarios.

5. Support scheduled task function

It supports the function of timed tasks. Users can use this function to avoid business peaks and perform data verification only during low business peaks.

The specified time period is used to perform the verification task. If the verification is not completed on the same day, the verification will continue until the next day.

6. Support the verification of any two nodes

It is not limited to the verification of master and slave nodes, as long as the target object supports MySQL's standard SQL syntax, data verification can be done.

7. Add timeout mechanism and suicide mechanism

The verification logic is to collect the database of the target node through SQL. If there is an abnormality in the target database system, it will undoubtedly worsen the situation and trigger an unknown problem. Therefore, a timeout mechanism is added. The threshold for a single data block is 5s, and it will give up after 5 seconds. Wait to try again. The test found that sometimes even if the timeout is triggered, the SQL task can still be seen in the processlist of the target database, so a kill mechanism is added, and a kill processlist id action will be triggered after the timeout. In addition, in order to avoid kill errors, a 32-bit md5 value is added to each SQL object, and this md5 value is checked each time of kill.

Keep the monitoring of threads_running. If threads_running is too large, the verification will be suspended. This part of the monitoring logic is combined with network monitoring.

5. Introduction to the use of data verification tools

This tool is rewritten by drawing on the idea of pt-table-checksum tool, and can check the data consistency of any two mysql (databases that support mysql sql syntax) nodes.

<font color='red'>This tool is for learning only, if you need to check online data, please test it thoroughly</font>

1. Verification logic

Traverse the data table in a block based on the primary key, compare the value of the checksum, and the size of the block can be specified by parameters.
(1) Obtain the query SQL of the first data block of the table.
(2) Record the checksum values of the data blocks of the two target nodes to the temporary file, file1 file2.
(3) Compare whether file1 and file2 are consistent.

  • Inconsistency: Repeat the operation of (2), at most 20 times in a row, if not consistent, the SQL will be recorded in the table directory
  • Consistent: skip to (4)
  • file1 is empty: indicates that the table traversal is completed, skip directly to (5)
    (4) Obtain the query SQL of the next data block of the table.
    (5) Skip to (7) if the check is passed, and transfer to (6) if the check fails.
    (6) Read the SQL that fails to pass the table directory verification and verify it again.
  • This verification is also considered to be consistent with the data
  • If the verification fails, the inconsistent part will be recorded in the diff directory
    (7) The table verification task ends.

2. Function introduction

  • Check data consistency at random two points
  • Support table structure verification
  • Support concurrency check, table-based concurrency
  • Support designated time, can avoid business peak period
  • Support network monitoring, if the network exceeds the threshold, the verification can be suspended
  • <font color='red'>Tables without primary keys (non-null unique keys) are not supported</font>
  • <font color='red'>Tables with a combined primary key of four or more fields are not supported</font>

3. Installation tutorial

(1) Download
git clone https://gitee.com/mo-shan/check_data_for_mysql.git
cd check_data_for_mysql
(2) Configuration
  • Edit configuration file
cd /path/check_data_for_mysql
vim conf/check.conf
Please configure the relevant configuration according to the comments according to the actual situation.
  • Modify work path
sed -i 's#^work_dir=.*#work_dir=\"/check_data_for_mysql_path\"#g' start.sh #将这里的check_data_for_mysql_path改成check_data_for_mysql的家目录的绝对路径

4. Instructions for use

(1) Catalog introduction
moshan /data/git/check_data_for_mysql > tree -L 2
.
├── conf
│   └── check.conf
├── func
│   ├── f_check_diff_for_mysql.sh
│   ├── f_check_diff_for_row.sh
│   ├── f_logging.sh
│   └── f_switch_time.sh
├── log
├── manager.sh
├── README.en.md
├── README.md
└── start.sh

3 directories, 9 files
moshan /data/git/check_data_for_mysql >
  • The directory of the conf configuration file, check.conf is the configuration file
  • log log directory
  • start.sh main program
  • manager.sh network monitoring script, task status management script
  • The func directory is the directory where the script is stored

    • f_check_diff_for_mysql.sh Script for verifying data blocks
    • f_check_diff_for_row.sh verifies data rows, this script further verifies the results of f_check_diff_for_mysql.sh that do not pass the verification
(2) Help Manual
  • main program
moshan /data/git/check_data_for_mysql > bash start.sh
Usage: start.sh

                 [ -t check_table ]             需要检查的表列表, 默认整库
                 [ -T skip_check_table ]        不需要检查的表, 默认不过滤
                 [ -d check_db ]                需要检查的库, 默认是除了系统库以外的所有库
                 [ -D skip_check_db ]           不需要检查的库, 默认不过滤
                 [ -w threads ]                 最大并发数
                 [ -l limit_time ]              哪些时间段允许跑校验, 默认是所有时间, 如需限制可以使用该参数进行限制, 多个时间用英文逗号隔开
                                                1-5,10-15   表示1点到5点(包含1点和5点), 或者10点到15点可以跑, 需要注意都是闭区间的
                                                1,5,10,15   表示1点, 5点, 10点, 15点可以跑
                 [ -f true ]                    是否执行check操作, 默认是false, 只有为true的时候才会check
                 [ -h ]                         帮助信息

moshan /data/git/check_data_for_mysql >
Parameters can be used according to requirements. <font color='red'> If you need to avoid performing verification tasks at low peaks during business peaks, please use the -l parameter to specify the execution time. For example, '-l 1-5' means 1:00 a.m. The verification task will be performed at 5:00. If the verification is not completed before 6:00 that day, the verification will continue until 1:00 am the next day</font>
  • task management script
moshan /data/git/check_data_for_mysql > bash manager.sh -h

Usage: manager.sh

                 [ -a start|stop|continue|pause ]     监控任务的管理动作, 数据校验任务的管理动作
                                                      start : 启动网络监控
                                                      stop|pause|continue : 连同校验脚本一起停掉|暂停|继续

                 [ -t eth0 ]                          网卡设备名, 默认是eth0
                 [ -n 50 ]                            网卡流量超过这个百分比就暂停, 等网卡流量小于这个就继续, 默认是50
                 [ -h ]                               帮助信息


moshan /data/git/check_data_for_mysql > 

The network card can be monitored according to the actual network card information, and data verification will be temporarily suspended when the traffic reaches the specified threshold. This script is mainly for the use of middleware, such as mycat (mysql to mycat). Or tidb (tikv to tidb), which takes up more network bandwidth in this case.

  • <font color='red'>The script must be executed in the home directory of the entire tool</font>
(3) Common command reference
  • Admin script related

    • bash manager.sh -a start -t eth0 -n 30 Start the traffic monitoring of the eth0 network card, and suspend data verification when the traffic reaches 30%
    • bash manager.sh -a pause Pause monitoring and data verification tasks
    • bash manager.sh -a continue continue to monitor and continue data verification
    • bash manager.sh -a stop stop monitoring and stop data verification
  • Main program related

    • bash start.sh -f true -d dbatest -t test1 -l 0-5 Only verify the test table under the dbatest library, and perform the verification task from 0:00 to 5:00
(4) Test Case - Verification Pass Scenario
  • <font color='red'>The log directory must be emptied every time the verification task is performed, so please make a backup of the verification results</font>
  • <font color='red'>It is strongly recommended to enable screen when performing verification tasks</font>
  • <font color='red'>There are network card monitoring requirements, and it is strongly recommended to open screen separately for monitoring when executing monitoring scripts</font>

Step 1: First open a screen to monitor the network

moshan /data/git/check_data_for_mysql > screen -S check_net_3306
moshan /data/git/check_data_for_mysql > bash manager.sh -a start
[ 2022-01-18 11:55:34 ] [ 1000 Mb/s ] [ RX : 2    MB/S ]  [ TX : 2    MB/S ]
[ 2022-01-18 11:55:35 ] [ 1000 Mb/s ] [ RX : 2    MB/S ]  [ TX : 4    MB/S ]
[ 2022-01-18 11:55:36 ] [ 1000 Mb/s ] [ RX : 2    MB/S ]  [ TX : 2    MB/S ]
[ 2022-01-18 11:55:37 ] [ 1000 Mb/s ] [ RX : 2    MB/S ]  [ TX : 3    MB/S ]
[ 2022-01-18 11:55:38 ] [ 1000 Mb/s ] [ RX : 1    MB/S ]  [ TX : 2    MB/S ]
[ 2022-01-18 11:55:39 ] [ 1000 Mb/s ] [ RX : 1    MB/S ]  [ TX : 2    MB/S ]
[ 2022-01-18 11:55:41 ] [ 1000 Mb/s ] [ RX : 1    MB/S ]  [ TX : 2    MB/S ]
[ 2022-01-18 11:55:42 ] [ 1000 Mb/s ] [ RX : 2    MB/S ]  [ TX : 8    MB/S ]

Step 2: Open a new screen to perform the verification task

moshan /data/git/check_data_for_mysql > screen -S check_data_3306
moshan /data/git/check_data_for_mysql > bash start.sh -d dba -t dbatest1 -f true 
[ 2022-01-17 20:32:19 ] [ 成功 ] [ 192.168.1.1 ] [ start.sh/start.sh ] [ f_prepare:130 ] [ 本次数据一致性检查开始 ]
[ 2022-01-17 20:32:19 ] [ 警告 ] [ 192.168.1.1 ] [ start.sh/start.sh ] [ f_main:185 ] [ 本次数据一致性检查将检查如下库 : [dba] ]
[ 2022-01-17 20:32:19 ] [ 成功 ] [ 192.168.1.1 ] [ start.sh/start.sh ] [ f_main:203 ] [ 正在检查dba库 ]

[ 2022-01-17 20:32:19 ] [ 成功 ] [ 192.168.1.1 ] [ func/f_check_diff_for_mysql.sh ] [ f_check_diff_for_mysql:249 ] [ dba.dbatest1 ] [ 表结构一致 ]

[ 2022-01-17 20:32:19 ] [ 成功 ] [ 192.168.1.1 ] [ func/f_check_diff_for_mysql.sh ] [ f_check_diff_for_mysql:491 ] [ dba.dbatest1 ] [ 1,1 ] [ 00 d 00 h 00 m 00 s ] [ 9.09%, (0:0)/1 ] [ 数据一致 ]
[ 2022-01-17 20:32:19 ] [ 成功 ] [ 192.168.1.1 ] [ func/f_check_diff_for_mysql.sh ] [ f_check_diff_for_mysql:491 ] [ dba.dbatest1 ] [ 2,11 ] [ 00 d 00 h 00 m 00 s ] [ 100.00%, (0:0)/1 ] [ 数据一致 ]
[ 2022-01-17 20:32:19 ] [ 成功 ] [ 192.168.1.1 ] [ func/f_check_diff_for_mysql.sh ] [ f_check_diff_for_mysql:504 ] [ dba.dbatest1 ] [ 检查完毕 ]

[ 2022-01-17 20:32:19 ] [ 成功 ] [ 192.168.1.1 ] [ start.sh/start.sh ] [ f_main:242 ] [ 本次数据一致性检查完成 ] [ 通过 ]

moshan /data/git/check_data_for_mysql > 
After the check is completed, it will prompt the check to pass, otherwise the check will fail, as in the following use case.
(5) Test Case - Verification Failed Scenario
  • It is strongly recommended to turn on screen when performing verification tasks

    moshan /data/git/check_data_for_mysql > screen -S check_data_3306
    moshan /data/git/check_data_for_mysql > bash start.sh -d dbatest1 -f true 
    [ 2022-01-17 20:32:09 ] [ 成功 ] [ 192.168.1.1 ] [ start.sh/start.sh ] [ f_prepare:130 ] [ 本次数据一致性检查开始 ]
    [ 2022-01-17 20:32:09 ] [ 警告 ] [ 192.168.1.1 ] [ start.sh/start.sh ] [ f_main:185 ] [ 本次数据一致性检查将检查如下库 : [dbatest1] ]
    [ 2022-01-17 20:32:09 ] [ 成功 ] [ 192.168.1.1 ] [ start.sh/start.sh ] [ f_main:203 ] [ 正在检查dbatest1库 ]
    
    [ 2022-01-17 20:32:09 ] [ 警告 ] [ 192.168.1.1 ] [ func/f_check_diff_for_mysql.sh ] [ f_check_diff_for_mysql:242 ] [ dbatest1.dbatest ] [ 表结构不一致 ] [ a_time name ] [ 跳过该表的检查 ]
    
    
    [ 2022-01-17 20:32:09 ] [ 错误 ] [ 192.168.1.1 ] [ start.sh/start.sh ] [ f_main:232 ] [ 本次数据一致性检查完成 ] [ 不通过 ]
    
    [ 2022-01-17 20:32:09 ] [ 错误 ] [ 192.168.1.1 ] [ start.sh/start.sh ] [ f_main:237 ] [ dbatest1.dbatest:table structure err ]
    moshan /data/git/check_data_for_mysql >

    5. Interpretation of test results

moshan /data/git/check_data_for_mysql > ls -l
total 444
-rw-r--r-- 1 root root 450389 Jan 18 11:56 info.log
drwxr-xr-x 2 root root    194 Jan 18 11:56 list
drwxr-xr-x 2 root root      6 Jan 18 11:56 md5
drwxr-xr-x 6 root root     72 Jan 18 11:56 pri
drwxr-xr-x 5 root root     42 Jan 18 11:52 res
-rw-r--r-- 1 root root     65 Jan 18 11:56 skip.log
moshan /data/git/check_data_for_mysql > 

(1) info.log file

The verification log will record whether the data in the database is consistent one by one. The following is a log record.
[ 2022-01-17 20:32:19 ] [ 成功 ] [ 192.168.1.1 ] [ func/f_check_diff_for_mysql.sh ] [ f_check_diff_for_mysql:491 ] [ dba.dbatest1 ] [ 2,11 ] [ 00 d 00 h 00 m 00 s ] [ 100.00%, (0:0)/1 ] [ 数据一致 ]
  • [ 2022-01-17 20:32:19 ] The first segment is the time to record the log
  • [Success] The second paragraph is the log status
  • [ 192.168.1.1 ] The third paragraph is the ip of the machine that generates the log
  • [ func/f_check_diff_for_mysql.sh ] Which file generates the log in the fourth paragraph
  • [ f_check_diff_for_mysql:491 ] Which function is the fifth paragraph: the log generated by the line number
  • [ dba.dbatest1 ] The sixth paragraph is the log generated for which db and which table
  • [ 2,11 ] The seventh segment is the left and right closed interval of the data block
  • [ 00 d 00 h 00 m 00 s ] The eighth paragraph is the total execution time of the data check for this table
  • [ 100.00%, (0:0)/1 ] The ninth paragraph is the execution progress, in which the parentheses indicate: (number of tables that passed the verification: the number of tables that failed to pass the verification)/total tables that need to be verified the number of
  • [Data consistency] The tenth paragraph is the data consistency state.

(2) list directory

-rw-r--r-- 1 root root  77 Jan 18 11:52 dba_ing.list
-rw-r--r-- 1 root root  77 Jan 18 11:56 dba.list
This directory will record two files for each db, one is the table that has been verified, and the other is the table that is being verified.

(3) md5 directory

The checksum temporary directory for saving data blocks, which can be ignored

(4) pri directory

This directory will create a directory for each db, and then record the pk (pk list) value of the last row of data in the currently verified data block of each table

(5) res directory

This directory is the directory where the verification results are recorded, and there will be three subdirectories.
drwxr-xr-x 2 root root 6 Jan 18 11:52 diff
drwxr-xr-x 2 root root 6 Jan 18 11:52 row
drwxr-xr-x 2 root root 6 Jan 18 11:56 table
  • table : The f_check_diff_for_mysql.sh script will record the SQL of the data block that fails the verification here. This directory will create a directory according to db, and the SQL statement format of the data blocks that fail to pass the verification will be recorded as follows: "table/db/table.log"
  • row : f_check_diff_for_row.sh The script will read the SQL statement in the table directory for re-checking, and then the generated temporary file exists in the row directory, which can be ignored
  • diff : f_check_diff_for_row.sh The script will read the SQL statement in the table directory for re-checking, and then the generated parts that fail to pass the re-checking will be recorded in this directory. The format is as follows: "diff/db.table.num.diff"

This is an example of inconsistency of a data block recorded in the table directory

set tx_isolation='REPEATABLE-READ';set innodb_lock_wait_timeout=1;SELECT '127d04065afd91d587bbb19bc16037a6:mobile_bind', COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#',`id`,`uid`,`count`,`score`,`timestamp`,`info`,`mobile_info`,`del`,CONCAT(ISNULL(`id`),ISNULL(`uid`),ISNULL(`count`),ISNULL(`score`),ISNULL(`timestamp`),ISNULL(`info`),ISNULL(`mobile_info`),ISNULL(`del`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM (select * from tdb_spam_oltp.`mobile_bind` where 1 = 1 and id > 667930554 order by id limit 10000 )a

If a data block is checked and it is found that the data of the two nodes is inconsistent, this SQL will be recorded

  • One is to facilitate the f_check_diff_for_row.sh script to check again
  • Second, it is convenient for users to reconfirm whether it is really inconsistent or because this is hot data, which is being frequently modified during verification

This is an example of inconsistency in a row of data recorded in the diff directory

7974c7974
< 667930554    2    1642491495866595584    100    948895134572797275    2022-01-10 16:01:30    2022-01-11 10:45:04    {"dlvBoid":7877725947093058957}    -667930554
---
> 667930554    1    1642491495866595584    100    948895134572797275    2022-01-10 16:01:30    2022-01-10 16:32:01    {"dlvBoid":7877725947093058957}    -667930554
The same primary key, if the data is inconsistent, it will be recorded in the diff directory in this format

(6) skip.log file

If the check fails, a skip.log file will be generated in the log directory, which records which tables are skipped and the reasons for skipping. If the check passes, there will be no such file.
moshan /data/git/check_data_for_mysql > ls -l log/skip.log 
-rw-r--r-- 1 root root 37 Jan 17 20:35 log/skip.log
moshan /data/git/check_data_for_mysql > cat log/skip.log 
dbatest1.dbatest:table structure err
moshan /data/git/check_data_for_mysql >

6. Write at the end

This tool refers to some ideas of the pt-table-checksum tool and rewrites it based on its own experience. There are still many shortcomings. It is only used for learning and communication. <font color='red'>If you need to use the online environment, Please fully test in the test environment. </font>


爱可生开源社区
426 声望211 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。