Failure analysis | mysql mgr multi-master data can not be updated case analysis - 个人文章

Author: Fu Xiang
Now living in Zhuhai, mainly responsible for the maintenance of Oracle, MySQL, mongoDB and Redis.
Source of this article: original submission
* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.

1. Failure phenomenon

A set of MGR three-node multi-master environment (5.7.25) that has been running for nearly two years. After a batch of data is successfully imported at node 1, a feedback program is developed to modify this batch of data and report errors. The error information is as follows:

update match_equip set name = ?, type = ?, equips = ?,score = ? where id = ? and
person_id = ?
Received #3101 error from MySQL server: "Plugin instructed the server to rollback
the current transaction."

1.1. Attempt failure recovery operation 1

After preliminary analysis, it was found that the imported data can be updated on import node 1, but failed to update on other nodes. It is suspected that there is a problem with node 1. Based on the principle of rapid recovery of failure, I asked the development and found that node 1 can be restarted. Restart. After restarting, you cannot join the group replication. It seems that restarting Dafa will not work well. The error message is as follows:

2021-05-27T07:37:53.290267Z 0 [ERROR] Plugin group_replication reported: 'This
member has more executed transactions than those present in the group. Local
transactions: 91f9d301-c234-11e9-

b15f-fa163e13423a:1-156817757:156843131-157503127:158192163-158412212,

a71d98a2-c234-11e9-b6db-fa163e3407f8:1-92,

eba21052-c250-11e9-b0d0-fa163e134234:1-3 > Group transactions: 91f9d301-c234-
11e9-b15f-fa163e13423a:1-156817825:156843131-157503172:158192163-158412212,

eba21052-c250-11e9-b0d0-fa163e134234:1-3'

2021-05-27T07:37:53.290348Z 0 [ERROR] Plugin group_replication reported: 'The
member contains transactions not present in the group. The member will now exit
the group.'

Local transactions：

Local transactions:

91f9d301-c234-11e9-b15f-fa163e13423a:1-156817757:156843131-157503127:158192163-
158412212,

a71d98a2-c234-11e9-b6db-fa163e3407f8:1-92,

eba21052-c250-11e9-b0d0-fa163e134234:1-3

Group transactions：

Group transactions:

91f9d301-c234-11e9-b15f-fa163e13423a:1-156817825:156843131-157503172:158192163-
158412212,

eba21052-c250-11e9-b0d0-fa163e134234:1-3

By comparison, gitd collection of the local node has executed a71d98a2-c234-11e9-b6db-fa163e3407f8:1-92 gtid not in the Group transactions, that is not a Group transactions Local transactions subset, can not be verified, added to the cluster fail.

1.2. Attempt to recover from failure 2

After joining the cluster through physical backup and restoration of normal nodes, an error is still reported:

2021-05-27T08:35:01.331704Z 13 [ERROR] Plugin group_replication reported: 'The
certification information could not be set in this server: 'Certification
information is too large for transmission.''

2021-05-27T08:35:01.331752Z 13 [ERROR] Plugin group_replication reported: 'Error
when processing Certification information in the Recovery process'

2021-05-27T08:35:01.331762Z 13 [ERROR] Plugin group_replication reported: 'Fatal
error during the Recovery process of Group Replication. The server will leave the
group.

Failed to join the cluster because the transmitted cluster conflict check information is too large:

Certification information is too large for transmission

View current cluster conflict detection information:

root@3306 performance_schema> select COUNT_TRANSACTIONS_ROWS_VALIDATING from
performance_schema.replication_group_member_stats;

+------------------------------------+
| COUNT_TRANSACTIONS_ROWS_VALIDATING |
+------------------------------------+
| 11239426 |
+------------------------------------+
1 row in set (0.00 sec)

There are currently more than 10 million records of conflict detection information. It is precisely because this information is too large that joining the cluster fails. At this point, it is estimated that some people will think that they have encountered a mysql bug, but it is not. To solve this problem, we have to start with the mgr transaction conflict detection mechanism.

2. Failure analysis and recurrence

2.1. Transaction authentication mechanism

Reference article: https://zhuanlan.zhihu.com/p/41175310

The MGR transaction authentication module is used to determine whether the transaction entering the MGR is to be submitted or needs to be rolled back. The basis for this function is to determine whether the gtid_set of each primary key updated by the transaction in the conflict detection database (certification_info) is the snapshot version of the transaction snapshot_version If it is a subset of, commit, otherwise rollback.

For transactions that can continue to be committed, GTID is assigned to them. Subsequently, each primary key contained in the transaction writeset will be inserted/updated to certification_info. It is foreseeable that as more and more transactions are certified, there will be more and more records in certification_info.

How to clean up the records in certification_info? Obviously, if a transaction A has been authenticated, it has been submitted on each node of the MGR cluster, that is to say, the gtid_executed of each node contains the GTID of the transaction. Due to the global order of the transaction in the cluster, it has not been The authenticated transaction must be executed after the transaction A of this node or at the same time (without interdependence), then it can be determined that the subsequent transactions that need to be authenticated in the MGR will not conflict with the transaction A. Therefore, all records in certification_info whose snapshot version (snapshot_version) is a subset of the snapshot version of transaction A can be cleaned up. Each node of MGR will broadcast its own gtid_executed every 60s. Each node collects a complete round of node gtid_executed and takes the intersection (stable_gtid_set), and then implements the cleanup operation based on the intersection.

When clearing certification_info, you need to traverse each record and compare the GTID_SET with stable_gtid_set. If the record GTID_SET is a subset of stable_gtid_set, it can be cleared.

2.2. Failure analysis

2.2.1. Currently there are 11239426 records of certification_info in mgr, and mgr cleans up every 60s. Why is it so large?

There is an independent local transaction a71d98a2-c234-11e9-b6db-fa163e3407f8:1-92 node 1. When dml is performed on node 1, as long as the conflict authentication is passed, the primary keys contained in the transaction a71d98a2-c234-11e9-b6db-fa163e3407f8:1-92 will be inserted/updated to certification_info, and the snapshot_version information will have the local transaction identifier 0612c9e1af2da0, when certification_info When cleaning up, the intersection of each node gtid_executed will never include the a71d98a2-c234-11e9-b6dbfa163e3407f8:1-92 set 0612c9e1af2da2, so the certification_info generated by the 1 node dml operation will never be cleaned up. Over time, certification_info has reached more than 10 million.

2.2.2. Why can't the data after node 1 dml operation be updated in other nodes?

Assumptions:

The server_uuid of nodes 1, 2, and 3 are respectively A, B, C, and the global transaction uuid is: G

There is a local independent transaction on node 1: A:1-XXX

Node 1 Executed_Gtid_Set : G:1-XXX,A:1-XXX

Node 2 Executed_Gtid_Set : G:1-XXX

Node 3 Executed_Gtid_Set : G:1-XXX

The certification_info information generated by node 1 dml G:1-XXX,A:1-XXX , when other nodes update the modified data of node 1, the snapshot_version in certification_info: G:1-XXX,A:1-XXX , the current transaction snapshot version G:1-XXX , G:1-XXX,A:1-XXX it is impossible to G:1-XXX subset of 0612c9e1af2e51, 0612c9e1af2e51, 0612c9e1e1af Transaction rollback:

Plugin instructed the server to rollback the current transaction

2.3. Failure simulation recurrence

2.3.1. Environment preparation, construction node 3 contains local affairs

Node 2 and node 3 deliberately do not close the parameter sql_log_in modify the initial password of root to generate a local gtid:

[root@mysql.sock][(none)]> show master status;

ERROR 1820 (HY000): You must reset your password using ALTER USER statement
before executing this statement.

[root@mysql.sock][(none)]>

[root@mysql.sock][(none)]>

[root@mysql.sock][(none)]> alter user root@'localhost' identified by '***';

Query OK, 0 rows affected (0.05 sec)

[root@mysql.sock][(none)]> show master status;

+------------------+----------+--------------+------------------+----------------
------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set
|
+------------------+----------+--------------+------------------+----------------
------------------------+
| MySql-bin.000002 | 391 | | | f1b251a9-bf54-11eb-a4fd-
000c299b4d6c:1 |
+------------------+----------+--------------+------------------+----------------
------------------------+

1 row in set (0.00 sec)

Next, each node performs some initialization operations in turn:

SET SQL_LOG_BIN=0;

CREATE USER rpl_user@'%' IDENTIFIED BY '*****';

GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%';

FLUSH PRIVILEGES;

SET SQL_LOG_BIN=1;

CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD=''*****' FOR CHANNEL
'group_replication_recovery';

INSTALL PLUGIN group_replication SONAME 'group_replication.so';

2.3.1.1. Node 1 as the boot node to start the mgr cluster

SET GLOBAL group_replication_bootstrap_group=ON;

START GROUP_REPLICATION;

SET GLOBAL group_replication_bootstrap_group=OFF;

2.3.1.2. Node 2 joins the cluster:

start group_replication failed on node 2:

[ERROR] Plugin group_replication reported: 'This member has more executed
transactions than those present in the group. Local transactions: 332ee50a-bf55-
11eb-b407-000c293fd895:1 > Group transactions: 91f9d3

01-c234-11e9-b15f-fa163e13423a:1,

f1b251a9-bf54-11eb-a4fd-000c299b4d6c:1-2'

reset master , reset the binlog and then join successfully.

2.3.1.3. Node 3 joins the cluster:

In the same way, 3 nodes joining the cluster also fail, by setting the parameters:

set global group_replication_allow_local_disjoint_gtids_join=1

Successfully join the cluster after not verifying local transactions.

2.3.2. Fault simulation and recovery

2.3.2.1. Current environment information: Node 3 contains local affairs

root@mysql.sock][fxtest]> select * from
performance_schema.replication_group_members;

+---------------------------+--------------------------------------+-------------
+-------------+--------------+

| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT |
MEMBER_STATE |

+---------------------------+--------------------------------------+-------------
+-------------+--------------+

| group_replication_applier | 332ee50a-bf55-11eb-b407-000c293fd895 | fxtest02 |
3307 | ONLINE |

| group_replication_applier | 34668704-bf55-11eb-b120-000c29ed3768 | fxtest03 |
3307 | ONLINE |

| group_replication_applier | f1b251a9-bf54-11eb-a4fd-000c299b4d6c | fxtest01 |
3307 | ONLINE |

+---------------------------+--------------------------------------+-------------
+-------------+--------------+

View Executed_Gtid_Set for each node:

#节点1：

root@mysql.sock][fxtest]> show master status;

+------------------+----------+--------------+------------------+----------------
---------------------------+

| File     | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set
       |

+------------------+----------+--------------+------------------+----------------
---------------------------+

| MySql-bin.000001 | 1152 | | | 91f9d301-c234-11e9-b15ffa163e13423a:1-29 |

+------------------+----------+--------------+------------------+----------------
---------------------------+

#节点2：
root@mysql.sock][fxtest]> show master status;

+------------------+----------+--------------+------------------+----------------
---------------------------+

| File     | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set
        |

+------------------+----------+--------------+------------------+----------------
---------------------------+

| MySql-bin.000001 | 1152 | | | 91f9d301-c234-11e9-b15ffa163e13423a:1-29 |

+------------------+----------+--------------+------------------+----------------
---------------------------+

1 row in set (0.02 sec)

#节点3：

root@mysql.sock][performance_schema]> show master status;

+------------------+----------+--------------+------------------+----------------
-------------------------------------------------------------------+

| File      | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set
        |

+------------------+----------+--------------+------------------+----------------
-------------------------------------------------------------------+

| MySql-bin.000001 | 1319 | | | 34668704-bf55-11eb-b120-000c29ed3768:1,

91f9d301-c234-11e9-b15f-fa163e13423a:1-29 |

+------------------+----------+--------------+------------------+----------------
-------------------------------------------------------------------+

1 row in set (0.00 sec)

You can see that node 3 has one more local transaction 34668704-bf55-11eb-b120-000c29ed3768:1

2.3.2.2. Scenario 1: After node 3 adds data, other nodes are updated

Execute the new data on node 3:

root@mysql.sock][fxtest]> create table mgr_test1(id int primary key,name
varchar(10));

Query OK, 0 rows affected (0.02 sec)

[root@mysql.sock][fxtest]> insert into mgr_tes
t1 values(1,'a');

Query OK, 1 row affected (0.00 sec)

Node 1 performs the modification:

root@mysql.sock][fxtest]> update mgr_test1 set name='aa' where id=1;

ERROR 3101 (HY000): Plugin instructed the server to rollback the current
transaction.

[root@mysql.sock][fxtest]>

Node 2 performs the modification:

root@mysql.sock][fxtest]> update mgr_test1 set name='aa' where id=1;

ERROR 3101 (HY000): Plugin instructed the server to rollback the current
transaction.

[root@mysql.sock][fxtest]>

Conclusion: Nodes 1 and 2 cannot modify the newly added data of node 3.

2.3.2.3. Scenario 2: Other nodes add new data, after node 3 is modified, other nodes are updated again

Node 1:

root@mysql.sock][fxtest]> insert into mgr_test1 values(11,'aa');

Query OK, 1 row affected (0.01 sec)

Node 2:

root@mysql.sock][fxtest]> insert into mgr_test1 values(22,'aa');

Query OK, 1 row affected (0.00 sec)

Node 3:

root@mysql.sock][fxtest]> update mgr_test1 set name='11' where id=11;

Query OK, 1 row affected (0.01 sec)

Rows matched: 1 Changed: 1 Warnings: 0

[root@mysql.sock][fxtest]> update mgr_test1 set name='22' where id=22;

Query OK, 1 row affected (0.01 sec)

Rows matched: 1 Changed: 1 Warnings: 0

After that, nodes 1 and 2 modify the data and report an error:

root@mysql.sock][fxtest]> update mgr_test1 set name='aaa' where id=11;

ERROR 3101 (HY000): Plugin instructed the server to rollback the current
transaction.

[root@mysql.sock][fxtest]> update mgr_test1 set name='aaa' where id=22;

ERROR 3101 (HY000): Plugin instructed the server to rollback the current
transaction.

[root@mysql.sock][fxtest]>

2.3.2.4. Fault repair

2.3.2.5. Scenario 1: The binlog corresponding to the local transaction of node 3 still exists, how to fix it

You only need to restart the node 1 and 2 replication to synchronize the local transactions of the node 3. The following demonstrates restarting node 1:

root@mysql.sock][fxtest]> show master status;

+------------------+----------+--------------+------------------+----------------
-------------------------------------------+

| File   | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set
       |

+------------------+----------+--------------+------------------+----------------
-------------------------------------------+

| MySql-bin.000001 | 3386 | | | 91f9d301-c234-11e9-b15ffa163e13423a:1-35:1000029:2000029 |

+------------------+----------+--------------+------------------+----------------
-------------------------------------------+

1 row in set (0.03 sec)

[root@mysql.sock][fxtest]> stop group_replication;

Query OK, 0 rows affected (9.73 sec)

[root@mysql.sock][fxtest]> start group_replication;

Query OK, 0 rows affected (3.24 sec)

[root@mysql.sock][fxtest]> show master status;

+------------------+----------+--------------+------------------+----------------
---------------------------------------------------------------------------------
--+

| File   | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set
      |

+------------------+----------+--------------+------------------+----------------
---------------------------------------------------------------------------------
--+

| MySql-bin.000001 | 4935 | | | 34668704-bf55-11eb-b120-
000c29ed3768:1,
91f9d301-c234-11e9-b15f-fa163e13423a:1-36:1000029:2000029 |

+------------------+----------+--------------+------------------+----------------
---------------------------------------------------------------------------------
--+

1 row in set (0.00 sec)

Restart the node group No. 1 to replicate successfully, and the local transaction 34668704-bf55-11eb-b120-000c29ed3768:1 node 3 is also synchronized.

2.3.2.6. Scenario 2: The binlog corresponding to the local transaction of node 3 is no longer there, how to fix it

Clear the binlog of node 3, simulate that the binlog corresponding to the local transaction is deleted, close the replication of the 1 node group, and restart the replication of the 2 node group so that it can only be synchronized from the 3 node.

#清空节点3本地事务对应binlog

root@mysql.sock][fxtest]> show binary logs;

+------------------+-----------+
| Log_name | File_size |
+------------------+-----------+
| MySql-bin.000001 | 4968 |
+------------------+-----------+

1 row in set (0.00 sec)

[root@mysql.sock][fxtest]> flush logs;

Query OK, 0 rows affected (0.00 sec)

[root@mysql.sock][fxtest]> show binary logs;

+------------------+-----------+
| Log_name | File_size |
+------------------+-----------+
| MySql-bin.000001 | 5011 |

| MySql-bin.000002 | 262 |
+------------------+-----------+

2 rows in set (0.00 sec)

[root@mysql.sock][fxtest]> purge binary logs to 'MySql-bin.000002';

Query OK, 0 rows affected (0.01 sec)

[root@mysql.sock][fxtest]> show binary logs;

+------------------+-----------+
| Log_name | File_size |
+------------------+-----------+
| MySql-bin.000002 | 262 |
+------------------+-----------+

1 row in set (0.00 sec)

Close the replication of node group 1:

root@mysql.sock][fxtest]> stop group_replication;

Query OK, 0 rows affected (9.57 sec)

Restarting the replication of the No. 2 node group failed. The No. 2 node synchronizes data from the No. 3 node. The binlog corresponding to the local transaction of the No. 3 node is deleted. Therefore, the No. 2 node fails to join the cluster and has been in the recovering state. The log of the No. 2 node is as follows:

2021-05-28T09:58:28.091303Z 276 [ERROR] Error reading packet from server for
channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER
TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing
GTIDs that the slave requires. Replicate the missing transactions from elsewhere,
or provision a new slave from backup. Consider increasing the master's binary log
expiration period. The GTID set sent by the slave is '91f9d301-c234-11e9-b15ffa163e13423a:1-36:1000029:2000029', and the missing transactions are '34668704-
bf55-11eb-b120-000c29ed3768:1'. (server_errno=1236)

root@mysql.sock][fxtest]> select * from
performance_schema.replication_group_members;

+---------------------------+--------------------------------------+-------------
+-------------+--------------+

| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT |
MEMBER_STATE |

+---------------------------+--------------------------------------+-------------
+-------------+--------------+

| group_replication_applier | 332ee50a-bf55-11eb-b407-000c293fd895 | fxtest02 |
3307 | RECOVERING |

| group_replication_applier | 34668704-bf55-11eb-b120-000c29ed3768 | fxtest03 |
3307 | ONLINE |

Therefore, node 2 can only join the cluster through reset master and set global gtid_purged:

stop group_replication;

reset master;

set global gtid_purged='91f9d301-c234-11e9-b15f-fa163e13423a:1-
36:1000029:2000029,34668704-bf55-11eb-b120-000c29ed3768:1';

start group_replication;

3. Summary

When building MGR:

1. Make sure that mgr is clean, that is, it only contains global gtid transactions and does not contain local gtid transactions. In the initialization phase before starting mgr, turn off writing binlog before operating commands.

2. When a member joins the cluster, start group replication and report an error:

This member has more executed transactions than those present in the group

This shows that the environment is not clean and there are local affairs. Do not set the parameter group_replication_allow_local_disjoint_gtids_join=1 This will leave hidden dangers in the future. It is estimated that the official has realized the seriousness of this problem and was removed after the 8.0.4 version:

group_replication_allow_local_disjoint_gtids_join: Allow current server to join
group even if it has transactions not present in group. Removed in MySQL 8.0.4.

Failure analysis | mysql mgr multi-master data can not be updated case analysis