1. Getting to know MGR for the first time
I believe many people are relatively unfamiliar with the term MGR. In fact, MGR (full name MySQL Group Replication) is a brand-new highly available and highly scalable solution launched by Oracle MySQL in December 2016 when MySQL 5.7.17 was released. It has the following characteristics:
- high consistency , based on native replication and group replication technology of the Paxos protocol, and provided as a plug-in to provide consistent data security guarantee;
- High fault tolerance , as long as most of the nodes are not broken, you can continue to work, there is an automatic detection mechanism, when different nodes conflict in resource contention, there will be no errors, according to the first come first principle to deal with, and built-in Automatic split brain protection mechanism;
- High scalability , the addition and removal of nodes are automatic. After a new node is added, it will automatically synchronize the status from other nodes until the new node is consistent with other nodes. If a node is removed, the other The node automatically updates the group information and automatically maintains the new group information;
- High flexibility , there are single-master mode (Figure 1) and multi-master mode (Figure 2), in the single-master mode, the master will be automatically selected, all update operations are performed on the master; in the multi-master mode, all servers are Update operations can be processed at the same time.
Single master mode (Figure 1)
Multi-master mode (Figure 2)
The MGR architecture diagram is as follows: It mainly includes the APIs layer, the component layer, which is responsible for the composition of the protocol module and the API+Paxos engine layer.
2. MGR technology evolution
2.1 Master-slave replication
Traditional MySQL replication provides a simple master-slave replication method by default. This architecture has a master and one or more slaves. When the master node executes the commit transaction, it is sent to other slave nodes in an asynchronous manner, and the relay is executed again from the library. The log content achieves the goal of consistency between the master and the copy. By default, the data of all nodes in the cluster is consistent.
MySQL asynchronous replication
2.2 Semi-synchronous replication
Asynchronous replication has a certain risk of data loss. MySQL has introduced semi-synchronous replication in version 5.6. A synchronous operation has been added to the synchronous data protocol. This means that the master node needs to confirm that at least one slave node confirms the receipt and returns during the commit operation. ACK, only in this way can the master node submit the data correctly.
MySQL semi-synchronous replication
2.3 Group Copy
MySQL MGR cluster is a distributed cluster composed of at least 3 server nodes, a share-nothing replication scheme, and each server node has a complete copy.
MySQL Group Replication Agreement
Three, MGR technical characteristics
3.1 Fault detection
Group replication comes with a fault detection mechanism, which can report which group member is unresponsive, and how to determine whether the member is excluded from the cluster group. Failure detection in group replication is a distributed service. Assuming that server A does not receive a message from server B within a predetermined period of time, if other members of the group also do not receive a message from server B, then confirm that B is faulty, so that other members determine that they will lose contact with the group member Exclude from the cluster.
At this time, server B cannot contact other service nodes. Due to the inability to reach the minimum number of arbitration members, they are in an independent state and cannot provide external services.
3.2 Fault tolerance
MySQL group replication is implemented on the basis of Paxos distributed algorithm to provide distributed coordination between different servers. Therefore, it requires most servers to be active in order to reach the number of arbitration members in order to make a decision. This has a direct impact on the number of faults that the system can tolerate without affecting itself and its overall function. The number of servers required to tolerate f failures (n) n = 2 * f + 1.
In practice, this means that in order to tolerate a failure, the group must have three servers. If one server fails, there are still two servers forming a majority (two-thirds) to allow the system to continue running automatically. However, if the second server goes down unexpectedly, the group is locked (only one server), because the election can be reached without a majority (you cannot elect yourself). The following is a small table illustrating the above formula:
3.3 Member management
Group replication is based on the Group View (hereinafter referred to as view) for member management. The view is generally the member state of the Group within a period of time. If there is no member change during this period, that is to say, there is no member joining or exiting. Once a member joins or exits the group, the view changes, and the view ID (view id) is used to track the change and distinguish the time. Let’s look at a picture to demonstrate:
In the serial number part, when initializing, the serial number of the first view starts from 1, and the member has only one leader. In order to initialize the node, the serial number of any member that appears in the future needs to be increased by 1, which can be passed through the performance\_schema system library. Query the current view in the replication\_group\_member\_stats table under.
Four, MGR installation experience
To understand any new technology, starting from deployment, installation is relatively simple. We prepare the following test nodes:
10.10.1.214
10.10.1.217
10.10.6.91
The installed version is the latest version 8.0.24. After decompressing the installation package, initialize it:
su - mysql
wget http://mirrors.ustc.edu.cn/mysql-ftp/Downloads/MySQL-8.0/mysql-8.0.24-linux-glibc2.12-x86_64.tar
tar -xf mysql-8.0.24-linux-glibc2.12-x86_64.tar
cd mysql-8.0.24-linux-glibc2.12-x86_64
# 创建配置文件和数据目录
mkdir conf data
初始化数据库并且启动
./bin/mysqld --initialize --datadir=/home/mysql/mysql-8.0.24-linux-glibc2.12-x86_64/data --basedir=/home/mysql/mysql-8.0.24-linux-glibc2.12-x86_64
./bin/mysqld_safe --defaults-file=conf/my.cnf &
4.1 General configuration instructions
Configuration code
[mysqld]
bind-address=0.0.0.0
datadir=/home/mysql/mysql-8.0.24-linux-glibc2.12-x86_64/data
basedir=/home/mysql/mysql-8.0.24-linux-glibc2.12-x86_64
port=3306
socket=/home/mysql/mysql-8.0.24-linux-glibc2.12-x86_64/data/mysqld.sock
user=mysql
# 每个节点要求不一样
server_id=1
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW
innodb_buffer_pool_size=1g
# 8.0 默认值XXHASH64,针对写事务进行哈希处理
transaction_write_set_extraction=XXHASH64
# 启动加载组复制插件
plugin_load_add='group_replication.so'
# 集群唯一ID
group_replication_group_name="8d3cebd8-b132-11eb-8529-0242ac130003"
# 是否启动MySQL服务时启动组复制,建议值:off
group_replication_start_on_boot=off
# 本地IP后面端口33061可自定义,集群通信端口,建议统一端口
group_replication_local_address= "10.10.1.214:33061"
# 初始化集群成员列表,可动态修改
group_replication_group_seeds= "10.10.1.214:33061,10.10.1.217:33061,10.10.6.91:33061"
# 判断是否为引导组
group_replication_bootstrap_group=off
# 设置白名单,这里特别注意,如果是同网段可以不用设置,如果是不同网段则需要修改否则通信端口不可访问
loose-group_replication_ip_whitelist='10.10.1.214,10.10.1.217,10.10.6.91'
4.2 Single master mode deployment
4.2.1 Boot node initialization
# 创建用户和安装插件
mysql> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE USER rpl_user@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.01 sec)
mysql> GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT BACKUP_ADMIN ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)
mysql> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0.00 sec)
mysql> CHANGE REPLICATION SOURCE TO SOURCE_USER='rpl_user', SOURCE_PASSWORD='password' FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.05 sec)
mysql> INSTALL PLUGIN group_replication SONAME 'group_replication.so';
mysql> SHOW PLUGINS;
+---------------------------------+----------+--------------------+----------------------+---------+
| Name | Status | Type | Library | License |
+---------------------------------+----------+--------------------+----------------------+---------+
| group_replication | ACTIVE | GROUP REPLICATION | group_replication.so | GPL |
+---------------------------------+----------+--------------------+----------------------+---------+
# 启动引导节点
mysql> SET GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.00 sec)
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected, 1 warning (2.33 sec)
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 4cf69361-b22b-11eb-a2c9-fa163ebefc6a | 10-10-1-214 | 3306 | ONLINE | PRIMARY | 8.0.24 |
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
4.2.2 Join slave node
mysql> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE USER rpl_user@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.01 sec)
mysql> GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.03 sec)
mysql> GRANT BACKUP_ADMIN ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)
mysql> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0.00 sec)
mysql> CHANGE REPLICATION SOURCE TO SOURCE_USER='rpl_user', SOURCE_PASSWORD='password' FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.05 sec)
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected, 1 warning (3.33 sec)
# 检查状态
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 4cf69361-b22b-11eb-a2c9-fa163ebefc6a | 10-10-1-214 | 3306 | ONLINE | PRIMARY | 8.0.24 |
| group_replication_applier | 53f39dba-b22b-11eb-bfdb-fa163e42784d | 10-10-1-217 | 3306 | ONLINE | SECONDARY | 8.0.24 |
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)
其他一个节点执行上述即可,执行完成后检查
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 4cf69361-b22b-11eb-a2c9-fa163ebefc6a | 10-10-1-214 | 3306 | ONLINE | PRIMARY | 8.0.24 |
| group_replication_applier | 53f39dba-b22b-11eb-bfdb-fa163e42784d | 10-10-1-217 | 3306 | ONLINE | SECONDARY | 8.0.24 |
| group_replication_applier | 56779526-b22b-11eb-a28e-fa163e1f9809 | 10-10-6-91 | 3306 | ONLINE | SECONDARY | 8.0.24 |
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)
4.3 Multi-primary mode deployment
The multi-master mode is similar to the single-master deployment mode, which is only executed when joining the cluster:
set global group_replication_single_primary_mode=off;
The single master is ON.
4.3.1 Boot node initialization
mysql> set global group_replication_single_primary_mode=off;
Query OK, 0 rows affected (0.00 sec)
mysql> SET GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.00 sec)
mysql> start group_replication;
Query OK, 0 rows affected, 1 warning (2.16 sec)
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 4cf69361-b22b-11eb-a2c9-fa163ebefc6a | 10-10-1-214 | 3306 | ONLINE | PRIMARY | 8.0.24 |
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
1 row in set (0.00 sec)
4.3.2 Join other nodes
mysql> set global group_replication_single_primary_mode=off;
Query OK, 0 rows affected (0.00 sec)
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected, 1 warning (3.26 sec)
mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 4cf69361-b22b-11eb-a2c9-fa163ebefc6a | 10-10-1-214 | 3306 | ONLINE | PRIMARY | 8.0.24 |
| group_replication_applier | 53f39dba-b22b-11eb-bfdb-fa163e42784d | 10-10-1-217 | 3306 | ONLINE | PRIMARY | 8.0.24 |
| group_replication_applier | 56779526-b22b-11eb-a28e-fa163e1f9809 | 10-10-6-91 | 3306 | ONLINE | PRIMARY | 8.0.24 |
+---------------------------+--------------------------------------+-------------------------------------------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)
4.4 Test experience
# 在任意节点执行
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.00 sec)
mysql> create database test;
Query OK, 1 row affected (0.01 sec)
# 任意节点查询
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
| test |
+--------------------+
5 rows in set (0.00 sec)
Five, application scenarios
- MGR adopts a multi-copy mode. In 2N+1 clusters, as long as N+1 nodes are still alive, the database can provide services stably. It is suitable for financial scenarios, because these scenarios must have zero data loss, and the availability is within 4 9 Even five nines.
- It is suitable to replace the current master-slave high-availability version and solve the single-point writing problem.
- An infrastructure environment that requires flexible expansion of nodes for business, such as a private cloud.
Six, summary
Although MySQL launched the MGR function in 2016, we also know that there are many benefits, and there are bold companies that use it for testing and even deploying online environments. According to public information, Netease and Didi both use it, and some domestic commercial banks also have it. Used, but many people are still on the sidelines, mainly due to the following reasons:
Demand is not particularly strong
- In many business situations, the use of MySQL semi-synchronous and asynchronous replication is sufficient to meet business requirements, and the cooperation with MHA third-party components meets most of the scene requirements.
distributed new things
- The concept of distributed itself has existed for many years, but due to the short launch of MGR, and we searched the official bug library, there are still many unresolved bugs. It is more difficult for users to use and troubleshoot problems, and it is also a hindrance that it is difficult to reproduce the problems due to the distributed design.
Ecologically immature
- There is almost no official solution that is fully mature to build a complete set of high-availability architecture. If you want to use it on a large scale, you still need a more mature ecosystem.
Anything new has a process of being accepted by the public, but it takes time to screen and sharpen it.
Reference documents
https://dev.mysql.com/doc/refman/8.0/en/group-replication.html
Author: vivo Internet database team-Liu Shilin
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。