MySql master-slave synchronization introduction

1 Introduction Hello everyone, Mysql is the most commonly used database for everyone. The following is a sharing of knowledge points about mysql master-slave synchronization, so as to consolidate the basic knowledge of mysql. If there are any mistakes, please correct me.

2 MySql master-slave synchronization overview
MySQL master-slave synchronization, or MySQL Replication, can synchronize data from one database server to multiple database servers. The MySQL database has its own master-slave synchronization function. After configuration, it can realize master-slave synchronization of various schemes based on the library and table structure.

Redis is a high-performance in-memory database, but it is not the protagonist of today; MySQL is a relational database based on disk files. Compared with Redis, the reading speed will be slower, but it is powerful and can be used for storage and persistence. The data. In actual work, we often use Redis as a cache in conjunction with MySQL. When there is a data access request, it will first look up from the cache, and if it exists, it will be taken out directly. If it does not exist, the database will be accessed again, which improves the The efficiency of reading also reduces the access pressure on the back-end database. Using Redis as a cache architecture is a very important part of a high-concurrency architecture.

With the continuous growth of business volume, the pressure on the database will continue to increase, and the frequent changes of the cache are also strongly dependent on the query results of the data, resulting in low data query efficiency, high load, and excessive connections. For e-commerce scenarios, there are often many typical scenarios of reading more and writing less. We can use MySQL as the master-slave architecture and separate read and write, so that the master server (Master) handles write requests, and the slave server (Slave) handles read requests. This can further improve the concurrent processing capability of the database. As shown below:

In the figure above, we can see that we have added 2 slave libraries, which can resist a large number of read requests together and share the pressure of the main library. The slave library will continuously synchronize data from the master library through master-slave replication, so as to ensure the consistency of the slave library data and the master library data. Next, let's take a look at the role of master-slave synchronization and how master-slave synchronization is implemented. 3 The role of master-slave synchronization

3 The role of master-slave synchronization Generally speaking, not all systems need to design a master-slave architecture for the database, because the architecture itself has a certain cost. If our purpose is to improve the efficiency of high concurrent access to the database, Then we should first optimize SQL statements and indexes to give full play to the maximum performance of the database; secondly, adopt caching strategies, such as using Redis, Magodb and other caching tools, to save data in the in-memory database through its high performance advantages to improve reading Finally, the master-slave architecture is used for the database to separate read and write. The use and maintenance costs of the system are gradually increased according to the upgrade of the architecture.

Closer to home, master-slave synchronization can not only improve the throughput of the database, but also play a role in the following three aspects:

3.1 Read-write separation We can synchronize data through master-slave replication, and then improve the concurrent processing capability of the database through read-write separation. Simply put, our data is placed in multiple databases, one of which is the Master master library, and the rest are Slave slave libraries. When the data of the main library changes, the data will be automatically synchronized to the slave library, and our program can read data from the slave library, that is, the read-write separation method is adopted. E-commerce applications are often "read more and write less", and higher concurrent access is achieved by using read-write separation. Originally, all the read and write pressure was borne by one server, but now there are multiple servers that jointly handle read requests, reducing the pressure on the main library. In addition, it can also perform load balancing on the slave servers, so that different read requests are evenly distributed to different slave servers according to the strategy, so that the reading is smoother. Another reason for smooth reading is to reduce the impact of the lock table. For example, we let the main library be responsible for writing. When the main library has a write lock, it will not affect the query operation of the slave library.
3.2 Data Backup Master-slave synchronization is also equivalent to a data hot backup mechanism, which is backed up under the normal operation of the master database without affecting the provision of data services.
3.3 High Availability Data backup is actually a redundant mechanism, through which the high availability of the database can be exchanged. When the server fails, downtime, etc., it can quickly fail over. , let the slave library act as the master library to ensure the normal operation of the service. You can learn about the high availability SLA indicators of the e-commerce system database.

4 The principle of master-slave synchronization Speaking of the principle of master-slave synchronization, we need to understand an important log file in the database, the Binlog binary file, which records the events of updating the database. In fact, the master-slave synchronization The principle is to synchronize data based on Binlog.
In the process of master-slave replication, it will operate based on three threads, one is the binlog dump thread, which is located on the master node, and the other two threads are the I/O thread and the SQL thread, which are located on the slave node respectively, as follows picture:

Combined with the above pictures, let's take a look at the core process of master-slave replication:

When the master node receives a write request, the write request may be an addition, deletion and modification operation, and the update operation of the write request will be recorded in the binlog log.
The master node will copy the data to the slave nodes, such as the slave01 and slave02 nodes in the figure. In this process, each slave node must first be connected to the master node. When the slave node is connected to the master node, the master node will be Each slave node creates a binlog dump thread to send binlog logs to each slave node.
The binlog dump thread will read the binlog log on the master node, and then send the binlog log to the I/O thread on the slave node. When the main library reads the event, it will lock on Binglog, and after the reading is completed, the lock will be released.
After the I/O thread on the slave node receives the binlog log, it will first write the binlog log to the local relaylog, and the binlog log is saved in the relaylog.
The SQL thread on the slave node will read the binlog log in the relaylog, parse it into specific addition, deletion and modification operations, and redo the operations performed on the master node on the slave node to achieve the data. The effect of restoration, so that the data consistency of the master node and the slave node can be guaranteed.

The data content of master-slave synchronization is actually binary log (Binlog). Although it is called binary log, it actually stores one event after another. These events correspond to database update operations, such as INSERT, UPDATE, DELETE. Wait.

In addition, we also need to note that not all versions of MySQL have the server's binary log enabled by default. When performing master-slave synchronization, we need to check whether the server has enabled the binary log.

Binary log, it is a file, there must be some delay in the process of network transmission, such as 200ms, which may cause the data read by the user from the library to be not the latest data, which will also cause master-slave synchronization Data inconsistency occurs. For example, when we update a record, this operation is completed on the main library, and in a very short period of time, such as 100ms, the same record is read again. At this time, the slave library has not completed data synchronization, then , the data we read from the library is an old piece of data. What should I do in this situation?

5 How to solve the data consistency problem of master-slave synchronization

It is conceivable that if the data we want to operate is stored in the same database, then when updating the data, we can add a write lock to the record, so that there will be no data inconsistency when reading. . But at this time, the role of the slave library is to back up data, without separation of read and write, sharing the pressure of the master library.

Therefore, we also need to find a way to solve the problem of data inconsistency in master-slave synchronization when separating read and write, that is, to solve the problem of data replication between master and slave. If the data consistency is divided from weak to strong , there are the following three copy methods.

5.1 Full synchronous replication First of all, full synchronous replication means that after the master library executes a transaction, all slave libraries are required to complete the transaction before returning the processing results to the client; therefore, although full synchronous replication Data consistency is guaranteed, but when the main library completes a transaction, it needs to wait for all the slave libraries to complete, and the performance is relatively low.
As shown below:

5.2 Asynchronous replication Asynchronous replication means that when the main library submits a transaction, it will notify the binlog dump thread to send the binlog log to the slave library. Once the binlog dump thread sends the binlog log to the slave library, there is no need to wait until the slave library is also completed synchronously. Transaction, the main library will return the processing result to the client.

Because the master library only needs to execute the transaction itself, it can return the processing result to the client, without caring whether the slave library has completed the transaction, which may lead to short-term master-slave data inconsistency, such as the one just inserted in the master library. If the new data is queried from the database immediately, it may not be queried.

Moreover, after the main library submits the transaction, if the downtime hangs up, the binlog may not have time to synchronize to the slave library. At this time, if the master-slave node is restored in order to failover, there will be a problem of data loss, so asynchronous replication Although the performance is high, the data consistency is the weakest. MySQL master-slave replication, the default is the asynchronous replication replication strategy.

5.3 Semi-synchronous replication

After MySQL version 5.5, it supports semi-synchronous replication. The principle is that after the client submits the COMMIT, it does not directly return the result to the client, but waits for at least one slave to receive the Binlog, write it to the relay log, and then return it to the client. The advantage of this is to improve the consistency of data. Of course, compared with asynchronous replication, at least one more network connection delay is added, which reduces the efficiency of the main database writing.

In MySQL 5.7 version, a rpl_semi_sync_master_wait_for_slave_count parameter is also added. We can set the number of slave libraries that need to respond. The default is 1, which means that as long as one slave library responds, it can be returned to the client. If this parameter is increased, the strength of data consistency can be improved, but it will also increase the time for the main library to wait for the response from the slave library.

However, semi-synchronous replication also has the following problems:

The performance of semi-synchronous replication is lower than that of asynchronous replication. Compared with asynchronous replication, it does not need to wait for any response from the library to receive data, while semi-synchronous replication needs to wait for at least one slave library to confirm receipt of binlog. The response of the log is more lossy in performance.
The maximum length of time that the master library waits for the response from the slave library is configurable. If the configured time is exceeded, semi-synchronous replication will become asynchronous replication, and the problem of asynchronous replication will also appear.
In versions prior to MySQL 5.7.2, there was a phantom read problem with semi-synchronous replication.

When the main library successfully submits the transaction and is in the process of waiting for the confirmation from the slave library, at this time, the slave library has not had time to return the processing result to the client, but because the main library storage engine has already submitted the transaction, other clients can to read data from the main library.

However, if the main library suddenly hangs in the next second, and the next request comes at this time, because the main library hangs up, the request can only be switched to the slave library, because the slave library has not finished synchronizing data from the main library, so , of course, this data cannot be read from the library. Compared with the result of reading the data in the last second, the phenomenon of phantom reading is caused.

5.4 Enhanced Semi-Synchronous Replication Enhanced Semi-Synchronous Replication is an improvement made by MySQL 5.7.2 and later on semi-synchronous replication. The principle is almost the same, mainly to solve the problem of phantom reading.

After the master database is configured with the parameter rpl_semi_sync_master_wait_point = AFTER_SYNC, before the storage engine submits the transaction, the master database must receive confirmation of the completion of data synchronization from the slave database before submitting the transaction, so as to solve the phantom read problem. Refer to the image below:

6 Summary

Through the above content, we have learned about the master-slave synchronization of Mysql database. If your goal is only high concurrency of the database, you can first consider optimization from aspects such as SQL optimization, indexing, and Redis cache data, and then consider whether to use master-slave synchronization. way from the architecture.

In the configuration of the master-slave architecture, if we want to adopt the strategy of separation of read and write, we can write our own programs or implement them through third-party middleware.

The advantage of writing programs by ourselves is that we are more independent. We can judge which queries are executed on the slave library by ourselves. For high real-time requirements, we can also consider which queries can be executed on the main library. At the same time, the program directly connects to the database, which reduces the middleware layer and can reduce some performance losses.

The method of using middleware has obvious advantages, it is powerful and easy to use. However, due to the addition of a middleware layer between the client and the database, there will be some performance loss. At the same time, the price of commercial middleware is high, and there is a certain learning cost. In addition, we can also consider using some excellent open source tools, such as MaxScale. It is MySQL data middleware developed by MariaDB. For example, in the figure below, MaxScale is used as the proxy of the database, and the read-write separation is completed through routing forwarding. At the same time, we can also use the MHA tool as a strongly consistent master-slave switching tool to complete the high-availability architecture of MySQL.

MySql master-slave synchronization introduction

京东云开发者

引用和评论

JDK从8升级到21的问题集

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份

《SQL应用场景解析：如何通过SQL解决实际业务问题》

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统