What problem does MySQL master-slave replication solve? How to solve the synchronization delay?

Problems solved by master-slave replication

Data distribution: distribute data to different geographical locations through replication
Load balancing: read and write separation and read load to multiple slave libraries
Backup: can be used as real-time backup
High availability: Use the master replicate achieve high availability

Copy principle

The principle of copying is actually very simple, only divided into the following three steps:

Record the data changes in the binary log on the main library. Specifically, the main library will record the data update event in the binary log every time before the transaction is ready to submit the transaction to complete the data update. Mysql will follow the order of transaction submission. Record the binary log. After the log is recorded, the main library notifies the storage engine to commit the transaction.
The slave library will start an IO thread, which will connect to the main library. The binlog dump thread on the main library will read the update events in the local binlog log file of the main library. Sent to the slave library, and after receiving the log from the library, it will be recorded to the local relay log relay-log.
SQL thread in the library reads the events in the relay log relay-log and replays them to the slave library. (Before version 5.6, the SQL thread was single-threaded, making the delay between master and slave greater)

Two ways to copy

What exactly is recorded in the log file? mysql supports two log formats, these two log formats also reflect their respective replication methods

Statement-based replication

Statement-based replication is equivalent to logical replication, that is, the binary log records the statements of the operation, and the replication is realized by replaying these statements in the slave library.

This method is simple, the binary log occupies less space, and the bandwidth is small and the transmission efficiency is higher. However, statement-based updates depend on other factors. For example, using the timestamp function to call the current time as the time value when inserting data can also cause problems, because the time value is inconsistent due to the delay between the master and the slave. There may also be problems with stored procedures and triggers.

So in the development, we should put the logic in the code layer as much as possible, not in mysql , it is not easy to expand.

Line-based copy

Row-based replication is equivalent to physical replication, that is, the binary log records each row of actual updated data. This leads to greater pressure for row replication, because the log occupies a large space and the transmission occupies a high bandwidth. But it is more accurate than statement-based replication, and can shield some inconsistencies caused by the differences between the master library and the slave library. Such as the timestamp function just mentioned.

Comparison of the two:

Statement replication
The transmission efficiency is high and the delay is reduced.
When updating a record that does not exist from the library, the statement assignment will not fail. The line replication will lead to failure, so that the inconsistency between the master and the slave can be found earlier.
Suppose there are one million pieces of data in the table, one SQL update all tables, statement-based replication only needs to send one sql, and row-based replication needs to send one million update records
Line copy
No query plan needs to be executed.
I don't know what statement is being executed.

For example, a statement to update the user's total points needs to count all the points of the user and then write it into the user table. If it is based on statement replication, the slave database needs to count the user's points again, and based on the line copy, the record is directly updated, without the need to count user points.

Because the two methods have their own advantages and disadvantages, mysql dynamically switches between these two replication modes. The default is a statement.

Configuration points

# 如果在双主复制结构中没有设置ID的话就会导致循环同步问题
server_id=1

# 即日志中记录的是语句还是行更新或者是混合
binlog_format=mixed

# 在进行n次事务提交以后，Mysql将执行一次fsync的磁盘同步指令。将缓冲区数据刷新到磁盘。
# 为0的话由Mysql自己控制频率。
sync_binlog=n

# 为0的话，log buffer将每秒一次地写入log file中并且刷新到磁盘。
# mysqld进程崩溃会丢失一秒内的所有事务。
# 为1的话，每次事务log buffer会写入log file并刷新到磁盘。（较为安全）
# 在崩溃的时候,仅会丢失一个事务。
# 为2的话，每次事务log buffer会写入log file，但一秒一次刷新到磁盘
innodb_flush_logs_at_trx_commit=0


# 阻止从库崩溃后自动启动复制，给一些时间来修复可能的问题，
# 崩溃后再自动复制可能会导致更多的问题。并且本身就是不一致的
skip_slave_start=1 


# 是否将从库同步的事件也记录到从库自身的bin-log中
# 允许备库将重放的事件也记录到自身的二进制日志中去，可以将备库当做另外一台主库的从库
log_slave_update 

# 日志过期删除时间，延迟严重的话会导致日志文件占用磁盘
expire_logs_days=7

The three parameters of innodb_flush_logs_at_trx_commit are easy to confuse. The following is a detailed analysis:

mysql先将日志写到log buffer缓冲区当中，再将log buffer缓冲区的数据写到log file日志文件中，此时写入的是内存中的log file，最终仍需操作系统将内存中的数据刷写到磁盘上。

参数0：mysql每秒都会将log buffer的数据写入到log file中并且刷新到磁盘。意味着mysql崩溃的时候将会丢失一秒内的所有事务。

参数1：每次事务提交都会将log buffer写入到log file并刷新到磁盘。意味着在mysql崩溃的时候,仅会丢失一个事务。

参数2：每次事务提交都会将log buffer写入到log file但不同时写入到磁盘，由mysql自行控制每秒将log file刷写到磁盘上，当mysql崩溃的时候操作系统没崩溃的时候，log_file中仅会丢失一个事务，操作系统仍会将log file刷写到磁盘，而如果操作系统也崩溃或断电的话，则会丢失一秒内的事务。

Recommended Use:

innodb_flush_logs_at_trx_commit=2
sync_binlog=500

Performance will be faster

innodb_flush_logs_at_trx_commit=1
sync_binlog=1

Relatively safe

Delay problem

Delayed generation

When the TPS concurrency of the main library is high, because the main library is written by multiple threads, and the SQL thread of the slave library is single-threaded, the SQL from the slave library may not be able to keep up with the processing speed of the main library (producers are more expensive than consumers). Faster, leading to accumulation of goods).

Delayed resolution

Network aspect: The slave libraries are distributed in the same local area network or in an environment with a small network delay.
In terms of hardware: configure better hardware from the library to improve the performance of random writes.
Configuration aspects:

Configure from library

sync_binlog=0
innodb_flush_log_at_trx_commit=2
logs-slave-updates=0
增大 innodb_buffer_pool_size

Let more operations Mysql memory, reduce disk operations. Or upgrade Mysql5.7 version to use parallel replication.

Architecture: For example, try to read and write to the main library in the transaction, and read in other non-transactions in the slave library. Eliminate some database inconsistencies caused by delays. Increase the cache to reduce the load of some slave libraries.

The author's personal experience, if there is any mistake, please comment on it.

Source: juejin.cn/post/6844903968259178504

What problem does MySQL master-slave replication solve? How to solve the synchronization delay?

Problems solved by master-slave replication

Copy principle

Two ways to copy

Statement-based replication

Line-based copy

Comparison of the two:

Configuration points

Delay problem

Delayed generation

Delayed resolution

民工哥

引用和评论

早知道有这么个吊炸天的 CI&CD 工具，我就不用 Jenkins 了！

70k star，取代Postman！这款轻量级API工具，太香了！

C++ 中 VS 项目引入公共配置文件

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

MySQL慢查询日志：性能优化的终极指南

大数据从业者必知必会的Hive SQL调优技巧

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份