3000-frame animation illustrates why MySQL needs binlog, redo log and undo log

The full text is built on the basis that the storage engine of MySQL is InnoDB

Let's first look at how a SQL is stored in the database:

This is a very simple update SQL. The SQL is received from the MySQL server to the disk, and it has passed through the MySQL Server layer and the InnoDB storage engine.

The Server layer is like a product manager, analyzing customer needs and giving solutions to achieve them.
InnoDB is like a grass-roots programmer, implementing the specific solutions given by the product manager.

In the process of MySQL "analyzing requirements and implementing solutions", it is also mixed with memory operations and disk operations, as well as recording various logs.

What are they good for? How did they fit together? Why should MySQL be layered? What is the Buffer Pool in InnoDB?

We analyze slowly.

Hierarchical structure

Why is MySQL divided into server layer and storage engine layer?

There is no official answer to this question, but it is not difficult to guess. In short, it is for "decoupling".

The server layer and the storage engine have their own responsibilities, and the division of labor is clear. Users can use the appropriate storage engine according to different needs. What a good design, right?

Later development also verified the superiority of "layered design": MySQL's original storage engine was the self-developed ISAM, the predecessor of MyISAM that only supports simple queries. Later, it cooperated with Sleepycat to develop the Berkeley DB engine to support transactions. In the process of continuously upgrading its storage engine, MySQL encountered InnoDB, which was born out of nowhere. The powerful functions of InnoDB put MySQL under pressure.

What if my own storage engine can't beat InnoDB?

Join if you can't beat it!

MySQL chose to cooperate with InnoDB. It is precisely because of the plug-in design of the MySQL storage engine that the two companies cooperated very smoothly. MySQL also released the 4.0 version that officially supports nnoDB and the classic 4.1 version soon after the cooperation.

The MySQL merger and acquisition model has also become an important factor in MySQL's prosperity. This allows MySQL to remain extremely competitive for a long time. Today, MySQL still occupies a very high database market share, second only to the ace database Oracle.

Buffer Pool

In InnoDB, there is a very important structure - Buffer Pool.

What is Buffer Pool?

Buffer Pool is a memory space used to cache MySQL disk data.

Why cache MySQL disk data?

Let's illustrate through an example. Let's first assume that there is no Buffer Pool, there is only one record in the user table, and the record's age = 1. Suppose that three SQLs need to be executed:

Transaction A: update user set age = 2
Transaction B: update user set age = 3
Transaction C: update user set age = 4

If there is no Buffer Pool, the execution is like this:

As can be seen from the figure, each update needs to get data from the disk (1 IO), and after the modification, it needs to be brushed to the disk (1 IO), that is, each update requires 2 disk IOs. Three updates require 6 disk IOs.

With Buffer Pool, the execution becomes like this:

As can be seen from the figure, it is only necessary to get the data from the disk to the Buffer Pool in the first execution (1 IO), and after the third execution, flush the data back to the disk (1 IO), the whole process only needs 2 disk IOs saves 4 disk IOs compared to no Buffer Pool.

Of course, the real operation process of Buffer Pool is not so simple. There are many specific implementation details and optimization techniques. Due to the limited space, this article will not describe in detail.

What I want to express is: Buffer Pool converts disk IO into memory operations, saving time and improving efficiency.

Buffer Pool improves efficiency, but there is a problem. Buffer Pool is based on memory, and as long as the power is turned off, all the data in the memory will be lost.

If the data of the Buffer Pool has not been flushed to the disk when the power is turned off, will the data be lost?

Still in the above example, if the three transactions are completed, when age = 4 has not been flushed to the disk, the power is suddenly cut off, and all the data is lost:

Just imagine, if these lost data are the core user transaction data, can users accept it?

the answer is negative.

So how does InnoDB keep data from being lost?

Today's first log - redo log debut.

restore - redo log

As the name suggests, redo means redo, and redo log means redo log.

How does redo log ensure that data will not be lost?

That is, after the modification, the modified value is first recorded in the redo log on the disk. Even if the power is suddenly cut off, all the data in the Buffer Pool will be lost, and the Buffer Pool can be restored according to the redo log when a call is made. The memory efficiency of Buffer Pool also ensures that data will not be lost.

Let's illustrate with an example. Let's first assume that there is no Buffer Pool, there is only one record in the user table, and the record's age = 1. Suppose that an SQL needs to be executed:

Transaction A: update user set age = 2

The execution process is as follows:

As shown in the figure above, after having redo log, after changing age to 2, write age = 2 to redo log immediately. If the power is suddenly cut off at this time, the memory data is lost, and the data in redo log can be read when the call is made. Come out to restore the data, in this way to ensure that the data will not be lost.

You may ask, the redo log file is also on the disk, and the data file is also on the disk, all of which are disk operations, why bother? Why not write the modified data directly to the data file?

Fool, because redo log is written sequentially to disk, and data flushing is random write to disk, and sequential write to disk is much more efficient than random write.

This mechanism of writing the log in advance and then flushing the data to the disk has a high-level professional term - WAL (Write-ahead logging), which is translated into Chinese as the write-ahead log.

Although disk sequential writes are already very efficient, there is still a certain gap with memory operations.

So, is there a way to further optimize it?

The answer is yes. That is to add a memory buffer to the redo log, that is, the redo log buffer, and use this nesting doll-style method to further improve the efficiency.

How exactly does the redo log buffer work with the flush?

Before this question, let's take a look at the relationship between the MySQL server and the operating system:

The MySQL server is a process that runs on top of the operating system. That is to say, if the operating system hangs up, MySQL must hang up, but if MySQL hangs up, the operating system does not necessarily hang up.

So MySQL hangs in two cases:

MySQL hangs, and the operating system also hangs, which is often referred to as server downtime. In this case, all the data in the Buffer Pool will be lost, and the data in the OS cache of the operating system will also be lost.
MySQL hangs, the operating system does not hang. In this case, all the data in the Buffer Pool will be lost, and the data in the OS cache of the operating system will not be lost.

OK, after understanding the relationship between the MySQL server and the operating system, let's look at the redo log placement mechanism. The flushing mechanism of redo log is controlled by the parameter innodb_flush_log_at_trx_commit, which has 3 values that can be set:

innodb_flush_log_at_trx_commit = 1: real-time write, real-time flush
innodb_flush_log_at_trx_commit = 0: delayed write, delayed flush
innodb_flush_log_at_trx_commit = 2: real-time write, delayed flush

Writing can be understood as writing to the cache of the operating system (os cache), and brushing can be understood as brushing the cache in the operating system to disk.

The differences between these three strategies are discussed separately:

innodb_flush_log_at_trx_commit = 1: real-time write, real-time flush

This strategy will flush the data from the redo log to the disk every time before each transaction is committed. In theory, as long as there is no problem with the disk, the data will not be lost.

In conclusion, this strategy is the least efficient, but also has the lowest risk of losing data.

innodb_flush_log_at_trx_commit = 0: delayed write, delayed flush

This strategy will only write data to the redo log buffer when the transaction is committed, and then let the background thread periodically flush the data in the redo log buffer to the disk.

This strategy is the most efficient, but we all know that there are gaps in scheduled tasks, but if the background thread does not have time to flush the redo log to disk after the transaction is committed, at this time, whether the MySQL process hangs or the operating system hangs , this part of the data will be lost.

In conclusion, this strategy is the most efficient and has the highest risk of losing data.

innodb_flush_log_at_trx_commit = 2: real-time write, delayed flush

This strategy will write the redo log to the os cache before the transaction is committed, but will not flush the redo log to disk in real time, but will perform a flush disk operation every second.

In this case, if the MySQL process hangs and the operating system does not hang, the operating system will still flush the os cache to the disk, and the data will not be lost, as shown in the following figure:

However, if the server where MySQL is located hangs, that is, the operating system hangs, then the os cache will also be emptied, and the data will still be lost. As shown below:

Therefore, this redo log flushing strategy is a compromise strategy between the above two strategies, with high efficiency and low risk of data loss. This strategy is recommended in most cases.

To sum up, the function of redo log is to restore data. The process of writing redo log is disk sequential writing. There are three flushing strategies, which are controlled by the innodb_flush_log_at_trx_commit parameter. It is recommended to set it to 2.

rollback - undo log

We all know that InnoDB supports transactions, and transactions can be rolled back.

If a transaction changes age=1 to age=2, the background thread has flushed age=2 to the disk before the transaction is committed. At this time, regardless of whether it is in memory or on disk, the age has become 2. If the transaction is to be rolled back, the age=1 before the modification cannot be found and cannot be rolled back.

then what should we do?

It is very simple, save the age=1 before the modification, and roll back according to the saved age=1 when rolling back.

MySQL does exactly that! This process of recording data before modification is called undo log. The translation of undo into Chinese means revocation and rollback. The main function of undo log is to roll back data.

How to roll back? Look at the picture below:

Before changing age = 1 to age = 2, MySQL first stores age = 1 in the undo log, so that when you need to roll back, you can read out age = 1 in the undo log and roll back.

It should be noted that the undo log exists in the global tablespace by default. You can simply understand that the undo log is also recorded in a MySQL table. Inserting an undo log is similar to inserting a normal data. That is to say, the redo log is also written in the process of writing the undo log.

archive - binlog

The undo log records the data before the modification and provides the ability to roll back.

The redo log records the modified data and provides the capability of crash recovery.

What does binlog do?

Binlog records the modified data for archiving.

Similar to the redo log log, binlog also has its own flushing strategy, which is controlled by the sync_binlog parameter:

sync_binlog = 0: write binlog to os cache before committing a transaction, and the operating system controls when it is flushed to disk
sync_binlog = 1: Write binlog by synchronously writing to disk, without using os cache to write binlog
sync_binlog = N : After every n transaction commits, call fsync once to force the binlog in the os cache to the disk

So the question is, binlog and redo log are both recorded modified values, what is the difference between the two? Why do you need binlog when there is redo log?

First look at some of the differences between the two:

binlog is a logical log, which records what modification is made to which row of which table; redo log is a physical log, which records which modification is made to which record in which data page, if you do not know the data page, You can understand which data on the disk has been modified.
Binlog is an additional write; redo log is a circular write. The log file has a fixed size and will overwrite the previous data.
binlog is the log of the server layer; redo log is the log of InnoDB. If you do not use the InnoDB engine, there is no redo log.

But to be honest, I think these differences are not the reason why redo log cannot replace binlog. MySQL officials can completely adjust redo log to allow him to merge with binlog, but he did not do so, why?

I think the biggest reason not to replace binlog with redo log is "unnecessary".

Why do you say that?

The first point is that the ecology of binlog has been established. MySQL's high availability mainly relies on binlog replication, and many companies' data analysis systems and data processing systems also rely on binlog. Replacing binlog to change an ecosystem is thankless.

The second point, binlog is not the bottleneck of MySQL, it is unnecessary to spend time where there is no bottleneck.

Summarize

in conclusion:

Buffer Pool is a memory space managed by the MySQL process, which reduces the number of disk IOs.
The redo log is a log of the InnoDB storage engine. It is mainly used for crash recovery. There are three flushing strategies, which are controlled by the innodb_flush_log_at_trx_commit parameter. It is recommended to set it to 2.
The undo log is a log of the InnoDB storage engine, and its main function is to roll back.
Binlog is a log of the MySQL Server layer, and its main function is to archive.
There are two situations in which MySQL hangs: the operating system hangs and the MySQL process hangs; the operating system does not hang, but the MySQL process hangs.

Finally, use a picture to summarize the knowledge points of the full text:

write at the end

This article was written a year ago. I thought it was a hydrology article that I didn’t want to post. Recently, I was bored and revised it and posted it. I hope I can help friends who are not well-founded in MySQL in the form of animations. God just ignore it. .

It should be emphasized that due to the limited level of the author, this article only briefly explains the general functions of several MySQL logs from scratch. Many details are omitted in the process, such as the implementation details of Buffer Pool, such as undo log and MVCC. Relationships, such as the existence of binlog buffer and change buffer, such as two-phase commit of redo log.

If you have any questions, we can discuss, and if you find mistakes in the text, I hope you can point them out, thank you very much!

Alright, that's it for today's article.

Thanks for reading! I'm CoderW, see you next time.

Finally, welcome to pay attention to my public account "CoderW" to discuss progress together~~~~

References

"MySQL Practical 45 Lectures"
"Understanding MySQL from the Root"
"MySQL Technology Insider - InnoDB Storage Engine" 2nd Edition

3000-frame animation illustrates why MySQL needs binlog, redo log and undo log

Hierarchical structure

Buffer Pool

restore - redo log

innodb_flush_log_at_trx_commit = 1: real-time write, real-time flush

innodb_flush_log_at_trx_commit = 0: delayed write, delayed flush

innodb_flush_log_at_trx_commit = 2: real-time write, delayed flush

rollback - undo log

archive - binlog

Summarize

write at the end

References

CoderW

引用和评论

从扫码登录的原理分析QQ大量被盗事件，到底是腾讯的问题还是用户的问题？

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

大数据从业者必知必会的Hive SQL调优技巧

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份

《SQL应用场景解析：如何通过SQL解决实际业务问题》

【成功解决】JetBrains PyCharm 激活提示 “Key is invalid” (秘钥无效) 的终极解决方案