Partial source code analysis of Redis 5.0

There was an old saying in the past that God will give great responsibility to the people, so we must look at Redis first.
The ancients also said before, the bright moonlight in front of the window, bow your head Redis.
The ancients also said that all the answers are in the source code.
Someone told me yesterday that it is more convenient to apply with Redis than Tair.

I don't know the real face of Mount Lu, only because I am in this mountain

Let's first give a big picture to see what the overall process of Redis AOF Rewrite is.

Let's take a look at the major components in the big picture

The main process and the sub-process, as we all know, Redis AOF Rewrite is done by creating a sub-process. An important feature of parent-child processes is "share-on-read, copy-on-write". We will discuss in detail later
Three channels used for communication between parent and child processes.
When the client writes to the Redis main process, the two data structures involved, aof_buf, aof_rewrite_buf_blocks; the aof_child_diff data structure involved in the child process.
One AOF file currently in use, which is an "active duty" file that is preparing to retire, and a "reserve" file that is being rewritten by the subprocess.
This is roughly the content involved. Next, we will see what happened in the chronological order of the execution of an AOF Rewrite.

Everything comes from the long wind

Let's take a look first, how is the Redis AOF Rewrite mechanism triggered?

There are two ways that everyone should be very clear. One is that a number of key-value pairs have changed within a certain period of time configured in the configuration file, and a rewrite needs to be triggered when the threshold is reached.
(It is added here that this check is checked during scheduling in the main thread of the Redis background, and this time is not very certain)

The so-called serverCron is this method, when to trigger, how to trigger, we will go back and elaborate

The other is the client, which requires the execution of bgrewriteaof.
The third type written here is only performed once when AOF is turned on. Generally, the startup of AOF is completed at startup.
But there is a special case here, that is when Redis has master and slave. When the slave service synchronizes data with the master service, the master service will generate an RDB file to the slave, and restore the data from the fake client to the memory. At this stage, the slave service needs to stop AOF (if it was originally enabled). After the master-slave synchronization data recovery is completed, AOF is automatically turned on, and an AOF rewrite will be performed here.

Mountain ride, water ride, fork child process

Whatever the reason, when it is determined to perform AOF Rewrite, the first thing to do is to perform a series of checks, and then fork a child process.

Where there is a red line is where the fork child process is.
The if statement is the code that the child process needs to execute, and the else is the main process.
Don't rush to study what the master and child processes do respectively, and look at the previous verification.
First, if there is an AOF rewriting subprocess or an RDB rewriting process, this rewriting will not be performed.
Second, if the pipe that creates the main child process fails, no rewriting is done.
After the child process is successfully created, the child process will immediately start rewriting the AOF file, while the main process will continue to provide external services, just to ensure that the AOF is not lost during the rewriting period, and will do a few more operations.
The pipelines created here are very important, there are three in total:

Tip: Since the fork child process will share memory with the main and child processes, the child process must know where the original data of the main process is stored.
This involves the operation of copying the page table of the original main process. This operation is blocking and will cause the fork operation to get stuck.
Therefore, when the traffic is large, it should be noted that AOF Rewrite will block Redis and increase RT.

Divided into two

child process

Let's see what the child process does:

Create a temporary file named temp-rewriteaof-{pid}, and initialize a reference such as a file handle.
Determine whether it is RDB mixed mode or pure AOF mode, and rewrite it. The difference between the two will not be repeated here.
The real rewriting is actually very simple, that is, read the contents of the db one by one, and then write it to the file in the corresponding format.
Due to the limitation of "share when reading, copy when writing" between the main and child processes, that is, if the two share a common content when reading, when someone wants to write the data, the original data will be copied. Modify the data of the new copy, the old one remains unchanged.
Redis takes advantage of this feature to ensure that the data read is the last version of the data before fork.
So far, it is actually the core logic of AOF Rewrite, and the rest of the logic is based on data changes during AOF Rewrite.
The entire rewrite is very CPU-intensive. Let's take a look at what the main process is doing while the child process is working overtime.
main process

After the main process fork the child process and handed over the ORK to the child process, it doesn't care about the child process. Occasionally check to see if the child process has completed its work (the communication pipe has new data/the child process has not disappeared).
Suppose that during the AOF rewriting process, a client sends a request to set a 1, which will change the original value of a from 0 to 1.
Due to the existence of "share-on-read, copy-on-write" of the main and child processes, there is no need to worry about the child process, it will read the old data.
After completing the memory data change, it will go to the following method. Let's take a closer look.

This method is in the aof.c file, and all operations that write aof follow this method.
It does a few things:
Redis has multiple dbs. If the db operated by the command is not the current db, a select db command will be inserted. Determined according to the ditcid parameter
Convert operations with expiration time to PEXPIREAT (EXPIRE/PEXPIRE/EXPIREAT/SETEX/PSETEX/SET EX seconds)
Convert the command of the operation to RESP format
Write AOF related cache Here is related to AOF Rewrite, which is step 4. Let's focus on what this step does:

First of all, to determine whether AOF is turned on, then obviously we have turned it on. We need to write this statement into the old AOF file. This is very reasonable. In case the rewriting fails, the data cannot be lost.
Secondly, if there is an aof subprocess pid, then one more step aofRewriteBufferAppend() needs to be done. What does this do?
It saves the statement just generated again in a structure of aof_rewrite_buf_blocks.
This aof_rewrite_buf_blocks is a list structure, which saves a block of 10M size. What is stored in the block is the statement just generated.
Then the method returns. This aof write operation ends.
The data of aof_rewrite_buf_blocks will wait for the created pipeline 1 to allow writing (the writing timing is guaranteed by other mechanisms, which is skipped here). data release.
Tip: It should be noted here that aof_buf and aof_rewrite_buf_blocks are two data structures, and the data inside is also two copies, not a common one.
Therefore, in the rewriting phase, data changes will cause the main process to store two copies of these data in memory, which is additional pressure on the main process.

The child process gets the first KR

We turn our attention back to the child process again.
At this point, it has completed the rewriting of the original data and got the first KR. Let's congratulate it~
Now we know that the data changed during Rewrite will be notified through the pipeline. So how is the child process handled?

Bit by bit, gather water into a river

In fact, the processing starts when the old data is rewritten in the "Child Process" chapter. The child process will occasionally read the pipe as it rewrites the old data. In the rdbSaveRio method

In the rewriteAppendOnlyFileRio method

The read data will be stored in the data structure of aof_child_diff.
In this way, the old data will be directly written to the aof rewrite file, and the changed data will be stored in the memory aof_child_diff, and the data order will not be chaotic. Finally, this part of the changed data can be uniformly written into the file to be rewritten by aof.

The most important thing is to manage upwards

After the child process completes the first KR and rewrites the old data, it starts the next work non-stop. Read changed data from pipe 1.

There are two points to pay attention to here. One is that the main process may be accepting new data all the time, which leads to the fact that there will always be data in channel 1, and the child process will read data unlimitedly, which is definitely unacceptable. Therefore, the limit It will only read 1s at most.
On the other hand, if there is no data all the time, there is no need to wait all the time. Everyone's time is precious. If there is no data for 20 milliseconds, then the child process will not read it.
The next step is to inform the main process that 80% of its work has been completed, and to manage it upwards.

The child process writes one to channel 2! number, notify the main process, please stop writing data to pipe 1

The main process accepts the invitation

After receiving the notification from channel 2, the main process sets aof_stop_sending_diff to 1, and the changed data still enters aof_rewrite_buf_blocks, but will not be written to channel 1, and all stay in its own memory.
The main process then writes a ! to channel 3, indicating that it stops writing data.

One step at the door

After the child process receives the notification, it finally reads all the data in channel 1, then flushes it to the disk, rewrites the aof rewrite file to temp-rewriteaof-bg-{pid}.aof, and then exits the process by itself .
The writing here refers to writing the data in the aof_child_buf in the memory into the file, and then releasing the memory as the process exits.

Main process: I'll get to the bottom of it!

At this point, the child process has completed its work and exited the process. After the main process finds that the child process no longer exists in serverCron, it calls the backgroundRewriteDoneHandler method to deal with the aftermath.
Write the data that still exists in aof_rewrite_buf_blocks into the aof rewrite file. (temp-rewriteaof-bg-{pid}.aof)
The aof file is set as a rewrite file, the rewrite aof is officially converted, and the old file is retired.

The road ends with white clouds, and spring and green streams grow

So far, a complete AOF Rewrite is over.
Looking at the whole process, we can see that the core focus is how to handle the data that changes during AOF Rewrite.
In order to ensure that this part of the data is correct, Redis 5.0 version uses a total of two memory structure storage (aof_rewrite_buf_blocks of the main process, aof_child_diff of the child process), two disk IO (the main process writes the old AOF file, and the child process writes the new AOF file) rewrite file), three communication pipes, and double the CPU overhead to complete.
Interested students can learn about the implementation of Redis 7.0, which has not been officially released, and Multi Part AOF. This version of the implementation perfectly solves the aforementioned redundancy problem.
Ok, that's it for today's sharing, thank you all.

Partial source code analysis of Redis 5.0

I don't know the real face of Mount Lu, only because I am in this mountain

Everything comes from the long wind

Mountain ride, water ride, fork child process