Graphic Redis丨This is the RDB snapshot, which can record the actual data

Abstract: The is to record a certain moment. For example, when we take a picture of a landscape, the picture and information of that moment are recorded in a photo. RDB snapshot is to record the memory data of a certain moment, and the actual data is recorded.

This article is shared from the HUAWEI CLOUD community " Illustrated Redis | Not much to say, this is the RDB snapshot ", the original author: Xiaolin coding.

Although Redis is an in-memory database, it provides two technologies for data persistence. They are "AOF log and RDB snapshot".

Both of these technologies use a log file to record information, but the contents of the records are different.

The content of the AOF file is the operation command;
The content of the RDB file is binary data.

Today I will mainly talk about the RDB snapshot .

The so-called snapshot is to record a certain moment. For example, when we take a picture of a landscape, the picture and information of that moment are recorded in a photo.

Therefore, the RDB snapshot is to record the memory data at a certain moment, and the actual data is recorded, while the AOF file records the log of the command operation, not the actual data.

Therefore, when Redis restores data, the efficiency of RDB to restore data is higher than that of AOF, because RDB files can be read directly into memory, and there is no need to perform additional steps of operation commands like AOF to restore data.

Next, let's talk about RDB snapshots in detail.

How to use snapshots?

To be familiar with something, it is better to see how to use it first.

Redis provides two commands to generate RDB files, save and bgsave. The difference between them lies in whether they are executed in the "main thread":

After the save command is executed, the RDB file will be generated in the main thread. Because it is in the same thread as the execution command, if the time to write the RDB file is too long, will block the main thread ;
After executing the bgsava command, a child process will be created to generate the RDB file, so that can avoid the blocking of the main thread ;

The loading of RDB files is automatically executed when the server starts, and Redis does not provide commands specifically for loading RDB files.

Redis can also use the options of the configuration file to automatically execute the bgsava command at regular intervals. The following configuration is provided by default:

save 900 1
save 300 10
save 60 10000

Regardless of the option named sava, it actually executes the bgsava command, that is, a child process is created to generate the RDB snapshot file.

As long as any one of the above conditions is met, bgsava will be executed. Their meanings are:

Within 900 seconds, the database has been modified at least once;
Within 300 seconds, the database has been modified at least 10 times;
Within 60 seconds, at least 10,000 changes were made to the database.
One thing to mention here is that the Redis snapshot is full snapshot , which means that every time a snapshot is executed, "all data" in the memory is recorded to the disk.

Therefore, it can be considered that performing snapshots is a relatively heavy operation. If the frequency is too frequent, it may affect the performance of Redis. If the frequency is too low, more data will be lost when the server fails.

It is usually possible to set at least 5 minutes to save a snapshot. At this time, if Redis goes down, it means that up to 5 minutes of data may be lost.

This is the disadvantage of RDB snapshots. When the server fails, more data will be lost than AOF persistence. Because RDB snapshots are full snapshots, they should not be executed too frequently, otherwise it will affect Redis performance. The AOF log can record operation commands in seconds, so there are relatively fewer lost data.

Can the data be modified when the snapshot is executed?

The problem is that during the execution of bgsava, the main thread can continue to work because it is handed over to the child process to build the RDB file. Can the main thread modify the data at this time?

If the data cannot be modified, then the performance will be reduced a lot. If the data can be modified, how can it be done?

Just say your conclusion, execution bgsava process, Redis still can continue processing operations command , that is, data that can be modified.

How can it be done? The key technology is Copy-On-Write (COW) .

When the bgsava command is executed, the child process will be created through fork(). At this time, the child process and the parent process share the same piece of memory data, because when the child process is created, the page table of the parent process is copied, but the page table points to The physical memory is still one.

Only when the memory data is modified, the physical memory will be copied.

The purpose of this is to reduce the performance loss when creating the child process, thereby speeding up the creation of the child process, after all, the process of creating the child process will block the main thread.

Therefore, after creating the bgsave child process, since all the memory data of the parent process is shared, you can directly read the memory data in the main thread and write the data to the RDB file.

When the main thread also performs read-only operations on these shared memory data, then the main thread and the bgsave child process do not affect each other.

However, if the main thread wants to modify a piece of data (such as key-value pair A) in the shared data, copy-on-write will occur, so the physical memory of this piece of data will be copied (key-value pair) A') , and then on this data copy (key-value pair A'). At the same time, the bgsave child process can continue to write the original data (key-value pair A) to the RDB file.

That's it. Redis uses bgsave to take a snapshot of all the data in the current memory. This operation is completed by the bgsave child process in the background. The main thread is not blocked during execution, which allows the main thread to modify the data at the same time.

Careful students must have discovered that during the bgsave snapshot process, if the main thread modifies the shared data, copy-on-write occurs, the RDB snapshot saves the original memory data , and the data just modified by the main thread is The way to write the RDB file at this time can only be handed over to the next bgsave snapshot.

So when Redis uses bgsave snapshot process, if the main thread modifies the memory data, regardless of whether it is shared memory data, the RDB snapshot cannot write the data just modified by the main thread, because at this time the memory data of the main thread and the memory of the child threads The data has been separated, and the memory data written by the child thread to the RDB file can only be the original memory data.

If the system crashes just after the RDB snapshot file is created, Redis will lose the data modified by the main thread during the snapshot.

In addition, such an extreme situation occurs when copy-on-write.

During Redis's RDB persistence, when just fork, the main process and the child process share the same physical memory, but the main process processed the write operation and modified the shared memory on the way, so the physical memory of the currently modified data will be copied. .

In the extreme case, is modified, the memory usage at this time is twice the original.

Therefore, for scenarios with many write operations, we must pay attention to the changes in the memory during the snapshot process to prevent the memory from being full.

RDB and AOF combined

Although RDB is faster than AOF in data recovery, the frequency of snapshots is not easy to grasp:

If the frequency is too low, once the server goes down between the two snapshots, more data may be lost;
If the frequency is too high, frequent writes to disk and creation of child processes will bring additional performance overhead.

Is there any method that not only has the advantage of fast RDB recovery, but also has the advantage of AOF with less data loss?

Of course there is, that is, the combined use of RDB and AOF. This method was proposed in Redis 4.0, This method is called mixed use of AOF logs and memory snapshots , also called hybrid persistence.

If you want to enable the hybrid persistence function, you can set the following configuration item to yes in the Redis configuration file:

aof-use-rdb-preamble yes
Hybrid persistence works in the AOF log rewriting process.

When hybrid persistence is turned on, when the AOF rewrites the log, the rewrite subprocess of fork will first write the memory data shared with the main thread to the AOF file in RDB mode, and then the operation commands processed by the main thread will be Recorded in the rewrite buffer, the incremental commands in the rewrite buffer will be written to the AOF file in AOF mode. After the writing is completed, the main process is notified to replace the old AOF file with the new RDB format and AOF format. AOF file.

In other words, using hybrid persistence, the AOF file is the full data in RDB format, and the second half is incremental data in AOF format.

The advantage of this is that when restarting Redis to load data, since the first half is RDB content, will be loaded very quickly when loaded like this.

After loading the RDB content, the second half of the AOF content will be loaded. The content here is the operation command processed by the main thread during the Redis background subprocess rewriting the AOF, which can make data less lost.

Click to follow and learn about Huawei Cloud's fresh technology for the first time~

Graphic Redis丨This is the RDB snapshot, which can record the actual data

How to use snapshots?

Can the data be modified when the snapshot is executed?

RDB and AOF combined

华为云开发者联盟

引用和评论

华为云开发者联盟入选 2023 中国技术品牌影响力企业榜，深耕开发者生态

嘎嘎好用！推荐三款开源的 Redis 桌面客户端！

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

在 Kubernetes 上用 KubeBlocks + Dify 快速构建生产级 AIGC 应用

如何实现页面广告随时上下线、过期自动下线及到时自动上线

数据库的下一场革命：S3 延迟已降至原先的 10%，云数据库架构该进化了

Ape-DTS：开源 DTS 工具，助力自建 MySQL、PostgreSQL 迁移上云