Interpretation of PolarDB-X 2.0 global Binlog and backup and recovery capabilities

Introduction to PolarDB-X 2.0 provides a global Binlog capability for data island problems, which provides the downstream ecosystem with an incremental log consumption experience that is exactly the same as MySQL Binlog. Provides data recovery capabilities of different granularities such as instance level, table level, SQL level, and row level for data corruption, including consistent backup and recovery, table recycle bin, SQL flashback, Flashback Query, etc.

Conference Portal : https://yqh.aliyun.com/live/polardbx2021

background

As developers, we all know or are familiar with the back-end system. The back-end system can be abstracted into two components: one is the business system, which is responsible for processing the business logic of the system. In a modern architecture, this part is usually designed to be horizontally scalable. Stateless node; the other is the database system, which is responsible for storing the state of the system, which includes the core business data.
From the perspective of the database, the inflow of data includes two parts. One is the real-time writing of the business system, which is the main part of the core data source; the other is the one-time or periodic import of data from the upstream data system. Because these core data were generated here for the first time, this database is also called SSOT (Single Source of Truth) .

SSOT is the most important data asset in the back-end system, so there are two problems that need to be handled properly. The first problem is that, as the most important asset, we usually need to synchronize these data to other systems in real time for further processing such as BI analysis. If there is no such real-time synchronization mechanism, then this data will become a data island. The second problem is that this data may be damaged due to various reasons, such as data damage caused by hardware failure or software bug, data damage caused by improper operation, data disorder caused by incorrect SQL, etc. Various mechanisms are provided to ensure this. The security of the data is very necessary.

Global Binlog

PolarDB-X is a distributed database product that is highly compatible with the MySQL ecosystem, so let's first look at how MySQL solves the problem of data islands.

From the DB-Engines rankings, it can be seen that the popularity of MySQL is higher than the sum of other open source databases, which means that the ecology of MySQL is very prosperous. For example, the downstream systems of MySQL include Kafka, MySQL standby nodes, and Canal. Tools, Pulsar, etc. MySQL realizes real-time incremental data synchronization with downstream systems through the Binlog mechanism. Binlog can be regarded as a message queue. The detailed incremental change information in MySQL is stored in the queue in order. By consuming this queue, downstream systems or tools realize real-time data synchronization with MySQL. This mechanism is also called CDC ( Change Data Capture), that is, incremental data capture.

Distributed databases provide CDC capabilities with higher complexity than stand-alone computers. A distributed database usually contains multiple nodes, these nodes will generate multiple incremental log queues, then if the downstream consumption of multiple queues will involve several problems.

Because there are multiple queues, how to determine the order of change events in multiple queues when downstream consumption?
Distributed transaction changes may involve multiple queues. If you want to ensure the integrity of the transaction during consumption, how to find and merge the change events of the same transaction?
How does the downstream handle the system expansion and contraction (that is, the increase or decrease of the queue) correctly?
DDL involves multiple queues. How can the downstream accurately identify the position of each queue before and after the schema change and coordinate the consumption progress?

In the face of these problems, the CDC capabilities of distributed databases need to be trade-off in terms of implementation difficulty, support features, and ease of use. Generally speaking, it is easier to implement a solution to provide multiple queues for downstream, not guarantee transaction integrity, only provide eventual consistency, and provide incremental logs in a custom format. However, this solution will put higher demands on downstream consumption. Requirements, such as the development of corresponding consumer code or tools, the need to consider the coordination of multiple queues, etc. A more user-friendly way is to provide a CDC capability that is completely consistent with MySQL Binlog, so that downstream can consume incremental changes to distributed databases as transparently as consuming MySQL Binlog, thereby greatly reducing the construction of data synchronization links. Cost, this is also the solution adopted by PolarDB-X 2.0, which we call global Binlog.

PolarDB-X 2.0 uses a horizontally scalable Share-Nothing architecture. The basic unit of the system is a node (ie Node), and each node can be divided into two parts: a computing node (ie CN) and a data node (ie DN). . As shown in the figure above, in order to provide global Binlog capabilities, PolarDB-X 2.0 adds CDC components on this basis. CDC is a flexible cluster.

The global Binlog generation process can be divided into three stages:

The CDC component pulls its incremental log from each DN, which is the physical Binlog, and then performs operations such as single-queue sorting, internal event filtering, DDL-related shaping, etc., in order to provide a "clean" incremental event queue for the next stage At the same time, if the system is expanded or contracted, the CDC component will automatically sense and do related processing at this stage;
The CDC component merges all "clean" incremental event queues, during which events belonging to the same distributed transaction will be merged, and will be globally sorted according to the transaction timestamp, so as to obtain a globally ordered guarantee of transaction integrity At the same time, the position of DDL in the queue will be processed at this stage. After that, the CDC component will generate a file compatible with MySQL Binlog format from the queue, that is, the global Binlog file;
After the CN component receives the downstream subscription request for global Binlog, it will send the global Binlog to downstream consumers in accordance with the MySQL DUMP protocol.

After the above three stages, PolarDB-X 2.0 has realized the global Binlog capability that is fully compatible with MySQL Binlog. If you are interested in the detailed implementation principle, please pay attention to our column 160e3f86f66789 "PolarDB-X Knowledge Column" .

Backup and restore

For data corruption issues, PolarDB-X 2.0 provides different granular data recovery capabilities, including instance-level consistent backup and recovery capabilities, table-level table recycle bin capabilities, SQL-level SQL flashback capabilities, row-level Flashback Query capabilities, etc. . The characteristics and usage scenarios of these four capabilities are introduced below.

Consistent backup and recovery

First look at the consistency of backup and recovery. The feature of this capability is that it can provide historical data recovery capabilities at any point in time at the instance level, which can be accurate to the second level.
In a stand-alone database, it can be considered that the full data and incremental logs are stored on one machine. To achieve consistent backup and recovery, only the full data and incremental logs need to be backed up. If consistent backup and recovery are required in a distributed database, because the full amount of data and incremental logs are stored on multiple machines, there will be additional complexity in implementation.
In PolarDB-X 2.0, a full backup of all DNs + global Binlog is used to achieve consistent backup and recovery capabilities.
Take the above picture as an example. For example, we have a PolarDB-X 2.0 instance that is backed up every Monday, Tuesday and Friday at zero o’clock. One day there is a demand that needs to restore data to 14:25:26 on Sunday, then our system Will select the full backup set closest to the recovery point ---- that is the one at midnight on Friday, and replay the global Binlog from midnight on Friday until it ends at 14:25:26 on Sunday, so that we can Got the desired snapshot.
The consistent backup and recovery capability of PolarDB-X 2.0 does not lock the library during backup. This capability relies on the global Binlog, that is, the recoverable interval is the saving interval of the global Binlog. This capability currently has several limitations, such as the inability to expand or shrink during backup, and only support isomorphic recovery.

Table Recycle Bin

The second data recovery capability provided by PolarDB-X 2.0 is called table recycle bin. As the name implies, we will temporarily put the DROP table into a recycle bin. If it is found that the table needs to be restored within two hours, it can be retrieved in the recycle bin.
The table recycle bin provides complete management functions, such as viewing all the tables in the recycle bin, completely deleting a table, and restoring a table. The recycle bin currently only caches tables deleted within two hours, and does not support retrieving tables deleted through TRUNCATE TABLE.

SQL flashback (coming soon)

The third data recovery capability provided by PolarDB-X 2.0 is called SQL flashback. This ability can accurately restore a piece of data affected by a misoperation of SQL. This ability is also provided in PolarDB-X 1.0. Since its launch, this ability has helped many users who accidentally deleted data retrieve their data. It is a widely recognized data recovery ability. Below we use an example to introduce the specific use of this ability.
As shown in the figure above, at T1, we want to delete the record whose position is "Developer" and whose name is "Ralph", but we forgot to add "name='Ralph'" in the WHERE condition, causing the record whose name is "Mary" to be deleted Deleted together. The two delete events and the corresponding SQL ID will be recorded in the global Binlog.
At T2, we found the wrong deletion problem, and found the corresponding SQL and ID through the audit function of PolarDB-X.
At T3, we generated recovery SQL through SQL ID and SQL flashback capability. The principle of SQL flashback is that after obtaining the SQL ID, search in the global Binlog to find all the change events corresponding to the SQL (here are two delete events), and generate reverse recovery SQL one by one.
At T4, we will get the two pieces of data that were mistakenly deleted after the SQL execution was restored.
SQL Flashback can provide accurate data recovery capabilities for SQL misoperation scenarios. It can be seen that the recovery time interval depends on the global Binlog save interval.

Flashback Query (coming soon)

The fourth data recovery capability provided by PolarDB-X 2.0 is called Flashback Query. This capability can provide row-level data accurate recovery capability within a certain time frame. Below we still take the SQL misoperation scenario as an example.
As shown in the figure above, at T1, we want to update the record with the position "Developer" and the name "Ralph" to "CTO", but we forgot to add "name='Ralph'" in the WHERE condition, resulting in all positions being "Developer" The records of "have been updated to "CTO". These changes will be recorded in the undo log of version Vn+1 (undo log is a basic data structure in the database, which records the changes of each row of data in detail, which can be simply compared to GIT commit log).
At T2, we immediately discovered the error correction problem and determined the time of the error operation and the affected data range.
At T3, we used the Flashback Query capability to directly find the correct values of the two affected rows at T1.
At T4, we corrected the data based on the correct value returned by Flashback Query.
It can be seen that the Flashback Query capability depends on the saving time of undo log. Compared with SQL flashback, this capability can provide faster and accurate row-level recovery capabilities, but undo log is usually not as long as the global Binlog save time, so the recovery interval is weaker than SQL flashback.

to sum up

PolarDB-X 2.0 provides a global Binlog capability for the problem of data islands, which provides the downstream ecosystem with an incremental log consumption experience that is exactly the same as MySQL Binlog. Provides data recovery capabilities of different granularities such as instance level, table level, SQL level, and row level for data corruption, including consistent backup and recovery, table recycle bin, SQL flashback, Flashback Query, etc. PolarDB-X 2.0 continues to build more product capabilities, so stay tuned~

Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

Interpretation of PolarDB-X 2.0 global Binlog and backup and recovery capabilities

background

Global Binlog

Backup and restore

Consistent backup and recovery

Table Recycle Bin

SQL flashback (coming soon)

Flashback Query (coming soon)

to sum up

阿里云开发者

引用和评论

福利来了！计算巢支持在已经购买的 ECS 上搭建幻兽帕鲁服务器，支持图形化管理配置

【深度揭秘】Caffeine 缓存引发的内存泄漏全攻略：从根源到解决方案

2025年3月中国数据库排行榜：PolarDB夺魁傲群雄，GoldenDB晋位入三强

分析型数据库入门指南：如何选择适合你的实时分析工具？

【深度剖析】MySQL 索引下推：原理、案例与性能提升实战

用装饰器模式实现多层缓存：让PHP应用更快更稳

如何选择分析型数据库？企业级选型指南与 2025 趋势解读