PostgreSQL logical replication decryption

In today's digital age, we all agree that data creates value. In order to maximize the value of data, we are constantly building data migration pipelines, from homogeneous to heterogeneous, from relational to non-relational, from the cloud to the cloud, from the data warehouse to the data lake. The value of mining data in various scenarios. In this criss-cross data network, logical replication plays an extremely important role.

Let's pull our perspective from the complex network back to one of its endpoints, starting with PostgreSQL and deciphering the principles of its logical replication.

1 Concepts and Principles <br>Logical replication is a method of replicating data and its changes based on replication identifiers. Different from physical replication's description of page operations, logical replication is a description of transactions and data tuples.

Figure - WAL data flow example

As shown in the figure, the data flow of physical replication is to operate on blocks of tablespace/database/filenode files, while the content of logical replication is to describe tuples.

Next, let's look at several concepts in logical replication:

duplication slot

A replication slot is a set of information that records the status of replication. Since WAL (Write Ahead Log) files are deleted after the data is actually dropped, replication slots prevent premature cleanup of WAL logs required for logical replication parsing. In logical replication, where each slot streams a series of changes from a single database, creating a replication slot requires specifying the output plugin it uses, while creating a replication slot provides a snapshot.

output plugin

The output plugin is responsible for decoding the WAL log into a readable format. Common plugins use test_decoding (mostly used for testing), pgoutput (default), and wal2json (the output is json). PostgreSQL defines a series of callback functions. In addition to using the above plugins, we can write our own output plugins through callback functions.

Figure - Replication slot data flow

Copy Agreements and Messages

Through the replication protocol, we can get the WAL data stream from the source. For example through the PSQL tool it is suggested to copy the connection

psql "dbname=postgres replication=database"

Turn on streaming WAL

START_REPLICATION[ SLOT slot_name] [ PHYSICAL] XXX/XXX[ TIMELINE tli]

Whether it is physical replication or logical replication, using PostgreSQL's publish-subscribe or pg_basebackup to build streaming replication, it interacts with defined messages through replication protocols (physical replication and logical replication data stream content is different)

Figure - WAL data flow message types

Figure - XLogData message in logical replication

work process

When we understand the concept, let's take a look at the entire parsing workflow. Since the content of a transaction in the WAL file is not necessarily continuous, it needs to pass through Reorder and put it in the buffer, organize it into a message according to the transaction ID, send it to the output plugin after COMMIT, and send the message stream to the target after the output plugin parses it. end.

Figure - Logic Parsing Workflow

2 Problems and Evolution <br>When we grasped the principle of logical replication and planned to use it to build our data migration application, we still have some problems that have not been solved. Let's take a look at what the burning problem is and how we can deal with it.

Problem 1: Failover slot

For high availability, the database will have at least one primary and one standby architecture. When the primary database fails and performs high-availability switchover, the standby database does not have the corresponding replication slot information, that is, lacks the failover slot. This is because the physical file that saves the slot information is not synchronized to the standby database. So how do we manually create a faliover slot?

The main library creates a replication slot to check whether the wal files of the standby library are continuous
Copy the physical files containing slot information to the standby database, under the pg_repslot directory
After the standby database is restarted, the replication slot information can be seen after the restart. The reason is that the function StartupReplicationSlots, which reads the physical file of the slot, is only called when the postmaster process starts.
Query the slot status of the primary database regularly, and use the pg_replication_slot_advance function to advance the replication slot of the standby database

Since then, we have also had corresponding information on the standby database, and manually implemented the failover slot. The well-known high-availability software Patroni in the PostgreSQL ecosystem is also implemented in this way. The difference is that when Patroni queries the slot status of the main database, the information is written into the DCS, and the standby database gets the location information in the DCS for advancement.

Problem 2: DDL synchronization

Native logical replication does not support parsing DDL statements, we can use event triggers to process them.

Use event triggers to sense table structure changes, record them in the DDL_RECORD table, and publish the table through logical replication.
After the receiving end obtains the data changes of the table, it can be processed as the corresponding DDL statement for execution.

Figure - Event trigger to achieve DDL synchronization

Problem 3: Two-way synchronization

When data migration involves two-way synchronization pipelines, for example, if you want to implement dual-master and dual-write operations, and operate on the same object in the database, a WAL loop will occur.

Figure - Two-way synchronization of the same table results in a data loop

In order to solve this problem, some DTS applications will create an auxiliary table, first operate the auxiliary table in the transaction, and know that the record is inserted by the DTS application by parsing the operation on the auxiliary table, so as to filter the transaction and no longer cyclic analysis . PostgreSQL provides Origin records for transactions, without auxiliary tables, the Origin ID can be specified through the pg_replication_origin_session_setup function or the replorigin_create in publish and subscribe.

After specifying the Origin ID, in addition to filtering through the DTS application after parsing, we can also filter during the parsing process through the FilterByOriginCB callback function in the parsing plugin, which reduces data transmission and is more efficient.

Figure - OriginFilter function DEMO in test_decoding

other problems:

In addition to the above three issues, there are some usage issues or limitations. Some are listed here, not expanded, but briefly explained.

Toast processing: For the toast value (which can be judged in the message format), we generally use placeholders for processing, and the receiving end will not process this column after receiving the placeholder. Although it is a bit troublesome, it is also in the process of and transmission. The result of the trade-off in the scheme of the toast value.

Heartbeat table: Since the XMIN recorded by the replication slot is global, when the table we publish has not been updated, XMIN is not advanced and WAL is backlogged. We can create a heartbeat table, write data periodically and publish it, so that XMIN can advance .

Large transaction delay: According to the workflow mentioned above, we can know that the default transaction will not be parsed until COMMIT, which will inevitably lead to delays for large transactions. The PG14 version provides streaming mode for parsing, that is, the transaction is parsed in progress and sent to the receiver.

3 Application and Practice <br>In the first two sections, we have decrypted PostgreSQL from the perspective of principles and problems. Next, we will look at how to apply and practice data migration through the logical replication principle we have mastered.

Full and incremental synchronization

In real data migration scenarios, most of them are scenarios in which full and incremental data must be synchronized, and after we have opened up the data transmission channel, we will also expand the security, efficiency, and functions of this channel, such as cleaning, ETL capabilities such as desensitization raise new requirements. Let's first take a look at how to achieve full and incremental synchronization.

Figure - Schematic diagram of data flow

The main processes include:

Create replication slots and export snapshots
Full data migration based on snapshots
Incremental data migration based on replication slots

We use the PG database or the message queue MQ as the data agent. Full and incremental analysis can be performed at the same time. When the full amount of data is processed, the state machine notifies the incremental handler for incremental publishing. And for the data in the proxy, it can be preprocessed after parsing.

Practice of migrating self-built instances to the cloud

Finally, I will share with you a practice of migrating a self-built instance to the cloud. This case is to migrate the self-built PG10 version instance to the RDS PG 11 version on JD Cloud. Data security is ensured through the return of incremental data and data verification. Switch smoothly with business.

Figure - Data Migration to the Cloud

DTS application is mainly divided into the following stages:

Data Check Phase: Check Primary Keys, Permissions, Configurations
Data migration stage: structure, inventory, incremental data migration, monitoring migration status
Application migration stage: switch domain names and introduce traffic
Rollback phase: Incremental data reflow, if there is a problem, it can be rolled back quickly.

Author: Jiang Shuai

PostgreSQL logical replication decryption

京东云开发者

引用和评论

提高IT运维效率，深度解读京东云AIOps落地实践（异常检测篇）

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

在 Kubernetes 上用 KubeBlocks + Dify 快速构建生产级 AIGC 应用

数据库的下一场革命：S3 延迟已降至原先的 10%，云数据库架构该进化了

Ape-DTS：开源 DTS 工具，助力自建 MySQL、PostgreSQL 迁移上云

好用的开源埋点方案-ClkLog埋点用户分析系统

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场