background

TiDB's one-click horizontal scaling feature helps users say goodbye to the complexity of query and operation and maintenance of sub-database and sub-table, but in the process of switching from the sub-database and sub-table solution to TiDB, this complexity is transferred to the data migration process . The TiDB DM tool provides users with the function of merging and migrating sub-databases and sub-tables.

This article will introduce Sync, the core processing unit of DM, which includes logic for binlog reading, filtering, routing, conversion, optimization, and execution. This article only describes the processing logic of DML. For DDL related content, please refer to "Introduction to DM Sub-Database and Sub-Table DDL "Optimistic Coordination" Mode" and "DM Sub-Database Sub-Table DDL "Pessimistic Coordination" Mode Introduction" .

Process flow

1.png

From the above figure, you can roughly understand the logical processing flow of Binlog replication

1. Read binlog events from MySQL/MariaDB or relay log

2. Process and convert binlog events

  • Binlog Filter: Filter binlog according to binlog expression, configured by filters
  • Routing: Convert the name of "library/table" according to the "library/table" routing rule, and configure it through routes
  • Expression Filter: Filter binlog according to SQL expression, configured by expression-filter

3. Optimize DML execution

  • Compactor: Combine multiple operations on the same record (with the same primary key) into one operation, which is enabled by syncer.compact
  • Causality: Perform conflict detection on different records (different primary keys) and distribute them to different groups for concurrent processing
  • Merger: Merge multiple binlogs into one DML, open via syncer.multiple-rows

4. Execute DML downstream

5. Regularly save binlog position/gtid to checkpoint

optimization logic

Compactor

According to the upstream binlog records, the DM captures the changes of the records and synchronizes them to the downstream. When the upstream makes multiple changes to the same record in a short period of time (insert/update/delete), the DM can compress these changes into one change through the Compactor. Reduce downstream pressure and increase throughput, such as

 INSERT + UPDATE => INSERT
INSERT + DELETE => DELETE
UPDATE + UPDATE => UPDATE
UPDATE + DELETE => DELETE
DELETE + INSERT => UPDATE

Causality

The MySQL binlog sequential synchronization model requires that binlog events be synchronized one by one according to the binlog order. Such sequential synchronization is bound to fail to meet the synchronization requirements of high QPS and low synchronization delay, and not all operations involved in binlog conflict.

DM uses a conflict detection mechanism to identify binlogs that need to be executed sequentially. On the basis of ensuring the sequential execution of these binlogs, the concurrent execution of other binlogs is maintained to the greatest extent to meet performance requirements.

Causality uses a union-like algorithm to classify each DML and group related DMLs into groups. For specific algorithms, please refer to TiDB Binlog source code reading series of articles (8) Loader Package introduction #Parallel execution of DML

Merger

MySQL binlog protocol, each binlog corresponds to a row of data change operations, DM can merge multiple binlogs into one DML through Merger and execute it downstream, reducing network interaction, such as

 INSERT tb(a,b) VALUES(1,1);
+ INSERT tb(a,b) VALUES(2,2);
= INSERT tb(a,b) VALUES(1,1),(2,2);

  UPDATE tb SET a=1, b=1 WHERE a=1;
+ UPDATE tb SET a=2, b=2 WHERE a=2;
= INSERT tb(a,b) VALUES(1,1),(2,2) ON DUPLICATE UPDATE a=VALUES(a), b=VALUES(b)

  DELETE tb WHERE a=1
+ DELETE tb WHERE a=2
= DELETE tb WHERE (a) IN (1),(2);

execute logic

DML generation

DM embeds a schema tracker to record upstream and downstream schema information. When the DDL is received, the DM updates the table structure of the internal schema tracker. When receiving DML, DM generates corresponding DML according to the table structure of schema tracker. The specific logic is as follows:

  1. When starting a full and incremental task, Sync uses the table structure dumped during the upstream full synchronization as the upstream initial table structure
  2. When starting an incremental task, since MySQL binlog does not record table structure information, Sync uses the table structure of the downstream corresponding table as the upstream initial table structure
  3. Because the user's upstream and downstream table structures may be inconsistent, such as the downstream has more columns than the upstream, or the upstream and downstream primary keys are inconsistent, in order to ensure the correctness of data synchronization, DM records the primary key and unique key information of the downstream corresponding table
  4. When generating DML, DM uses the upstream table structure recorded in the schema tracker to generate the columns of the DML statement, uses the column values recorded in the binlog to generate the column values of the DML statement, and uses the downstream primary key/unique key recorded in the schema tracker to generate the columns in the DML statement. WHERE condition. When the table structure has no unique key, DM will use all column values recorded in binlog as the WHERE condition.

Worker Count

From the above, we know that Causality can divide binlog into multiple groups and execute them concurrently downstream through the conflict detection algorithm. DM controls the number of concurrency by setting worker-count. When the CPU usage of downstream TiDB is not high, increasing the number of concurrency can effectively improve the throughput of data synchronization. Configured via syncer.worker-count

Batch

The DM saves multiple DMLs into one transaction and executes them downstream. When the DML Worker receives DML, it adds it to the cache. When the number of DMLs in the cache reaches a predetermined threshold, or when the DML has not been received for a long time, it will be added to the cache. DML in the cache is executed downstream. Configured via syncer.batch

Checkpoint

From the above flowchart, we can see that DML execution and checkpoint updates are not atomic. In DM, checkpoint is updated every 30s by default. At the same time, due to the existence of multiple DML worker processes, the checkpoint process calculates the binlog location with the latest synchronization progress of all DML workers, and uses this location as the current synchronization checkpoint. All binlogs earlier than this location are guaranteed to be successfully completed. Execute downstream.

Transactional Consistency

From the above description, we can see that DM actually synchronizes data at the "row level". An upstream transaction will be split into multiple rows in DM and distributed to different DML Workers for concurrent execution. When the DM synchronization task is suspended due to an error, or the user manually suspends the task, the downstream may stay in an intermediate state, that is, some DML statements in an upstream transaction may be synchronized to the downstream, and some are not, and the downstream is in an inconsistent state. In order to keep the downstream in a consistent state when the task is suspended as much as possible, after v5.3.0, the DM will wait for all upstream transactions to be synchronized to the downstream when the task is suspended before actually suspending the task. This waiting time is 10s. If it has not been fully synchronized to the downstream within 10s, the downstream may still be in an inconsistent state.

Safemode

In the above section on execution logic, we can find that DML execution and write checkpoint operation are not synchronous, and write checkpoint operation and write downstream data cannot guarantee atomicity. When DM exits abnormally for some reason, checkpoint may only record To a recovery point before the exit time, so when the synchronization task restarts, DM may rewrite part of the data, that is, DM actually provides "at-least-once processing" logic (At-least-once processing) , the same data may be processed more than once. In order to ensure that the data is reentrant, the DM will enter safemode when restarting abnormally. The specific logic is as follows:

1. When the DM task is suspended normally, all DML in the memory will be synchronized to the downstream, and the checkpoint will be refreshed. After the task is suspended normally, it will not enter safemode, because all the data before the checkpoint is synchronized to the downstream, and the data after the checkpoint has not been synchronized, and no data will be processed repeatedly.

2. When the task is suspended abnormally, the DM will first try to synchronize all the DML in the memory to the downstream, which may fail (such as downstream data conflict, etc.), and then the DM records the latest data pulled from the upstream in the current memory. The position of the binlog, denoted as safemode\_exit\_point, flush this position and checkpoint to the downstream. When the task resumes, the following situations may exist

  • checkpoint == safemode\_exit\_point, which means that all DMLs are synchronized to the downstream when DM is suspended. We can follow the processing method when the task is suspended normally without entering safemode
  • checkpoint < safemode\_exit\_point, which means that when DM is paused, part of the DML in memory fails to execute downstream, so checkpoint is still an "older" point. At this time, from checkpoint to safemode\_exit\_point In this section of binlog, safemode mode will be turned on, because they may have been synchronized once
  • safemode\_exit\_point does not exist, which means that flushing safemode\_exit\_point while DM is paused failed, or the DM process was forcibly terminated. At this time, the DM cannot specifically determine which data may be processed repeatedly, so it will enable safemode in the two checkpoint intervals after the task resumes (the default is one minute), and then disable safemode for normal synchronization.

During Safemode, in order to ensure data reentrancy, DM will perform the following conversions

  1. Convert upstream insert statements into replace statements
  2. Convert upstream update statement to delete + replace statement

Exactly-Once Processing

From the above description, we can find that DM's logic of splitting transactions and then synchronizing concurrently has caused some problems. For example, the downstream may stop in an inconsistent state, for example, the synchronization order of data is inconsistent with the upstream, for example, it may lead to data reentrancy (safemode During the replace statement, there will be a certain performance loss, and if the downstream needs to capture data changes (such as cdc), then repeated processing is not acceptable).

In summary, we are considering implementing the logic of "exactly one processing". If you are interested in joining us, you can come to https://internals.tidb.io and discuss together.


PingCAP
1.9k 声望4.9k 粉丝

PingCAP 是国内开源的新型分布式数据库公司,秉承开源是基础软件的未来这一理念,PingCAP 持续扩大社区影响力,致力于前沿技术领域的创新实现。其研发的分布式关系型数据库 TiDB 项目,具备「分布式强一致性事务...