background
As a "terminator" of the sub-database sub-table solution, TiDB has won the favor of many users. After switching to TiDB, users bid farewell to the complexity of query and operation and maintenance of sub-databases and sub-tables. But in the process of switching from the sub-database sub-table solution to TiDB, this complexity has been transferred to the data migration process. The TiDB DM tool provides users with the function of merging and migrating sub-databases and tables. In the process of data migration, it supports the merge and migration of sub-table DML events, and supports upstream sub-tables for DDL changes to a certain extent.
This article and subsequent articles mainly introduce the coordination of DDL changes for each sub-table when the sub-database and sub-tables are merged and migrated. DM's sub-database and sub-table DDL coordination can be configured as "pessimistic coordination" and "optimistic coordination". This article mainly introduces the "pessimistic coordination" mode of TiDB DM sub-database and sub-table DDL coordination. A follow-up article will introduce the "optimistic coordination" model.
The problem of sub-database and sub-table DDL (abbreviated version)
This section first briefly introduces the impact of sub-database and sub-table DDL on data migration with an example, and then gives a more formal definition of this issue.
Suppose there are two sub-tables t1 and t2 in the two upstreams, and the downstream table is t
The next synchronization event for t1 is INSERT (3,3), and the next synchronization event for t2 is DROP COLUMN c2. If DROP COLUMN c2 is synchronized to the downstream first, an error will be reported due to the lack of c2 column when synchronized to INSERT (3,3). Therefore, we have to deal with DDL synchronization events specially.
Definition of sub-database and sub-table merge migration
Next, we try to use a more formal language to describe this problem, which leads to how to solve this problem correctly.
From the user's point of view, the main purpose of the database is to query. Before and after the sub-database and sub-table merge, the results of the query should be the same (not considering LIMIT and other operators and uncertain queries). In other words, the union of the query results of the sub-tables should be equal to the query results after migration. It is easy to obtain a sufficient condition to meet this requirement: the union of the sub-table data should be equal to the table data after migration. Taking into account the effect of synchronization delay, that is, the data of the downstream table at the current time is equal to the union of the data of the sub-tables at a certain time in the past.
For this definition, data migration is to keep the synchronization of the sub-tables moving forward. If before synchronizing an event, the downstream table and each sub-table meet the definition, then we apply the synchronization event of a sub-table to the downstream in the same way to affect the synchronization time of the sub-table correctly.
The problem of sub-database and sub-table DDL (official version)
From the above definition, DDL will cause two problems.
The first is that DDL may change the table structure. Refer to the previous example, if t1 and t2 have DROP COLUMN c2 events, DM will first synchronize to the event of t2, and the synchronization event needs to be applied downstream in the same way. We should only DROP COLUMN the data corresponding to downstream t2. Obviously, the data of downstream t1 and t2 share a table structure, and this operation cannot be completed. Therefore, the event of t2 cannot be synchronized temporarily.
Another problem is that even if part of the DDL does not affect the table structure, it will have an impact on the data. For example, ALTER TABLE DROP COLUMN c, ADD COLUMN c DEFAULT xx will change all column c of a sub-table to xx. The current implementation of DM is also unable to apply this event to downstream in the same way.
Solution
For the problems introduced by the above DDL and based on the definition of synchronization correctness in the previous article, we can obtain a sufficient condition to meet the requirements: when a DDL synchronization event occurs in a certain sub-table, we will suspend its synchronization; until all sub-tables have the DDL When synchronizing events, we apply DDL to the downstream and restore synchronization of all sub-tables. At this time, we can ensure that the impact of the DDL of the downstream table is equal to that all the sub-tables are DDL (not considering non-deterministic DDL, for example, the default value of the new column of DDL is current_timestamp).
Examples of pessimistic coordination
We still take the merge migration of the two tables t1 and t2 as an example to observe the progress of binlog synchronization
In the figure on the left, when the sub-table t1 encounters DDL, the t2 synchronization event has not reached this DDL, so the t1 synchronization should be suspended. When it progresses to the right figure, the same DDL appears in the t1 and t2 sub-tables, so this DDL can be applied downstream and the synchronization of t1 and t2 can be restored at this time.
In some cases, t1 and t2 may be in a binlog stream, so the pause and resume of the seemingly independent stream in the above figure is actually implemented as skipping events and rolling back the synchronization position in the same binlog stream.
As shown in the figure above, we need to synchronize event 3 after events 1, 2, 4, 5, and 6, so we skip when we first encounter event 3 in the binlog stream, and re-synchronize from event 3 after event 6 is completed, and skip 4, 5, and 6 events that have been synchronized.
Pessimistic coordination mode restrictions
It can be seen that this coordination mode solution has the following limitations:
- When a DDL synchronization event occurs, the minutes table will pause, which will cause the synchronization delay to increase. This may cause the upstream binlog to have been cleaned up when synchronization is restored
- It does not support the scene when only part of the sub-table is changed for grayscale test. The synchronization of the remaining sub-meters will be suspended during the grayscale period. In addition, if the gray test result is rolled back, the synchronization cannot be restored
All sub-tables are required to have DDL synchronization events in the same order
- If the submeter enters the inconsistent state of DDL due to misoperation, the repair operation is more complicated
- For users of DM, they may not be able to control the initiation of upstream DDL and thus cannot meet the conditions
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。