Technology sharing | TiDB data splitting practice of hundreds of terabytes - 个人文章

Author: Yang Jiaxin
Multi-point senior DBA, good at fault analysis and performance optimization, likes to explore new technologies, and likes photography.
Source of this article: original contribution
*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.

background

To improve the availability of TiDB, it is necessary to split the multi-point TiDB cluster with hundreds of terabytes into two sets

challenge

There are currently 12 versions of TiDB clusters that need to be split (4.0.9, 5.1.1, and 5.1.2 are available), and the splitting method of each version is different.
Among them, 5 sets of TiDB have a data volume of more than 10T, the largest TiDB cluster currently has a data volume of 62T, and a single TiDB cluster has a large backup set, which consumes a lot of disk space and bandwidth resources

The maximum space is 3 clusters

There are various ways to use tidb (each method has different splitting methods), including direct reading and writing of tidb, mysql->tidb summary analysis query, and tidb->cdc->downstream hive
Will full backup of TiDB have performance impact during peak business hours?
Consistency guarantee for split data with large amounts of data

Scheme

Currently, the official synchronization tools provided by TiDB are:

DM full + incremental (this method cannot be used for tidb->tidb, but is applicable to MySQL->TiDB)
BR full physical backup + CDC incremental synchronization (CDC synchronization is expensive to repair after OOM of tidb and tikv nodes https://github.com/pingcap/tiflow/issues/3061 )
BR full physical backup + binlog incremental (similar to the binlog log that records all changes in MySQL, TiDB binlog consists of Pump (record change log) + Drainer (replay change log), we use this method to perform full + Incremental sync split)

Backup and Restore Tool BR: https://docs.pingcap.com/en/tidb/stable/backup-and-restore-tool

TiDB Binlog: https://docs.pingcap.com/en/tidb/stable/tidb-binlog-overview

Because TiDB split BR full physical backup + binlog increment involves a long cycle, we divide it into 4 stages.

first stage

1. Clean up unused data in existing TiDB clusters

The tidb library has useless tables, such as the xxxx log table 3 months ago

2. Upgrade the existing 15 sets of TiDB clusters in GZ (12 sets of TiDB clusters need 1 to 2) version to 5.1.2

Take advantage of this split to unify the GZ tidb version to solve challenge 1

second stage

1. Deploy the same version 5.1.2 TiDB cluster on the new machine

set @@global.tidb_analyze_version = 1;

2. On the destination side, all tikv tiflashes on the source side are mounted on NFS, and BR is installed on the pd node

Exteral storge uses Tencent Cloud NFS network disk to ensure that both the tikv backup destination and the full restore source can be in the same directory, NFS network disk space is automatically and dynamically increased + speed-limited backup to meet challenges 2

3. Deploy 12 sets of TiDB clusters on 3 machines independently

The pump collects binlog (ports distinguish different TiDB clusters) pump, and the drainer uses independent 16C, 32G machines to ensure the maximum performance of incremental synchronization
Note: To ensure the availability of tidb computing nodes, the key parameter ignore-error binlog needs to be set ( https://docs.pingcap.com/zh/tidb/v5.1/tidb-binlog-deployment-topology )

 server_configs: 
  tidb: 
    binlog.enable: true 
    binlog.ignore-error: true

4. Modify the GC time of the pump component to 7 days

binlog is retained for 7 days to ensure full backup -> can be connected to the incremental synchronization process

 pump_servers: 
  - host: xxxxx 
    config: 
      gc: 7 
#需reload重启tidb节点使记录binlog⽣效

5. Frequently asked questions about backing up the full data of a TiDB cluster to NFS Backup & Restore ( https://docs.pingcap.com/zh/tidb/v5.1/backup-and-restore-faq )

Note: Each TiDB cluster creates different backup directories on the same NFS
Note: The source TiDB cluster is speed-limited respectively (the read and write delay time is basically unaffected before and after backup) to perform off-peak full backup (there was a situation where multiple TiDB clusters were backed up at the same time and the NFS 3Gbps network bandwidth was full) to reduce the current situation. There is the pressure of TiDB read and write, NFS to meet the challenges

 mkdir -p /tidbbr/0110_dfp 
chown -R tidb.tidb /tidbbr/0110_dfp 
#限速进⾏全业务应⽤库备份
./br backup full \ 
     --pd "xxxx:2379" \ 
     --storage "local:///tidbbr/0110_dfp" \ 
     --ratelimit 80 \ 
     --log-file /data/dbatemp/0110_backupdfp.log 
#限速进⾏指定库备份 
 ./br backup db \ 
   --pd "xxxx:2379" \ 
   --db db_name \ 
   --storage "local:///tidbbr/0110_dfp" \ 
   --ratelimit 80 \ 
   --log-file /data/dbatemp/0110_backupdfp.log 
   
12.30号45T TiDB集群全量备份耗时19h，占⽤空间12T 
[2021/12/30 09:33:23.768 +08:00] [INFO] [collector.go:66] ["Full backup succes s summary"] [total-ranges=1596156] [ranges-succeed=1596156] [ranges-failed=0] [backup-checksum=3h55m39.743147403s] [backup-fast-checksum=409.352223ms] [bac kup-total-ranges=3137] [total-take=19h12m22.227906678s] [total-kv-size=65.13T B] [average-speed=941.9MB/s] ["backup data size(after compressed)"=12.46TB] [B ackupTS=430115090997182553] [total-kv=337461300978]

6. Each new TiDB cluster synchronizes the user password information of the old TiDB cluster separately

Note: BR full backup does not back up the tidb mysql system library. The application and administrator password information can be exported by the open source pt-toolkit toolkit pt-show-grants

7. Restore NFS full backup to the new TiDB cluster

Note: The disk space of the new TiDB cluster needs to be sufficient. After a full backup and restoration, the new TiDB cluster occupies a space of several T compared to the TiDB cluster. Communication with the official staff is because the algorithm for generating sst during restoration is lz4, resulting in a compression ratio. Not as high as the old TiDB cluster
Note: tidb_enable_clustered_index, sql_mode must be the same for new TiDB cluster

8. Tiup expands drainer for incremental synchronization

Confirm that the downstream checkpoint information does not exist or has been cleared before expansion
If the downstream has taken over the drainer before, the relevant site is in the tidb_binlog.checkpoint table of the target end, and needs to be cleaned up when redoing
Note: Because the long-term average write TPS of the largest source TiDB cluster is about 6k, after increasing the number of worker-count playback threads, even though the destination domain name resolves to 3 tidb nodes, a single drainer increment cannot catch up. Delay (the playback speed is up to 3k TPS), and after communicating with TiDB official, it was changed to 3 drainers (different drainers synchronize with different library names) in parallel and incremental synchronization delay to catch up (3 drainer increments make "leakage" There is no accumulation, the source inflow data can reach the target outflow in time)
Note: For multiple drainers in parallel increments, you must specify the destination checkpoint.schema for different library drainer configuration instructions

 ＃从备份⽂件中获取全量备份开始时的位点TSO
grep "BackupTS=" /data/dbatemp/0110_backupdfp.log 
430388153465177629 
#第⼀次⼀个drainer进⾏增量同步关键配置
drainer_servers: 
- host: xxxxxx 
commit_ts: 430388153465177629 
deploy_dir: "/data/tidb-deploy/drainer-8249" 
config: 
syncer.db-type: "tidb" 
syncer.to.host: "xxxdmall.db.com" 
syncer.worker-count: 550 1516


#第⼆次多个drainer进⾏并⾏增量同步
drainer_servers: 
- host: xxxxxx 
commit_ts: 430505424238936397 #该位点TSO为从第⼀次1个drainer增量停⽌后⽬的端ch eckpoint表中的Commit_Ts
config: 
syncer.replicate-do-db: [db1,db2,....] 
syncer.db-type: "tidb" 
syncer.to.host: "xxxdmall.db.com" 
syncer.worker-count: 550 
syncer.to.checkpoint.schema: "tidb_binlog2"

The incremental delay of 1 drainer is getting bigger and bigger

Three drainers perform parallel incremental synchronization. The slowest incremental link: 9h chases nearly 1 day of data

3 drainers in parallel, the destination writes 1.2w TPS > the source 6k writes TPS

9. Configure the new tidb grafana&dashboard domain name

The domain names of grafana and dashboard point to the production nginx proxy, and the grafana port and the dashboard port are proxied by nginx

The third phase

1. Check the consistency of data synchronization in the new TiDB cluster

TiDB will perform data consistency check at full and incremental time. We mainly focus on the incremental synchronization delay, and randomly count(*) the source and destination tables.

 #延迟检查⽅法⼀：在源端TiDB drainer状态中获取最新已经回复TSO再通过pd获取延迟情况 
mysql> show drainer status;
+-------------------+-------------------+--------+--------------------+------- --------------+ 
| NodeID | Address | State | Max_Commit_Ts | Update_Time | 
+-------------------+-------------------+--------+--------------------+------- --------------+ 
| xxxxxx:8249 | xxxxxx:8249 | online | 430547587152216733 | 2022-01-21 16:50:58 | 


tiup ctl:v5.1.2 pd -u http://xxxxxx:2379 -i 
» tso 430547587152216733; 
system: 2022-01-17 16:38:23.431 +0800 CST 
logic: 669 


#延迟检查⽅法⼆：在grafana drainer监控中观察 
tidb-Binlog->drainer->Pump Handle TSO中current值和当前实际时间做延迟⽐较 
曲线越陡，增量同步速率越快

2. The tiflash table is established & CDC synchronization is established in the new TiDB cluster & the new mysql->tidb summary synchronization link closed loop (DRC-TIDB)

tiflash
The source tidb generates a new tiflash statement at the destination

 SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = '<db_name>' and TABLE_NAME = '<table_name>' 
SELECT concat('alter table ',table_schema,'.',table_name,' set tiflash replica 1;') FROM information_schema.tiflash_replica where table_schema like 'dfp%';

CDC link closed loop
Select 1 TSO site in the old TiDB CDC synchronization to establish CDC to kafka topic synchronization in the new TiDB
DRC-TIDB link closed loop (self-developed mysql->tidb combined database combined table synchronization tool)

The above picture shows the state before and after the DRC-TIDB split
1. Copy the left drc-tidb synchronization rules to the right new drc-tidb, do not start drc-tidb synchronization (record the current time T1)
2. The drainer synchronizes existing TiDB data to the newly created TiDB link and enables safe mode replace (syncer.safe-mode: true) insertion
3. Modify the destination address of the left drc-tidb synchronization source to closed loop, and start drc-tidb (record the current time T2)
4. In the monitoring of the right tidb grafana drainer, check whether the current synchronization time checkpoint is >= T2 (similar to tikv follower-read), if not, wait for the delay to catch up
5. Modify the edit-config drainer configuration file for incremental synchronization of the right tidb cluster, remove the library name synchronized by mysql-tidb (add the specified library name synchronization for all libraries) and reload the drainer node

 commit_ts: 431809362388058219 
config: 
syncer.db-type: tidb 
syncer.replicate-do-db: 
- dmall_db1 该DB为直接读写 
- dmall_db2 该DB为从mysql同步⽽来，需去掉

6. Modify the destination address of the right drc-tidb synchronization source to a closed loop, and start the right drc-tidb (drc-tidb adopts idempotent synchronization, which will repeatedly consume the mysql binlog from the time T1 of the copy synchronization rule to the present)

3. ANALYZE TABLE updates table statistics for each new TiDB cluster

Not necessary, update the statistics to the latest to avoid query SQL index selection errors

fourth stage

1. The left tidb cluster application domain name is resolved to the new tidb computing node

2. Batch kill the connections of the left application of the right TiDB cluster

There are scripts that kill tidb pid in batches multiple times; there are still a large number of left application connections on the right tidb node, so after the left application rolling restarts, the right
tidb node left application connection release

3. Remove the old TiDB cluster -> incrementally synchronize the drainer link of the new TiDB cluster

Note: Because multiple TiDB clusters share one high-configured drainer machine, node_exporter (collection machine monitoring agent) is also shared by multiple TiDB clusters. When A TiDB cluster stops the drainer link, BC TiDB cluster will report that node_exporter is not alive alert

Summarize

It is necessary to upgrade the unified version of different TiDB versions. First, the splitting method is common to reduce the complexity of splitting, and the second is to enjoy the features of the new version and reduce operation and maintenance management costs.
The disk space of the target TiDB cluster must be sufficient
When the source TiDB write pressure is high, the delay guarantee of incremental synchronization of binlog to the destination requires the drainer to perform concurrent incremental synchronization according to the library name.
TiDB split involves many steps, the steps that can be done in advance are done in advance, and the time window for the real split is very short
Thanks to the TiDB official community for their technical support, we have a long way to go, and we will continue to search

Technology sharing | TiDB data splitting practice of hundreds of terabytes

background

challenge

Scheme

Because TiDB split BR full physical backup + binlog increment involves a long cycle, we divide it into 4 stages.

first stage

second stage

The third phase

fourth stage

Summarize

爱可生开源社区

引用和评论

gh-ost 扩展 MySQL 字段失败？看看 ChatDBA 和 DeepSeek 都怎么说？

TiDB 优化器 | 执行计划管理及实践

基于 AutoFlow 快速搭建基于 TiDB 向量搜索的本地知识库问答机器人

53 倍性能提升！TiDB 全局索引如何优化分区表查询？

2024年12月国产数据库大事记-墨天轮

墨天轮国产数据库排行榜年终总结-2024年

2024年12月中国数据库排行榜：群雄竞逐显风采，GoldenDB摘探花