Author: Yang Jiaxin
Multi-point senior DBA, good at fault analysis and performance optimization, likes to explore new technologies, and likes photography.
Source of this article: original contribution
*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.
background
To improve the availability of TiDB, it is necessary to split the multi-point TiDB cluster with hundreds of terabytes into two sets
challenge
- There are currently 12 versions of TiDB clusters that need to be split (4.0.9, 5.1.1, and 5.1.2 are available), and the splitting method of each version is different.
- Among them, 5 sets of TiDB have a data volume of more than 10T, the largest TiDB cluster currently has a data volume of 62T, and a single TiDB cluster has a large backup set, which consumes a lot of disk space and bandwidth resources
The maximum space is 3 clusters
- There are various ways to use tidb (each method has different splitting methods), including direct reading and writing of tidb, mysql->tidb summary analysis query, and tidb->cdc->downstream hive
- Will full backup of TiDB have performance impact during peak business hours?
- Consistency guarantee for split data with large amounts of data
Scheme
Currently, the official synchronization tools provided by TiDB are:
- DM full + incremental (this method cannot be used for tidb->tidb, but is applicable to MySQL->TiDB)
- BR full physical backup + CDC incremental synchronization (CDC synchronization is expensive to repair after OOM of tidb and tikv nodes https://github.com/pingcap/tiflow/issues/3061 )
- BR full physical backup + binlog incremental (similar to the binlog log that records all changes in MySQL, TiDB binlog consists of Pump (record change log) + Drainer (replay change log), we use this method to perform full + Incremental sync split)
Backup and Restore Tool BR: https://docs.pingcap.com/en/tidb/stable/backup-and-restore-tool
TiDB Binlog: https://docs.pingcap.com/en/tidb/stable/tidb-binlog-overview
Because TiDB split BR full physical backup + binlog increment involves a long cycle, we divide it into 4 stages.
first stage
1. Clean up unused data in existing TiDB clusters
The tidb library has useless tables, such as the xxxx log table 3 months ago
2. Upgrade the existing 15 sets of TiDB clusters in GZ (12 sets of TiDB clusters need 1 to 2) version to 5.1.2
Take advantage of this split to unify the GZ tidb version to solve challenge 1
second stage
1. Deploy the same version 5.1.2 TiDB cluster on the new machine
set @@global.tidb_analyze_version = 1;
2. On the destination side, all tikv tiflashes on the source side are mounted on NFS, and BR is installed on the pd node
Exteral storge uses Tencent Cloud NFS network disk to ensure that both the tikv backup destination and the full restore source can be in the same directory, NFS network disk space is automatically and dynamically increased + speed-limited backup to meet challenges 2
3. Deploy 12 sets of TiDB clusters on 3 machines independently
The pump collects binlog (ports distinguish different TiDB clusters) pump, and the drainer uses independent 16C, 32G machines to ensure the maximum performance of incremental synchronization
Note: To ensure the availability of tidb computing nodes, the key parameter ignore-error binlog needs to be set ( https://docs.pingcap.com/zh/tidb/v5.1/tidb-binlog-deployment-topology )
server_configs:
tidb:
binlog.enable: true
binlog.ignore-error: true
4. Modify the GC time of the pump component to 7 days
binlog is retained for 7 days to ensure full backup -> can be connected to the incremental synchronization process
pump_servers:
- host: xxxxx
config:
gc: 7
#需reload重启tidb节点使记录binlog⽣效
5. Frequently asked questions about backing up the full data of a TiDB cluster to NFS Backup & Restore ( https://docs.pingcap.com/zh/tidb/v5.1/backup-and-restore-faq )
Note: Each TiDB cluster creates different backup directories on the same NFS
Note: The source TiDB cluster is speed-limited respectively (the read and write delay time is basically unaffected before and after backup) to perform off-peak full backup (there was a situation where multiple TiDB clusters were backed up at the same time and the NFS 3Gbps network bandwidth was full) to reduce the current situation. There is the pressure of TiDB read and write, NFS to meet the challenges
mkdir -p /tidbbr/0110_dfp
chown -R tidb.tidb /tidbbr/0110_dfp
#限速进⾏全业务应⽤库备份
./br backup full \
--pd "xxxx:2379" \
--storage "local:///tidbbr/0110_dfp" \
--ratelimit 80 \
--log-file /data/dbatemp/0110_backupdfp.log
#限速进⾏指定库备份
./br backup db \
--pd "xxxx:2379" \
--db db_name \
--storage "local:///tidbbr/0110_dfp" \
--ratelimit 80 \
--log-file /data/dbatemp/0110_backupdfp.log
12.30号45T TiDB集群全量备份耗时19h,占⽤空间12T
[2021/12/30 09:33:23.768 +08:00] [INFO] [collector.go:66] ["Full backup succes s summary"] [total-ranges=1596156] [ranges-succeed=1596156] [ranges-failed=0] [backup-checksum=3h55m39.743147403s] [backup-fast-checksum=409.352223ms] [bac kup-total-ranges=3137] [total-take=19h12m22.227906678s] [total-kv-size=65.13T B] [average-speed=941.9MB/s] ["backup data size(after compressed)"=12.46TB] [B ackupTS=430115090997182553] [total-kv=337461300978]
6. Each new TiDB cluster synchronizes the user password information of the old TiDB cluster separately
Note: BR full backup does not back up the tidb mysql system library. The application and administrator password information can be exported by the open source pt-toolkit toolkit pt-show-grants
7. Restore NFS full backup to the new TiDB cluster
Note: The disk space of the new TiDB cluster needs to be sufficient. After a full backup and restoration, the new TiDB cluster occupies a space of several T compared to the TiDB cluster. Communication with the official staff is because the algorithm for generating sst during restoration is lz4, resulting in a compression ratio. Not as high as the old TiDB cluster
Note: tidb_enable_clustered_index, sql_mode must be the same for new TiDB cluster
8. Tiup expands drainer for incremental synchronization
Confirm that the downstream checkpoint information does not exist or has been cleared before expansion
If the downstream has taken over the drainer before, the relevant site is in the tidb_binlog.checkpoint table of the target end, and needs to be cleaned up when redoing
Note: Because the long-term average write TPS of the largest source TiDB cluster is about 6k, after increasing the number of worker-count playback threads, even though the destination domain name resolves to 3 tidb nodes, a single drainer increment cannot catch up. Delay (the playback speed is up to 3k TPS), and after communicating with TiDB official, it was changed to 3 drainers (different drainers synchronize with different library names) in parallel and incremental synchronization delay to catch up (3 drainer increments make "leakage" There is no accumulation, the source inflow data can reach the target outflow in time)
Note: For multiple drainers in parallel increments, you must specify the destination checkpoint.schema for different library drainer configuration instructions
#从备份⽂件中获取全量备份开始时的位点TSO
grep "BackupTS=" /data/dbatemp/0110_backupdfp.log
430388153465177629
#第⼀次⼀个drainer进⾏增量同步关键配置
drainer_servers:
- host: xxxxxx
commit_ts: 430388153465177629
deploy_dir: "/data/tidb-deploy/drainer-8249"
config:
syncer.db-type: "tidb"
syncer.to.host: "xxxdmall.db.com"
syncer.worker-count: 550 1516
#第⼆次多个drainer进⾏并⾏增量同步
drainer_servers:
- host: xxxxxx
commit_ts: 430505424238936397 #该位点TSO为从第⼀次1个drainer增量停⽌后⽬的端ch eckpoint表中的Commit_Ts
config:
syncer.replicate-do-db: [db1,db2,....]
syncer.db-type: "tidb"
syncer.to.host: "xxxdmall.db.com"
syncer.worker-count: 550
syncer.to.checkpoint.schema: "tidb_binlog2"
The incremental delay of 1 drainer is getting bigger and bigger
Three drainers perform parallel incremental synchronization. The slowest incremental link: 9h chases nearly 1 day of data
3 drainers in parallel, the destination writes 1.2w TPS > the source 6k writes TPS
9. Configure the new tidb grafana&dashboard domain name
The domain names of grafana and dashboard point to the production nginx proxy, and the grafana port and the dashboard port are proxied by nginx
The third phase
1. Check the consistency of data synchronization in the new TiDB cluster
TiDB will perform data consistency check at full and incremental time. We mainly focus on the incremental synchronization delay, and randomly count(*) the source and destination tables.
#延迟检查⽅法⼀:在源端TiDB drainer状态中获取最新已经回复TSO再通过pd获取延迟情况
mysql> show drainer status;
+-------------------+-------------------+--------+--------------------+------- --------------+
| NodeID | Address | State | Max_Commit_Ts | Update_Time |
+-------------------+-------------------+--------+--------------------+------- --------------+
| xxxxxx:8249 | xxxxxx:8249 | online | 430547587152216733 | 2022-01-21 16:50:58 |
tiup ctl:v5.1.2 pd -u http://xxxxxx:2379 -i
» tso 430547587152216733;
system: 2022-01-17 16:38:23.431 +0800 CST
logic: 669
#延迟检查⽅法⼆:在grafana drainer监控中观察
tidb-Binlog->drainer->Pump Handle TSO中current值和当前实际时间做延迟⽐较
曲线越陡,增量同步速率越快
2. The tiflash table is established & CDC synchronization is established in the new TiDB cluster & the new mysql->tidb summary synchronization link closed loop (DRC-TIDB)
- tiflash
The source tidb generates a new tiflash statement at the destination
SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = '<db_name>' and TABLE_NAME = '<table_name>'
SELECT concat('alter table ',table_schema,'.',table_name,' set tiflash replica 1;') FROM information_schema.tiflash_replica where table_schema like 'dfp%';
- CDC link closed loop
Select 1 TSO site in the old TiDB CDC synchronization to establish CDC to kafka topic synchronization in the new TiDB
- DRC-TIDB link closed loop (self-developed mysql->tidb combined database combined table synchronization tool)
The above picture shows the state before and after the DRC-TIDB split1. Copy the left drc-tidb synchronization rules to the right new drc-tidb, do not start drc-tidb synchronization (record the current time T1)
2. The drainer synchronizes existing TiDB data to the newly created TiDB link and enables safe mode replace (syncer.safe-mode: true) insertion
3. Modify the destination address of the left drc-tidb synchronization source to closed loop, and start drc-tidb (record the current time T2)
4. In the monitoring of the right tidb grafana drainer, check whether the current synchronization time checkpoint is >= T2 (similar to tikv follower-read), if not, wait for the delay to catch up
5. Modify the edit-config drainer configuration file for incremental synchronization of the right tidb cluster, remove the library name synchronized by mysql-tidb (add the specified library name synchronization for all libraries) and reload the drainer node
commit_ts: 431809362388058219
config:
syncer.db-type: tidb
syncer.replicate-do-db:
- dmall_db1 该DB为直接读写
- dmall_db2 该DB为从mysql同步⽽来,需去掉
6. Modify the destination address of the right drc-tidb synchronization source to a closed loop, and start the right drc-tidb (drc-tidb adopts idempotent synchronization, which will repeatedly consume the mysql binlog from the time T1 of the copy synchronization rule to the present)
3. ANALYZE TABLE updates table statistics for each new TiDB cluster
Not necessary, update the statistics to the latest to avoid query SQL index selection errors
fourth stage
1. The left tidb cluster application domain name is resolved to the new tidb computing node
2. Batch kill the connections of the left application of the right TiDB cluster
There are scripts that kill tidb pid in batches multiple times; there are still a large number of left application connections on the right tidb node, so after the left application rolling restarts, the right
tidb node left application connection release
3. Remove the old TiDB cluster -> incrementally synchronize the drainer link of the new TiDB cluster
Note: Because multiple TiDB clusters share one high-configured drainer machine, node_exporter (collection machine monitoring agent) is also shared by multiple TiDB clusters. When A TiDB cluster stops the drainer link, BC TiDB cluster will report that node_exporter is not alive alert
Summarize
- It is necessary to upgrade the unified version of different TiDB versions. First, the splitting method is common to reduce the complexity of splitting, and the second is to enjoy the features of the new version and reduce operation and maintenance management costs.
- The disk space of the target TiDB cluster must be sufficient
- When the source TiDB write pressure is high, the delay guarantee of incremental synchronization of binlog to the destination requires the drainer to perform concurrent incremental synchronization according to the library name.
- TiDB split involves many steps, the steps that can be done in advance are done in advance, and the time window for the real split is very short
- Thanks to the TiDB official community for their technical support, we have a long way to go, and we will continue to search
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。