TiDB HTAP Getting Started Guide丨How to add a copy of TiFlash

TiFlash is a key component of the TiDB HTAP form. It is a column memory extension of TiKV. It provides good isolation while also taking into account strong consistency. The inventory copy is replicated asynchronously through the Raft Learner protocol, but when reading, the consistency isolation level of Snapshot Isolation is obtained through the Raft proofreading index and MVCC. This architecture solves the problems of isolation of HTAP scenarios and synchronization of column storage.

Before using TiFlash, you need to add a copy of TiFlash to the table. Many users reported that there was a problem when adding a copy of TiFlash. The TiFlash copy is always unavailable. The official document summarizes some simple troubleshooting.

This article will introduce the working principle of adding TiFlash copies to the tables in TiDB under the current version (all the 4.x and 5.x versions of the current release). It is mainly for DBA students to troubleshoot related problems. You can refer to it first. Where to collect information and try to solve it.

basic concept

From the perspective of PD, the TiFlash instance is similar to the TiKV instance in that they are both a store, except that the store of TiFlash will have a label of "key=engine, value=TiFlash". After adding a copy of TiFlash, PD schedules the region to TiFlash, and allows the region to exist only as a learner, relying on the Placement Rules function.

The TiFlash instance contains a modified version of TiKV code, which is mainly responsible for cooperating with TiKV to process the operation of the Raft layer, and its output log is basically the same as TiKV. When TiUP is deployed, its log will be output to tiflash_tikv.log.

The TiFlash instance will periodically start a child process to handle operations related to the addition and deletion of TiFlash copies. If you occasionally see a non-resident process named tiflash_cluster_manager (called "pd buddy" on the official website) in the process list, it is normal. Its log will be output to tiflash_cluster_manager.log.

TiFlash internal component architecture diagram

Add the work of the components in the cluster at each stage of the TiFlash copy

Timing diagram for adding a copy of TiFlash

Perform copy number modification DDL

When 061851ccd1b5ac is executed in alter tableset tiflash replica , this statement is executed as a DDL statement.

During the synchronization process from progress 0.0 to 1.0

TiDB provides an http interface through which other components can query which tables have TiFlash copies: curl http://:/tiflash/replica .
TiFlash has regular tasks and is responsible for:

Which tables/partitions have TiFlash copies pulled from the tiflash/replica interface of TiDB. For tables that are not available, if the table does not have corresponding Placement Rules on the PD, the task will be responsible for setting up the corresponding rules, and the key range is [t__r, t__ ).
For tables that are not available, the task will pull the region_id corresponding to the key range from the PD, and how many synchronized region_ids are in all TiFlash stores online.
Based on the number of region_id in the TiFlash store after deduplication, the number of region_id in PD is used to update the synchronization progress progress by sending a POST request to the tiflash/replica interface.
If placement-rules exist in PD but the corresponding table_id does not exist in tiflash/replica, it means that the table/partition has been DROP and the GC time has passed, and the corresponding rule will be removed from PD.

The log output of this component is tiflash_cluster_manager.log. If there are multiple TiFlash in the cluster, one of them will be selected through PD's built-in etcd to be responsible for the above tasks. When checking through logs, you need to get the node responsible for the work within the time period, or get the related logs of all TiFlash nodes.
PD's behavior: After receiving placement-rules, PD will:

First split the Region to ensure that the boundary of the Region does not cross the table data and index (because TiFlash only synchronizes the table data part)
Dispatch AddLearner to TiFlash store to the Leader of Region

TiKV's behavior:

The Leader of the Region in TiKV accepts and executes the AddLearner command of PD
Region Leader sends Region data to TiFlash's Region peer in the form of Snapshot

Perform the Add partition process for the partition table that already has a copy of TiFlash

When TiDB adds partition to a partition table that already has a copy of TiFlash, it will block and wait after the partition is generated (but not visible to the user). DDL will not be executed until TiFlash reports that the partition_id corresponding to the partition is available. (For details, please refer to TiDB related PR)
For TiFlash, adding a partition to a partition table is similar to adding a normal table. You can refer to the above process. The difference is that in this case, an accelerate-schedule operation will be added to the PD to increase the scheduling priority of the region related to the key range of the partition table, in order to reduce the available speed of the partition table and reduce the DDL block when the cluster is busy. time.
Why do you need the Add partition operation of the block partition table:

If you do not block the DDL operation of Add partition, when the user executes a query statement (such as count(*)), if the query selects to read from TiFlash, but the region on the new partition has not yet established a TiFlash copy, this will cause the user to The query failed because of a small number of regions. It appears that when the user executes Add partition, querying the table is unstable and easy to fail.
In order to avoid the instability of the query, the Add partition operation of the block partition table will allow the partition to be read after the region of the newly created partition is ready to create a copy of TiFlash.

Directions for troubleshooting when problems occur at different stages (examples)

`alter tableset tiflash replica` when executing 061851ccd1b872

Generally speaking, this DDL operation only modifies the meta-information in TiDB and will not block for too long when executed. If there is a problem that the execution of this statement is stuck, you can see if there are other DDL operations that block the execution of the statement (for example, whether there is an add index operation on the same table). For more information, please refer to the DDL stuck experience in other TiDB [FAQ] DDL stuck troubleshooting experience-TiDB common FAQ.
The number of copies is modified successfully, but the progress is always zero, or the progress is progressing, but it is "slow"

First confirm the basic problem based on the TiFlash copy is always unavailable
If the above troubleshooting is correct, check the tiflash_cluster_manager.log log first. Check whether there is an abnormal connection with TiDB or PD. If there is an abnormality, first confirm whether the API query of the relevant component has timed out (curl http://:10080/tiflash/replica, see TiDB and TiFlash synchronization interface) or network connectivity problem.
Then confirm whether the problematic table has a placement-rule (keyword "Set placement rule… table--r" in the tiflash_cluster_manager.log log), and report the progress information (id, region_count, flash_region_count) to TiDB. Confirm whether the rule of the corresponding table can be queried on the PD (refer to the documentation of Placement Rules).
Confirm the specific manifestation of the "slow" synchronization progress. The table in question, whether its flash_region_count "has not changed" for a long time, or is it just "changing slowly" (for example, it will increase several regions in a few minutes).

If it is "no change", you need to troubleshoot where the problem occurs in the entire working link.

TiFlash sets the rule for PD -> PD sends AddLearner scheduling to the region leader in TiKV -> TiKV synchronizes region data to TiFlash to see if there is any problem with the link, collect logs of related components for troubleshooting.
You can check the warn/error information in tikv and tiflash-proxy logs to confirm whether there are errors such as network isolation.

If it is "slow change", you can check the current load of TiFlash and PD scheduling.

Mainly observe the TiFlash-Summary board in Grafana, the "Applying snapshots Count", "Snapshot Predecode Duration", and "Snapshot Flush Duration" in Raft, which reflect the concurrency of data received by TiFlash through ApplySnapshot and the time it takes to apply; and Storage Whether the "Write Stall Duration" in Write Stall is written too frequently, causing the Write Stall phenomenon; collect other logs such as CPU, disk IO load, etc., and TiFlash logs.
For the adjustment of PD-related scheduling parameters, see: PD Scheduling Parameters.

The process of adding partition to the partition table that already has a copy of TiFlash is stuck.

According to the comment in the PR, if it is blocked because TiFlash has not established a copy, a log of "[ddl] partition replica check failed" will be printed. The next troubleshooting direction is probably whether there are more Regions creating TiFlash copies, the pressure of TiFlash apply snapshot, whether the PD scheduling priority is in effect, etc.

appendix:

Some APIs that assist in troubleshooting in the process:

Query TiFlash copy, progress, etc. in TiDB

select * from information_schema.tiflash_replica

View the recently executed pending DDL tasks

admin show ddl jobs

API interface for obtaining TiFlash copy messages in TiDB (the main interface for interacting with TiFlash)

curl http://:/tiflash/replica

Region information of the query table in TiDB

SHOW TABLEREGIONS;

Query the Region information corresponding to table_id on a single TiFlash node

echo "DBGInvoke dump_all_region(,true)" | curl "http://:/?query=" --data-binary @-

Query Region information in PD

tiup ctl pd -u http://:region

Query Placement-rules information in PD

tiup ctl pd -u http://:config placement-rules show

TiDB HTAP Getting Started Guide丨How to add a copy of TiFlash

basic concept

Add the work of the components in the cluster at each stage of the TiFlash copy

Perform copy number modification DDL

During the synchronization process from progress 0.0 to 1.0

Perform the Add partition process for the partition table that already has a copy of TiFlash

Directions for troubleshooting when problems occur at different stages (examples)

`alter tableset tiflash replica` when executing 061851ccd1b872

The process of adding partition to the partition table that already has a copy of TiFlash is stuck.

appendix:

PingCAP

引用和评论

从企业数智化四阶段解读 TiDB 场景价值

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

DNS服务器地址大全

实战分享：DolphinScheduler 中 Shell 任务环境变量最佳配置方式

TiDB HTAP Getting Started Guide丨How to add a copy of TiFlash

basic concept

Add the work of the components in the cluster at each stage of the TiFlash copy

Perform copy number modification DDL

During the synchronization process from progress 0.0 to 1.0

Perform the Add partition process for the partition table that already has a copy of TiFlash

Directions for troubleshooting when problems occur at different stages (examples)

alter tableset tiflash replica when executing 061851ccd1b872

The process of adding partition to the partition table that already has a copy of TiFlash is stuck.

appendix:

PingCAP

引用和评论

从企业数智化四阶段解读 TiDB 场景价值

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

DNS服务器地址大全

实战分享：DolphinScheduler 中 Shell 任务环境变量最佳配置方式

`alter tableset tiflash replica` when executing 061851ccd1b872