头图

In the computer world of 0 and 1, in order to improve system speed and maximize server performance, developers and programmers have to face various challenges. problem.

The topic of "Big Data Cost Reduction and Efficiency Improvement" is a topic that provides reference for practitioners to carry out big data practice by summarizing and sharing their own experience of stepping on pits and tuning skills in the process of big data actual combat. This article is the third part of the topic "Big Data Cost Reduction and Efficiency Improvement". It will share with you the practical experience of achieving a thousand-fold performance improvement of TiDB through tuning.

Getui's bond with TiDB

As a data intelligence enterprise, Getui provides developer services such as message push for hundreds of thousands of apps, and provides professional digital solutions for customers in many industries. At the same time of rapid business development, the company's data volume is also growing rapidly. With the passage of time, the amount of data has become larger and larger, and MySQL has been unable to meet the company's needs for fast query and analysis of data. It supports horizontal elastic expansion, can effectively deal with high concurrency and massive data scenarios, and is highly compatible with MySQL. The new database has become a popular selection requirement.

After in-depth research, we found that the "Internet celebrity" database TiDB not only has the above characteristics, but also is a cloud-native distributed database with financial-grade high availability, strong data consistency, and real-time HTAP support. Therefore, we decided to switch from MySQL to TiDB, hoping to ensure fast data query even when the amount of data storage continues to grow, so as to meet the needs of internal and external customers for efficient data analysis, such as providing timely push and distribution for developers and users. Data reports such as volume, arrival rate, etc., help them make scientific decisions.

After the selection is completed, we start the data migration. The amount of data to migrate MySQL database instance is about several terabytes. We use the ecological tool Data Migration (DM) that comes with TiDB to migrate full and incremental data.

Full data migration: Migrate the table structure of the corresponding table from the data source to TiDB, then read the existing data and write it to the TiDB cluster.

Incremental data replication: After the full data migration is completed, the corresponding table changes are read from the data source, and then written to the TiDB cluster.

Gentui migrates MySQL data to TiDB

After data synchronization is stable, the application is gradually migrated to TiDB Cluster. After migrating the last application, stop the DM Cluster. This completes the data migration from MySQL to TiDB.

Note: For the specific configuration and usage of DM, please refer to the official documentation.

Stuck in an "anti-pattern" used by TiDB

However, after all applications were migrated to TiDB, a series of problems such as slow database response, freezes, and application unavailability occurred.

As shown below:

Caton when logging in to the database

Through investigation, we found that there are a large number of slow SQL scripts that use load to import data.

Importing slow SQL takes dozens of minutes

After communicating with the business side, we found that some import statements contained tens of thousands of records, and the import time took dozens of minutes.

Compared with the previous use of MySQL, an import takes only a few minutes or even dozens of seconds to complete, but it takes twice or even several times the time to complete the migration to TiDB. The TiDB cluster composed of several machines is not as good as a MySQL machine. .

This is definitely not the correct posture to open TiDB, we need to find the reason and optimize it.

Single server load is too high

By checking the monitoring, it is found that the server load pressure is on one of the machines (as shown in the figure above, the server marked in the red wire frame bears the main pressure), which shows that we have not fully utilized all the resources and failed to exert It shows the performance advantages of TiDB as a distributed database.

The correct posture for opening TiDB

First optimize the configuration parameters

How to optimize it? Let's start with the configuration parameters. As we all know, many configuration parameters use the default parameters of the system, which does not help us make reasonable use of the performance of the server. Through in-depth review of official documents and multiple rounds of actual testing, we have made appropriate adjustments to the configuration parameters of TiDB, so as to make full use of server resources and achieve ideal server performance.

The following table is a description of how to adjust TiDB configuration parameters for reference:

Focus on solving hot issues

Adjusting the configuration parameters is only a basic step, we still have to fundamentally solve the problem that the server load and pressure are concentrated on one machine. But how to solve it? This requires us to first deeply understand the architecture of TiDB and the internal principles of data storage in tables in TiDB.

In the entire architecture of TiDB, the distributed data storage engine TiKV Server is responsible for storing data. When storing data, TiKV uses range segmentation to segment the data. The smallest unit of segmentation is region. Each region has a size limit (the default upper limit is 96M), there will be multiple copies, and each group of copies will become a raft group. In each raft group, the leader is responsible for reading & writing the block data. The leader will automatically be evenly scheduled on different physical nodes by the PD component (Placement Driver, referred to as "PD", which is the management module of the entire cluster) to evenly distribute the read and write pressure and achieve load balancing.

TiDB architecture diagram (picture from TiDB official website)

TiDB assigns a TableID to each table, an IndexID to each index, and a RowID to each row (by default, if the table uses an integer Primary Key, the value of the Primary Key is used as the RowID). The data of the same table will be stored in a range prefixed with the table ID, and the data will be arranged in the order of the value of RowID. In the process of inserting into the table, if the value of RowID is incremented, the inserted row can only be appended at the end.

When the Region reaches a certain size, it will be split. After the split, it can only be appended at the end of the current range, and the insert operation can only be performed on the same Region forever, thus forming a hot spot (that is, a single point of excessive load) , falling into an "anti-pattern" used by TiDB.

The common increment type self-incrementing primary key is incremented in order. By default, when the primary key is an integer, the primary key value is used as RowID. At this time, the RowID is also incremented in order, and a large number of inserts will form a table write into the hotspot. At the same time, the RowID in TiDB is also incremented in the order of self-increment by default. When the primary key is not an integer type, the problem of writing hot spots will also be encountered.

When using MySQL database, for convenience, we are all used to using auto-incrementing ID as the primary key of the table. Therefore, after migrating data from MySQL to TiDB, the original table structure remains unchanged, and the auto-incrementing ID is still used as the primary key of the table. This caused the problem of TiDB write hotspots when importing data in batches, resulting in continuous Region splitting and consuming a lot of resources.

In this regard, when optimizing TiDB, we start with the table structure, rebuild the table with the auto-incrementing ID as the primary key, delete the auto-incrementing ID, and use the implicit _tidb_rowid column of TiDB as the primary key.

create table t (a int primary key auto_increment, b int);

Change it to:

create table t (a int, b int)SHARD_ROW_ID_BITS=4 PRE_SPLIT_REGIONS=2

By setting SHARD_ROW_ID_BITS, the RowID is broken up and written to multiple different Regions, thereby alleviating the write hotspot problem.

It should be noted here that the SHARD_ROW_ID_BITS value determines the number of shards:
SHARD_ROW_ID_BITS = 0 means 1 shard
SHARD_ROW_ID_BITS = 4 means 16 shards
SHARD_ROW_ID_BITS = 6 means 64 shards

Too high SHARD_ROW_ID_BITS will cause the number of RPC requests to be enlarged, increasing CPU and network overhead. Here we set SHARD_ROW_ID_BITS to 4.

PRE_SPLIT_REGIONS refers to the pre-uniform segmentation after the table is successfully built. By setting PRE_SPLIT_REGIONS=2, we can achieve the pre-uniform segmentation of 2^(PRE_SPLIT_REGIONS) regions after the table is successfully built.

Experience summary

· Prohibit the use of self-incrementing primary keys for new tables in the future, and consider using business primary keys

· Add parameter SHARD_ROW_ID_BITS = 4 PRE_SPLIT_REGIONS=2

In addition, due to the difference between the optimizer of TiDB and MySQL, the same SQL statement can be executed normally in MySQL, but it is slow to execute in TiDB. We conducted in-depth analysis on specific slow SQL, and targeted index optimization, and achieved good results.

Optimization results

Through the slow SQL query platform, we can see that after optimization, most of the imports are completed in seconds, which has achieved thousands of times performance improvement compared to the original dozens of minutes.


Slow SQL optimization results

At the same time, the performance monitoring chart also shows that when the load is high, several machines are high at the same time, rather than a single machine, which shows that our optimization method is effective, and the advantages of TiDB as a distributed database can be realized. reflect.


After optimization, achieve server load balancing

Summarize

As a new distributed relational database, TiDB can provide a one-stop solution for OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) scenarios. Getui not only uses TiDB for efficient query of massive data, but also conducts real-time data analysis and insight based on TiDB.

Follow-up more "big data cost reduction and efficiency improvement" dry goods sharing, please continue to lock in a push technology practice public account~


个推
1.5k 声望2.4k 粉丝

个推(每日互动股份有限公司,股票代码:300766)成立于2010年,是专业的数据智能服务商,致力于用数据让产业更智能。个推深耕开发者服务,并以海量的数据积累和创新的技术理念,构建了移动开发、用户增长、品牌...