TiDB 5.3 release-crossing the observability gap, achieving a new leap in HTAP performance and stability

"When users use software, they will have to face two gaps: one is the gap in execution, where the user has to figure out how to operate and "talk" to the software; the other is the gap in evaluation, and the user has to figure out the operation. the result of. "PingCAP co-founder and CTO Huang Dongxu mentioned in "Making Basic Software That Makes You Loveable", "Our mission as designers is to help users eliminate the two gaps between observability and interactivity. "

On November 30, 2021, TiDB version 5.3.0 was officially launched. This version launched the Continuous Profiling function (currently an experimental feature), which bridges the observability gap and provides users with performance insights at the level of database source code. , Thoroughly answer every database question.

While improving the observability of data, TiDB 5.3.0 achieves a significant improvement in HTAP performance and stability, and greatly improves data migration efficiency, high availability and ease of use, bringing heavy benefits to all users .

5.3.0 Feature highlights and user value

Support Continuous Profiling, leading the trend of database observability
Deeply optimize the distributed time stamp acquisition technology to improve the overall performance of the system
Continue to optimize storage and computing engines to provide more agile and reliable HTAP services
Further reduce the delay of synchronizing data from the upstream system to TiDB, and boost business growth during peak periods
Added parallel import function to improve the efficiency of full data migration
Introduce a temporary table, a SQL statement to simplify business logic and improve performance

Support continuous performance analysis, leading the trend of database observability

About 30% of IT failures encountered by enterprises are related to databases. When these failures involve application systems, network environments, and hardware devices, the recovery time may reach several hours, causing damage to business continuity, affecting user experience and even revenue. In a complex distributed system scenario, how to improve the observability of the database, help operation and maintenance personnel quickly diagnose problems, and optimize the troubleshooting process has always been a major problem that plagues enterprises. In TiDB 5.3.0 version, PingCAP took the lead in launching the Continuous Profiling feature (currently an experimental feature) in the database field, providing enterprises with performance insights at the level of database source code.

Continuous performance analysis achieves a continuous snapshot of the internal operating state of the database (similar to a CT scan) with a performance loss of less than 0.5%, and interprets the resource overhead from the system call level in the form of a flame graph. Let the original black box database become white box . After opening the continuous performance analysis with one click on the TiDB Dashboard, the operation and maintenance personnel can quickly locate the root cause of the performance problem, which can be traced back to the past and the present.

When the database is down unexpectedly, the diagnosis time can be reduced by at least 50%
In a case in the Internet industry, when an alarm business was affected in a customer cluster, it was difficult for the operation and maintenance personnel to find the root cause of the failure due to the lack of continuous performance analysis results of the database, and it took 3 hours to locate the problem and restore the cluster. If you use TiDB's continuous performance analysis function, the operation and maintenance personnel can compare the analysis results of daily and faults, and only need 20 minutes to restore the business, which greatly reduces the loss.
can provide cluster inspection and performance analysis services to ensure the continuous and stable operation of the cluster
Continuous performance analysis is the key to TiDB cluster inspection service, which provides commercial customers with cluster inspection and inspection result data reporting. Customers can discover and locate potential risks by themselves, implement optimization suggestions, and ensure the continuous and stable operation of each cluster.
provides more efficient business matching in database selection

When selecting a database, companies often need to complete the process of functional verification and performance verification in a short time. The continuous performance analysis function can help companies more intuitively find performance bottlenecks, quickly perform multiple rounds of optimization, ensure that the database is compatible with the business characteristics of the company, and improve the efficiency of database selection.

Note: The are stored on the monitoring node and will not affect the nodes that process business traffic.

Deeply optimize the distributed time stamp acquisition technology to provide a strong backing for the processing of massive business data

When the core business system of the Internet industry has a huge number of users and business data, in a scenario of high concurrent access, there may be increased delays in obtaining database timestamps, resulting in slower business response, frequent timeouts, and a sharp decline in user experience Case. Massive business data requires the database to have good scalability. TiDB itself has the advantage of being able to scale horizontally, but the ever-increasing amount of business data makes the ability to obtain timestamps gradually become a bottleneck that hinders cluster expansion, and ultimately limits the overall expansion of the cluster.

In order to further improve the ability to obtain timestamps, in TiDB 5.3.0, TiDB maintains the original global timestamp management method and adds two timestamp processing tuning parameters. When the PD load reaches the bottleneck , Can effectively reduce the load, reduce the time stamp acquisition delay, and greatly improve the overall performance of the system:

Support parameter setting PD follower proxy switch, after enabling it, followers are allowed to forward timestamp processing requests in batches.
Supports setting the maximum waiting time parameter for batch processing of time stamps by PD client to increase the processing bandwidth of timestamp requests.

Through this optimization, TiDB can better support the expansion of large-scale clusters of one hundred TB or one million QPS. After testing and verifying Sysbench 512 threads, after the timestamp processing process is optimized, the overall QPS throughput of the TiDB cluster has increased by more than 100%. The specific test environment is as follows:

Role	quantity	Specification
TiDB	26	8 cores
PD	3	4 cores
TiKV	5	12 cores

This optimization is applicable to the following scenarios:

A large-scale cluster with a hundred TB or more than one million QPS requires the expansion of the large-scale cluster.
With a medium-scale cluster, but with the rapid growth of business, the data has doubled, and it is necessary to achieve unlimited expansion of the cluster.

Continue to optimize storage and computing engines to provide more agile and reliable HTAP services

In large logistics and financial service companies, application scenarios such as online transactions and real-time business monitoring have higher data consistency and timeliness requirements, especially when the mixed load of reading and writing is large, it will affect the performance of the database management system. Stability poses a greater challenge. During the peak period of annual traffic, the write/update and analysis tasks of the data platform tend to surge several times. For example, during Double Eleven, a partner (a leading logistics company) processed more than 250 billion update and insert records every day, while also taking into account the analysis tasks of massive historical data (5 billion to 10 billion).

TiDB HTAP is committed to providing a one-stack data service platform for large-scale online transactions and real-time analysis applications of enterprises, improving the timeliness of key services and reducing the complexity of the data technology stack. On the basis of existing products, TiDB 5.3.0 further optimizes the performance and stability of HTAP, greatly improving the concurrent query capability and the execution speed of query tasks in high mixed load scenarios. The main improvements include:

performance has been greatly improved (50%-100%), CPU/memory resource utilization is further optimized, and query failures are reduced: TiDB 5.3.0 optimizes the columnar storage engine, adjusts the storage engine's underlying file structure and IO model, and optimizes The plan to access node copies and file blocks alleviates write amplification problems and improves general code efficiency. In general, failures caused by insufficient resources under high load are greatly alleviated.
remote data reading, improves task success rate, and enhances the readability of alarms: optimizes the MPP calculation engine, supports more strings/times and other functions/operators to be pushed down to the MPP engine, and improves the storage layer When the write/update transaction volume is large, the data waiting causes the internal process to time out. At the same time, the alarm information of the query request is optimized to facilitate tracking and locating the problem.
easily expand nodes: In TiDB 5.3.0, the TiDB HTAP architecture can be easily expanded to 200 nodes or even larger clusters with business growth, and ensure that there is no resource conflict and mutual performance impact between OLTP and OLAP in principle.
enhances operation and maintenance capabilities: improves data verification and solves possible problems in internal processing when the node restarts; at the same time, it further improves the SQL alarm information and enhances the log collection and retrieval functions.

Synchronize to TiDB with low latency to help enterprises continue to grow their business

As the business continues to grow, the pressure on the database of the enterprise order system is also increasing. The core transaction library has a huge write traffic, which causes a longer order submission time and affects the user experience of the website. In the face of this typical business scenario, in order to help companies shorten the order submission time, TiDB supports the provision of business query services as a downstream read-only slave library to reduce the pressure of the core transaction system.

TiDB Data Migration (DM), as a real-time data synchronization tool, supports synchronizing data from a database compatible with the MySQL protocol to TiDB, realizing business offloading, and reducing the pressure on front-end order writing during peak periods . The high immediacy of transaction scenarios requires extremely low business query latency and extremely high real-time data, which brings great challenges to the synchronization performance of DM.

In order to ensure low latency, the data migration tool DM has implemented two optimizations in v5.3.0:

Consolidate multiple changes of a single row of data, reduce the amount of SQL synchronized downstream, improve migration efficiency, reduce data delay, and provide website users with business query services faster;
Batch check updates are combined into a single statement operation to reduce the number of remote procedure call requests. The same number of binlogs can be synchronized faster, thereby reducing delays and providing website users with more accurate business query services.

The extremely low synchronization delay guarantees the real-time performance of downstream TiDB data query. Enterprises can quickly introduce TiDB to enhance real-time query and analysis capabilities without large-scale transformation while maintaining the existing architecture, and extract data value better and faster.

According to actual scenarios, under 300K QPS data synchronization traffic, the DM synchronization delay is reduced to less than 1 second within 99.9% of the time, which is especially suitable for scenarios where TiDB is used as a read-only slave under high-load business pressure.

Added parallel import function to improve the efficiency of full data migration

At present, MySQL sub-database and sub-table architecture is becoming more and more common, and the data volume of many enterprises has reached the level of 100 terabytes. With the growth of enterprise data volume, migration from centralized databases to distributed databases represented by TiDB has become inevitable. However, there is no convenient and efficient tool to migrate 100 TB of data in the stock system.

To solve this problem, TiDB 5.3.0 released [TiDB Lightning]( https://docs.pingcap.com/en/tidb/dev/tidb-lightning-distributed-import#tidb-lightning-%E5%88% 86%E5%B8%83%E5%BC%8F%E5%B9%B6%E8%A1%8C%E5%AF%BC%E5%85%A5
) The parallel import function provides efficient TiDB cluster initialization capabilities. Users can start multiple TiDB Lightning instances at the same time to migrate single-table or multi-table data to TiDB in speed of 161aec7319ef2c can be scaled horizontally, which greatly improves the efficiency of data migration .

The schematic diagram of parallel import is as follows. Users can use multiple TiDB Lightning instances to import MySQL sub-tables to downstream TiDB clusters.

The parallel import function supports a variety of data sources, including: CSV/SQL format text data, MySQL table export data. The maximum supported single table size is between 20 TB and 30 TB. It is recommended to use 1 to 8 TiDB Lightning instances to import single table data, and the optimal size of each instance is maintained at 2 TB to 5 TB. There is no upper limit for the total scale of multiple tables and the number of Lightning used.

After testing, using 10 sets of TiDB Lightning, 20 TB MySQL data can be imported into TiDB within 8 hours, and a single TiDB Lightning can support an import speed of 250 GB/s, and the overall efficiency is increased by 8 times.

Introduce a temporary table, a SQL statement to simplify business logic and improve performance

Temporary intermediate data storage is not easy

In scenarios with large amounts of data, user services often need to process huge intermediate data. If the business needs to use a subset of the data repeatedly, the user usually saves this part of the data temporarily and releases it after use. Therefore, DBAs have to frequently build and delete tables, and may also need to design their own data storage structure to store intermediate data in the business module. This not only increases the complexity of the business, but also causes a huge memory overhead, and if it is not well managed, there is also the risk of a memory leak leading to a system crash.

TiDB temporary tables help users simplify business logic and improve performance

To help users solve the above pain points, TiDB introduced the temporary table function in version 5.3.0. This function addresses the problem of temporary storage of business intermediate calculation results, freeing users from frequent table creation and deletion operations. Users can store business intermediate calculation data in temporary tables, and automatically clean up and recycle them after use. avoids overly complex business, reduces table management overhead, and improves performance .

TiDB temporary tables are mainly used in the following business scenarios:

Cache the intermediate temporary data of the business. After the calculation is completed, the data will be dumped to the regular table, and the temporary table will be automatically released.
Perform multiple DML operations on the same data in a short period of time. For example, in the e-commerce shopping cart application, add, modify, delete products and complete the settlement, and remove the shopping cart information.
Quickly import intermediate temporary data in batches to improve the performance of importing temporary data.
Update data in batches. Import data into temporary tables in the database in batches, and export them to files after modification.

One SQL statement easily creates a temporary table

You can create a temporary table with the CREATE [GLOBAL] TEMPORARY TABLE statement. The data in the temporary table is stored in the memory, and the user can limit the memory size of the temporary table through the tidb_tmp_table_max_size variable.

The temporary tables provided by TiDB are divided into Global and Local categories. No matter which temporary table is used, can effectively help users simplify business logic and improve performance :

Global temporary table:
Visible to all Sessions in the cluster, the table structure is persistent.
Provides transaction-level data isolation. Data is only valid within the transaction, and the data is automatically deleted after the transaction ends.
Local temporary table:
Only visible to the current Session, the table structure is not persistent.
Support for duplicate names, users do not need to design complex table naming rules for the business.

Provides session-level data isolation, reduces business design complexity, and deletes temporary tables after the session ends.

Concluding remarks

The 5.3.0 version released this time further improves the observability of the system, improves the scalability of distributed databases, ensures low-latency synchronization of data, greatly improves the efficiency of full data migration, and improves the stability of real-time analysis. is an important milestone for TiDB to become a mature enterprise-level HTAP platform.

Tang Liu, Chief Architect of PingCAP, said: TiDB HTAP's mission is not limited to upgrading traditional databases or improving transaction and analysis processing performance. In essence, TiDB HTAP is an open ecosystem that supports data services in enterprises. Consumerization and the role of building a unified real-time data service platform bring innovation and improvement in business and architecture to users.

Every release and progress of TiDB is inseparable from every user's feedback, every developer's PR merger, and every quality assurance staff's test. Thanked everyone, TiDB in subsequent versions will continue to strengthen stability and ease of use in a large-scale scenes, not forget the early heart, temper forward, form the basis of an addictive software, giving users more Good user experience .

Check TiDB 5.3.0 Release Notes , download and try it now, and start the TiDB 5.3.0 journey.

TiDB 5.3 release-crossing the observability gap, achieving a new leap in HTAP performance and stability

5.3.0 Feature highlights and user value

Support continuous performance analysis, leading the trend of database observability

Deeply optimize the distributed time stamp acquisition technology to provide a strong backing for the processing of massive business data

Continue to optimize storage and computing engines to provide more agile and reliable HTAP services

Synchronize to TiDB with low latency to help enterprises continue to grow their business

Added parallel import function to improve the efficiency of full data migration

Introduce a temporary table, a SQL statement to simplify business logic and improve performance

Temporary intermediate data storage is not easy

TiDB temporary tables help users simplify business logic and improve performance

One SQL statement easily creates a temporary table

Concluding remarks

PingCAP

引用和评论

从企业数智化四阶段解读 TiDB 场景价值

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

【TVM教程】为 ARM CPU 自动调度神经网络

DNS服务器地址大全

【赵渝强老师】在Docker中运行达梦数据库