TiDB 6.0 Release: Towards an Enterprise-Level Cloud Database

Overview We are pleased to bring you an introduction to the latest version of TiDB, 6.0. Although it is an open source database, TiDB has always been positioned as an enterprise-level and cloud-oriented database, and TiDB 6.0 is also developed around this theme. In the latest version, we have greatly enhanced manageability as an enterprise-grade product, while adding the infrastructure required for many cloud-native databases.

For enterprise and cloud databases, in addition to the usual dimensions of performance, availability, and functionality, an important dimension is manageability. In addition to providing the necessary "hard" capabilities to accomplish the user's technical and business goals, whether it is "easy to use" is an important consideration for users when making choices. The manageability dimension will also profoundly affect the implicit use of the database by users. cost. This is even more obvious for cloud databases. Providing databases as cloud services will magnify the importance of manageability: a database with better operability and Provide better service to users.

In this direction, TiDB 6.0 introduces the data placement framework (Placement Rules In SQL), adds the enterprise-level cluster management component TiDB Enterprise Manager, opens the preview of the intelligent diagnosis service PingCAP Clinic, greatly enhances the operability of ecological tools, and improves the Provide users with more means for hot issues. Together, these efforts will enable users to experience a smoother and more similar experience regardless of whether they are using public cloud services or private deployments, allowing TiDB to move forward in the dimension of mature enterprise-level cloud databases.

In addition, TiDB 6.0 has also made great progress compared to the previous version in this year, repairing 137 Issues, and incorporating 77 enhancements brought by the harsh real environment tempering. The open source TiFlash that the community has been expecting has also been realized, and community developers are welcome to participate.

Applicable to Chinese companies and developers going overseas to comprehensively strengthen manageability. Manageability is an important capability dimension of the database: on the premise of meeting business needs, whether it is flexible and easy to use will determine the hidden cost behind the user's choice of technology. This cost can be large or small, it can be a complaint, or it can be combined with catastrophic consequences. During the development process of the latest version, we combined customer and market feedback to summarize the current problems of manageability that still exist, including "daily management of complex and unintuitive clusters", "uncontrollable data storage" Location", "the data ecosystem kit is difficult to use", "lack of solutions in the face of hot spots", etc., and TiDB 6.0 has strengthened these issues from the aspects of the kernel, data ecosystem kit, and enhanced components.

Autonomous Data Scheduling Framework Let's start with the kernel part first.

TiDB 6.0 exposes the data scheduling framework (Placement Rule In SQL) to users in the form of SQL. In the past, TiDB's data scheduling has always been a black box for users. TiDB decides on its own which node a data block should exist on, regardless of the physical boundaries of the data center and the differences between machines of different specifications. This makes TiDB unable to provide flexible responses in scenarios such as multi-center, hot and cold data separation, and buffer isolation required for a large number of writes.

Let's look at two interesting scenarios first:

You have a business that spans multiple cities, with data centers in Beijing, Shanghai, and Guangzhou. You want to deploy TiDB in these three data centers in a cross-center manner, covering the user groups in North China, East China, and South China respectively, so that users in different regions can access data nearby. In previous versions, users could indeed deploy TiDB clusters across centers, but they could not store data belonging to different user groups in different data centers as expected, and could only allow TiDB to be evenly distributed according to hot spots and data volume The logic of dispersing data in different centers. In the case of high-frequency access, user access is likely to cross regions frequently and suffer high latency.

You want to use a set of dedicated import nodes to isolate the performance jitter caused by importing data, and then migrate the imported data back to the worker nodes; or you want to use a set of low-profile nodes to store cold data and accept low-frequency historical data access. For the time being, there is no special means to support such a use case.

The open data scheduling framework of TiDB 6.0 provides an interface for free placement of partition/table/library-level data between different label nodes. Users can make customized choices for the storage location of a table or data partition. In the new version, users can label a set of nodes and define placement constraints for this set of labels. For example, you define a placement strategy for all TiDB storage nodes located in the New York computer room:

CREATE PLACEMENT POLICY 'newyork' CONSTRAINTS = "[+region=nyc]";
Then apply this strategy to the table:

CREATE TABLE nyc_account (ID INT) PLACEMENT POLICY = 'newyork';
In this way, all NYC_ACCOUNT data will be stored in the New York computer room, and users' data access requests will naturally be routed to the local computer room.

Similarly, users can tag mechanical disk nodes for cold storage and infrequent access to save costs, and place old data partitions on low-cost nodes.

CREATE PLACEMENT POLICY storeonhdd CONSTRAINTS="[+disk=hdd]";
ALTER TABLE orders PARTITION p0 PLACEMENT POLICY = 'storeonhdd';
In addition, this feature can also be used in multi-tenant isolation scenarios. For example, in the same cluster, users can allocate data of different tenants to different nodes through placement rules, and the load of different tenants will also be automatically processed by the corresponding nodes. This enables TiDB to have the ability to isolate tenants, and with reasonable permission settings, there is still the possibility of mutual data access between tenants.

Although it is a large-scale feature introduction, in fact, the main part of this framework has been indirectly released to users in version 4.0 through TiFlash's row-column separation ability, and it has been iterated and polished for more than a year. So, while a major change, the framework already has a mature case. This release fully opens the Placement Rules capability to users in the form of SQL. In addition to solving the above problems, we also hope to use the unlimited imagination of the community to discover more valuable usages.

Dealing with hotspot scenarios Under the distributed data architecture, there is an annoying topic: how to deal with hotspots. In hotspot data access or lock conflict scenarios, distributed databases cannot take advantage of the performance advantages of multiple computing nodes, causing performance bottlenecks and affecting business stability and application experience. TiDB 6.0 adds a variety of solutions for this type of problem.

Small table cache Sometimes the user's business involves both a large table (such as orders) and several small tables (such as exchange rate tables). It is easy to cause performance bottlenecks. The small table cache function newly introduced in TiDB 6.0 supports explicit caching of small hotspot tables in memory to greatly improve access performance, improve throughput, and reduce access latency. , exchange rate table, etc.

Memory pessimistic locks greatly reduce resource overhead in pessimistic scenarios by caching pessimistic lock transactions, reducing CPU and IO overhead by about 20%, and improving performance by about 5%-10%.

In addition to the above-mentioned new functions, TiDB will also provide the automatic splitting capability of hotspot regions based on load in the future to improve the access throughput of hotspot data and solve the performance problems caused by unexpected hotspot accesses.

The improvement of the manageability of the data ecological suite is an important part of TiDB products, and the manageability of the data ecological suite is particularly important. Specific to the data migration scenario, when users migrate a large-scale MySQL Sharding system, they need to have a lot of migration tasks, migration rules, configuration and management work related to source and target tables. During the daily management of the data synchronization environment, management operations such as monitoring, diagnosing, creating, and deleting data synchronization tasks are often required. Command line operations are usually inefficient and error-prone when performing complex operation and maintenance operations or a large number of task operations. Therefore, in the new version, DM has launched a web-based graphical management tool to help users manage the entire data migration environment more conveniently. It includes the following functions:

Dashboard: Contains the main monitoring information and running status information of synchronization tasks in DM, helping users to quickly understand the overall running status of the task, as well as important information related to latency and performance.
The data migration task management function helps users monitor, create, delete, and configure replication tasks.
The data migration upstream management function helps users manage the upstream configuration in the data migration environment, including adding and deleting upstream configuration, monitoring the synchronization task status corresponding to the upstream configuration, and modifying the upstream configuration and other related management functions.
The migration task detailed information management function allows you to view the specific configuration and status information of synchronization tasks according to the filter conditions specified by the user, including upstream and downstream configuration information, upstream and downstream database names, table names, etc.
The cluster member information management function helps users view the configuration information of the current DM cluster and the status information of each worker.

New management platform and intelligent diagnostic suite
From the initial version of the TiEM management platform to the present, the daily operation and maintenance of TiDB is mainly based on command line control. Although TiDB has launched the TiUP tool since 4.0 to install, deploy and operate the TiDB cluster, which reduces the management complexity, but it is a command-line tool after all, and the learning cost is high, which is not satisfactory for a considerable number of enterprise users. . In addition, we often encounter multiple clusters where users manage multiple businesses at the same time, with different configurations and specifications. Any cluster changes and monitoring are a big challenge. An unintentional negligence logging in to the wrong management node and applying the wrong changes can lead to irreparable losses. We've had cases where the production cluster was shut down as a test cluster simply because the command line pagination was cut wrong. Many enterprise-level databases now have a graphical management interface, and TiDB 6.0 also introduces TiEM (TiDB Enterprise Manager).

In the current version, TiEM integrates many functions such as resource management, multi-cluster management, parameter group management, data import and export, and system monitoring. Users can manage multiple clusters on the same interface through TiEM, expand or shrink capacity, backup and restore data, change unified parameters, upgrade versions, and switch between active and standby clusters. TiEM also has built-in monitoring and log management, which makes cluster inspection easy and efficient, eliminating the need to constantly switch between multiple sets of tools and monitoring. Through the management platform of TiEM, in addition to the convenient unified interface for multi-cluster management, disasters caused by human negligence will be avoided to a large extent, and users can also be relieved from complicated and error-prone work.

PingCAP Clinic Auto-Diagnosis Service (Preview)
Like other database systems, TiDB itself has certain inherent complexity, and it is not very easy to diagnose and solve problems. In the cloud environment, service providers need to face a large number of users with different usage conditions, which poses new challenges for cluster problem location, diagnosis, and problem solving. In order to better and more efficiently deal with problem diagnosis, location and repair, TiDB must face it in a different way. Ultimately, we want the database to be intelligently self-tuning and self-healing, but that's a very ambitious goal.

Traditionally, we rely on engineers/DBAs with expert knowledge for analysis and diagnosis, but whether this knowledge can be provided through programs to facilitate our daily operation and maintenance management, and even whether this knowledge can be changed by continuous accumulation of different real cases. Smarter and more powerful? As a new step for TiDB towards self-service, we hope that the analysis of cluster operation, risk warning and problem location can be provided as intelligent services: at the same time as TiDB 6.0 was released, the new version also introduced the intelligent diagnosis service PingCAP Clinic preview version. PingCAP Clinic ensures the stable operation of the cluster from the entire life cycle, predicts and reduces the probability of problems, quickly locates and fixes problems, and accelerates the efficiency of problem solving. It integrates functions such as diagnostic data collection, intelligent diagnosis, intelligent inspection, and cloud diagnosis platform, and these functions will be gradually opened to users.

PingCAP Clinic obtains various fault information by accessing (with the user's permission) the information collection component, and continuously enhances its capabilities while troubleshooting various problems. PingCAP Clinic will benefit from all kinds of cluster operation data provided by thousands of users in different scenarios. We will continue to solidify the rules abstracted from the problem into intelligent diagnosis, and distribute them to TiDB users through online/offline upgrades, which enables users to continuously gain the accumulation of the entire TiDB community while using TiDB. It is foreseeable that when TiDB gets more cloud customers, it will be easier for PingCAP Clinic to continuously "learn" to improve itself. As a starting point for an ambitious goal, we welcome your attention and discussion. For more information about PingCAP Clinic, please read the official documentation, and pay attention to the follow-up progress release.

Observability for non-experts As an important part of manageability, observability is what TiDB has been continuously enhancing. In addition to the basic monitoring and indicators that other distributed systems have, since 4.0, TiDB has successively released distributed database-specific functions such as Key Visualizer, SQL statistics and slow query display, monitoring relationship graph, and continuous profiling. It is a good enhancement to TiDB's observability, which can help DBAs and engineers better understand the operation of their business on TiDB, so as to locate problems and perform system tuning more accurately. But these are more or less expert-oriented features, which require users to have a certain technical understanding of the system.

Starting from 6.0, we will introduce more non-expert-oriented observability features, so that users who do not know much about distributed databases and TiDB can also troubleshoot system problems. The launch of Top SQL is the first step in implementing the concept.

Top SQL is an integrated, self-service database performance observation and diagnosis function for operation and maintenance personnel and application developers. Different from various diagnostic functions for database experts in the existing TiDB Dashboard, Top SQL is completely for non-experts: you don't need to look at thousands of monitoring charts to find correlations, and you don't need to understand things such as Raft Snapshot, RocksDB, MVCC, TSO You only need to know common database concepts, such as indexes, lock conflicts, and execution plans, etc., and you can start using it to quickly analyze database load and improve application performance. Top SQL provides users with coverage solutions for different performance scenarios ranging from common to complex and rare, in the way of self-service use, judgment and analysis by users, together with the automated rules of PingCAP Clinic.

Top SQL does not require additional configuration. It works out of the box in TiDB 6.0 and is integrated into the TiDB Dashboard graphical interface without affecting the performance of existing applications in the database. The current version of Top SQL is the first to provide the CPU load of each node for 30 days. You can intuitively understand which SQL statements come from the high CPU load of each node, so as to quickly analyze scenarios such as database hot spots and sudden load increases. In future versions, we will continue to iteratively improve Top SQL, reorganize and integrate existing expert functions such as traffic visualization, slow query, and lock view into Top SQL, and help operation and maintenance personnel in the form of integrated, non-expert-oriented functions. And application developers can analyze database performance problems more simply and comprehensively.

More mature HTAP capabilities
TiDB 5.0 is the preliminary version of its analysis engine architecture. In this version, we introduced the MPP execution mode to serve a wider range of user scenarios. In the past year, TiDB HTAP has also undergone severe tests, whether it is the high-frequency data service of hundreds of thousands of TPS writing and merging dozens of real-time reports in the Double Eleven scenario, and the high-concurrency data service completed by the optimizer automatic routing under the transaction analysis hybrid. , these use cases have become the support for the continuous maturity of TiDB HTAP. Compared with TiDB 5.0, the analysis engine TiFlash in the latest version has:

More operator and function support: Compared with 5.0, the TiDB analysis engine has added more than 110 commonly used built-in functions and several table association operators. This will enable more computations to enjoy the order of magnitude performance improvement brought by the acceleration of the TiDB analysis engine.
Better threading model: In the MPP mode, TiDB used to be relatively uncontrolled on thread resources. The consequence of this implementation is that when the system needs to process short queries with high concurrency, the system cannot use up the CPU resources due to the overhead of creating and destroying too many threads, resulting in a lot of waste of resources. In addition, when performing complex calculations, the MPP engine will also occupy too many threads, which brings dual problems of performance and stability. In response to this problem, a new elastic thread pool has been introduced in the latest version, and the way that operators hold threads has been greatly refactored, which makes the resource occupation in TiDB MPP mode more reasonable and achieves the same level in short queries. The computing performance of computing resources is doubled, and the stability is better in high-pressure queries.
More efficient column storage engine: By adjusting the underlying file structure and IO model of the storage engine, the plan for accessing replicas and file blocks on different nodes is optimized, write amplification and general code efficiency are optimized. It has been verified by the customer's actual situation that the concurrency capability is improved by more than 50% to 100% under extremely high read-write mixed load, and the utilization rate of CPU/memory resources is greatly reduced under the same load.
In addition to the manageability of the enhanced disaster recovery capability, as a key component of data disaster recovery, TiCDC has also ushered in the enhancement of its core capabilities: by optimizing the entire process of processing incremental data, controlling the speed of pulling transaction logs, etc. TiCDC has made great progress in disaster recovery of large-scale cluster data.

TiCDC has optimized multiple processing processes such as extraction, sorting, loading, and delivery of incremental data, reducing the amount of CPU and memory required to process incremental data of each table, and reducing the number of communications between processes . This greatly improves the stability of TiCDC's data synchronization in large-scale clusters, and reduces resource consumption and data latency. Tests in real user scenarios show that TiCDC of version 6.0 can ensure that the scale of the upstream cluster reaches 100K tables, the number of rows changed by the cluster per second is less than 20 K/s, and the amount of data changes is less than 20 MB/s. 99.9% of data latency is less than 10 seconds, RTO < 5 minutes, RPO < 10 minutes. On the whole, in the scenario where the upstream cluster TiDB cluster node is undergoing planned upgrade or downtime, the delay can be controlled within 1 minute.

In addition, in order to reduce the performance impact on the upstream cluster during the data replication process and ensure that the business is not aware of the data replication process, TiCDC has added a current limiting function for the transaction log scanning of the main cluster. In the vast majority of cases, ensure that the impact of TiCDC on the average response time of QPS and SQL statements of the upstream cluster does not exceed 5%.

The anchoring for enterprise-level versions continues to mature with the rhythm of version release. With the release of TiDB 6.0, we also adjust the release model again to meet the stability requirements of enterprise-level users. Starting from version 6.0, on the basis of the version iteration in a 2-month cycle, the TiDB release strategy will introduce an LTS (Long Term Support) version with a half-year release cycle, while providing users with a long-term stable version that only contains fixes and fast Iterative version to take into account version requirements of different tendencies. Among them, the LTS version is aimed at users who do not need the latest functions, but want to continuously receive bug fixes. The life cycle is 2 years; the non-LTS version has a higher iteration speed and is only maintained for 2 months. Non-production occasions where stability is not required. TiDB 6.1 is planned to be released as the first LTS version.

Outlook Since ApsaraDB does not emphasize versions, we did not go into details about TiDB Cloud in the previous article. However, it can be seen that version 6.0 is not only another new version of TiDB's move towards an enterprise-level HTAP database, but also a new starting point for TiDB to move towards a cloud database. Topics such as manageability, data placement frameworks, and Clinic auto-diagnostics take into account the use of private deployments, but in fact they will all have greater potential in the cloud.

Cloud-native databases are an interesting topic. Our understanding of cloud databases is constantly improving with continuous exploration, from databases that can run on the cloud, to databases that are implemented with the help of cloud infrastructure, to databases that can be operated and maintained on the cloud. Version 6.0 is our practice. An important step in implementing this concept. Just imagine, combined with good manageability, when the cloud database service provides support for thousands of users, it can also collect far more non-sensitive cluster operation data than the current non-sensitive cluster operation data, and these data will be used as database self-operation and maintenance self-service On the premise of improving user experience, it also frees up more resources for the service back-end team, and concentrates on better providing the products required by users, which will bring the irreplaceable advantages of private deployment. .

In the subsequent version planning, we will try to provide users with an experience beyond the previous form by using technologies such as cloud storage services and on-demand start and stop of resources. With the power of open source, making users feel that cloud services are more worthwhile than using private deployments for free, and transforming it into our new thrust is a win-win goal for us and the entire community.

Check out TiDB 6.0.0 Release Notes, download and try it now, and start the journey of TiDB 6.0.0 enterprise-level cloud database.

TiDB 6.0 Release: Towards an Enterprise-Level Cloud Database

PingCAP

引用和评论

从企业数智化四阶段解读 TiDB 场景价值

2025年2月国产数据库大事记-墨天轮

TiDB 可观测性解读（二）丨算子执行信息性能诊断案例分享

【赵渝强老师】使用TiDB的审计日志

架构师必看！现代应用架构发展趋势与数据库选型建议丨TiDB vs MySQL 专题（一）

Dify 基于 TiDB 的数据架构重构实践

TiDB 观测性解读（一）丨索引观测：快速识别无用索引与低效索