TiDB has some functions that are different from other functions. Such functions can be used as the basis for building other functions and combine new features. Such functions are called: Meta Feature.
<p align="right">"Thinking about the Value of Basic Software Products" - Huang Dongxu</p>

For a distributed database, how data is distributed and stored in different nodes is always an interesting topic. Do you sometimes expect to have specific control over which nodes your data is stored on?

  • When you support multiple sets of services on the same TiDB cluster to reduce costs, but worry about business pressures interfering with each other after mixing storage
  • When you want to increase the number of copies of important data, improve the availability and data reliability of key businesses
  • When you want to put the leader of hot data on a high-performance TiKV instance to improve OLTP performance
  • When you want to separate hot and cold data (hot data is stored on high-performance media, and cold data is vice versa) to reduce storage costs
  • When you want to deploy in a multi-center, store data according to the actual geographical ownership and data center location to reduce long-distance access and transmission

You may already know that TiDB uses the Placement Driver component to control the scheduling of replicas. It has various scheduling policies based on hotspots, storage capacity, etc. However, these logics used to be almost uncontrollable for users, and you could not control how the data was placed. And this control capability is what TiDB 6.0's Placement Rules in SQL data placement framework hopes to give users.

TiDB 6.0 officially provides a data placement framework (Placement Rules in SQL) based on the SQL interface. It supports flexible scheduling and management capabilities in dimensions such as the number of copies, role types, and placement locations for any data. This enables TiDB to provide more flexible data management capabilities in scenarios such as multi-service shared clusters and cross-AZ deployments to meet diverse needs. business requirements.

Let's look at a few specific examples.

Deploy across regions to reduce latency

Imagine that you are a service provider with businesses all over the world. The early architecture is a centralized design. With the cross-regional development of the business, the business is split and deployed globally. The central data access latency is high, and the cross-regional traffic cost is high. As your business evolves, you start to push data across geographies to stay close to your local business. Your data schema comes in two forms, locally managed regional data and globally accessible global configuration data. Local data is updated frequently and the amount of data is large, but there is almost no cross-regional access. The global configuration data has a small amount of data and a low update frequency, but is globally unique and needs to support access from any region. The traditional single-machine database or single-region deployment database cannot meet the above business requirements.

The following figure is an example. Users deploy TiDB in three data centers in a cross-center manner, covering the user groups in North China, East China, and South China respectively, so that users in different regions can access local data nearby. In previous versions, users could indeed deploy TiDB clusters in a cross-center manner, but they could not store data belonging to different user groups in different data centers, and could only disperse the data in the logic of even distribution of hot spots and data volume. different centers. In the case of high-frequency access, user access is likely to experience high latency across regions frequently.

1.jpg

Through the Placement Rules In SQL capability, you can set a placement strategy to assign all copies of regional data to a specific computer room in a specific area. All data storage and management are completed in the local area, which reduces the data replication delay across regions and reduces traffic costs. All you need to do is to label nodes in different data centers and create corresponding placement rules:

 CREATE PLACEMENT POLICY 'east_cn' CONSTRAINTS = "[+region=east_cn]";
CREATE PLACEMENT POLICY 'north_cn' CONSTRAINTS = "[+region=north_cn]";

And control the placement of data through SQL statements, here are different city partitions as an example:

 ALTER TABLE orders PARTITION p_hangzhou PLACEMENT POLICY = 'east_cn';
ALTER TABLE orders PARTITION p_beijing PLACEMENT POLICY = 'north_cn';

In this way, copies of order data belonging to different cities will be "fixed" in the corresponding data center.

business isolation

Suppose you are in charge of the data platform of a large Internet enterprise. There are more than 2,000 internal businesses, and one or more sets of MySQL are used to manage related businesses. However, due to the large number of businesses, the number of MySQL instances is close to 1,000. Upgrades, security protection and other tasks have put enormous pressure on the operation and maintenance team, and with the growing business scale, the operation and maintenance cost has increased year by year. You want to reduce operation and maintenance management costs by reducing the number of database instances, but data isolation between services, access security, flexibility of data scheduling, and management costs have become serious challenges for you.

With TiDB 6.0, through the configuration of data placement rules, you can easily and flexibly share the cluster rules. For example, business A and business B share resources to reduce storage and management costs, while business C and D have exclusive resources to provide the highest isolation. Since multiple businesses share a set of TiDB clusters, the frequency of daily operation and maintenance management such as upgrades, patching, backup plans, and capacity expansion can be greatly reduced, reducing management burdens and improving efficiency.

2.jpg

 CREATE PLACEMENT POLICY 'shared_nodes' CONSTRAINTS = "[+region=shared_nodes]";
CREATE PLACEMENT POLICY 'business_c' CONSTRAINTS = "[+region=business_c]";
CREATE PLACEMENT POLICY 'business_d' CONSTRAINTS = "[+region=business_d]";

ALTER DATABASE a POLICY=shared_nodes;
ALTER DATABASE b POLICY=shared_nodes;
ALTER DATABASE c POLICY=business_c;
ALTER DATABASE d POLICY=business_d;

Based on the data placement rules based on the SQL interface, you only use a few TiDB clusters to manage a large number of MySQL instances, place the data of different businesses in different DBs, and schedule the data under different DBs to different hardware nodes through placement rule management to achieve Physical resource isolation of data between services to avoid mutual interference caused by resource contention and hardware failures. Avoid cross-business data access through account authority management, and improve data quality and data security. Under this deployment method, the number of clusters is greatly reduced, and the original daily operation and maintenance work such as upgrades, monitoring and alarm settings will be greatly reduced, and the resource isolation and cost performance will be balanced, and the daily DBA operation and maintenance management costs will be greatly reduced.

Master-slave multi-room + low-latency reading

Now you are an internet architect looking to build a local multi-datacenter architecture with TiDB. Through data placement rule management, you can schedule follower replicas to the standby center to achieve high availability in the same city.

 CREATE PLACEMENT POLICY eastnwest PRIMARY_REGION="us-east-1" REGIONS="us-east-1,us-east-2,us-west-1" SCHEDULE="MAJORITY_IN_PRIMARY" FOLLOWERS=4;
CREATE TABLE orders (order_id BIGINT PRIMARY KEY, cust_id BIGINT, prod_id BIGINT) PLACEMENT POLICY=eastnwest;

At the same time, you allow historical queries with low consistency and freshness to be read in a timestamp-based way ( Stale Read ), which avoids the access delay caused by cross-center data synchronization, and also improves the hardware requirements of the slave room. utilization.

 SELECT * FROM orders WHERE order_id = 14325 AS OF TIMESTAMP '2022-03-01 16:45:26';

Summarize

The Placement Rules In SQL of TiDB 6.0 is an interesting feature: it exposes the internal scheduling capabilities that users could not control in the past, and provides a convenient SQL interface. You can use it to freely place data at different levels of partitions/tables/libraries based on labels, which opens up many scenarios that were not possible before. In addition to the above possibilities, we also look forward to exploring more interesting applications with you.

Check out TiDB 6.0.0 Release Notes , download and try it now, and start the journey of TiDB 6.0.0 enterprise-level cloud database.

PingCAP
1.9k 声望4.9k 粉丝

PingCAP 是国内开源的新型分布式数据库公司,秉承开源是基础软件的未来这一理念,PingCAP 持续扩大社区影响力,致力于前沿技术领域的创新实现。其研发的分布式关系型数据库 TiDB 项目,具备「分布式强一致性事务...