Simple implementation of TiDB cold and hot data tiered storage | He3 team interview - 开源分布式关系型数据库 TiDB

Participating in Hackathon can get in touch with like-minded partners in various fields of kernel, tools and ecology, and learn very good ideas through their projects. Everyone's ideas are wonderful and full of innovation. In the usual research and development process, we rarely come into contact with them. Hackathon can help us open our minds and let us know that TiDB can still play like this. - He3 team

During the use of TiDB, as the amount of user data continues to grow, the storage cost will account for an increasing proportion of the total database cost. How to effectively reduce the cost of database storage is in front of many users.
Among the many solutions, one approach is to implement tiered storage of hot and cold data. In most scenarios, data can actually be divided into "cold data" and "hot data". The principle of data division can be based on time distance, hotspot/non-hotspot users, etc. Users typically only access data over a period of time, such as nearly a week or a month. If the data is not divided, it will inevitably lead to a certain degree of performance and cost loss.

In the just-concluded TiDB Hackathon 2021, the He3 team chose hot and cold data tiered storage to reduce the storage cost of TiDB. their design, they store hot data on TiKV, and store cold data with less query and analysis probability in cheap and general cloud storage S3. At the same time, the S3 storage engine supports the pushdown of some TiDB operators, realizing TiDB based on S3 cold storage. Data analysis query. The project won unanimous praise from the judges and won the first prize of this competition.

This project lays a good foundation for the integration of TiDB and S3 later, and the feasibility was verified in this Hackathon. Its principle is actually very simple. Put the cold data into S3, push the operator down to S3 as much as possible, and speed up the query through the native select function of S3. Of course, if the data is already in S3, you can also use other services on the Cloud, such as Athena, to do more query aggregation operations to speed up the query. This time everyone is making a fuss about partitions. After all, partitioning according to time slices is a very common operation. We are currently doing some research on integration with S3 through LSM, and I still look forward to seeing a lot of results this year. For example, a TiDB Cloud dev tier cluster can be fully verified by this mechanism.

—— Comments from judge Tang Liu

choose the direction of hot and cold data tiered storage?

The captain of the He3 team, Xue Gang, members Shi Pixian and Shen Zheng are all R&D engineers from the mobile cloud database team. Their usual job is to develop cloud database services. It is their responsibility to reduce the cost of users using databases on the cloud. the goal that has been pursued.
In the Hacking Camp in July last year, He3 implemented the Serverlessdb for HTAP project that provides Serverlessdb services based on TiDB. Users can press the usage charge when using TiDB, no longer like the traditional RDS requires monthly subscription package, greatly reduces the cost of user TiDB . The project also won the Hacking Camp Outstanding Graduate and Best Application Awards.
With the launch of the product on the mobile cloud, many users find that the storage cost is getting higher and higher as the amount of data increases after using it for a period of time. Xue Gang explained that on the public cloud, block storage charges are much higher than S3 object storage, and the data in some user scenarios is actually cold data, which can be stored on S3. So in December last year, they began to think about how to reduce the storage cost of TiDB. At this time, the TiDB Hackathon 2021 was launched. Xue Gang, Shi Pixian and Shen Zheng discussed it, and decided to make tiered storage of hot and cold data as this year's competition item. During the defense, they used a page of PPT to analyze the cost changes after using the project:

The direction of the project has been set, and it is time to sign up. Captain Xue Gang became interested in the element helium-3 while watching TV. After learning about it, he found that it can be used as a nuclear fusion fuel. It has more energy than existing nuclear fuel and has very little radioactivity. It is a kind of Clean, efficient and safe fuel for power generation. This feature is exactly what they expect from a distributed cloud database - security, high performance, ease of use, and low price, so He3 became the team name.
In the next less than a month, Xue Gang, as the captain, was responsible for the confirmation of the overall requirements, the architecture design, the verification of the scheme and the development of the specific framework. Other team members are mainly responsible for function development, Shi Pixian is responsible for operator pushdown and data type support, and Shen Zheng focuses on performance optimization and TPC-H testing.

What problem does this project solve?

The TiDB cold and hot data tiered storage project developed by He3 can realize the separation of cold and hot data in a minimal way:

for ordinary tables: implements insert into select to complete the separation of hot and cold data:

Support for creating S3 external tables;
Support to dump the data of TiKV internal table to S3 object storage through insert into s3_table select from tikv_table where ... ;
Support to dump the data of S3 external table to TiKV internal table through insert into tikv_table select from s3_table where ... .

for partitioned tables: table into an external S3 table, retaining the master-slave relationship between the primary table and the external S3 table.
It supports the operation of Alter partition table, and automatically dumps the data of TiKV internal partition table to the corresponding S3 external table, and automatically completes the following things:

Internal TiKV partition table data is dumped to S3 object storage;
Change the partition table metadata, convert TiKV internal partition table into S3 external table, the core point is to retain the partition relationship between the S3 external table and the main table;
Delete TiKV internal partition table data.

After the conversion, the S3 external partition table is completely transparent to the user. For the user, the S3 external table is a sharded table of the main table. For example, if the query result for the main table contains part of TiKV's internal shard table and some shard table data corresponding to S3's external table, the returned result will come from two parts: TiKV's internal shard table and S3's external table.
Ensure that users use S3 external tables and TiKV internal tables without any difference:

S3 external tables support all data types;
S3 external tables support all operators;
Optimize the performance of S3 external table operations within the user-acceptable range.

By supporting predicates (logical operations, comparison operations, numerical operations), aggregate functions, Limit and other operators are pushed down to S3 nodes, and the computing power of S3 is used to improve query performance.
In order to achieve all the desired effects, He3 developed and modified some TiDB modules in this Hackathon:

SQL Parser module, system table module

Add a new system table to save S3 metadata, each record corresponds to an S3 storage metadata: including S3 endpoint, access key, secret key, s3 bucket. insert into mysql.serverobject values("s3object","http://192.168.117.220:9000","minioadmin", "minioadmin","s3bucket");
Supports the creation of external tables. Compared with ordinary tables, s3option selection is added, which corresponds to S3 metadata objects. External tables correspond to the storage path of S3: Bucketname/DBName/TableName create table s3_table(id1 int8,id2 char(30)) s3options s3object;
Supports automatic conversion of sharded tables into S3 external tables

Alter table employees alter partition employees_01 s3options s3object

Actuator Module

It can distinguish whether the operation table is an S3 external table. If it is an external table, when writing, the data is saved to an S3 object with a granularity of 256M. When querying an S3 external table, the S3 object will be assembled in a streaming way. chunk to support upper-layer operator operations;
Support the push down of operators to S3 nodes, and use the computing power of S3 nodes to accelerate the performance of S3 external tables;
The S3 external table supports all data types. The data stored in S3 is stored in the chunk according to the data type corresponding to the schema of the S3 external table, and the relevant columns are encoded based on the data type;
Support Alter to automatically dump the data of the internal fragmented table to the external S3 table, while keeping the master-slave relationship between the main table and the external S3 table unchanged.

optimizer module

A small number of operators that cannot be pushed down S3, He3 modified the optimizer to prevent this part of the operator from being pushed down. The currently unsupported operators mainly include the TopN operator.

from performance testing

He3 initially set two goals: one is that data can be separated directly from hot and cold data in a relatively simple way; the other is that after the cold data is separated to S3, its query performance can be within a reasonable time range. So from the beginning, run through the TPC-H as the goal.
The project's hot and cold data separation function was quickly developed, but then they encountered one of the biggest problems - the performance was always not up to the standard. The initial design of the solution was to read all data to TiDB for centralized processing, but in the test, it was found that even with only 10GB of data, TPC-H could not run. Through discussion, research, and analysis, the three team members found that S3 actually has certain computing capabilities. Is it possible to push some calculations down to S3, so that S3 can carry some calculations like TiKV?
After changing the plan and pushing down several scenario operators, He3 found that the performance improvement is very obvious. In the subsequent development, all the operators that can be pushed down will be pushed down, and the entire performance optimization of the project will be improved by 20% every day. . Finally on race day, they ran the entire TPC-H test.

TPC-H test score of He3 in Hackathon

In this Hackathon, another team, Interstellar, also chose tiered storage, which also left an interesting picture for the He3 team members: when Interstellar started to defend, He3 thought it was himself casting the screen and was in a hurry. They searched everywhere for the button to turn off the screencast, and until the other party started to answer, they realized that it was the two teams who had clashed.

The mentality of competition

The He3 team members actually participated in the TiDB Hackathon last year, because they were new to TiDB and did not touch the kernel. At that time, I had an idea in my heart that I must do a hard enough project next time. This is also the goal that Xue Gang set for himself after graduation - is the database kernel , and he thinks this is a cool thing. So in this year's competition, He3 chose the most hard-core track - the core group.
Their work in the past year has been very helpful to them. Since the three of them have a lot of work in combination with TiDB, when they encounter problems, they will think about solutions. In this process, it is easy to generate various good ideas. For example, how to reduce the storage cost of TiDB this time, they came up with at least three solutions: the first is to make some changes to the underlying coding method of TiDB, so that the overall compression ratio of TiDB can be reduced by another 50% - 60% ; The second is also a hot and cold data separation scheme, integrating LSM-tree and S3; the third is the current hot and cold data storage tiering scheme. But the first two schemes were difficult to complete in such a short period of Hackathon, so they adopted the third scheme.
In the future, He3 will continue to evolve and iterate the project from three directions:

Through the new encoding method and acceleration algorithm, the storage capacity of data in S3 is reduced, and the storage capacity is reduced by 50% based on the effect achieved in this competition;
Continue to optimize the storage difference performance between TiDB and S3. In the later stage of this competition, this performance has been improved by 20% every day. He3 believes that there is actually a lot of room for improvement;
It further simplifies the separation of hot and cold data for users. For the final realization of this project, He3 actually has some regrets. At the beginning of the design, they thought that the separation of cold and hot data still needs DBA to do some operations. If this work can be further automated, it can make cold The applicability of hot data separation has reached a new level, but it has not been realized due to limited time.

In addition, in addition to the continuous improvement of the project itself, He3 also hopes to submit the code of the entire product to the community after iterating to a certain extent, and give back to the community in an open source way, so that everyone can create together.

Simple implementation of TiDB cold and hot data tiered storage | He3 team interview

PingCAP

引用和评论

从企业数智化四阶段解读 TiDB 场景价值

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

DNS服务器地址大全

实战分享：DolphinScheduler 中 Shell 任务环境变量最佳配置方式