"Breaking to the Skyline" - Irresponsible Comments on TiDB 2021 Hackathon Finals - 开源分布式关系型数据库 TiDB

About the author: Tang Liu, PingCAP VP of Engineering, invited judge of TiDB Hackathon 2021.

The TiDB 2021 Hackathon has finally come to an end. At first, I was worried about what else could come out of this year's Hackathon, but the results have far exceeded my expectations. Many projects can really be described as stunning, and everyone is laughing at themselves, saying, "It's too much to be rolled in." so good". As a judge, I participated in the preliminaries core group and the finals defense throughout the whole process. Responsibility Reviews.

There are 20 projects in the finals this time. I will give a unified introduction in several dimensions.

Performance/feature enhancements

This time on the TiDB kernel, a lot of performance improvement functions are really amazing, because I mainly commented on the kernel group in the preliminaries, so I will briefly summarize it here.

Incremental Analytic Table

This function accelerates the analysis of the entire table by means of Region Cache statistics. The larger the table, the more obvious the benefit will be. On the one hand, it accelerates the speed of analyze, and on the other hand, it can relieve a lot of IO and CPU overhead caused by analyze, and reduce the pressure on the system. However, this implementation has a premise that most of the business is still hot update or incremental writing. However, even if there is a large-scale update, because the statistical information is now directly placed in the region cache, the actual effect may be better than the current implementation during full analysis. In addition, we may also be able to perform the analyze operation through TiFlash later, which will be much faster.

TiExec

Although this project was not on the kernel track before, I think this thing is hardcore enough. This is also a problem we actually encountered in the previous PoC process. We found that adjusting the configuration of some kernels, or the adjustment of Go GC, the performance can be improved. This time it is optimized for the problem of iTBL-Cache-Miss. Then everyone maps the segment .text to hugepage, which reduces cache misses and improves performance. Originally, the judges were worried about whether the performance of hugepage would be reduced, but in fact, this is not a transparent hugepage that is opened globally, but only remaps the data of a certain interval to the hugepage alone.

TPC

This project is the most hardcore project I think, and I gave it a very high score myself, but unfortunately, it may be too hardcore, and many judges don't know what it's talking about (because not all judges know very well TiDB's internal mechanism), and finally only won the third prize. I personally think this project can actually hit the first prize.

The TPC project has done a lot of work. Although I can foresee the difficulty of the subsequent implementation, I am still looking forward to it. After all, this is also on our roadmap. We used io uring, but it seems that we have encountered a lot of pits. We can also choose AIO later, or a separate asynchronous thread mechanism. Because of the use of the new raft engine (this will be in TiDB 5.4 GA), parallel log write can also be easily done, making full use of the multi-queue IO feature, which is also critical on the cloud, because EBS single-threaded write IOPS Not really high. In addition, we will remove the WAL of KV RocksDB later, so that several thread pools can really be merged, only computing operations are performed, and IO operations are completely asynchronous.

Speed up add index with Lightning

This is also a project I like very much, and it is also a problem we have encountered many times in the actual PoC. Directly ingest SST through Lighting can greatly improve the speed of add index, and the actual effect shown in the final also improves the performance by an order of magnitude. This function is also on our roadmap, so I am looking forward to it later. In addition, in fact, as long as the conflict of operations such as operation update in the process of Lighting ingest SST is solved, then we can completely enable Lighting to support online import. In addition, on the cloud, in the future, we can sort them through EMR, and then write the data to S3 first, and then let TiKV pull it from S3, or directly use the data of S3.

MVCC Time Machine

This project is a life-saving function for operation and maintenance students at some point. TiDB uses the MVCC mechanism for data storage, that is, a piece of data may have multiple versions, so even if the user deletes this piece of data by mistake, we can still restore it on the old version. The current recovery process is to first query the old data with an old version number (here, timestamp), and then reinsert it back. This operation is relatively simple for a single piece of data to be restored, but if we restore a batch of data, the operation is very troublesome. This project solves the operation problem of operation and maintenance well through SQL. Even better, the project has introduced a multi-safepoint mechanism, that is, you can periodically do some global snapshots for the TiDB cluster to perform fast and lightweight logical backups. However, as the number of snapshots to be saved increases, the MVCC version of the data will also increase, which may affect the scan operation. Later, if we move the historical version to another place, this problem can be alleviated.

Let TiDB speak intelligently on the cloud

I also think this project is very good. I even think that not only on the cloud, but also on the intelligent operation and maintenance benefits of TiDB, on the OP, TiUP can also learn from it. When we upgrade, we can introduce more strategies, such as switching the leader of the replica only once, or judging whether the node can be upgraded according to the current hot spots and traffic of TiKV. These strategies can well reduce the impact of the upgrade process on user services.

TiDB hot and cold data tiered storage

This project won the first prize of this Hackathon, and then combined with another similar project of this Hackathon, it will lay a good foundation for the integration of TiDB and S3 later, at least this Hackathon has verified the feasibility. In fact, the principle is very simple. Put the cold data into S3, and then push the operator down to S3 as much as possible, and use the native Select function of S3 to speed up the query. Of course, if the data is already in S3, we can also use other services on the cloud, such as Athena, to do more query aggregation operations to speed up the query. This time, everyone is working on partitions. After all, partitioning partitions based on time slices is a very common operation. We are also doing some research on integration with S3 through LSM. I still look forward to all of these being available this year. See the results. For example, our TiDB Cloud Developer Tier cluster can fully use this mechanism to verify first.

Diagnostic efficacy

This year's Hackathon, I personally think the happiest person should be Kai. He is now responsible for improving TiDB's observability and diagnostic ease of use. Several projects this year can be implemented well when they are completed. In fact, some of them are already in our Inside the roadmap.

Automatic configuration (Matrix)

Matrix is a project developed by PingCAP students and students from Huazhong University of Science and Technology. I have to say that the current students are really good. Only through this project did I know that TiDB 5.3 already has more than 800 parameters, so we really need to come up with some development principles later, such as how to add parameters, configuration, session variables, etc., otherwise subsequent tuning will be more difficult. In fact, Hackathon has also tried a lot of automatic tuning before, but this time I think a big breakthrough is that the team visualized the influence of important parameters.

Automatic diagnosis (Collie)

The team name of the automatic diagnosis (Collie) team (we are such a good judge will not be angry) is quite interesting, in fact, you are not good at all, and you have also helped optimize the function of TiUP diag, which is already very powerful. Only through this project did I know that we have nearly 8,000 metrics indicators. To be honest, I think that if we continue to develop, people will no longer be able to debug...

log analysis (Naglfar)

Logs are also an urgent need for optimization in TiDB. We have typed some useless logs, which need to be cleaned up later. However, when diagnosing problems, we need to perform correlation analysis on the logs, and observe the trend of a certain log to find problems in advance. Naglfar is off to a good start here.

Slow SQL Diagnostics (TiVP)

When I finally saw the visualized execution plan, I was almost in tears. After all, it was really hard for us to diagnose slow SQL before, so the execution plan on a large screen is almost impossible to see, and if we want to compare the similarities and differences between the two execution plans, it will be even more broken. With visualization, at least the efficiency of analyzing where is slow will be greatly improved, and later we can directly integrate the functions of SQL advisor into TiVP, so that everyone can perform SQL bind, add/drop index operations directly online. After reading this project, I immediately asked my classmate wish, and he directly threw me a more beautiful Visual Plan map. It turned out to be in the roadmap, everyone will wait and see.

Ecological expansion

This year's ecology can be described as a hundred flowers blooming. I have seen too many different things. In fact, I have always liked ecology-related projects very much. If the Hackathon kernel-related projects are to increase the technical depth of TiDB, then I think the ecology is to expand the breadth of TiDB. For a database, "wide" means an increase in market share.

TiMatch - A distributed graph database with complete syntax

Last year TiGraph already amazed everyone, and this year TiMatch is even more exciting. This time, the ease of use is better, and it can be directly upgraded and used for old clusters. Because TiMatch only builds a set of graph indexes internally, it is updated uniformly with the data of the original relational table through the distributed transaction mechanism of TiDB. The syntax is based on the syntax of Oracle graph, so the relationship is complete, but I think the challenge lies in the performance. I hope that the next project can show you relevant data.

TiLaker: Putting wings into the lake for TiDB

Last year Hackathon actually had a lot of projects integrated with Flink, but I saw one in the finals this year. To be honest, I was a little disappointed. But this year, TiLaker is still quite complete. After all, with the participation of Flink Committer, everyone has implemented a CDC Connector for Flink, which allows Flink to directly read the incremental data of TiDB and synchronize it to the downstream. With the help of Flink's capabilities, TiDB can better connect with the downstream ecosystem, and I hope that there will be many application cases in the future.

pCloud

This is a very interesting project. Your company's CTO, Dongxu, went directly to the stage to bring the goods. Putting aside his great on-site appeal, from a practical point of view, pCloud has done a really good job. Dongxu just showed the effect of the product and talked about the business model, but I actually know that the underlying implementation of this project is still very challenging. However, this also gives another reference for the students participating in the next Hackathon. For a project, sometimes it is easier for everyone to pay attention to the technology itself, but if we are making a product or a SaaS service, the understanding of users and business is also very important. of. So even if you feel that you don't have much understanding of TiDB and can't write too hard-core programs, you can make breakthroughs from other directions.

Dujiang Weir

This project is also very interesting. Xiaoguang has implemented some plugins on Weir to extend the functions of TiDB. For example, he wrote a plugin for Redis, which provides cache support for TiDB, but I am now thinking, is it better for TiDB to do a better job of the plugin mechanism? Now our plugin uses Go's own plugin mechanism, which is not very good in terms of scalability and maintainability. If we use Hashicorp's go-plugin to interact through RPC, although performance is lost, it will Wouldn't it work better? This future can be discussed with Xiaoguang and the others.

TiClick

This is one of my favorite projects. I personally gave the highest score, not because of Sai's passionate speech, nor because of the cool web interface, but because I saw how TiDB can better attract developers one direction. In response to the problem of developers learning TiDB, I believe that there is a high probability that it is a SaaS service, and developers can directly learn about TiDB through the browser. This project made me see the feasibility of landing, and I also hope to land quickly. But I also know that the most important thing at the moment is to make TiDB Cloud support GitHub SSO login and OpenAPI to become more friendly to developers, so as to lay the foundation for the subsequent ecological expansion.

Summarize

Of course, most of the projects are only listed here, and some projects are not reviewed here in detail, such as TiTravel, oom.ai, TiMultiple, Bubbles, etc., which are all very good projects, and you can add them later when you have time. To find out and follow.

In general, this Hackathon was a judge for two days, and I was seriously burned out in terms of physical and mental strength, but I was still very excited because I saw the infinite possibilities of TiDB in the future. The slogan of this Hackathon is Explore the Sky. Sky is not far from us, for example, I am writing this article at high altitude and on a plane :-)
However, this Hackathon is still a little regrettable. I personally think that one of the essence of Hackathon is 24-hour high-intensity programming, but due to the epidemic, it cannot be achieved. I hope the epidemic will improve this year. Everyone brings sleeping bags, at PingCAP The experience of writing programs in the office is still very interesting.

"Breaking to the Skyline" - Irresponsible Comments on TiDB 2021 Hackathon Finals

PingCAP

引用和评论

从企业数智化四阶段解读 TiDB 场景价值

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

DNS服务器地址大全

实战分享：DolphinScheduler 中 Shell 任务环境变量最佳配置方式