1

In the just-concluded TiDB Hackathon 2021, Team Matrix's work Tenseigan is a test framework for tuning a distributed database for TiDB. The tool can provide functions such as automatic parameter tuning, parameter influence evaluation, etc. It integrates a variety of workload. Due to the innovation and scalability of the project, it won the "Best Campus Award" and "Best Market Potential Award Specially Sponsored by Mingshi Capital".

Shuning accepted the award on behalf of the Matrix team
This article will reveal the wonderful stories of the team before and behind the scenes through the dialogue between the Matrix team and Mingshi Capital Executive Director Xu Zhihao, and also hope to bring some inspiration to developers to develop their own applications and projects based on TiDB in the future.

The origin of the Matrix team name

Tong Jian: If you think of TiDB as a matrix of The Matrix, our project Tenseigan is like Neo, the hero of The Matrix, with strong learning ability, self-evolving, and eventually becoming a savior exist. Hence the name Matrix.

Inspiration for the project

Ding Chen: is inspired by a paper at the 2017 SIGMOD summit, which is the OtterTune framework. Our original intention of doing this project is to see the performance of OtterTune on TiDB, whether it can be implemented in the production environment to solve practical problems, and what problems still exist in the OtterTune framework itself. For distributed scenarios, will there be any new problems? research points generated.
Tenseigan: an automated tuning test framework for TiDB

Project value in the eyes of

The brilliant creativity of the Matrix team also left a deep impression on Xu Zhihao, executive director of Mingshi Capital. Before joining the investment industry, Xu Zhihao also had nearly 10 years of R&D work experience and was very interested in such Infra projects. Capital Executive Director Xu Zhihao at the Hackathon Awards
Xu Zhihao : This team was selected to win the Best Market Potential Award, mainly because this project has strong scalability, and on the other hand, it is more innovative. In the past, the optimization of the database mostly stayed in the kernel and optimizer, executor or architecture, but from the user's point of view, a good performance improvement can be obtained through some configuration and parameter adjustments. Everyone focuses on the optimization of the kernel, and not many people pay attention to the parameter tuning part, so this direction is still relatively innovative.
In the selection of technology stacks, it is also relatively innovative to choose to use AI instead of humans to select the optimal parameters. Recently, Google is also using AI itself to optimize the AI training framework, which is essentially a parameter adjustment and a research direction with high ROI. If it can be used on TiDB as a productized plug-in, the performance of TiDB will be even more powerful. In this way, the AI after being trained is like an operation and maintenance master, and can give better parameter suggestions more efficiently, which is of practical significance for machines to replace manual labor in the future.

Special Experience as a Student Participating

The Matrix team can be said to be the representative team of Huazhong University of Science and Technology. The captain Ding Chen is a second doctoral student at Huazhong University of Science and Technology. His research direction is key-value storage systems based on new hardware; A graduate student; Tong Jian and Chen Shuning both graduated from Huazhong University of Science and Technology, and are currently working as R&D engineers in PingCAP.
Ding Chen: participated in the competition as a student. The biggest impression is that Hackathon is a high-intensity and immersive project experience, which is rare in schools. Doing things in school is usually slow, and there is no Deadline. For example, a small project needs to be written within two days, which is a big challenge to the students' ability. In addition, it is very helpful for us to do research by taking this opportunity to get in touch with practical problems in the industry. Because most of our research is based on logic, if we understand the actual problems in the industry, we can try to solve the actual problems encountered.
Yiqin: This is my first Hackathon. As a student or a first-time player, the Hackathon is unprecedented and attractive to me. Because in just two days, I threw myself into the project wholeheartedly, which brought me great personal improvement.

Teamwork to overcome technical difficulties

Ding Chen : In the early stage, Yi Qin and I were mainly responsible for cluster construction and framework deployment, as well as some TiDB adaptation work. In the later stage, Shu Ning and Tong Jian mainly constructed some workloads in the production environment, and then made the effect of parameter adjustment stand out. We constructed a case of TiDB, PD, and TiKV respectively, and then I went to write the PPT and Demo of the defense. Shuning and Tong Jian at Hackathon Guangzhou

Q

What major technical difficulties did you encounter during the competition? How was it resolved? How close is it to real production availability?

Ding Chen: the beginning of 161f35532ecbc4, when we tested the performance of a typical TPCC load, we found that OtterTune was not very effective for parameter tuning. Because TiDB's default configuration parameters have been optimized for this standard test. Later, we tried to construct some cases with bottlenecks encountered in the real production environment. Only under the case with the bottleneck, the effect of parameter adjustment can be better reflected. This is a technical difficulty we encountered, that is, how to construct a real environment and show the effect of parameter adjustment is a problem we encountered.
In terms of the ease of use of the framework, the framework of OtterTune is actually not very easy to use. We have also made many scripts in this project, just like running a script set. If you want to make it easier to use, you can make some improvements in the framework. For example, you can put scripts on web pages and run them with one click, making operation, maintenance and R&D more convenient to use. In the future, if OtterTune can be combined with TiDB Cloud, it can provide a better environment for operation and maintenance on the cloud, and hide the complexity, it should be a more imaginative scenario, which is what our project hopes to expand in the future The place.

cost issue & regret not completed due to time

Xu Zhihao: I am still curious, have you ever calculated the cost of building such a training cluster? If I really want to make it into a production-ready product, I understand that a lot of data preparation is still needed. In this case, how to ensure that the training cost is still controllable?
Ding Chen: The is a key issue, and we didn't think much about it when we were doing it. The data we run is in the scale of hundreds of points. If we want to use it in the production environment, we may need a special parameter tuning cluster to train the AI model and collect various user data into this cluster. Inside, let the AI model learn the workloads of different businesses and the experience under different hardware configurations. This experience can be reused, and it can also improve the efficiency of data usage. When the training is enough, the cost can be reduced.

Q

The time for this Hackathon is limited. Do you have any regrets during the competition?

Ding Chen : At present, our project can only give one optimal parameter. But there is no way to explain why this parameter is optimal, that is to say, our AI still lacks interpretability. We also want to make a visual module to intuitively show the relationship between performance and optimal configuration, so as to help operation, maintenance and R&D. To analyze the reasons behind the parameters. But due to time reasons, the visualization has not been done very intuitively here, just a list. In addition, the test results of some cases are not as expected, which is more related to Workload. How to integrate more Workloads in the framework in the future, so that the parameter adjustment experience between different Workloads can be reused is a problem that needs to be considered .
Yiqin: Due to time constraints, our test coverage is still not enough, including the number of tests and test parameters are not too perfect.
Tong Jian: In fact, some experienced DBAs have their own experience in how to adjust parameters. But they can usually only give a directional suggestion. What we want to do is actually to concretize this relationship, to be able to know the changing relationship of the parameters, so as to quickly locate the position of the optimal parameters. This can speed up the convergence process.
Shuning: I have two main regrets here. One is that due to the epidemic, the two teammates were unable to come to the scene to Coding together. In addition, because there are more participants this time, the preliminaries and the finals are separated, and there is no way to retain the complete 24-hour continuous programming time. In fact, the 24-hour uninterrupted coding experience is very attractive.

judges in the eyes of TiDB Hackathon

Xu Zhihao: This is my first time as a judge on Hackathon. TiDB Hackathon gives me a very hard-core feeling. I participated in the Hackathon organized by the company when I was a programmer before, but I don't feel as hard-core as this time. This time, there are AI projects, and there are also projects such as graph databases directly on TiKV. The degree of hard core is completely beyond my imagination. Capital Executive Director Xu Zhihao participated in the review at the Hackathon
This Hackathon is also very valuable for TiDB. Many projects are very close to entering the TiDB product, and the practicality is also very good. And in this process, everyone opened up a lot of brain holes, and produced a lot of valuable ideas for product iteration and product surrounding. Therefore, Hackathon can also be regarded as an annual explosion of community wisdom, in which the creativity and potential of community partners can be felt very practically.
From the perspective of the organization of the event, I was deeply impressed by the publicity and momentum in the early stage of the event, as well as the short video of "finding teammates" that a community fan did very coolly at that time. In addition, in the finals, some projects that did not enter the top 20, but have outstanding performance in a single direction also have the opportunity to participate in the competition for independent awards and showcase their projects, which is more intimate. The only test is the physical strength of our old judges, and the entire defense is still relatively long.

Looking ahead and looking forward to

Q

Xu, as an investor, do you have any suggestions for commercialization of the project?

Xu Zhihao: I have quite a few suggestions. This project still has a long way to go before it can be commercialized. First, we can make the product more stable, it can't perform well in some workloads, and it has no effect in other workloads. Because this parameter adjustment is for databases or other Infras, it is a very core part of the enterprise, and it is difficult to commercialize it if the performance is unstable.
The second is the cost issue. It is necessary to combine AI algorithms and human experience. For example, some irrelevant factors should be eliminated first through human experience. This will allow AI to approach better results faster and save costs. .
Third, it is recommended to penetrate and penetrate the scene of a certain software first. In this process, it is necessary to maintain close cooperation with the original software manufacturers such as PingCAP, such as obtaining more data in the real environment. In addition, considering the business model, don’t make a one-shot deal when adjusting parameters. Try to make a continuous and dynamic parameter adjustment suggestion, which will make our business model more imaginative.

Q

Your project won the Best Market Potential Award and Campus Team Award specially sponsored by Mingshi Capital this time. Do you want to continue this project in the future?

Ding Chen: . On the one hand, it is adapted to the cloud, and on the other hand, we hope to integrate more functions, such as visual display and AI causal reasoning. There is also the dynamic load tuning mentioned earlier by Mr. Xu, taking into account the relationship between revenue and cost. Then there is the optimization framework, which also requires the data in the actual production environment of the community.

Q

What improvements does 161f35532ece92 look forward to in next year's TiDB Hackathon?

Ding Chen, Xiong Yiqin: can consider attracting more students from colleges and universities to participate and strengthen the publicity in the campus circle.
Tong Jian: First of all, students are not so familiar with TiDB and engineering implementation. Therefore, I hope that an independent track can be set up for the campus group and independent awards can be made. In addition, for the players who were eliminated in the preliminary round, the participation in the final is not high, and some more display and interaction links can be added, so that the preliminary contestants can continue to pay attention to the final.
Xu Zhihao: hopes to give the judges some brief introductions to the project before the competition to help the judges better understand the project. In addition, the intensity of the on-site review is quite high, the brain information has been overloaded, and I am a little tired of aesthetics. I hope to have more rest in the middle. I also hope to see the status and implementation of the previous year's Hackathon project before the competition, which is an encouragement to the players and judges.


PingCAP
1.9k 声望4.9k 粉丝

PingCAP 是国内开源的新型分布式数据库公司,秉承开源是基础软件的未来这一理念,PingCAP 持续扩大社区影响力,致力于前沿技术领域的创新实现。其研发的分布式关系型数据库 TiDB 项目,具备「分布式强一致性事务...