TiFlash is finally open source , and no longer closed source: we chose to open source and make a commitment for this special holiday, hoping to be more sincere :)

PingCAP has always believed in open source. This belief is rooted in the founder's feelings, and has deeply influenced everyone gathered under this banner. Open source itself is not a marketing strategy: long before we see what open source can bring, we have the persistence of open source, which also makes our open source spirit relatively pure. As an important force in the TiDB community, we contribute code together with other community contributors. Through the use and polishing of community users, we can share the dividends of TiDB's continuous progress without any artificial barriers and boundaries. The concept of freedom and win-win is also our stubbornness towards technology and open source.

TiDB is an HTAP distributed database, and an important part of providing HTAP capabilities is TiFlash. Starting from the prototype development, because it is an exploratory project, we did not think about the final form and structure at the beginning (TiFlash did undergo a complete renovation and redo in 2018). In order not to cause too much discussion from the outside world, we chose In order to make TiFlash temporarily closed source, and then open source when it is generally formed. Although we planned to open source from the beginning, when we actually started to do it, we found that it had owed a lot of "open source debt", which seemed to be incompatible with the backbone of TiDB: the release process was not fully integrated, and the documents required for open source were missing. , the compilation experience is quite bad, and even some code styles are ugly. Paying off these debts requires a considerable investment of time and manpower. At the same time, the pressure of product iteration and limited resources have become the biggest obstacle to the open source of TiFlash. As a team that regards open source as a belief, the closed source of TiFlash is actually in the throat for us. Therefore, although there are still many shortcomings, we choose to invest time in gradually repaying the debt. After all, as far as this engine is concerned, we also take something from the community, whether it is the excellent Runtime foundation of ClickHouse, or the real scenarios and improvement suggestions provided by many community users, all of which benefit TiFlash. With these help, there will be no TiFlash.

TiFlash first benefits from other projects in the open source community. Apart from the various basic libraries we use, the most important part is: the framework code of TiFlash is based on ClickHouse. The way we use ClickHouse is to use it as a stand-alone Compute Runtime and Server framework, and reuse the Storage Interface. For the big goal of online transaction data analysis, we have added transaction-related logic, MPP capability, and a column-based storage engine that can be updated in real time, and introduced the Raft protocol to be compatible with MySQL to integrate into the entire TiDB system. Thank you ClickHouse for providing the community with a high-performance computing engine. For us, a good foundation has greatly accelerated the development of TiFlash. It is worth mentioning that TiFlash and ClickHouse have completely different good scenarios: TiFlash is completely focused on the analysis of transactional data, and we do not want users to think that TiFlash is a better ClickHouse.

As a young engine, in more than two years, we have been tolerated and helped by angel users, and we have also iteratively improved our products at a high speed by listening to the voices of users at close range. All of this in turn helps users simplify the structure of the analysis link and enjoy the dividends of real-time data. At that time, although TiFlash was not open source, it has been blessed by the TiDB community. We clearly remember that in the early version, our friends in the tax system tested with us, trying to optimize the real-time nature of the tax payment process. Although it was not successfully implemented, it gave us a first-hand understanding of the pros and cons of the system design in real scenarios. Somatosensory, which directly led to a major refactoring: we completely redesigned the storage layer, introduced Raft Learner as the replication protocol, and decided to use TiDB-Server as the unified entry instead of TiSpark, which is also the architecture that everyone sees today; With the official version released together with TiDB 4.0, it can be said that Xiaohongshu completely relied on Xiaohongshu to provide a scene to explore with us a few months before the release, and this attempt finally successfully landed in the real-time e-commerce kanban scene, allowing transaction data to be displayed in real time ; By the time of 5.0, from the 618, which was a near miss but sometimes had glitches, to the Double Eleven, where the traffic doubled but was stable and stable, the performance of TiFlash in high-voltage real-time scenarios has been improved and verified. This represents not only the implementation of the core business of the world's leading express delivery companies, but also the real-time monitoring of tens of thousands of TPS and billions of orders, but also the tolerance, trust and mutual assistance of users. So far, we have often received private messages from customers, saying that we have gained real business benefits in a certain scenario due to the capabilities of TiFlash. We share the joy within the team every time we see these lovely messages. Helping everyone and providing unique value is what we pursue; at the same time, the support of the community is also an indispensable sustenance for product development, promoting it to be implemented in more stringent and valuable scenarios. It can be said that without the power of the community, we cannot provide better products to more people. Today, open source is a brand new channel, allowing TiFlash to help and win-win with the community in a more in-depth way.

Finally, let us say sorry. Although open source, TiFlash still lacks the necessary community-oriented code interpretation. For the architecture analysis of TiFlash and TiDB HTAP, you can refer to our paper "TiDB: A Raft-based HTAP Database" published in VLDB 2020, or "In-depth Interpretation of TiDB HTAP" , "How is TiDB's columnar storage engine implemented? "article. (The community classmates also compiled a list of TiFlash related articles. Unfortunately, these are all based on TiDB 4.0 version. At that time, TiFlash still did not have important MPP capabilities. In addition, source code analysis and reading guides are still available. The missing state. These will be added in the next few months to help everyone read and understand TiFlash. I hope that everyone who pays attention to TiFlash can use open source to have more means to help it become better, whether it is More functions, operator support, more intuitive tracing capabilities, or a faster and more stable Delta Tree column storage engine.

Yes, we look forward to your contribution, as this force will take the wings of the community to fly over awe-inspiring wonders.

Happy holidays everyone.


PingCAP
1.9k 声望4.9k 粉丝

PingCAP 是国内开源的新型分布式数据库公司,秉承开源是基础软件的未来这一理念,PingCAP 持续扩大社区影响力,致力于前沿技术领域的创新实现。其研发的分布式关系型数据库 TiDB 项目,具备「分布式强一致性事务...