On December 25, 2021, the 2021 China Big Data Technology Conference and CCF Big Data and Computational Intelligence Competition Summit Forum was held in the Institute of Computing Technology, Chinese Academy of Sciences. Fan Ruohan, Senior Vice President of PingCAP, was invited to give a speech on the theme of "Open Source Going to the World" at the main venue. Combined with the practice of PingCAP, he shared the relationship between "open source" and "globalization" from the perspectives of collaboration and technology evolution. interrelated, inseparable relationship. The content of this article is organized according to the content of the speech and is divided into two parts. The theme of this article is: open source to build a global stage.
There is a very strong connection between "open source" and "globalization", and the two are not isolated. In 2015, PingCAP used open source, made open source, invested in open source at the first time of its establishment, and made a layout for globalization.
In fact, PingCAP is done on GitHub from the very first line of code commits, and we take this territory very seriously. Based on TiDB Cloud, the community's small partners have implemented an online tool for public data archiving: GitHub Archive for recording public GitHub activities. By analyzing and comparing the original log data (over 4 billion), and selecting typical domestic enterprises [1], we can see that in the past 10 years, China's open source has developed like a torrent, and the number of active warehouses [2] has increased significantly. In these thousands of active warehouses, more and more open source projects originating from China are recognized and praised by domestic and foreign counterparts, showing a state of blooming. The cognition and determination of the founder of PingCAP come from the nourishment of this torrent, and he has also made a small contribution to this curve in the past six years. The GitHub analysis tool we made will soon be available online for everyone to analyze quickly and flexibly according to their needs.
It is gratifying that the track of basic software has really made great progress on the stage of open source: the "14th Five-Year" software and information technology service industry development plan just released at the end of November clarifies that "open source reshapes the development of new software. Ecology”, open source projects in the fields of databases, artificial intelligence, and operating systems are the most active.
China is currently in the stage of rapid development of open source. The development of open source software starts from the bottom operating system, has developed into databases and middleware, and gradually extends to the application field. In recent years, it has begun to lead the in-depth innovation in the field of information technology: Data, cloud computing, artificial intelligence, blockchain, cloud native, etc., have accelerated the independent development of national core technologies. The "2021 China Open Source Development Blue Book" provides incomplete statistics on active open source communities in China. We can see that open source projects in the fields of databases, artificial intelligence, and operating systems are the most active. are also closely related.
A series of policy guidance and trends in the global open source market have greatly enhanced the recognition of open source companies in the capital market. When we studied the domestic ToB technology industry in the past three years, we found that many companies that have obtained tens of millions or even billions of investment and financing have a common label behind the full range of technology, that is, open source. On the other hand, there is also a high degree of matching in track selection: operating system, database and artificial intelligence.
In the past, the huge power of open source in commercialization was seriously underestimated, and now everyone is still groping for the high-leverage model of open source. Open source is the door that really helps PingCAP go global. Today, I will share some explorations and experiences of PingCAP with you.
Solving the problem of trust is the premise of all work
PingCAP is not only open source and open to the world, but the documentation is also the same. The earliest TiDB has no Chinese documentation, all comments and even every commit record must be in English, and all design documents must be placed on GitHub. Early users have questions and sometimes contact us directly by WeChat or email, but we also recommend that they put their problems on GitHub. There is even a special person who will help the early issuers to translate the issue from Chinese to English. The reason for this is that we have established a concept from the inception and early stages of the TiDB project, and we want to show the why and how of the project to participants in the global community.
The handsome guy you see is a TiDB community contributor from the Dominican Republic. Open source naturally solves the problem of mutual trust, which is also the unique charm of open source, giving everyone in the global community the opportunity to know you, recognize you, and identify with you.
On the basis of trust, open source products can be efficiently disseminated on the global stage
The related technical documents of TiDB have been spontaneously translated into Russian, Ukrainian, Japanese, Spanish and Portuguese by technical enthusiasts, which has greatly promoted the globalization of the product. NewSQL related courses compiled by PingCAP have also been selected for database courses in Wisconsin, Purdue, and Carnegie Mellon University.
The open source ecology greatly speeds up the iteration speed
This picture shows the annual code changes of TiDB. Red is the code of TiDB in 2015, and blue is from 2016. As you can see, TiDB will update more than 40% of the code every year, and 40% of these codes are contributed by external contributors, and the product can be quickly iterated. Guarantee its continued leadership. A number of institutions at home and abroad have jointly participated in the development of TiDB, including domestic Xiaomi, Meituan, Zhihu, Yidianxin, etc., as well as foreign Databricks, Facebook (now Meta), South Korea's Samsung Research Institute, and so on.
The most powerful products must be used all over the world. Adhere to the global technical vision, adhere to the strategy of open source, and compete with global talents and global scenarios, so that TiDB will take the lead in the field of new-generation distributed databases. Next, I will share two stories with you, so that you can have a deeper sense of the global community co-building brought by open source.
The first story is about writing a book
In our impression, the usual book writing is a meticulous work for many years, ranging from half a year to several years. After the first draft is completed, it is reviewed by many people and then published. But in today's fast-changing technological development, the product iteration speed is relatively fast, can there be a faster and more efficient way for technical books? Before the release of TiDB 4.0 GA last year, the TiDB Master branch community was in full swing. In order to leave something different for this milestone version, our CTO had a "crazy" idea and asked a group of TiDB community partners to write a book about TiDB within 48 hours in a "distributed" way technical books. I hope to systematically and briefly introduce the entire ecosystem of TiDB and its surrounding tools, so as to guide users to "use TiDB and make good use of it".
Just go ahead and publish the idea of TiDB Book Rush online on March 3rd, starting at 21:00 on Friday, March 6th, lasting 48 hours, with a total of 102 authors from the community participating, ending at 21:00 on Sunday , a total of 421 Commits and 199 PRs were generated, and finally the first edition of the open source e-book "TiDB in Action" was born, covering the basic architecture principles, best practices and cases of TiDB, the development process of the TiDB open source community and surrounding ecosystems, etc.
Access address: https://book.tidb.io/
The second story is about open source accelerating product iteration
"4.0 Bug Catching Contest" is a challenge launched by the TiDB community. Contestants can obtain corresponding points by finding bugs or submitting test reports for TiDB, and the points can be exchanged for prizes. A total of 40 community partners formed 23 teams to participate in this competition. Through the unremitting efforts of everyone, a total of 51 P1-level bugs and 8 P0-level bugs were found for TiDB 4.0 GA.
Manuel Rigger, the winner of the first prize in this "bug" competition (also named himself Li Manu in Chinese), is a postdoctoral fellow specializing in database testing. He is from ETH Zurich. His testing framework It also helped MySQL, PostgreSQL, MariaDB and other databases to find more than 400 bugs.
The two stories just now let us see how attractive an open open source community is, which can inspire many contributors in the global community to co-create and build, and accelerate the rapid iteration of products.
This kind of contribution is not one-way. By stimulating the creativity of engineers in the community, open source also enables a large number of developers, institutions and enterprises to use high-quality software products without barriers, contribute huge social value, and form a good cycle.
The two men you see are professors and students from a university in Guatemala. In the picture, they are wearing sweaters from Chaos Mesh, an open source chaos engineering project founded by PingCAP. The professor and student made a very interesting on GitHub. The project, by building a distributed system, records and analyzes information on new crown vaccinations around the world in real time. Chaos Mesh ensures the stability of this system. Before seeing this project, we may not have imagined that a project done by a group of engineers in China can be noticed by a country in Central America, and we also do not know that the chaos testing tool we created can also participate in this way. In the work of the new crown epidemic that all mankind is concerned about. At the same time, our TiDB products are also quietly shining in the fields of disease prevention and control management, herd immunity monitoring and other fields in China. These real stories reflect the global nature of PingCAP community users. The circles on this world map represent the number of community users participating in the PingCAP open source project. It is these developers from all over the world that allow us to reach more users, gain more attention, and build an early customer base.
As an advanced production model, open source allows PingCAP to get a lot of help from the community, and at the same time contributes the wisdom of open source to the world, realizing huge social value. Now PingCAP is getting better and better on the road from open source to commercial. It also extracts commercial value from the contribution of these social values, helps companies and projects thrive, and forms a continuous closed loop of value for the success of enterprises and products.
When we break away from our own products and look at the entire data technology industry around the world, we will find that 95% of the data technology innovations that have emerged in the past decade are open source. This picture shows the annual PR Top 10 data technology projects based on our GitHub Archive data analysis as of early December 2021. PR is Pull Request, which means that the user has modified the code and hopes to merge it into the main warehouse. It reflects It is the innovative and iterative ability of open source projects.
It can be said that data technology innovation has entered an unprecedented stage of vigorous development in the past decade. Where does the driving force come from? What should I do if it is difficult to choose? In the next article, I will share with you what PingCAP has learned from the development of the database for nearly half a century, and what impact technology development has had on PingCAP's architecture choice and product direction.
[1] Typical domestic enterprises: including more than 40 companies such as Alibaba, Baidu, Tencent, Byte, Meituan, Didi, Huawei, JD.com, Xiaomi, NetEase, Bilibili, Wezhong, Ctrip, PingCAP, etc.
[2] Active warehouse: refers to the warehouse that has at least one event in one year, such as PushEvent, IssueCommentEvent, ReleaseEvent, PullRequestEvent, etc. (there are 21 types of events in total)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。