Introduction: Who would have thought that a person who is studying mathematics for both undergraduate and master's degrees would later make the world's first native distributed database? Before 2010, Yang Zhenkun could not have imagined that one day he would establish such a close relationship with the database, let alone that the next ten years would be a very difficult ten years in his career.
Interviewer/Author: Tian Weijing
Interviewee: Yang Zhenkun, founder of OceanBase distributed relational database
"If we can't get in, we will miss the opportunity of Alipay to "go to O", then OceanBase will have no chance."
Facing the interview camera, Yang Zhenkun calmly told the battle of life and death. The OceanBase database in his mouth is a "child" who has not been favored since birth, and has stood on the edge of life and death several times.
Don't do engineering, talk on paper
Yang Zhenkun, who was born in 1965, cannot completely say that he is like most people. The same is the learning trajectory of junior high school and junior high school. The difference is that he was admitted to Peking University in 1984. How hard was it to get into college in the 1980s? According to the statistics of the college entrance examination over the years, the number of people who took the college entrance examination in 1984 was 1.64 million, and the number of people admitted was 480,000, and the admission rate was 29%. On the surface, the admission rate is not low, but in fact it only accounts for about 0.0179 of the same age (according to the birth population data over the years, the birth population in 1965 was 26.79 million). Equivalent to 100 people, only 1 person can be admitted to the university, in that era when most people can only maintain food and clothing, and many people give up their studies because they can't pay tuition fees. Therefore, being able to go to high school is already "one in a hundred", and being admitted to university is called "the heaven's favorite". During his undergraduate and postgraduate studies, Yang Zhenkun not only studied mathematics, but also learned a lot of basic computer courses. The second grader participated in the research and development of the computer research institute of Peking University. Due to his interest, he chose the computer direction during his Ph.D. After graduation, he stayed in the school to teach, and was promoted to associate professor and professor exceptionally at the age of 32.
When his career was going smoothly, Yang Zhenkun chose to leave academia and enter the industry. Perhaps influenced by his doctoral supervisor, Academician Wang Xuan (Academician of the Chinese Academy of Sciences, Academician of the Chinese Academy of Engineering, and founder of computer Chinese character laser phototypesetting technology), Yang Zhenkun felt that "not doing engineering is equivalent to talking on paper". He has worked in Lenovo Research Institute, Microsoft Research Asia, Baidu, etc.
So far, Yang Zhenkun's first 45 years have not been able to catch up with the database.
make a "big plane"
At that time, Baidu, Alibaba, Tencent (commonly known as "BAT"), the three major Internet giants, only Alibaba (hereinafter referred to as Ali) was optimistic about cloud computing. As an expert in cloud computing, Yang Zhenkun is a high-end talent that Ali desperately needs. At the invitation of Ali partner Liu Zhenfei, Yang Zhenkun joined Taobao on May 12, 2010. From Taobao's internal appointment email, it is enough to see his love for Yang Zhenkun: "Dr. Yang is a talent we have been looking forward to for a long time, and has very rich experience in system design and implementation, massive information processing, algorithm design and many other aspects. Dr. Shen Xinyang and the team work together to create a high-performance, high-reliability, low-cost, dedicated computing platform for large-scale e-commerce for Taobao. Transaction volume' provides the underlying technology needed."
But he chose the most difficult path--self-developed distributed database, with a bright future ahead.
In fact, when he was working at Microsoft Research Asia, Yang Zhenkun met Wang Jian, the founder of Alibaba Cloud, and got in touch with distributed systems. Both of them are very optimistic about distributed systems. Before entering Taobao, Yang Zhenkun stayed at home for a month, during which time he was thinking about what to do next. In his words, "At that time, there was a general thinking, and I was not sure if I could do this on Taobao, but there was a chance."
At that time, centralized databases dominated the world, and representative products such as the Oracle database of Oracle Corporation of the United States dominated the international market, and Ali was its largest customer in China. The widespread commercial use of the Internet, cloud computing, big data, Internet of Things, blockchain, artificial intelligence and other technologies have accelerated the arrival of the mobile Internet. Behind the more convenient interconnection between people and the more intelligent society, there are more and more frequent concurrent access to business systems and more and more huge data processing volume. The high cost of centralized databases and their extremely limited scalability of storage and computing are all over the top.
The essence of a centralized database is a stand-alone database. In an environment where the average daily traffic of the Internet exceeds 10 billion and 100 billion, the storage capacity of a stand-alone database is extremely limited, let alone how to analyze and process data. Due to the fact that distributed databases are more complex, fault location is more difficult, distributed transaction performance is reduced, and system maturity is insufficient, traditional database vendors have chosen the solution of "sub-database sub-table + middleware", namely Based on the centralized database, the business is greatly transformed, dismantled and split, so that each dismantled and split part is suitable for a single centralized database, which is the sub-database and sub-table database. This method is actually to split the original big business in one database into multiple small businesses. Yang Zhenkun called it "if there is no big plane, it will be divided into multiple pieces and transported by small planes", but if it is heavy equipment , Heavy weapons such as tanks and cannons, small aircraft can not be installed. Therefore, this is not a complete solution to the problem after all, and faces high costs and complicated operation and maintenance.
Yang Zhenkun realized that "the opportunity is here!" So he set up the project, "the direction and goal are clear". Although the development of distributed online transaction processing is very complex and difficult; although the threshold for distributed databases is extremely high in terms of SQL optimizer, storage architecture, etc.; although this distributed database requires a long time and a large number of practical scenarios to polish; although the distributed architecture Combined with relational databases, there is no prior experience to refer to,
"We want to make a big plane, no matter how big your business volume is, you can use the big plane of distributed database to transport it for you." When the system capacity is not limited, more can certainly be done. This was the only thing Yang Zhenkun was sure of at the time—value.
Yang Zhenkun's way of doing things is to "think about things clearly before doing them, and it will be a big trouble if you do it." He only determines the goal, and how to act is still a problem. Fortunately, there is no need to worry about what functions are required for distributed databases. There are ready-made mature open source and commercial databases such as MySQL and Oracle, and the functions have already been defined. In other words, create a distributed environment, use multiple machines, and refer to the existing function list of mature databases to implement distributed databases.
But for basic software such as database, especially the new type of distributed database that is completely different from the traditional centralized database architecture, it is by no means that the architecture can be built and the code can be applied to the business. Instead, we have to start with a specific business, make basic functions for it, meet its basic business needs, and then make the database bigger and stronger step by step. Therefore, only by finding the opportunities brought by business needs, cutting in, and landing, there is a chance to survive.
must "survive"
It's easier said than done. The distributed database that Yang Zhenkun wants to do is a "difficult child" and can't find any business at all.
Fortunately, Chu Cai, who was in charge of the OceanBase project at that time, was a Taobao old man. He took Yang Zhenkun to various Taobao business and technical teams. After a few days, he finally found a chance in the Taobao Favorites business team.
Why are Taobao favorites willing to test the waters? "Favorites are a relatively special requirement. Until now, we don't know any better solutions to solve this problem." Thousands of products that you are interested in, and changes in product information such as price cuts and removals need to be updated in time, so that users can grasp the dynamics of products. Every time the user opens the favorites, the background system has to query the database hundreds of times to obtain the latest information of each product. For the database, it is equivalent to millions or tens of millions of people reading the data, and then Multiply the number of reads by a hundred thousand times. With such a high-frequency data processing volume, almost no database can handle it, and users are often complained of "no response" and "unable to open".
"We made a relatively special structure, and later found that this structure has great value," Yang Zhenkun said happily. To put it simply, the modification of commodity information will not be directly written to the hard disk, but will be temporarily stored in the memory. Each time the user's favorites are displayed, only the modified commodity information needs to be obtained from the memory. It is equivalent to turning the original hundreds or thousands of I/Os (input/output) of the favorites into one I/O. The original plan needed to expand to hundreds of machines, but now only more than 20 machines can solve the problem . This greatly reduces the number of machines, reduces costs, and enhances business stability, preliminarily proving the survival value of OceanBase. However, the good times did not last long. After Favorites was launched in 2011, OceanBase did not find a second business with such significant value throughout 2012. Yang Zhenkun said bluntly, "In 2012, I really can't do it anymore."
So the only goal in 2012 is to survive.
Under the pressure that the team could be disbanded at any time, Yang Zhenkun approached Wang Jian, who was the chief architect of Alibaba at the time, and confided his difficulties. Then recommended by Wang Jian, on November 15, 2012, the OceanBase team transferred from Taobao to Alipay. Because the Alipay business deals with money and requires higher data consistency, I hope to find opportunities in the Alipay business. Yang Zhenkun knew that "if you can't find another opportunity in Alipay, it will be over." In the new team "unfamiliar with people", they are familiar with people while exploring opportunities.
Fortunately, the OceanBase team met an enlightened leader, Cheng Li (the flower name Lu Su, the current CTO of Ali Group), who was the technical director of Alipay at the time, who encouraged and supported new technologies. Coinciding with July and August, Alipay began to discuss how to "go to O", that is, to replace the Oracle database (as early as 2009, Alibaba proposed "go to O"). But the great difficulty is that if you don't use Oracle database or shared storage, and choose a stand-alone database like MySQL, what should you do if the data is lost? In finance, this is an unacceptable consequence. Continuing to use the Oracle database will cost incalculably high software and hardware costs for Alibaba's business, which has a huge amount of business data and data concurrency.
Yang Zhenkun said confidently, "We have a solution to this matter." This method is the three copies that are widely used today. Each transaction is done on 3 nodes or 5 nodes at the same time, and if more than half of them are successful, they are considered successful. For example, if one of the three machines is broken, the data of at least one of the remaining two machines is correct. This method solves the core problem of replacing Oracle, that is, the problem of data corruption and loss after giving up the shared storage. .
OceanBase version 0.5 was born, which also opened up a way for OceanBase.
But Yang Zhenkun understands that this just gives OceanBase a chance to prove itself. It took about 7 months from proposing a solution to the official release of version 0.5. On the other hand, it is difficult for the business team to accept the OceanBase database at once. First of all, is it feasible to apply this version to the core business without being verified by other businesses? It is difficult for the business team to rest assured that the OceanBase database supports Alipay's transaction database and handles transaction flow data. Secondly, the OceanBase technical solution requires three computer rooms, and it is necessary to add another computer room on the basis of the original main and standby computer rooms. At that time, except for Hangzhou, Ali and Alipay did not have three or more computer rooms in other cities.
To this end, the two sides repeatedly argued fiercely. For the OceanBase database, this is a battle of life and death. If you don't seize this opportunity, not only will you miss the historic opportunity of Alipay's "go to O", but the OceanBase database will never have a chance. Therefore, the OceanBase team is of course striving for it. On the other hand, the business team also maintains a very cautious attitude, and the two sides are deadlocked.
In the end, Lu Su came forward to convince the business team that there was a 1% opportunity to divert the OceanBase database. If there was a problem, 1% of the traffic would be cut off at any time. Looking back on this matter, Yang Zhenkun's attitude is "our luck is good". Before the "Double 11" in 2014, probably at the end of September or the beginning of October, the business team began to do stress testing for the support of "Double 11". Due to the huge traffic, it has exceeded the predetermined capacity of the Oracle database and the I/O capability of the system. , the system reports a lot of errors.
Due to the lack of time, it was too late to buy equipment temporarily, and it was too late to expand the capacity of the server on the shoulder. The business team remembered the OceanBase database that supports 1% of the traffic during the stress test, and asked Yang Zhenkun, "Can you support 10% of the traffic?" Of course, the OceanBase team was happy: "Don't say 10%, it can be 100%. hold on." Even in the war room on the night of "Double 11", facing the concerns of Peng Lei, the then CEO of Ant Financial (currently chairman of Ant Group), Yang Zhenkun told a joke about jumping out of the window if he failed.
The result was very smooth, and this battle basically confirmed the position of OceanBase database in Alipay. After replacing the Oracle database with the OceanBase database, the single-copy data can be 1/7 of the original, and its computing resource investment is also reduced to 1/12 of the original. Only one item of storage saves about 2 billion yuan compared to the Oracle database, which is equivalent to 90% savings in cost per account.
Since then, the road has been smooth and smooth. In 2015, the Oracle database was replaced to support Alipay's payment system; in 2016, OceanBase 1.0 version was released, which was upgraded from the original semi-distributed database to a real distributed database. In the same year, the Oracle database was replaced to support Alipay's account. The business system has achieved the goal of "going to O" of Ant Group; in 2017, it went out of Ant Group and commercialized externally.
Before commercializing it, Yang Zhenkun has to answer a very critical question, why do others have to use the OceanBase database?
As mentioned above, a big plane is more valuable than a small plane, but if someone else has a small plane, why spend the time to replace a big plane?
Interest is eternal, and for a fledgling new thing, it can only survive if it provides enough value. At present, the vast majority of database products are divided into transaction systems and analysis systems, both of which are huge in size and cost. In business scenarios, there are often delays between the two systems. If a set of databases is used for both transaction processing and analysis processing, the problems of cost and real-time synchronization can be solved. Yang Zhenkun is very sure, "If this thing is done, it is very likely to subvert the entire industry, and this thing can definitely be done." Process, integrated transaction and analytical processing).
The future depends on this
It is said that children who are not optimistic are more promising. In 2019, the OceanBase database broke the TPC-C (Online Transaction Processing Benchmark) world record held by the Oracle database for 9 years, becoming the first Chinese database in China to top the list. Product; in 2020, the OceanBase database once again set a performance record of 707 million TPC-C, firmly occupying the top position; in 2021, it broke the TPC-H (data analysis benchmark) world record with a performance of 15.26 million QphH (currently ranked No. 2); In the same year, with the support of Ant Group, the OceanBase database announced the establishment of Beijing Aoxing Bass Technology Co., Ltd., which was open sourced again.
Why "open source again"? As early as 2011, OceanBase version 0.2 was open sourced, but after version 0.4, the open source update of OceanBase database was interrupted. This is because from the perspective of Ant Group at that time, the OceanBase database mainly supports the internal business of Ant Group, serving Taobao, Tmall, Alipay and other businesses. The OceanBase team is too busy to "survive" and has no time to open source affairs. The OceanBase database open source has stopped updating. Until 2021, OceanBase, which has got rid of all concerns, will embrace open source more firmly, and integrate storage engine, SQL engine, distributed engine, distributed transaction, multiple copies, high performance, scalability, optimizer, failure recovery, multi-active disaster recovery, etc. The core technology and code are shared with the outside world.
In the face of the industry's doubts about its open source motives, Yang Zhenkun, in the article "Two Children Who Have Been "Bullied": Electric Vehicles and Distributed Databases, quoted the book "Silicon Valley Iron Man" to the horse. The description of SK's opening of Tesla's patent is a metaphor for the core of the OceanBase open source system. “When Musk announced in 2014 that Tesla would be making all of its patents public, analysts were trying to determine if he was showing off or if there was an unintelligible motive or a trap. But Musk’s decision was so candid and he wanted people Make and buy electric cars. Musk believes that the future of mankind depends on this. If the disclosure of Tesla's patents means that other companies can build electric cars more easily, then it is good for mankind, these ideas should be It’s free. Cynics will laugh at his views, but Musk has planned to do so, and he’s sincere and extremely sincere in explaining his thoughts.”
Obviously, Yang Zhenkun believes that native distributed database is an inevitable choice for database development, and the future of real-time data processing depends on this. Yang Zhenkun also hopes that there will be more real distributed database products. From the perspective of domestic databases, it is expected to achieve "de-IOE", allowing more Chinese databases to go to the international market. From a higher perspective, this will affect the database ecology. Development and technological advancement are favorable.
end
Today, the OceanBase database has accumulated more than 400 external corporate customers, covering many important industries such as banking, securities, energy, electricity, social security, etc., and no company regrets using OceanBase and wants to replace it, even if OceanBase uses a set of systems for transactions and Analytical processing capabilities are still maturing.
Yang Zhenkun has achieved his goal for many years - to make a real distributed database.
When asked whether it was the most difficult thing to do OceanBase database in his career, and whether he thought about giving up when the team faced disbandment, Yang Zhenkun's answer showed us his tenacity. "It's more difficult. Many times, you can't control your own destiny. You know that one thing is right, but you want others to believe it. It's very difficult. So in the process of building the OceanBase database, I looked for a little bit. , As long as this project is not 'shot to death', we will do it, and it is not good to give up, right? If one day it is 'shot to death', we have no choice, right? Anyway, as long as we can do it, we will do it. "
This article is the content of "New Programmer 004", and I talked with Yang Zhenkun, the founder of OceanBase, about his programming life. "New Programmer 004" will be available soon, so stay tuned. From Michael "Monty" Widenius, the father of MySQL and founder of MariaDB, to Bruce Momjian, co-founder of PostgreSQL Global Development Group, Jia Yangqing, vice president of Alibaba, Pan Aimin, founder and CEO of Instruction Set, Wu Jun, a famous technology author, and Vue.js Author You Yuxi... "New Programmer 004" with the theme of "Our Technology Era, My Programming Life", conducted in-depth dialogues with many well-known domestic and foreign technology pioneers and representatives of the new generation of programmers, hoping that the industry will be excellent The characters' technical road and life insights bring inspiration to everyone.
————————————————
Copyright statement: This article is an original article by the CSDN blogger "New Programmer" editorial department. It follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement for reprinting.
Original link: https://blog.csdn.net/programmer_editor/article/details/123667034
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。