Recently, the intelligent industry media "Smart Things" conducted an exclusive interview with OceanBase CTO Yang Chuanhui. In the process of communicating with Zhixi, Yang Chuanhui explained and shared the core technology, R&D difficulties, and implementation challenges of HTAP to everyone, and provided some reference suggestions for enterprises to make database architecture choices suitable for their business.
In addition, as an expert who has been deeply involved in databases for more than ten years, he also shared his observations on the development opportunities and core bottlenecks of domestic databases. (This article is reproduced from the public account "Smart Things" ID: zhidxcom)
The following is the original sharing of the interview:
Domestic databases are booming, taking advantage of distributed technology and rushing into the territory of international giants.
As the basic software for managing data, the database holds the lifeblood of an enterprise, and it affects the whole body, especially in the core business, a slight error may cause irreparable losses. With the increasing voice of localization, this key business, which has been monopolized by overseas giants for a long time, has become a "lost ground" that local enterprises are bound to regain.
Just recently, Forrester, a global authoritative IT consulting agency, released the 2022 Translytical-oriented data platform vendor selection report, and the domestic self-developed native distributed database OceanBase was listed impressively. It is also the only manufacturer in the world that has the ability to achieve full coverage of Forrester-defined distributed database segmentation functions (single cloud, hybrid cloud, and multi-cloud).
OceanBase has delved into the field of hybrid transactional and analytical processing for 12 years and is now gaining momentum in the database industry.
With the ability to carry high-concurrency transaction real-time processing and large-scale data real-time business decision-making, HTAP is expected to more efficiently mine data value for enterprises and greatly reduce the total cost. With the booming domestic demand, the database invested in HTAP has gradually become a prairie fire.
As one of the founding members, OceanBase CTO Yang Chuanhui has led the database architecture design and technology research and development of the past generations, making it survive the test of extreme concurrency scenarios such as Alipay transaction system and "Double 11", and serve more than 400 industries such as finance, energy, transportation, etc. customers, and led OceanBase to become the only domestic native distributed database that broke world records on both the international database benchmarks TPC-C and TPC-H.
In the process of communicating with us, Yang Chuanhui has a lot of knowledge about the HTAP database, in-depth interpretation of the core technology, research and development difficulties, and implementation challenges of HTAP, and provides some reference suggestions for enterprises to choose a database architecture suitable for their business. In addition, as an expert who has been deeply involved in databases for more than ten years, he also shared his observations on the development opportunities and core bottlenecks of domestic databases.
What is a real HTAP? HTAP≠OLTP+OLAP
The general trend of the world, long-term integration must divide, long-term separation must be combined, and the development of database is the same.
In the early days, the database was all-inclusive. By the end of the last century, due to the increasingly rich application scenarios, it was gradually differentiated into two types: OLTP and OLAP. The former was in charge of transactions, and the latter was specialized in analysis. Now, these two functions are coming together again.
With the influx of big data, the amount of data is expanding rapidly, and many business scenarios need to cope with the ever-increasing demands for real-time transaction processing and analysis. The HTAP database that supports the two types of functions in a unified way was born, and it is gaining popularity in the enterprise-level market.
HTAP has two significant advantages: low cost and low latency. It is not difficult to understand that a system that can do two things at the same time is more cost-effective than two systems; it also saves the tedious and time-consuming ETL process, reduces latency, and better supports real-time analysis.
For a time, various databases began to be labeled "HTAP", and cloud computing companies were also gearing up.
However, for enterprises, it is inevitable to pay the cost of trial and error to update the database. Therefore, it is crucial to know how to use HTAP, how to select models, and which factors to focus on. This touches on a hot topic in databases - what is real HTAP?
Yang Chuanhui's answer is: the ability to expand OLAP on the basis of high-performance OLTP database can well support real-time analysis.
International database giants Oracle, Microsoft SQL Server, and OceanBase, the leading domestic distributed database, all adopt this approach. Unlike the first two, OceanBase has a native distributed architecture at the bottom, which is highly scalable and can handle larger amounts of data. .
There are also many startups taking the route of introducing real-time writing on the basis of OLAP to form a real-time data warehouse. Going this route, if you don't have OLTP core business experience, it may be difficult to support complete transaction processing capabilities. Yang Chuanhui explained that some "HTAP products" in the industry have poor transaction processing performance, which is not a problem of HTAP, but a problem of product design and implementation.
No matter which HTAP route is taken, one premise should be guaranteed— one system, one data.
First of all, the simple superposition and stitching of the two systems will not only lead to increased costs and inherent delay, but also the syntax of the two systems will be different. Expose various problems and limit the development of enterprise-level applications.
Secondly, some schemes using two copies of data pull OLTP data into OLAP system through ETL mechanism. This will have a natural design flaw, because data handling cannot be avoided, and neither cost performance nor latency can be optimized.
This is why, from the very beginning, the OceanBase team decided to build an HTAP database based on "one system, one data" to maximize cost performance.
Yang Chuanhui said that "a piece of data" is viewed from the user's point of view. In actual implementation, as long as the redundancy can be minimized on the premise of meeting the data processing requirements of HTAP, multiple copies or multiple forms can be considered as "one piece of data". data".
In order to enable OLTP to have the capability of large data volume OLAP, HTAP needs to introduce a native distributed architecture and a low-cost storage engine to support resource isolation between OLTP and OLAP, complex queries and large data volume queries, as well as OLAP's data development and modeling capabilities .
It should be noted that you cannot have both, and the real HTAP system is not a panacea.
In theory, it doesn't sacrifice analytical power. However, due to engineering complexity and product maturity, the OLAP database developed based on OLTP has weaker OLAP capabilities than specialized OLAP systems, so it is more suitable for OLTP, OLTP and real-time OLAP mixed load processing scenarios, not suitable for offline data warehouses or large Data unstructured data processing scenarios.
Yang Chuanhui suggested that when an enterprise starts a new business, or an existing business encounters some pain points that cannot be solved by traditional database solutions, it may be a good time to cut into HTAP.
So when an enterprise chooses HTAP, how do you judge whether this database solution is worth the trial-and-error cost, and whether it can be used for a long time in the future to bring value to the business?
He gave some reference dimensions: the first is to look at the landing experience, to realize the HTAP solution of large-scale application in the core business scenarios of benchmark customers, indicating that it is mature and stable; also to pay attention to the core capabilities, such as the performance in the public benchmark benchmark. , whether the ecological tools are complete and easy to use. If the business will develop to a larger scale in the future, the enterprise also needs to consider whether the technical architecture adopted by HTAP has defects, how stable and disaster-tolerant it is, and whether it can achieve the best price-performance ratio for the business.
From these two perspectives, OceanBase seems to be a rare choice. On the one hand, it is the most popular domestic distributed database in financial scenarios. Today, it has accumulated more than 400 external enterprise customers covering banking, energy, electricity, social security and other industries, and its financial-grade disaster recovery, mature stability has been fully achieved On the other hand, it has broken the world records of the international online transaction processing benchmark TPC-C and the data analysis benchmark TPC-H in the past three years, proving its technological leadership.
Behind these achievements, as the pioneer of the distributed HTAP database, OceanBase has been crossing the river by feeling the stones since its birth in 2010.
TPC-C benchmark test results, the red box is the result of OceanBase
Sharpen a sword in 12 years, and refine a mature HTAP body next year
On the road of HTAP, OceanBase, a domestic self-developed native distributed database, has been working hard for 12 years.
Yang Chuanhui believes that insisting on self-research and implementation of core business scenarios is the "killer" for OceanBase to leave its peers behind.
Only by independent research and development, can we fully grasp the core of the database and truly make a plan of "one system, one data". Therefore, every line of code of OceanBase is written by its own team. Its know-how ability that has been accumulated in various core business scenarios over the years has also built more and more solid technical and market competition barriers for OceanBase.
In Yang Chuanhui's view, there will be a time lag for other companies to follow OceanBase's route and match its capabilities.
But for many domestic databases, they don't even have the conditions for "imitation" - how many businesses such as Alipay transactions and Double 11, which are critical and have ultra-high concurrency challenges, dare to let fledgling, untested by a lot of practice. New database trial?
Looking back on OceanBase's experience in the past 12 years, in Yang Chuanhui's words, it was almost "difficult step by step".
Ocean Base's development of a distributed HTAP database is a process from 0 to 1, and there is no business at all at the beginning. Just when the team was worried, in 2011, Taobao Favorites took the lead in launching a landing invitation - millions and tens of millions of users read product information at the same time, causing the original database to collapse at every turn, and they needed a new database to replace it. . As a result, the OceanBase team immediately tailored a special architecture for it, proving the landing value of its database for the first time.
At this time, the OceanBase team was still under enormous pressure. Taobao Favorites is not a core business after all, and the database requirements are not so high. In order to go on in the long run, OceanBase must enter the core business scenario and withstand the most severe tests.
Until November 2012, OceanBase gained a new opportunity - Alipay, which has a huge amount of business data, high concurrency and almost zero tolerance for failures, plans to "go to O". As long as OceanBase can take over this burden, its financial business road will be predictable and smooth.
After two years of experience, OceanBase finally ushered in the big test of the core business scenario in 2014 - replacing the Alipay transaction system, bearing the pressure of the country's largest traffic peak "Double 11", and making the entire system "silk-smooth" .
As a result, OceanBase became famous.
Since then, it has been a smooth journey: from the full application of Ant Group's internal core business to the first adoption by external customers, from supporting financial services such as banking, insurance, and securities, to entering more non-financial businesses such as government, public utilities, and the State Grid The core of the transaction business scenario, OceanBase's landing road is getting wider and wider.
"Doing a database depends on accumulation." Yang Chuanhui said, "This is an invaluable experience that all other domestic databases can't get, and it has played the most important role in OceanBase becoming a leader in the field of distributed databases today. ."
Starting from 0 to serving more than 400 external customers, these experiences have played a snowball-like positive cycle. With more and more customer endorsements, OceanBase has gained more practical experience, and its experience has been integrated into database products. In the iteration, it further widens the gap with its peers in terms of performance and stability.
Yang Chuanhui told Wisdom that OceanBase has made some new progress in the past six months. It has further optimized its analysis, resource isolation and other capabilities in the development of new versions, and has also gained more new key customers in the industry in terms of landing. " By the end of next year, we will almost have a mature body of HTAP."
The domestic database is the year of the fight, and it goes to the "distributed" sea of stars
As a key basic software track for "stuck neck", the rise of domestic databases is inevitable. In Yang Chuanhui's view, the distributed database is carrying the biggest "curve overtaking" opportunity.
In the centralized database track, database giants such as Microsoft and Oracle are not old, and open source databases are also imposing, leaving little market space for domestic database players. However, in the distributed database arena, the starting points of domestic and foreign enterprises are similar, and even the business scenarios faced in China are more demanding than overseas, and the more challenging things are, they can often lead to double-speed growth.
"Centralization has done a good job in the past, mature and stable, but this sometimes becomes a burden for them to do next-generation technology." Yang Chuanhui said that when distributed becomes the mainstream direction of the next-generation database, then the advantages of domestic database manufacturers will be Reflected.
He said that, including OceanBase, domestic distributed databases have entered the core business scenarios from peripheral venues, and OceanBase is the most widely used in core transaction scenarios. In solving some small data volume problems, OceanBase has also been able to achieve the same price-performance ratio as MySQL and Oracle.
In the past, enterprises may have used distributed databases more in edge scenarios as a supplement, but in recent years, OceanBase has used distributed HTAP in the core business scenarios of key customers in different industries, and has stably launched and continued to operate. Yang Chuanhui believes: "The distributed database market will be particularly large in the future, and almost all customers will give priority to distributed databases."
As more enterprises move towards digital transformation and have higher requirements for real-time performance, driven by the trend of cloud native and distributed superposition, Yang Chuanhui is very optimistic about the future of distributed HTAP.
He observed that the user acceptance of distributed databases has been increasing in the past two years, but there is still a big gap with Oracle and MySQL. "For example, the number of users of MySQL Oracle may be in the millions or tens of millions, and the number of distributed users may be in the hundreds or thousands. This is not an order of magnitude."
Therefore, the current recognition of distributed databases needs to be improved. HTAP is still in the early stage of development, and the core challenge is ecological issues. After all, this is an emerging technology route, and many companies or developers may be unfamiliar with it, and need to continue to cultivate more user habits through open source, community operations, and university cooperation. Domestic distributed database products also need to solve problems such as language and documentation.
It is for this reason that OceanBase announced open source in June last year, opening up all core capabilities containing 3 million lines of code at one time, allowing more people to become developers of distributed HTAP databases. They will also continue to publish a series of articles on the interpretation of HTAP technology on the official account, and share their realized HTAP technical solutions and scene values. "We are confident that OceanBase's technology in the distributed industry is far ahead, and what we need is for this industry to become better." Yang Chuanhui said.
He believes that as the distributed database is adopted by more and more people, it can also solve the single-machine problem. In the future, the distributed database will replace the centralized database in most scenarios. type database, OceanBase is preferred when choosing a distributed database.
write at the end
Doing a domestic database is a matter that requires feelings and awe. OceanBase can easily support the core trading business of more industry customers. It is by no means only relying on technological advantages to widen the gap, anchoring the distributed HTAP track, taking advantage of the wave of the mobile Internet era, and continuously accumulating industry know-how and other factors. Only then did it achieve the staged success it is today.
At this stage, domestic databases are entering the fast lane, making great strides in the era of big data and artificial intelligence. According to the calculation of the China Academy of Information and Communications Technology, the database market size in China is about 24.1 billion yuan in 2020, and it is expected to increase to 68.8 billion yuan by 2025. The market space is huge.
However, in the midst of the wind, it is inevitable that the sand and sand will fall. This requires not only companies with real core technologies to maintain their strength in the melee, adhere to independent research and development and innovation, but also require the industry to formulate stricter control standards to escort truly powerful domestic database manufacturers. .
"I think what all domestic database manufacturers should pursue is to do something to replace the core system." In Yang Chuanhui's eyes, this may be the most difficult and socially meaningful thing, but its commercial value is not necessarily high, because the replacement The investment in core technology is particularly large, and it is not of the same order of magnitude as replacing a peripheral system.
"But this matter can't be done until it's mature, there will never be a day when it's mature," Yang Chuanhui's tone suddenly raised, "it should be combined with some sentimental enterprises, and quickly do this thing. come out."
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。