background:
As a first-tier Internet manufacturer that has been working on the Internet for many years, Ctrip is an industry benchmark in terms of database instance scale and data scale. What are the considerations when introducing a distributed database and what is the effect after business practice? With these questions, we interviewed Mr. Yu Ronggang, senior database director and head of Ctrip DBA.
As a first-tier Internet manufacturer that has been working on the Internet for many years, Ctrip has put forward relatively high requirements on relational databases due to its own business complexity, data level, data security, and strict transaction ACID. Product development trends, technology evolution attention and research enthusiasm.
Why pay attention to OceanBase?
As we all know, relational databases have basically gone through three stages since their large-scale application:
● Stand-alone or active-standby relational databases based on minicomputers and high-end storage, typically represented by Oracle, SQLserver, etc. In recent years, there are also solutions based on PC Server and using open source database cores, such as aurora, polar DB, etc.;
● Based on the distributed mode of sub-database and sub-table middleware, many underlying stand-alone relational databases are represented by middleware, such as TDDL, DRDS, TDSQL, etc.;
● Cloud-native distributed database, a solution to complex distributed problems such as multi-copy consistency and data partitioning through native distribution; such as Google Spanner, TiDB, OceanBase, etc.
As an enterprise that has been developing on the front line of the Internet for 20 years, in the early stage of development, like many companies, the solution of using high-end hardware (minicomputer, shared storage, etc.) with SQL Server quickly supported our business development. However, with the development of business and the increase of business types, different businesses have put forward different requirements for the database, and there will inevitably be competition for database resources. At the same time, the continuous accumulation of massive data makes our database cost higher and higher.
"The Inadequacy of MySQL"
Driven by the domestic digital transformation trend and the above factors, Ctrip has gradually introduced the solution of MySQL with PC Server in different businesses, and has completed the replacement of SQL Server after years of hard work. However, MySQL itself also has many shortcomings, that is, it needs to find solutions outside the kernel capabilities. The summary is as follows:
- Stability risks brought by MySQL's online DDL
- Multi-node data consistency problem based on binlog replication
- Single table data capacity problem
- The impact of operation and maintenance operations such as switchover of the active and standby nodes and relocation of the equipment room on the business.
For problem 1, the industry currently needs to rely on external tools to use replication, renaming, etc. to solve problems such as delay (for large tables in weeks), space expansion (at least 120% change table storage space) and other problems.
For problem 2, there is almost no solution in the industry, including the use of MGR and binlog replication methods are also proven hidden dangers; so all our online database nodes use disks in RAID10 mode. However, the data consistency problem at the computer room level cannot be solved.
For problem 3, even using a partition table cannot solve the limitation of computing and storage resources that can only be used on a single computer; at the same time, because of the well-known problems of the sub-database and sub-table scheme, we have not widely spread the sub-database and sub-table scheme.
For question 4, we developed a unified operation and maintenance platform, which can automatically release after various checks in the process of relocation and switching; however, due to the native defects of MySQL data replication, we cannot truly achieve business insensitivity, and each change is still Will be apprehensive.
We provide standard MySQL configurations for different businesses, but different business models use different resource usage patterns, including but not limited to heavy computation over storage, heavy storage over computation, etc., resulting in the overall utilization rate of our online resources On the low side, it cannot be easily increased.
"Advantage of OceanBase"
In view of the above problems, Ctrip has been following up on technological evolution and product trends. It coincides with the rise of domestic distributed databases, and OceanBase has come into our team's attention. After communicating with the OceanBase technical team and our use and testing, we believe that OceanBase has the following advantages.
● It has been honed for a long time. After all, a product that can support the business volume of Ant Group and has experienced the test of Double Eleven for more than ten years has a strong endorsement;
● The results of TPCC and TPCH are amazing. Once upon a time, they were familiar international products in these lists, but OceanBase's many tests and excellent results have strengthened our confidence;
● Well-thought-out product implementation, including multi-tenancy, LSM-like tree storage engine, efficient encoding and compression, HATP capability, truly usable multi-room deployment capabilities and cases;
● Mature peripheral supporting products, such as unified management and control platform, data migration platform, etc.;
● Strong technical team, program design ability and efficient and stable service ability, etc.;
● Has obvious performance advantages compared with similar products.
The above advantages have enabled Ctrip and OceanBase to open up a road of win-win cooperation.
How does Ctrip use OceanBase to solve problems?
Through the native distributed capabilities of OceanBase, the hidden dangers of MySQL are easily solved and many additional benefits are brought.
"How to troubleshoot online DDL?"
Online DDL is one of the nightmares that plagues technicians who have never operated and maintained MySQL. Back then, the Ctrip IM business group message table needed to add a status field. Considering the stability, a new table had to be created to expand this field, which introduced the inconvenience. The necessary join query is vivid in the past, but it has been improved because of the introduction of OceanBase; because the storage engine of OceanBase uses the LSM Tree-like design, and the table scheme also supports multiple versions, so it is no longer a problem in DDL, and it can be done in seconds. Completed and will not lock the table, you can safely do such operations.
In fact, you don't have to worry about operations such as adding or subtracting indexes. In simple terms, the storage mechanism of OceanBase can be understood as a table that has a part in memory and disk. When adding or subtracting data, the data in memory is real-time. When it takes effect, the amount of data in this part is not very large, and it can be completed quickly and returned for use. For some data on the disk, there will be background tasks to process, because most of the business is to operate near real-time data, background tasks There will be no business impact.
"How much data can a single table in OceanBase have?"
How big can OceanBase single table data be? In fact, there is no accurate answer to this question, because Ctrip has not yet reached this upper limit.
Taking Ctrip's IM business as an example, our group message table saves data for two months and occupies about 800G of storage space, which basically reaches the MySQL single table under the current configuration. After the same data is migrated to OceanBase, the data volume is about 200G.
Why is it so magical? The answer lies in OceanBase's efficient encoding and compression mechanism. Before data is written to disk, OceanBase can use dictionary and other methods to encode data columns. After encoding, the amount of data occupied by each field will be greatly reduced. Aggressive compression algorithms can achieve good space friendliness. In actual tests, the compression ratio of different services is between 1/3 and 1/10; even if OceanBase uses three copies, it can ensure that the storage space does not increase. In addition, Ctrip actually uses one master and two backups or MGR on MySQL, and the disk uses In raid10 mode, OceanBase can save nearly 5/6 of the storage space even at the lowest compression ratio.
And because the encoding field is stored in the memory, in fact, in the process of business processing, the data does not need to be decoded. It can be said that through efficient compression encoding, OceanBase has achieved a win-win situation in cost and performance.
Further, OceanBase provides a native partition table solution, which can organize data through rich partitioning methods. By rationally arranging the primary partition of the data partition (write operations must occur in the primary partition, and read operations can be selected according to the business), it is also possible to further utilize the machine resources where multiple copies are distributed to improve resource utilization. When the resource configuration is increased, the write performance is greatly improved while the efficiency of the read operation is guaranteed.
"How to ensure data consistency?"
As a cloud-native distributed database, OceanBase, like most products, uses multiple copies to solve data security problems, so how to ensure the data consistency between these copies? The biggest problem faced by MySQL here has become the least worrying problem in OceanBase.
OceanBase uses the optimized paxos protocol combined with physical log replication to strongly ensure the data consistency of the majority, that is, in the case of three copies, at least two copies of the data consistency are strongly guaranteed, using OceanBase to ensure RPO=0, the actual test RTO<30s, Meet the data consistency requirements of Ctrip's financial business.
In addition, we actually agree with the statement that "there is only one multi-copy consensus protocol in the world, paxos, and other protocols are variants". Through our technical research, paxos compares raft and other variant protocols:
● The speed of selecting the master during failure recovery is faster, and when it is reflected on the database, the service recovery time of node failure is shorter
● paxos allows log holes, which can make it easier for Ctrip to participate in the main election process, and more reasonable for the use of business coding resources
"How to do OceanBase master/slave switch, is it safe?"
After using OceanBase, I don't really care about the master-slave problem. Because of OceanBase's share nothing architecture, each copy is a peer node, and OceanBase can guarantee the consistency of the majority copy at any time. In the traditional sense, the master The standby switch is actually just marking the copy, which fundamentally ensures that the master-standby switch becomes a safe operation. In addition, OceanBase can also automatically determine the current system status according to the configuration sequence. If the machine is down, it can automatically discover and switch to other copies with complete data to ensure the continuous availability of services.
Further, OceanBase provides the Obproxy component in the access layer to automatically route database access. Combined with the DAL component (application data source access control layer) that we already have, it can provide traffic switching capability very conveniently and stably, and truly achieve no interruption to the business. Or be less intrusive.
Through the above native capabilities of OceanBase, Ctrip almost fundamentally solved the hidden dangers of MySQL, and also gained additional benefits.
"Resource pooling, database instance co-location"
Because different business models use the database in different ways, the business peaks are also different; in the original MySQL instance-based deployment model, each business needs to match resources according to the highest configuration required by the business, and actually run, The real average resource utilization is not very high, but due to the limitation of MySQL data replication and switching efficiency and stability, it cannot be easily expanded or reduced, so there is no good way to improve resource utilization.
By configuring a large resource pool to OceanBase, Ctrip uses the multi-tenant resource isolation capability provided by OceanBase to migrate the MySQL instances used by multiple businesses to the tenants of OceanBase, thereby realizing the hybrid deployment of multiple database instances in one cluster model.
● By configuring different resource isolation for tenants, ensure that each tenant has appropriate resource configuration;
● According to the different timing of business peaks, the vertical expansion and contraction method of tenant resources is used to realize the expansion and contraction of database instance resources in seconds. On the premise that the overall cluster resource usage remains unchanged, multiple business peaks can be stably achieved. Carrying capacity.
● If some businesses grow rapidly, you can also use the method of expanding the number of resource nodes used by the database instance to rapidly expand the business horizontally. Of course, data copying will occur when the business is scaled horizontally. OceanBase also provides the configuration parameters for the bandwidth resource usage by data copying to ensure service stability during data copying or migration. The measured data copying can effectively use up to 80% of the server bandwidth. %about.
● Going a step further, you can easily control the overall resource usage of the database cluster by adding or removing nodes; the expansion and contraction operations are all online operations and do not require that the expansion and contraction operations must be peer-to-peer expansion and contraction resources, which guarantees Ctrip's flexibility in resource usage.
"HTAP capability improves business response speed"
The HTAP capability currently provided by OceanBase opens up new possibilities for Ctrip's business. In the same database cluster, it is possible to perform TP operations for online business while providing business Ad Hoc query capabilities, because AP queries are located where the data is located. In situ, the data is fresh, and the service capability to the business is more time-sensitive than the data that passes through ETL. At the same time, because there is no data copy, the storage cost is also effectively controlled, which is equivalent to buying one get one free. .
In the scenario of mixed use of TP/AP services, OceanBase also wants to provide effective resource isolation and restriction capabilities for different services at the front, so that the introduction of AP will not lead to resource competition with TP services, which will eventually lead to service quality damage.
How to migrate to OceanBase smoothly?
"About Operation and Maintenance Control"
Ctrip's database scale is relatively large as a whole, so detailed operation and maintenance rules and procedures are implemented for large-scale operations, and a series of platforms developed in conjunction with it have formed Ctrip's database operation and maintenance system. It just so happens that OceanBase started from Alipay, and the design concept for large-scale operation is very consistent with Ctrip; the unified operation and maintenance monitoring platform OCP provided by OceanBase can manage multiple OceanBase clusters and provide APIs for development. By deploying different OCPs for development and testing, it is convenient to manage OceanBase clusters in each environment, and these platforms and clusters are quickly and easily integrated into the original operation and maintenance system through API.
"About data migration"
For any database migration, we need to be cautious, and we must ensure strong data consistency, because data is the lifeblood. The introduction of OceanBase is no exception. Fortunately, OceanBase provides an easy-to-use and stable data migration platform OMS.
By using OMS it is easy and guaranteed to do the following during data migration:
- Full migration, the scheme and data in MySQL can be migrated to the tenant of OceanBase's MySQL mode, and after the migration is completed, full data verification will be performed, and prompts and modified SQL will be given for each difference. At the same time, the OMS migration process will give full play to the concurrent capability of OceanBase data writing and perform automatic flow control according to the memory state of OceanBase to ensure the stability of the migration process.
- Incremental migration, before the full migration is initiated, OMS will pull the MySQL binlog to the local, and replay these logs after the full migration and full verification is completed, ensuring that the incremental data during the full migration process is not At the same time, incremental data verification will be performed to ensure the data consistency of incremental migration.
- Reverse migration, on the premise of ensuring that the incremental migration is completed and the data is consistent, when the business switches read and write to using OceanBase, OMS can also reversely migrate the incremental data in OceanBase to MySQL to ensure that during the data parallelism period It can be moved back, and I have to say that this is a thoughtful design.
- Data synchronization, OMS provides the ability to synchronize data in OceanBase to message queues such as kafka, providing data for subsequent big data analysis and machine learning. It can also be used as a logical backup method to provide a certain data security guarantee.
"What benefits does Ctrip get from using OceanBase?"
Through the introduction and business practice of OceanBase, the problems of online DDL, data consistency, large table performance, and active-standby switching that have long plagued Ctrip using MySQL have been solved. While providing more stable and efficient service capabilities, it reduces the overall cost of ownership of the database system and improves resource utilization.
At the same time, OceanBase itself is an open system architecture, whether it is Observer (database kernel), OCP (cluster management and control platform), OMS (cluster migration tool) or OB Agent (management node), Obproxy (lightweight data routing agent) provides convenience. interface or standard SQL interface, we can connect OceanBase's monitoring and alarm system to our existing monitoring system; by using API for control and access, we can quickly and easily connect OceanBase's resources, clusters, tenants, databases The management and control capabilities of multiple levels are involved in the current management and control platform, so that Ctrip can manage all database products in a unified manner. The convenience brought by these open architectures protects technology investment well, and also allows technical students to be more familiar with and Easy to use.
Everyone knows that there is no silver bullet in the software world, and using OceanBase is not easy. First of all, you need to understand the design concept and architecture design of OceanBase. Each functional feature also needs to be aware of its advantages and disadvantages. Taking advantage of strengths and avoiding weaknesses puts higher requirements on technical students; at the same time, OceanBase has not yet achieved complete database autonomy, and requires a large degree of manual participation, mainly focusing on table structure data, index design and query optimization. However, this is also a good thing for technical students. On the basis of understanding database technology, knowledge reserves of distributed systems are also required to promote technical students to be closer to the business, and only after understanding the business model can they better design and optimize the database. These all promote the breadth and depth of knowledge of technical students to a certain extent.
"Talk is cheap, show me code" means that if you are afraid of getting your hands dirty in the software world, you may not get a real understanding after all; even if you have a good control over the features of OceanBase and Ctrip's own business model, But these are all prior experiences. How can we truly be fully understood, we cannot do without the true posterior conclusions. After all, we cannot use online business for direct verification. For this, technical students from Ctrip and OceanBase The following projects were jointly built:
- The business SQL replay system, based on the trace system of Ctrip business and database, has replayed the full amount of business SQL to OceanBase for business adaptability verification; it has also verified business pressure by means of traffic replication and expanded replay multiples to ensure that OceanBase can Really satisfy the business without compromising the business.
- SQL trace system, Ctrip's business trace has only one trace record in the SQL layer trace, but there is no clear hierarchical trace at the specific execution level of the database. In addition to the characteristics of OceanBase distributed database and data routing, a complete SQL trace can be obtained to find the real The performance bottleneck is not easy. Through the joint construction of the two parties to realize the SQL trace system, we can visualize the execution process of each SQL, and then find the real bottleneck point to provide data support for subsequent optimization.
- Unified log analysis system. For any distributed system, unified analysis of logs is required. Log analysis of a distributed database such as OceanBase is very important to ensure service stability. It can be done by identifying and concatenating key log information. To early detection of problems and early warning, at the same time through the log linkage analysis can also better understand the operation process of the distributed system.
Through the co-construction of the above systems, Ctrip has a clearer understanding of the applicability of the OceanBase system to the business, and also has a clearer understanding of the operation process of the distributed system, which is of obvious help to the ability growth of technical students and the guarantee of business SLA.
At the same time, it is also clear that Ctrip can do more in the introduction and practice of OceanBase, and it needs to do more. There is also a tip here, that is, to communicate and cooperate with the architects of OceanBase more, and share their opinions on OceanBase. Understanding and long-term practice bring Ctrip more clearly, and working together can avoid many detours and avoid failures.
What are your expectations for OceanBase?
At present, all members of the Ctrip team have passed OBCA and are moving towards OBCP for all members. During this process, everyone's understanding of distributed, cloud-native and database has become stronger and stronger; I believe that through continuous learning, practice, and co-creation, There will also be more and more OBCEs in the team. It is also hoped that the projects jointly built by Ctrip and OceanBase will be implemented on Ctrip as soon as possible, so that business adaptability and tuning capabilities can be identified better and faster, and business migration can be carried out more practically and confidently. At the same time, it is hoped that more and more technical personnel can master OceanBase in depth, so that the ecology of OceanBase will become more and more abundant.
Go fast alone, go far. It is hoped that more and more practical cases can be shared, and domestic cloud-native databases can be used in mutual reference and learning to create higher value for our business, and also allow us to have more capabilities and higher value.
Postscript (the architect's perception)
OceanBase architect Han Bing (flower name: really difficult):
As a leading Internet company with more than 20 years of technology accumulation, the Ctrip DBA team has a profound technical background, and has profound attainments in databases and distributed systems; during the cooperation process, Ctrip's technical students quickly mastered The use of OceanBase, combined with the database operation and maintenance management system and platform tools that Ctrip has built and continued to develop, quickly integrates and integrates the enterprise-level service capabilities provided by OceanBase, and soon integrates OceanBase into R&D, testing and production. in each environment.
In the cooperative study with Ctrip students, I also went deep into the business system and mastered the business characteristics. We designed the database's adaptive use posture and targeted settings according to the business characteristics, so as to smooth the business migration and improve the efficiency of the database. Fast support is provided. In this process, I deeply feel that as the original architect, we must work closely with our users and learn from each other. Only in this way can we give full play to our advantages, master the business characteristics and user operation and maintenance system as soon as possible, and also better OceanBase's design concept, functional features, and usage posture are better brought to a cooperative environment, and business value can be obtained faster, better, and more efficiently.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。