Introduction
In September 2021, a super-large financial institution successfully completed the last comprehensive migration and transformation of a core database of up to 20TB+, which also laid a solid foundation for the subsequent evolution to a cloud-native multi-active architecture. The successful launch of the core system database full migration project has established a benchmarking practice for the financial industry to practice the power of science and technology. Liu Weiguang, Vice President of Alibaba Group and General Manager of Alibaba Cloud Smart New Finance & Internet Business Unit, refined and sorted out the complete steps and technical strategies of the whole migration process that lasted for one year, and completely precipitated it into a unique dry product, the entire content of this article.
"Practice brings true knowledge", Alibaba Cloud and OceanBase have taken a solid step in assisting the comprehensive migration of domestic databases of super-large financial institutions, and accumulated precious experience. Therefore, this article is not an analysis and imagination of database replacement, but a technical guide to replace the technical platform of the actual large-scale and complex core application system. There are various problems that were unexpected in the "analysis" article in the process. Especially for various adaptations and compatibility of the existing operating environment, friendliness to applications, etc., how to solve these problems, this article gives detailed solutions one by one.
Under the background of accelerating the construction of a powerful country in science and technology and realizing high-level scientific and technological self-reliance and self-reliance at the national level, a super-large insurance (group) company further promoted digital transformation, followed the development trend of pioneering technology, and launched a forward-looking layout to start the distributed transformation of IT architecture. Transformation, and successfully completed the last comprehensive migration and transformation of the core database up to 20TB+ in September 2021, which also laid a solid foundation for the subsequent evolution to a cloud-native multi-active architecture. The domestic migration project of the database was successfully launched, which has established a benchmark practice for the financial industry to practice science and technology, and is also responsible for the national science and technology self-reliance strategy and domestic technology; it also promotes the entire domestic database management and application system. The rapid maturity of the industrial chain.
For the insurance industry, although the concurrent pressure of short-term business is not as great as that of Internet companies, it is far greater than Internet companies in terms of business complexity and dependence on proprietary database features. The processing of insurance business is more complex. A single business needs to be completed by multiple systems. The call chain is longer and more complex than banking and Internet business. Ensuring the stability of complex collections and large transaction volumes is a domestic challenge for insurance business databases.
Due to the strict requirements of financial institutions for business continuity and data accuracy, none of the traditional leading financial institutions has been able to complete the full migration of domestic databases until this insurance company successfully implemented it and made five breakthroughs.
Short migration time
From September 2020 to September 2021, it took only one year to complete the migration, while traditional financial institutions have not yet achieved such a large-scale full migration of core systems.
Record-breaking scale of migration
Completed the full relocation of online traditional centralized databases of nearly 100 business systems including traditional core, Internet core, individual insurance sales, group insurance sales, operation management, customer service management, and big data within one year, and the scale of migrated data exceeded 400TB , The amount of data exceeds 100 billion, the data scale of a single database exceeds 20TB, and the overall server scale of the project exceeds 20,000 cores.
Ensure business continuity and data accuracy at the same time
The entire migration process has not been switched back. Nearly a year after it went online, the system has been running stably, and has gone through a full cycle of "business exams" in 2021. The rigorous tests of all business links fully meet the production needs and realize the leap from usable to easy-to-use domestic databases.
Realize 100% independent innovation of technology
Based on the completely self-developed and innovative domestic native distributed database, during the migration process, the version upgrade has been continuously released for a total of more than 50 times, and the longest demand resolution time is 2 months (Pro*C+Tuxedo). At the same time, through systematic training and communication, more than 500 employees have been certified by database professional exams, and the ability to fully control the database has been realized.
Next-generation technology becomes key productivity
After the migration, the storage cost has been significantly reduced, and the performance has been greatly improved. The database has been developed from the active-standby mode to support multi-active deployment in two places and three centers, and the processing time of production events has been shortened from hours to minutes.
When we look back on this process, although the process is difficult, we have accumulated valuable practical experience in the migration of domestic databases of large financial institutions.
Domestic financial-grade database migration practice
initial preparation work
1. Database selection
The database is the crown jewel of the enterprise IT infrastructure. It stores the core data assets of the enterprise's operation, supports applications upwards, and shields the underlying infrastructure downwards. Under the premise of "stability overriding everything" in the financial industry, database selection is more cautious. , according to the description of the "Database Development Research Report (2021)" of the Academy of Information and Communications Technology, as of the end of June 2021, there will be as many as 81 domestic relational database manufacturers. Faced with such a complex product, how to choose a suitable database is placed in the The first question facing insurance companies. Although there are many database products, after careful evaluation, three products including OceanBase and PolarDB were finally selected as the initial pilot verification. The main selection considerations are as follows:
- Whether it can meet the smooth migration of business and the evolution of future architecture;
- Whether it has the ability of layered decoupling, focusing on decoupling the database with the underlying hardware, operating system, and middleware;
- Whether there is enough talent reserve and capital investment to ensure the long-term evolution of the product and business bottom line;
- Whether there is a wide range of industry practice cases;
- Whether it can achieve completely independent research and development;
- Whether it is compatible with the original development operation and maintenance system, and whether its own technical personnel can quickly grasp it.
2. Infrastructure preparation
The insurance company's core business system originally used more than 60 IBM and HP high-end minicomputers and more than 70 high-end storage systems. The Oracle architecture is highly coupled, making it difficult to achieve linear expansion of scale and performance. The domestic database adopts rack-mounted servers and local storage to fully replace imported minicomputers and traditional SAN storage architectures to meet the cloud-native distributed architecture transformation of the full migration of core systems. At the same time, in order to avoid the instability of the business system caused by excessive changes in the infrastructure, a hybrid deployment architecture of Intel + Haiguang + Kunpeng servers is adopted. In the early stage, it was still dominated by Intel X86, and gradually transitioned to domestic servers with Haiguang and Kunpeng chips. Realize online adjustment of different types of machines, and relieve the dependence on infrastructure supply.
In September 2020, after the official launch of the domestic database migration project, from the model selection of the hardware environment, to the selection of the target system, and the capacity planning, it took less than two months to complete the hardware and operating system adaptation of the domestic database from 0. configuration, and the construction of the entire server cluster.
3. Development of migration strategy
After years of development, the insurance company's business has covered the whole country, with distinctive features, various types, and intricate relationships. The migration of the core database requires extensive research and sufficient scientific demonstration - both the database products are required to be comparable to the original production database. Performance, security and reliability also need to quickly achieve smooth migration of multiple systems, while addressing resource elasticity and the ability to scale out the database. Therefore, the unified norms and standards for database migration implementation are established, and the scientific methodology of evaluation-implementation-control-analysis improvement is generally followed, and orderly migration is carried out, and three migration strategies are set:
- First relocate and then do business and architecture transformation and upgrading to avoid simultaneous occurrence of multiple variables and affect business continuity. The original data model will not be transformed, and the main transformation work will be undertaken by the new database;
- The migration batch is based on the business system, from low load to high load, from peripheral to core;
- It takes 1 year to complete the full migration and transformation of the databases of all business systems. The time window for all system database migration actions is only for Saturdays and Sundays from 0:00 am to 6:00 am. Small traffic verification on weekends, key guarantees on Mondays, does not affect normal business development .
Internet Core Migration
1. Business Background
Although the core of the insurance company involves many systems, it is mainly divided into: Internet core and traditional core, and asynchronous decoupling is realized through a bus mechanism similar to ESB in the middle.
Since 2016, the insurance company's Internet core and traditional new core applications have been transformed from a traditional monolithic architecture to a distributed microservice architecture. By 2020, the core Internet business system has been split into more than 40 microservice modules and meshed access has been completed. The core features of the Internet are:
- The database system has achieved physical and logical centralization nationwide, and there are many associated systems for database docking;
- Although the microservice is split, the database still has a certain amount of stored procedures, and advanced functions such as triggers, custom types, functions, foreign keys, and partition tables are used;
- Because of the business characteristics, to serve more than 1 million agents well, it has higher requirements on the flexibility and performance of database resources.
Therefore, the main technical challenges facing database migration at the core of the Internet are:
- A single point of failure under a national centralized deployment will affect the whole country;
- As the entire insurance account opening portal in the core business link, the main data system is connected to 43 related systems internally, with a data scale of over 20TB, the largest single table exceeding 5 billion pieces of data, and the daily interface call volume exceeding 20 million times. The system with the largest average daily database request has many related systems and is at the core of the business link, so the efficiency of the database SQL is very high, and the migration process cannot affect the original production system;
- Migrating to a new distributed database platform requires the ability to synchronize to Kafka in real time and be compatible with the original format for consumption by downstream big data systems.
2. Technical solutions
(1) Overall selection
In response to the above technical challenges, PolarDB, which is closer to the original Oracle RAC architecture, was chosen as the replacement for the core Internet database. The main features of PolarDB as a new generation of cloud-native database are as follows:
- Computing and storage are separated, and shared distributed storage is used to meet the needs of business elastic expansion. Greatly reduce the storage cost of users;
- Read-write separation, one-write multiple-reading, the PolarDB engine adopts a multi-node cluster architecture, and there is one master node (readable and writable) and at least one read-only node in the cluster (a maximum of 15 read-only nodes are supported). Write operations are sent to the master node, and read operations are evenly distributed to multiple read-only nodes to achieve automatic read-write separation;
- Based on K8S deployment, it provides minute-level configuration upgrade and upgrade, second-level fault recovery, global data consistency and complete data backup disaster recovery services;
- The centralized architecture does not need to consider the design of the distributed architecture, which is consistent with the original usage habits, and the performance is not lower than the original database;
- It is highly compatible with the Oracle database, and the application basically does not need to adjust the SQL syntax.
(2) Migration method
In order to avoid the impact on the original production business and ensure the strict consistency of the migrated data, the DTS full + incremental method is adopted. For Oracle database clusters with large data scale, such as the customer master data system, the data migration chain is started 2 weeks in advance Before the full data migration, DTS will start the incremental data pull module, and the incremental data pull module will pull the incremental update data of the source instance, and parse, encapsulate, and store it in the local storage.
When the full data migration is completed, DTS will start the incremental log playback module. The incremental log playback module will obtain incremental data from the incremental log reading module, and migrate it to the target instance after reverse parsing, filtering, and encapsulation. The end primary key guarantees the uniqueness of the data. After the application is switched successfully, in terms of the response speed of the application interface, the performance is improved by about 30% compared with the Oracle database. By the end of 2020, the two parties will work together to complete the migration of all core Internet modules, including a billing system app serving over one million agents, a life insurance app with over 100 million registered users, and a total of more than 40 business systems including customer master data.
In order to reduce the impact on downstream big data consumption during the migration process, a two-step strategy is adopted for the transformation of the synchronization link to big data.
The first step is to increase the reverse real-time synchronization between PolarDB and Oracle. The original synchronization link between Oracle and Kafka remains unchanged to avoid too much changes caused by database switching;
The second step is to carry out customized development and transformation of DTS with reference to the format of SharePlex. After the verification is sufficient, the original synchronization link of SharePlex is directly replaced.
(3) Main challenges
After the migration is completed, PolarDB, as the core database of the Internet, needs to stably support the business sprint in the first quarter of 2021. The front-end ordering system is the focus of the entire performance pressure, and due to the microservice transformation, it has been split into more than 30 modules and scattered in multiple databases. Any database may be at risk of being exploded. , before migrating to PolarDB, it was dismantled in multiple Oracle RAC clusters, relying on internally developed database monitoring to complete the monitoring of multiple Oracle clusters. After migrating to PolarDB, the overall architecture will be more adaptable to the challenges of business flexibility:
- Unified management and control: Unified management and control of clusters composed of multiple machines through PolarStack to provide DBaaS services;
- Resource elasticity: The instance is deployed from the original physical machine to the K8S Pod deployment, which is more flexible and elastic;
- Read-write separation: The intelligent proxy service realizes automatic read-write separation, realizes minute-level expansion, automatic switching in failure scenarios, and the application does not need to be adjusted.
The business sprint day passed three peak times: 12:00, 17:00, and 21:00. The number of orders issued per hour and the number of orders issued throughout the day entered the top three in history, and the number of orders issued during the peak period reached 9,000. pen/s.
(IV) Migration process
- In September 2020, the first batch of Internet core application modules were migrated to PolarDB, and the entire adaptation process took less than a month. Since then, various modules of the core of the Internet have begun to migrate on a large scale;
- In November 2020, PolarDB completed the largest single-database customer master data migration;
- At the end of January 2021, PolarDB, as the database of the core Internet ordering system, will stably support the insurance company's business sprint in the first quarter of 2021.
legacy core migration
1. Business Background
The traditional core system of this large insurance company has a long history. It was built before 1998 or between 2004 and 2008. The time span is long and the amount of data is extremely large. The data size of a single database even exceeds 20TB. What is even more challenging is that many old cores are split by province and city, and if they are to be migrated by province and city, there may be as many as 36 databases that need to be migrated for a single old core system. Generally speaking, the traditional core can be divided into three types of systems:
The first category: There are about 13 new core systems developed based on the Java technology stack in 2016 and 2017;
The second category: There are about 6 old core systems built between 2004 and 2008 before 1998.
The third category: some systems that may be offline and are not within the scope of this database migration.
The main technical challenges faced by these traditional core databases at the time were:
- The relationship between the systems is complex, there are both the policy platform management system and the capital system, and the relationship between the systems is difficult to sort out;
- There are new cores that are both physically and logically centralized, and old cores that are physically centralized and logically separated. The old cores are deployed in provinces, each province will have a set of databases, and the workload of migration is huge;
- It relies more on Oracle's proprietary features, and uses a lot of stored procedures, triggers, custom types, functions, foreign keys, etc. What is more challenging is that the old core uses Pro C (SQL embedded C program) and Tuxedo (Oracle middleware for distributed transaction processing) for policy process processing. There are more than 1,500 Pro C programs involved in a certain annuity system alone. , about 1.4 million lines of code, the business is difficult to transform in a short time;
- The volume of a single database is very large, there are 6 more than 10TB, the largest single database is more than 20TB, and the downtime window is short;
- The transaction volume is large, with billions of database calls every day, and a large number of complex aggregate actuarial and settlement transactions.
2. Technical solutions
(1) Selection plan
In response to the above technical challenges, the distributed database OceanBase was chosen as a replacement for the traditional core. The main features of OceanBase's native distributed database are as follows:
- Adopting a multi-copy architecture based on Shared-Nothing, the entire system has no single point of failure, ensuring the continuous availability of the system;
- Based on LSM storage engine technology, combined with the capabilities of new hardware, it adopts a scalable distributed architecture to provide high performance and scalability;
- The database provides very high application compatibility support for the most widely used database ecosystems such as Oracle and MySQL;
- Although it is a distributed architecture, it is generally not necessary to redesign the application layer accordingly, such as specifying distribution keys, which is basically consistent with the original Oracle usage habits;
- OceanBase database is completely independent research and development, does not rely on external open source code, truly independent research and development.
(2) Migration method
Comprehensive verification was carried out for the complex database situation of the traditional core, and finally a 140-page migration operation manual and a detailed cutover calendar were formed, which accumulated valuable experience for the subsequent system migration and large-scale promotion, and formed a standard migration cutover. Accept the plan.
The overall migration method process is as follows, from the basic environment preparation-migration process drill-formal cutover-monitoring operation and maintenance and other four major links are dismantled one by one, and implemented to people, accurate to points.
For the migration of large-scale Oracle databases, we have summarized the following four points to help improve migration efficiency:
First, the separation of hot and cold data.
In general business database data, data has its own life cycle, and the high-frequency access of data has the characteristics of hot and cold. For example, the historical data of the running water table and the historical data of the log table are rarely or even not accessed except in the audit review scenario, but usually this part of the data will be relatively large, and the data migration cost will be relatively high, which will prolong the data migration time. . For this part of the data, we can archive and backup this part of the data in advance, and then use static migration or use OMS tools to migrate the whole data separately.
Second, LOB type data.
The Oracle data table row LOB type occupies a large space, and the data pull size of each batch will increase significantly on the basis of the original row. Compared with the non-LOB data type, the memory demand on the OMS side is several times higher. Therefore, the optimized strategy is to establish a new link for the LOB type table separately, and use less concurrency to prevent the risk of JVM OOM. At the same time, To improve the overall migration speed, multi-link parallel migration is performed.
Third, there is no LOB type data.
Compared with LOB type data, the migration of tables without LOB data type has a small and stable unit migration batch size, controllable memory requirements, moderately increased concurrency, and improved migration speed. Therefore, single-link or multi-link migration with higher concurrency can be used for this part of the data.
Fourth, multiple database migration links are migrated concurrently through different OMSs.
A single OMS can support multiple migration tasks, but share data network exits. In view of the continuous pulling of big data library data, the big data library migration can be distributed to different OMS nodes to reduce the contention of big data network traffic.
(3) Main challenges
But the hardest part is the adaptation to Pro C. Pro C is an application-specific development tool provided by Oracle. It uses C language to write programs, and can directly embed SQL statements in the program source code to perform database operations. Pro*C is not only compatible with the development mode of the traditional C language, but also provides powerful database manipulation capabilities, so it also has a large user base in the insurance industry and other industries.
As the earliest distributed transaction product, Tuxedo (Transactions for Unix, Extended for Distributed Operations) was introduced by AT&T in the 1980s. In the traditional old core business, Tuxedo is widely used to call related Pro*C programs to process the policy business process to ensure the consistency of cross-database transactions.
In order to fundamentally solve this problem and achieve smooth migration of applications, Alibaba set up a project team to develop a Pro C compatible precompiler and runtime library from scratch in one month. Before the National Day in 2020, the precompiler was successfully compiled. All the more than 1,000 Pro C programs of an annuity business , and correctly run through two typical batch jobs, passed the company's acceptance, and the progress far exceeded the company's expectations. Therefore, the company successfully won the horse race and won the company's approval. Confidence in OceanBase's product development capabilities.
The adaptation of the old core is completed in a short time, thanks to:
- Always adhere to independent research and development, the research and development personnel have excellent personal ability, know the ins and outs of each line of code of the product, and can add and modify code quickly and with high quality, truly achieving independent research and development;
- The research and development model of the full link is opened. Pro*C is only an external interaction mode, and the bottom layer also depends on the kernel capabilities of the database. From the SQL mode, optimizer, server, etc., to achieve the full link, such as research and development in the batch job site During joint debugging, when it was found that SQL did not support the 'J' parameter of the to_date function, it was quickly reflected to the SQL module, and the backend completed development, testing and release in just one day;
- In the agile development model, the research and development and testing of the tough team sit together, and quickly determine and adjust the goals of the day with the progress and changes of the project every day. Breaking the boundaries of R&D and testing, while R&D is developing, test students have written single test and integration test cases, and if there is a small progress on the development side, the test will be verified immediately, so that development and testing can be completed nearly simultaneously.
(IV) Migration process
In October 2020, the first traditional new core claims settlement system was successfully launched;
In March 2021, complete the migration of the traditional old core and smallest provinces;
In April 2021, the migration of 13 traditional new cores will be completed;
In August 2021, complete the migration of the last major province of the traditional old core;
In September 2021, the last single library migration of the traditional old core will be completed and launched.
Comprehensive system migration
Another problem encountered during the core migration process of the insurance company is how to systematically and comprehensively migrate. Although the migration of the smallest province will be completed in March 2021, there are still many old cores distributed in 36 provinces and cities independently deployed. In Oracle, each province includes more than 20 schemas. According to the old migration method, each province needs to create more than 20 migration links, which consumes a lot of resources and manpower, and it is difficult to complete in a short time. . Through analysis, the biggest problem of engineering batch migration is that the whole process is not automated, and there are still many manual steps. In order to solve this problem, the production and research and on-site delivery students have done three things:
- The OMS data migration tool supports the multi-schema merge operation from the technical level on the underlying link, so that more than 20 links in the same province can be merged into one migration link;
- At the product level, the underlying capabilities of the data migration tool are disassembled, the steps that could not be automated are automated, and exposed through the API, so that the front-line delivery students can assemble and use like building blocks according to the actual situation of users;
- Based on the exposed API and the more than 140-page migration operation manual, the students developed a quick migration tool that simplifies the configuration of the migration link in one month.
After four iterations of the Express Migration Tool, it was put into use. The workload requiring manual intervention has been reduced by 80%. At the same time, a unified specification and standard for the implementation of database migration was established, and orderly migration was carried out. The online implementation standard process includes 8 major links, 98 steps, 5 times peak pressure measurement, and the 8 major links of systematic migration are as follows:
- Compatibility evaluation: clarify the scope of changes, evaluate the workload of adaptation and transformation, and reasonably arrange work tasks;
- Load evaluation: Obtain SQL load information from the original database and play it back in the new database test environment to verify the performance of the new database after application;
- Test migration, adaptation transformation: carry out adaptation transformation, full regression testing, and performance testing. Conditional systems (better microservices, refactoring, etc.) can be transformed and migrated in batches. Among them, the performance test can be based on the critical business capacity baseline before the migration to determine the test standard;
- Production inventory and incremental migration: For systems with low business continuity requirements, data migration is generally done in a one-time inventory mode; for systems with high business continuity requirements, full + incremental migration methods are adopted, and data needs to be incrementally migrated. To implement the switch after tying up, you only need to suspend the business at the switch time (minute level);
- Reverse flow: For key applications, data synchronization can be implemented back to the original database to deal with unknown risks;
- Data verification: After the migration is completed, data accuracy verification and application verification are carried out;
- Continuous monitoring: monitoring, detailed evaluation and analysis of possible problems;
- Online stress testing: After the migration is completed, online stress testing is carried out on a regular basis, and full-link stress testing is performed based on actual production scenarios to continuously ensure application capacity and performance.
In May 2021, the migration of a province in the west was successfully completed within 2 hours, which verified the important technical difficulty of multi-schema merge migration on the Oracle side. Compared with the previous one, it has improved several times and cleared the obstacles for the parallel migration of the remaining provinces. , after optimization:
- Test environment: independently perform data migration and stress test playback, and use the SQL automatic optimization and suggestion tool, which greatly improves the efficiency of migration verification and can solve more than 90% of the problems by yourself;
- Production environment: The time-consuming and laborious steps that require manual inspection in the process are automated.
Then completed the data of three provinces in Northeast China and four provinces in Inner Mongolia. In the process, the problem of invisible control character dirty data at the Oracle source was solved, and the data was accurate.
In August 2021, after the previous 11 migrations, the last and most important province database migration with the largest data scale was finally completed.
In September 2021, after solving all technical problems, completing the migration of all core databases, and going through the test of a good start, to complete a complete business cycle of an insurance company, only the last level is left, which is the actuarial .
Actuarial insurance is a feature of insurance companies' business operations. It refers to the use of knowledge and principles of mathematics, statistics, finance, insurance, and demography to solve items that require accurate calculation in commercial insurance and various social security businesses. It is usually carried out at the end of the quarter and the end of the year to measure the operating conditions of the enterprise and formulate more market-competitive insurance products, which is an indispensable key part of the insurance business.
The characteristics of actuarial analysis are the large amount of data, the complex model of the analysis, and the writing of a large amount of data in the process, which often lasts a week or even longer. In addition, it is necessary to ensure that the data at the snapshot point cannot be changed during the actuarial process. Based on the traditional IOE architecture, snapshots of the storage layer are often implemented.
After migrating to a distributed database, how to ensure that the insurance actuarial can be completed without stopping the application is the last obstacle in the entire migration process. After repeated evaluations, Alibaba Cloud has formulated the best solution for this, benefiting from the fast physical backup and table-level recovery capabilities of OceanBase's underlying data blocks. After nearly a month of stress testing and verification, the cluster recovery speed reached 800MB/S, which fully meets the time requirements for actuarial backup and recovery. Finally, on September 30, 2021, the data was backed up within the specified time window and imported into the actuarial database, which effectively supported the fully migrated insurance actuarial business and solved the last remaining small tail.
Summary of main issues
Of course, the migration process was not completely smooth. Although no major production accidents occurred, there were several failures in the process. These failures not only reflect the improvement of domestic databases' ability to face complex scenarios, but also reflect the fundamental changes brought about by the distributed architecture.
1. The database connection is full and triggers high-availability switching multiple times
The biggest problem encountered during the migration of the Internet core to PolarDB was in January 2021, when two important systems for C-end users completed data migration and application cutover in the early morning of that day. With the gradual increase of business traffic during the day, the two systems accumulate many application connections due to a large number of slow queries, blocking the database service, triggering automatic high-availability switching of PolarDB instances many times throughout the day, and executing the node reconstruction and recovery process.
Database service nodes deployed in the form of cloud-native containers are not only limited by memory parameters related to their own databases, but also limited by the CPU and memory specified by the cgroup. After the connection pool was full, the memory exceeded the limit, causing multiple high-availability switching and rebuilding of the instance. The deployment of cloud native databases based on containers requires many enhancements in terms of stability and self-preservation capabilities. In order to solve related problems, functions such as global plan cache, resource manager, parallel log playback, and global index have been added in subsequent versions. The kernel parameters of the database Customized optimizations have also been made one by one for financial scenarios.
In response to the extremely high demand for stability in financial scenarios, this Internet core migration has also increased many management, control, operation and maintenance capabilities:
- Add AWR function, collect AWR reports regularly to analyze performance and availability;
- Added GAWR function to collect full data on hosts, Dockers, and RW/RO;
- Added online promote function to optimize online switching and further shorten switching time;
- Add the function of automatically disconnecting the Idle state session timeout, reduce the number of background processes, and release and recycle the memory resources of the Idle Session in time;
- Optimize the meta information cache function, optimize the session level meta information cache to the global meta information cache, reduce the memory usage of the background process. Increase the total memory resource management control, set a certain threshold, start Cancel Query, Kill Idle Session, Kill Active Session after reaching the threshold, reject new user session connections, and enhance the self-protection ability of the database.
2. The failure of the SAN switch causes the database to enter an unowned state
Since the original Oracle database was deployed based on SAN storage, when the database migration was started in September 2020, the local SSD disk hardware recommended for OceanBase deployment had not yet been purchased. In order to quickly start the related migration work, the traditional new core cluster of OceanBase was initially deployed on SAN storage, which also laid a hidden danger for the first production problem.
After the first traditional new core application claims settlement went online, the system ran relatively smoothly. The accident occurred at 14:7 pm one day, and the system received both application monitoring and database monitoring alerts. Monitoring shows that the application has been blocked for 90 seconds. However, while the teams on both sides were still troubleshooting, the database was automatically restored.
After in-depth analysis, it was found that a port connecting the SAN storage switch to the core switch was faulty. Although multi-path forwarding is configured, the automatic master selection operation of OceanBase is triggered due to the mismatch between the timeout time of the operating system kernel and the OceanBase switching master time. In the process of master selection, the same port that another physical machine also goes through also has IO blocking problem, which eventually causes OceanBase to enter the state of no master. When the multi-path software is successfully switched, OceanBase completes automatic recovery without any intervention. . In essence, it is caused by the mismatch between the software timeout parameter and the hardware timeout parameter, and it is also a manifestation of insufficient running-in of the software and hardware system. The adjustment of the relevant parameters can reduce the RTO time.
Before this failure, everyone's understanding of OceanBase stayed at the PPT level: RPO=0, RTO<30 seconds. It was not until this failure that I really felt how important the fast switching and automatic recovery capabilities were in the event of a failure. However, when the failure occurred, there were also voices of doubt within the project team: "The deployment of OceanBase based on SAN storage is inherently wrong, and we should not use OceanBase." However, after in-depth analysis, it was found that it was not the problem of OceanBase, nor the SAN storage. The problem is whether there is sufficient running-in and whether the parameters related to software and hardware are the most suitable.
The reason why the IOE architecture has become the best combination under the centralized architecture is that after extensive practice and tempering in various scenarios, both software and hardware can provide services in an optimal state. Finally, after this incident, everyone realized that adjusting the parameters could not solve the problem fundamentally. OceanBase, which was originally deployed on SAN storage, was migrated to local disk hardware devices, and then gradually evolved into a two-location, three-center, multi-active deployment architecture.
3. Execution plan jumps lead to business freezes
If a database vendor says that it is 100% compatible with Oracle and guarantees that there will be no problems in the migration process, it must be boasting. Even if the stress test is sufficient in advance, and all business scenarios are covered as far as possible. However, there is still a question mark on the system stability and compatibility after the cutover goes online. The key lies in whether there is timely and effective monitoring, and rapid emergency measures when problems occur. After all, for applications that have already been put into production, emergency is still the first priority.
On a weekend in November, slow SQL occurred in the claim settlement system, which caused the bill summary operation of the claim settlement application system to freeze and time out. Why has the system been running stably for more than half a month without any business changes, but problems occurred during the low business peak on weekends? On-site delivery experts quickly located the cause after analysis, and the execution plan of OceanBase jumped, resulting in the wrong execution plan.
After in-depth analysis, OceanBase, like other databases, uses the execution plan cache (Plan Cache) to skip the parsing phase for the same SQL (with different parameters) and avoid regenerating the execution plan to improve SQL performance. However, the parameters passed in in the actual scene are often different. Just like Taobao Double 11 has hot inventory, and there are also large and small agency numbers in the insurance industry. Although the SQL looks the same, because the parameters passed in are different, the optimization methods and execution paths are also different.
In order to select the optimal execution plan, Oracle database will periodically collect the statistical information of data objects (such as the maintenance window every night), eliminate the old execution plan, and enable the new execution plan to follow the actual data statistics. Generate more accurate and optimal execution plans. OceanBase adopts a similar method, but because OceanBase freezes and merges data every day, and incremental data is placed on the disk, the actual data information (rows, columns, averages, etc.) of database objects will change greatly. Therefore, after merging, the plan cache will be cleaned up, and the execution plan will be generated and cached according to the parameters passed in for the first time. By default, only one execution plan will be retained. Because the parameters passed in for the first time at the weekend were not generally representative, the subsequent execution plan went wrong, resulting in performance problems.
Execution plan jumping is a relatively common database performance phenomenon. Oracle has successively launched Cursor Sharing, Outline, Bind peeking, ACS, SPM and other means to optimize and improve. However, the problem of wrong execution plans cannot be completely avoided in production. The compatibility and optimization of database products from Oracle 99% to 100% is also the most difficult, and it cannot be achieved overnight. For such a small probability event, emergency has become the last resort. Under the premise of not moving the application, the flexible binding execution plan through the database is a more effective and easy method to implement when an emergency occurs. In the actual entire migration process, the project team has been very familiar with the occasional execution plan jump, and has not brought any unexpected impact on the migration.
overall effect
Over the past year, nearly 100 business systems have fully upgraded domestic databases, becoming the first large-scale financial enterprise to complete 100% domestic database upgrades for core business systems. The database OceanBase and PolarDB account for more than 97%.
Through the comprehensive replacement of the database, the comprehensive security guarantee capability for data assets is realized, and the following are achieved:
- 100% secure and controllable database technology stack, get rid of the dependence on Oracle database;
- Get rid of reliance on minicomputers and high-end storage;
- Promote the maturity of cloud-native and distributed database applications, from usable to easy-to-use and achieve performance improvements;
- Centralized management and control of database services, significantly reducing hardware and overall operation and maintenance costs;
- Real real-time scaling and high availability, easy to deal with big promotions.
From a completely closed system architecture to gradual opening to full opening, it truly realizes the independent control of the core technology of the database. Thanks to the elasticity and resource pooling capabilities of the cloud-native architecture and distributed architecture, "one library with multiple cores" can be realized since then, and tenants can be switched to Haiguang server nodes with just one command, realizing the smoothness of domestic hardware. replace.
After the migration, due to the efficient compression capability provided by the distributed database, the storage capacity is only 1/3 of the original size. In addition, the high-end minicomputers are migrated to domestic rack-mounted servers, saving nearly 200 million yuan in equipment investment.
The utilization rate of database servers and storage cabinets increased by 300%. The power of the equipment is reduced to 1/3 of the original. It is estimated that nearly 10 million kWh of electricity can be saved throughout the year, providing a steady stream of green momentum for the company's digital transformation, effectively implementing the national dual-carbon strategy, and reducing the company's incremental carbon emissions due to self-built data centers. .
Conclusion
Today, the core business of most domestic financial institutions is still running on foreign databases. This is a reality that we cannot avoid. The replacement of databases is not only a product replacement, the purpose of replacement is not just for the word "domestic", but also The important thing is: technology must improve; the new system after replacement must have the capabilities that the old system and foreign products do not have, not only performance and stability, but also the ability to support business agilely, and face massive business and uncertainty. The processing power at peak business hours, and the financial-grade high availability to a higher level.
Over the years, we have read many articles on the analysis and imagination of database replacement, but in the face of the actual large-scale and complex technical platform replacement of core application systems, there are still many unexpected things in the process of various "analysis" articles. There are many problems, especially for the various adaptations and compatibility of the existing operating environment, the friendliness of applications, etc. Regarding these, Ali has taken a solid step and accumulated precious experience, which are all for the future domestic process. gives a good demonstration effect.
About the author
Liu Weiguang
Vice President of Alibaba Group, General Manager of Alibaba Cloud Smart New Finance & Internet Division. Before joining Alibaba Cloud, he was responsible for the commercial promotion and ecological construction of fintech and the business development of Ant Blockchain at Ant Financial; he has been deeply involved in the enterprise software market for many years, and once founded the Pivotal Software Greater China branch company and founded the enterprise It is the first in the market for high-level big data and enterprise-level cloud computing PaaS platforms.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。