Author: Xiong Lang, senior database architect of Ping An Technology, has rich experience in relational and non-relational distributed database technology.
Ping An Technology is an expert in technology solutions under Ping An Group, fulfilling the corporate mission of "Technology Empowers Finance, Technology Drives Ecology", empowers the Group's financial services, healthcare, automotive services, and the construction of smart city ecosystems, and is committed to becoming an international leader technology company.
Introduction to UbiSQL
The word UbiSQL may be unfamiliar to everyone. UbiSQL is a distributed database product built by Ping An Group. The code is based on TiDB and is fully compatible with TiBD 4.0. On top of the characteristics of TiDB, UbiSQL has improved its stability, security and applicability, creating a financial-grade distributed database with independent and controllable kernel source code, providing a one-stack HTAP solution.
UbiSQL's plan is to provide financial-level security capabilities, such as encryption algorithms, enhancements to TDE's transparent algorithms, and enhancements to internal cluster management. Because the number of clusters will be increased to thousands in the future, we have strengthened the management of the clusters and consolidated the monitoring. In addition, UbiSQL provides the separation of hot and cold data, and supports the separation of cold data of the cluster to SATA disks, thereby reducing storage costs.
The process of migrating from Oracle to UbiSQL
Next, I will share a more detailed Oracle migration practice, which is a summary of our work in Ping An Group for many years. I hope it can be used for reference. The amount of data migrated by the group's core payment system is about 8 T, because they are all rac nodes. In order to avoid mutual influence between nodes, they are migrated to two UbiSQL instances.
Figure: Comparison of clusters before and after migration
The architecture of UbiSQL is through F5 load balancing, hitting the TiDB clusters in the three data centers. F5 is deployed in the three computer rooms, and the corresponding UbiSQL instances are accessed through DNS. UbiSQL's own Raft protocol realizes strong and consistent data synchronization, and then performs asynchronous replication through the Drainer tool and replicates it to a remote disaster recovery cluster.
Figure: UbiSQL architecture after migration
The first is the analysis of the status quo before the migration, which mainly includes three aspects:
The load of the database and the analysis of the financial, object, and stored procedures of the migrated SQL determine whether it can be migrated. This part of the analysis is obtained by the DBA through code scanning and reporting.
For application analysis, the application layer needs to see whether the new database can be adapted and compatible with the application, understand the structure of the application layer and the corresponding JDBC version.
For the analysis of the associated system, many Oracle databases are synchronized through OGG or ETL, Kettle or calling a third party, and the interface calling situation also needs to be sorted out accordingly.
Next, based on the results of the analysis, make the selection of the database:
Whether to choose RDBMS or other open source databases or distributed databases, we need to look at the requirements of the application layer, whether to achieve dual-active, rapid expansion, or distributed, and then create the corresponding database according to the requirements. selection. The schema design scheme of the database needs to do more work, or NG is enough, and it also needs to be designed according to the corresponding business needs.
The splitting and decoupling scheme of the application system determines whether the application is to be migrated in its entirety or only a part of it, and then start the preliminary transformation of the application.
Linked data synchronization scheme design, this part is the synchronization of OGG or other ETL data, distributed database and open source PG, currently does not support OGG as the source, so it is necessary to use a third party, such as Kafka and other tools to do the corresponding data synchronization.
Then there is the design of the corresponding migration plan and fallback plan. It is necessary to make corresponding manpower planning, and also look at how much manpower and cost are invested.
After these preparations are done, we start to build the database, development environment, test environment, and production environment. After we are ready, we will start data migration. To migrate some table structures, objects, etc. sorted out earlier to the new UbiSQL, we need to compare the table structure and data. This part will be introduced in detail later. .
The transformation and testing of application functions is the most important and core part. Developers need to transform the corresponding functions, SQL and stored procedures. This is the core part of the entire de-O. If the stored procedures are large, there are dozens of them. Wanxing and the like, need to evaluate the human input. The stored process of the project we removed from O this year has about 20,000 lines, and about 10 people have been invested in it for about two months to convert the entire about 100 packages into java code. The human investment in this piece is quite large. of. After this part is completed, the full regression test of the application layer function is performed to verify whether the functions of the development and transformation are completely consistent, and whether it is completely consistent with the original Oracle logic. The next step is to perform application performance stress testing, database stress testing, and then the corresponding migration manual.
Next, enter the stage of production and implementation, and migrate according to the actual situation and plan steps. It may be "one migration, one switch", or "multi-batch migration and gradual switch", because there are some core systems that need to be migrated. Safeguards, such as cutting the past can not have any problems, may cut some grayscale data in the past. After entering the parallel operation stage of the production database, we will monitor and report the migrated database accordingly, and then take off the old Oracle database and rollback link to summarize the project.
Traffic replication and playback scheme
Focus on analyzing our implementation of traffic replication and traffic playback. First of all, why is there traffic duplication and traffic playback? Traffic playback is because after Oracle is converted into java code, I don’t know whether the logic is correct or not. It is completely through the tester’s regression test, which may not be fully tested, or may miss important parts. We want to ensure the production business traffic through the traffic playback scheme. Completely do the corresponding application in the test environment.
Traffic duplication is because you can do stress testing. The production traffic is directly introduced into the environment after the migration, and then the traffic is doubled to two or three times for stress testing. After that, you can go to O on the subsequent core system. Doing reuse, thus saving the corresponding cost and manpower.
Figure: Traffic Replication
Traffic replication uses F5 or Ngnix traffic to access all client traffic to the Oracle environment through F5, or mirror the traffic at the F5 layer, and forward the real-time traffic to F5 to the application layer. By accessing UbiSQL, the corresponding business access is achieved through the baffle program. The core of the entire link is F5, and all traffic must pass through F5. If there is no traffic going through F5, there is no way to capture the call or execution of the corresponding SQL. The shutter program is to call all environments according to the production environment. If you access other databases, you do not need to do the corresponding access. There will be a shutter program in this place. This baffle program is to directly access other databases, get the results directly, and not execute them in other databases, otherwise there will be data confusion.
Figure: Traffic playback scheme
Traffic playback is relatively complex, mainly implemented by developers. The main implementation logic is to access the corresponding function point through Traceid through the call of the peripheral system. There will be an Agent above the function point, which will record the situation and parameters of the corresponding call, which interface is called, and store it on the log platform. . Then, through the storage in the log platform, this part is stored, and after storage, the UbiSQL of the subsequent payment system can be called, and the data can be played back.
Because it is stored, you can call it whenever you want. There is also a corresponding baffle program in this place. After that, we can compare the Oracle and UbiSQL databases, because the previous SQL is the same and full, then we can restore the Oracle data to a certain point in time through recovery, and then restore the UbiSQL database. The data of the day is played back by means of traffic playback. The data on both sides are basically the same, and then the data on both sides is compared through the data comparison tool to verify whether the functions are completely consistent.
The traffic replication or playback solution builds a production bypass verification environment and performs production-level verification of the fallback link in the bypass environment. Compared with the traditional database playback, this solution carries out the verification at the application + database overall architecture level, reducing the risk of launching new technologies in major core business systems. After the parallel verification is passed, and the data migration and comparison are completed, it can be switched online directly, which reduces the production time.
Data comparison and switching scheme
For the scheme of data switching and data comparison, first look at the part of data comparison. There will be a data comparison process when migrating Oracle. There is a Ludbgate tool within Ping An Group that can realize the transformation of table structure, synchronization of full data, synchronization of incremental data, and comparison of full data and incremental data. Yes, it is a full-link tool for data comparison throughout the migration process.
The general logic of data comparison is that when data synchronization starts, the corresponding log points will be recorded in the log, and full synchronization will start. After the full synchronization is completed, incremental synchronization can be started because the corresponding start time point is recorded. In the future, incremental synchronization can be done through this time point. While doing the incremental, a shadow table will be created in the middle MySQL. The function of this shadow table is to record the primary key of all dml operations of the synchronization table after the process starts. value.
After the data synchronization is completed, the corresponding full-scale synchronization can be performed. At this time, all incremental and full-scale data are constantly changing, and a full-scale comparison needs to be performed. Based on this time point, the data on both sides can be compared, regardless of whether it is static or not. The full comparison is mainly done by dividing the data into certain partitions. First, the table records are partitioned for multi-process processing. Each process goes to Oracle to obtain the corresponding record according to the rowid of the corresponding partition and calculates the md5 value, and obtains it on the UbiSQL side at the same time. To this primary key value, and by querying the data of the entire row, calculate the md5 value for comparison, if the comparison result is inconsistent, it will be stored in the shadow table.
After the full comparison is completed, the incremental comparison will be started. The incremental comparison will be started once every hour. The primary key value in the shadow table will be taken to Oracle to read the corresponding record, the md5 value will be calculated, and then obtained in UbiSQL. Correspondingly record and calculate the md5 value, and then compare it. If it is consistent, delete the primary key value from the shadow table. If it is inconsistent, the corresponding comparison will be made again at the next startup, which reduces the comparison time. to minimum downtime.
Figure: Data comparison and switching scheme
The data switching scheme accesses the UbiSQL side through Oracle Ludbgate. Oracle has links to other Oracle OGGs. The migration is through partial traffic switching. Because it is a tariff system, it needs to ensure that the migration is complete without any problems. When it comes to money, everyone thinks it is more troublesome, so we do it according to the method of cutting some users.
We cut a small amount of traffic and make corresponding function switches on the application side to cut a small amount of data to the UbiSQL side, and then verify the application. If there is no problem, the traffic will be switched to the UbiSQL side bit by bit. In addition, it has corresponding synchronization to third parties, which will be enabled to kafka and synchronized to other Oracle databases. This solution has a shortcoming, that is, the rollback mechanism cannot be guaranteed, because after writing to UbiSQL, the newly added data of UbiSQL cannot be written back to Oracle. That is to say, this part of the data is no longer needed when rolling back. We have put forward corresponding requirements to the TiDB community, whether it is possible to distinguish user writes in TiDB, so that we can achieve two-way synchronization between Oracle and TiDB, and there will be no problems that cannot be rolled back.
Figure: Performance comparison after migration
Finally, look at the performance comparison after the migration. You can see that because they are all tariff systems, the average response time is very short. After migrating to UbiSQL, the general statistics are carried out. When the Count statement is executed, the performance is improved. In other queries or There is a loss in insertion performance, but the loss is not large, basically within 10%, and the application side is acceptable. Using this method, we migrated unified payment, CF2 customer relationship system and workflow system, which are core business systems within Ping An Group.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。