1 Background
The amount of billing data increases sharply, and it is necessary to split the data of the old database into multiple sub-databases and data sharding;
The splitting rule is the payment object (or ID) field, HASH, modulo (32), divided into 32 libraries
2 goals
Realize data migration from the old database to the sub-database according to the sharding rules to ensure smooth data migration, minimize the downtime as far as possible, support rollback, synchronization failure, and support rollback of a single database
3 options
3.1 Implementation based on honeycomb middleware
3.2 Semi-self-developed synchronous data processing program to develop data processing program and consume historical data MQ;
Consumption incremental data MQ synchronizes historical data based on dts (specified time point, synchronization history)
Synchronize real-time data based on JDQ (specify time point, restore real-time synchronization)
3.3 Based on open source middleware strategy
3.4 Completely self-developed data processing tool to develop data query program, send historical data query to MQ and write real-time data to double write to send MQ in a unified manner, which is asynchronously processed and written by MQ
3.5 Scheme comparison
To sum up the overall evaluation, we finally choose to do the overall data migration plan based on sharding-proxy
4 Proxy introduction and construction
4.1 Introduction
4.1.1 The design significance is positioned as a transparent database agent, providing a server version that encapsulates the database binary protocol to complete the support for heterogeneous languages. Currently, MySQL and PostgreSQL (compatible with PostgreSQL-based databases such as openGauss) are available. It can use any access client compatible with MySQL/PostgreSQL protocol (such as MySQL Command Client, MySQL Workbench, Navicat, etc.) to operate data, which is more friendly to DBAs.
It is completely transparent to the application and can be used directly as MySQL/PostgreSQL;
Works with any client compatible with the MySQL/PostgreSQL protocol.
4.1.2 Overall Architecture
The whole architecture can be divided into three parts: front-end, back-end and core components.
The front-end is responsible for network communication with the client, and adopts the NIO-based client/server framework. The NIO model is used under Windows and Mac operating systems, and the Linux system is automatically adapted to the Epoll model. In the process of communication, the encoding and decoding of the MySQL protocol is completed. After the core component gets the decoded MySQL command, it starts to call Sharding-Core to parse, route, rewrite, and merge the results and other core functions of the SQL. The interaction of the backend with the real database currently relies on the Hikari connection pool.
4.2 Build
4.2.1 Keyword Interpretation Download, unzip, install mysql driver, start, finish
4.2.2 Install shared-proxy
Download the installation package, select the appropriate version (4.1.1 is used in this article), and download it from the official website, the official website address https://shardingsphere.apache.org/document/current/cn/downloads
Decompress the installation package (automatically decompress, or command decompression), and specify the decompression directory at will (any directory with permission)
Shared-proxy directory after decompression
shared-proxy configuration directory conf, containing all configuration data
4.2.3 Install the mysql driver Put the mysql driver jar package (mysql-connector-java-5.1.44.jar) in the lib directory of the shared-proxy ShardingSphere-Proxy does not have the mysql driver jar package, you need to manually download the download address https ://dev.mysql.com/downloads/connector/j/
4.2.4 proxy start
Shared-proxy's bin directory start.sh, start through ./start.sh
4.2.5 Remark
The default startup port of Sharding-Proxy is 3307
4.3 Configuration
4.3.1 Keyword Interpretation Six major configurations - log configuration (logback.xml), basic service configuration (server.yaml), logical configuration (four conf configuration files, fragmentation (core) / shadow / read-write separation / encryption configuration)
This example configures the sharding strategy based on server.yaml and config-sharding.yaml
4.3.2 server.yaml
Basic service configuration, consisting of three parts
1) The orchestration governance configuration of shared-jdbc provides data governance functions, including the following:
Centralized and dynamic configuration. (Support data source, dynamic switching of table and fragment read-write separation strategy)
Data governance. Provide the ability to fuse the database access program to the database and disable the access from the library to support the registry of Zookeeper and etcd;
2) Permission configuration, configure user name and password, and authorize database The following example configures two users: root/root and sharding/sharding, where root authorizes all databases by default, and the sharding user authorizes the sharding_db database. The database (schema) here is a logical database, and the corresponding sub-database mapping is configured in config-*.yaml
Proxy data source parameter configuration Configure data link, thread, number of cores, etc.
4.3.3 config-sharding.yaml
Shared-proxy core configuration, sharding rule related configuration, including schemaName, dataSources, shardingRule three parts
1) The mapping configuration of the logical library corresponding to the sub-library data source in the following figure
schemaName logic library name, the authorized schema declared in server.yaml is the schemaName here
dataSources is the data source configuration. In this example, two sub-databases (ds0, ds_1) are mapped. ds${0..1} corresponds to the name of the logical sub-database, and the url is filled in with the actual database.
4.3.4 logback.xml
logback based log configuration
4.3.5 The remaining three configurations
config-shadow.yaml/config-master_slave.yaml/config-encrypt.yaml
For shadow library configuration, master-slave configuration, data field encryption configuration, you can see the link below if you are interested
5 Debug
Based on building ShardingSphere-Proxy proxy, choose direct connection tool client use Navicat or mysql command to directly connect to manual mysql command link as follows
The query without split key searches the entire database by default, and the new default route corresponds to the real database according to the split key
6 Data Migration
Migration in three steps
1) Install sharding-proxy online
2) Data synchronization: Create a migration task and start synchronization. The principle is to create a DTS task
3) Data integrity check full comparison, overall synchronization progress query
Time segment comparison, compare the total amount of the new library and the old library according to the sampling of each time period, and manually verify
Random sampling comparison: The data of a certain period of time in the random new database is compared one by one, and the manual tool verification is manually sampled and queried according to the development tools, and the queried data is compared with the old database. Full data verification: comparison of synchronized data It takes a long time to perform full data verification and verify it according to the DTS tool.
7 Configure the query machine
Configure the logistics designated query machine based on easyops or myops, and query the proxy agent through the query machine.
8 Questions and Conclusions
The biggest problem encountered in the overall data migration process is that the data is unmeasurable. For various historical data problems, data migration is interrupted, resulting in rework, cleaning garbage data, and re-migration.
8.1 The split key is empty and the split key is empty is not supported by default
8.2 Updating split keys
The update statement does not support updating split keys by default (actually 4.x does not support updating with split keys, 5.x already supports updating with split keys without changing) Unknown exception: [INSERT INTO .... ON DUPLICATE KEY UPDATE can not support update for sharding column.]
8.3 Solutions to the above two exceptions
The split key cannot be empty, set the default split key
Update with split key, upgrade sharding-proxy to 5.x or configure synchronous DTS to remove split key update
Author: Ren Hongbo
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。