3

background

Some time ago, due to business needs, it was necessary to sub-database and sub-table of the core database. About 4 billion data were migrated. Record this here to see the pros and cons of this solution later.

Write in front

Why dismantle the library and dismantle the table?

With the rapid development of the company's business, the amount of data in the database has soared, and the access performance has also slowed down. Optimization is imminent. Analyze where is the problem? The relational database itself is relatively easy to become a system bottleneck, and the storage capacity, number of connections, and processing capacity of a single machine are limited. When the data volume of a single table reaches 1000W or 100G, due to the large number of query dimensions, even if the database is added and the index is optimized, the performance will still drop severely when many operations are performed.
In response to this situation in the production environment, we usually have two ways to deal with it, soft and hard. "Hard" refers to the improvement in hardware, that is, we usually talk about adding storage, adding CPU, etc. The cost of this solution is very high (ps: ignored by the rich), and it will be uncomfortable if the bottleneck is not in the hardware. "Soft" refers to the division at the design level, that is, we will break up the table and split the pressured library (a single spark can also start a prairie fire).

Several common ways to dismantle the meter

Sub-library sub-table includes two parts: sub-library and sub-table. In production, it usually includes four methods: vertical sub-library, horizontal sub-library, vertical sub-table, and horizontal sub-table.
Let's first understand the concept of vertical and horizontal:

  • "Vertical" usually refers to dividing a table into multiple tables according to fields, and each table stores some of the fields.
  • "Level" usually means that the table structure will not be changed, and the data will be divided into multiple places according to certain rules.

After we know this concept, we will explain the four common methods of sub-database sub-table:

  • Vertical table splitting: Split the fields of a wide table into multiple tables according to the frequency of access, the principle of whether it is a large field, or other specific rules, which can not only make the business clear, but also improve part of the performance. After the split, try to avoid joint inspections from a business perspective, otherwise the performance will outweigh the gains.
  • Vertical sub-library: categorize multiple tables according to the tightness of business coupling and store them in different libraries. These libraries can be distributed on different servers, so that the access pressure is loaded by multiple servers, which greatly improves performance and improves the overall structure of the business. Clarity, different business libraries can be customized and optimized according to their own conditions. But it needs to solve all the complex problems brought about by cross-library.
  • Horizontal sub-database: divide the data of a table (according to data rows) into multiple different libraries, each library has only part of the data of this table, these libraries can be distributed on different servers, so that the access pressure is loaded by multiple servers, greatly Improve performance. It not only needs to solve all the complex problems brought about by cross-database, but also solve the problem of data routing (the data routing problem will be introduced later).
  • Horizontal sub-table: divide the data of a table (according to data rows) into multiple tables in the same database. Each table has only part of the data of this table. This can slightly improve performance. It is only used as a horizontal sub-database. Supplementary optimization.

Generally speaking, in the system design stage, the vertical sub-database and vertical sub-table scheme should be determined according to the tightness of the business coupling. When the amount of data and access pressure are not particularly large, the first consideration is to cache, read and write separation, and index technology. If the amount of data is extremely large and continues to grow, then consider the horizontal sub-database level sub-table plan. Here we also need to consider a question. How can we smoothly and losslessly migrate the existing business from the original single database single table to the new shard database shard table? (The reader can think about this question, which is also the focus of this article).

Problems caused by sub-database and sub-table

  • The uniqueness of the primary key id.
  • Distributed transaction problem: After the implementation of sub-database and table, because the data is stored in different databases, the database transaction management is difficult.
  • Cross-database and cross-table join problem: After performing database and table splitting, it is difficult to avoid dividing the originally logically related data into different tables and different databases. At this time, table association operations will be restricted , We cannot join tables located in different sub-databases.

Preliminary preparation

Converge all cases of directly connected DB

Why is there this one? Because in our business code, it is difficult to avoid the direct connection of DB in some business scenarios due to various problems. If it is for your own team, it will cause unpredictable consequences if it is not. If this part of the work is relatively standardized in the project and the DB monitoring is done well, it is better to investigate, otherwise it will be difficult to sort out comprehensively, so the monitoring and specification are not useless. If you say no, dismantle the library and test the table. Try it.
In addition to the better maintenance of convergence, the business side also has greater control over their own data, and all data writing and reading have clear records (imagine the trouble of not knowing who has secretly changed the data that you maintain ).

Distributed ID generator

We use sub-databases and sub-tables, and the best way to achieve this is to use globally unique ids in different shard tables. At this point, some students will ask, "Why? Even after sub-database and sub-table, each data is split into each table, due to the self-increment of the primary key of the MySQL database, their IDs are independent in each table. When querying, select * from table name, you can also query the corresponding information, hey, this does not require a unique ID." But let’s consider it from another perspective. For example, the order volume of the e-commerce scene is huge, and the order data is stored in the database. The database must be sub-database and table. Hey, have you found that everyone’s order number is definitely different? It reflects the global unique ID. Of course, the students will say again, can’t I open a field to store this order id separately? This is just me, Shaoxia's subordinates show mercy. For details, please refer to: ID generator detailed explanation

Sort out the impact of ID type changes on relying parties

Now that we have adopted a globally unique id, we have to consider the impact on the relying party. There are probably the following points:

  • Does the downstream depend on whether the id of the library table is forced to convert the type, for example, to int32.
  • Does the front end directly read the integer ID, because Javascript's digital storage uses the double-precision floating-point data type specified in IEEE 754, and this data type can be safely stored -(2^53-1) to 2^53- Values between 1 (including boundary values). JSON is a subset of Javascript, so it also obeys this rule. The value range of the int64 type is -(2^63-1) to 2^63-1. Use int64 type json for out-of-range numbers, there will be parsing errors.

Sort out the dependence on binlog

Because we need to perform sub-database sub-table operations, the original places that rely on the binlog of the old database should also be modified accordingly.

Check whether all the shard keys can be obtained

This can be handled according to your own business

SOP

A detailed SOP must be formulated and strictly implemented

Scheme design (switching from single database and single table to sharded database table)

Option One

Phase 1: Create a new library and synchronize data from the old library to the new library

  1. Synchronize data according to binlog

Phase 2: Verify data consistency

  1. Verify the consistency of binlog synchronization data

Phase 3: Business cut traffic

  1. Turn on the business code, switch to read and write the new library and increase the volume

Option II

Phase 1: Create a new library and synchronize data to the new library

Phase 2: Stop service && verify data consistency

Phase 3: Business cut traffic

third solution

Phase 1: Double write phase

Double writing is divided into several scenarios, insert&&update&&delete
  1. Situation 1: Normal situation (no failure, the updated record exists in the new and old database)

    1. Insert business side double writes the new database and the old database: The new database and the old database normally insert data, because it is sub-database and table, so the global unique id is used to replace the original auto-increment id, and the historical data retains the original information
    2. Update Update new and old libraries (only records for new libraries)
    3. Delete There is no hard deletion of existing business, ignore it
  2. Case 2: Abnormal situation (the old library fails, the new library fails)

    1. Old library failed (Old library Insert failed, Update failed)

      1. Because double writing is serial, you don’t need to think about it even if it fails
    2. New library failed (new library Insert library failed, new library Update failed)

      1. Insert the new library fails, record the write library information, send the failure compensation message, and repair the data
      2. New library update fails, record write library information, send failure compensation message, and perform data repair
    3. Case 3: Retry with failure message

      1. For the data that the new library fails to write to the library, retry, compare the data in the new and old libraries, skip it if they are the same, and update the new library data with the old library data (locking) if it is not the same, if it still fails at this time, push again To the message queue
      2. When updating data that does not exist in the new database, read data from the old database according to the update information, and then insert it into the new database. This process locks the records of the new and old databases to ensure data consistency

Phase 2: Migrating data (script)

Synchronize data from the old database to the new database, and use insert ignore to prevent conflicts between new and old data

Phase 3: Data consistency guarantee

Compare new and old database data in batches (script)

  1. Same, skip
  2. Different, the old library overwrites the new library (only the old library can be locked)

Stage 4: Reading

  1. New library reading for gray scale heavy volume
  2. Full cut

Stage 5: Migrate downstream dependencies

Mainly dependent on binlog

Phase 6: Stop writing the old library

Stage Seven: Recycle Resources &&Clean Up Switch

Option Four

Phase 1: Create a new library and synchronize data to the new library (open the old library to synchronize data with the new library)

Phase 2: Verify data consistency

Phase 3: Stop serving Rename in milliseconds

Stage 4: Business cuts traffic to the new library, reads and then writes

Phase 5: Start the synchronization of incremental data from the new database to the old database to ensure that the new and old data increments are consistent

Simple comparison of the advantages and disadvantages of the scheme

Option OneOption IIthird solutionOption Four
advantageThe operation is relatively simpleThe operation is relatively simple1. The whole process is non-stop service, smooth migration and no data loss. 2. Zero risk rollback at any stage. 3. The downstream has ample time for migration. 4. It is convenient to read the gray level of the new library.1. The operation is relatively simple. 2. The downstream has ample time for migration. 3. Less business intrusion.
shortcoming1. If the data does not meet the expectations after the traffic is cut to the new database, the data generated during the period is all problem data and the problem data cannot be quickly restored. 2. Depend on the progress of downstream migration. 3. Data that fails to be written during the switching process will be lost. 4. The verification time is short after going online. 5. The cost of going online again after the rollback is relatively high, and the above problems have the risk of recurring.Same as 11. The business is intrusive. 2. Dual writing affects interface performance1. The service is unavailable during the switching process 2. The verification time is short 3. The unclean write switch from the old library to the new library will cause data timing problems (when the old library writes and the new library writes the same data) 4. Again after rollback After going online, there is a risk of recurring the above issues.

Summarize

It is best to adopt different solutions in the appropriate business scenarios.

Follow us

Readers who are interested in this series of articles are welcome to subscribe to our official account, and pay attention to the blogger not to get lost next time~
image.png


NoSay
449 声望544 粉丝