1
头图

1. Background

When it comes to sub-databases and tables, it is not a new term for most server development. With the development of the business, the amount of data in our tables will become larger and larger, and the fields may gradually increase as the business complexity increases. In order to solve the query performance problem of a single table, we generally divide the table operate.

At the same time, the user activity of our business will become higher and higher, and the level of concurrency will continue to increase, so it may reach the upper limit of the processing capacity of a single database. At this time, in order to solve the processing performance bottleneck of the database, we generally perform a sub-database operation. Regardless of whether it is a sub-library operation or a sub-table operation, we generally have two ways to deal with it, one is vertical splitting, and the other is horizontal splitting.

Regarding the differences and characteristics of the two splitting methods, there are many reference materials on the Internet, and many people have written relevant content. I will not repeat them in detail here. Interested readers can search by themselves.

This article mainly talks about some special details in our most practical and common horizontal sub-database sub-table method, hoping to help you avoid detours and find the most suitable sub-database sub-table design for your business.

[Note 1] The cases in this article are all based on Mysql database, and the sub-database and sub-tables in the following refer to the horizontal sub-database and sub-tables.

[Note 2] The N tables in the M database mentioned later refer to a total of M databases, and each database has a total of N sub-tables, that is, the total number of tables is actually M*N.

2. What is a good sub-database sub-table plan?

2.1 Project sustainability

The volume of the preliminary business data is not large, and when the traffic is low, we do not need to sub-database and sub-table, and it is not recommended to sub-database and sub-table. But once we have to design the sub-database and sub-table for the business, we must consider the sustainability of the sub-database and sub-table plan.

So what is sustainability? is actually: When the business data level and business traffic further increase in the future to reach a new level, our sub-database and sub-meter scheme can be used continuously.

A popular case, assuming that our current plan for sub-databases and sub-tables is 10 libraries and 100 tables, then at some point in the future, if 10 libraries still cannot cope with the user's traffic pressure, or the disk usage of 10 libraries is about to reach the physical level Within a limited time, our solution can be smoothly expanded.

In the following article, we will introduce the doubling expansion method and the consistent Hash expansion method commonly used in the industry.

2.2 Data skew problem

A good scheme for sub-database and sub-table, its data should be relatively evenly dispersed in each library table. If we carry out a brainstorming sub-database and sub-table design, it is easy to encounter the following similar problems:

a. In a certain database instance, some tables have a lot of data, while other tables have very few data. The business performance often fluctuates and fluctuates.

b. In a database cluster, the disk usage of some clusters grows by a special block, while the disk growth of some clusters is very slow. The growth pace of each library is inconsistent, this situation will bring inconsistent pace for subsequent expansion, and the problem of inconsistent operation.

Here we define the maximum data skew rate of the sub-database and sub-table: (the sample with the largest amount of data-the sample with the smallest amount of data) / the sample with the smallest amount of data. Generally speaking, if our maximum data skew rate is within 5%, it is acceptable.

Three, common sub-database sub-table scheme

3.1 Range sub-database sub-table

As the name implies, the program divides the storage location of the data according to the data range.

For the simplest example, we can put the order table as a unit by year, and store the data for each year in a separate database (or table). As shown below:

/**
 * 通过年份分表
 *
 * @param orderId
 * @return
 */
public static String rangeShardByYear(String orderId) {
    int year = Integer.parseInt(orderId.substring(0, 4));
    return "t_order_" + year;
}

According to the scope of the data to sub-database and sub-table, this program is the simplest one, and it can also be used in flexibly combined with other sub-database and sub-table programs. Nowadays, a very popular distributed database: TiDB database. Aiming at the fragmentation of data in TiKV, it is also based on the Range method, and the [StartKey, EndKey) in different ranges are allocated to different Regions.

Let's take a look at the shortcomings of this program:

  • a. The most obvious problem is data hot issues. For example, in the order table in the above case, it is obvious that the database table in the current year belongs to hot data and needs to carry most of the IO and computing resources.
  • b. The issue of adding new databases and new tables. Generally, our online applications do not have database and table building permissions, so we need to create new database tables in advance to prevent online failures.
This is very easy to be forgotten, especially for modules that have been running stably for several years without iterative tasks, or where personnel are frequently alternated.
  • c. The processing of data in the cross-range of business. For example, the order module cannot avoid some intermediate state data compensation logic, that is, it needs to scan the orders that have been in a state of waiting for payment confirmation for a long time through a timed task to the order table.

Here you need to pay attention, because the database and table are divided by year, so on New Year's Day, your timed task is likely to miss the data scan of the last day of the previous year.

3.2 Hash sub-database sub-table

Although there are many schemes for sub-database sub-tables, Hash sub-table sub-table is the most popular and common solution, and it is also the part that this article spends the most time describing.

For the details of the Hash sub-database and sub-tables, there is not much relevant information. Most of them are expounding the concept to give a few examples, and the details are not very in-depth. If you don't rush to reference your own business, it is very prone to various problems in the later period.

Before formally introducing this method of sub-database sub-table, we first look at a few common error cases.

Common error case 1: Data skew caused by non-coprime relationship

public static ShardCfg shard(String userId) {
    int hash = userId.hashCode();
    // 对库数量取余结果为库序号
    int dbIdx = Math.abs(hash % DB_CNT);
    // 对表数量取余结果为表序号
    int tblIdx = Math.abs(hash % TBL_CNT);
 
    return new ShardCfg(dbIdx, tblIdx);
}

The above scheme is a misunderstanding that is particularly easy for first-time users to enter. Use the hash value to take the remainder of the number of sub-databases and the number of sub-meters respectively to obtain the library serial number and the table serial number. In fact, if you think about it a little bit, we will find that, taking the 10 library 100 table as an example, if a Hash value is 0 for 100, then it must also be 0 for 10.

This means that only the 0 table in the 0 library can have data, and the 0 tables in other libraries are always empty!

Similarly, we can also deduce that there are a total of 100 tables in the 0 library, and only 10 tables (table numbers with a single digit of 0) may have data. This brings about a very serious data skew problem, because some tables can never have data, and the maximum data skew rate has reached infinity.

Then it is obvious that the program is a wrong program that does not achieve the expected results. The schematic diagram of the scattered data is as follows:

In fact, as long as the number of libraries and the number of tables are not mutually primed, there will be a problem of no data in some tables.

The proof is as follows:

So as long as the number of libraries and the number of tables are relatively prime, can we use this sub-database sub-table scheme? For example, if I use 11 libraries and 100 tables, is it reasonable?

The answer is no. In addition to considering the problem of data skew, we also need to consider the issue of sustainable expansion. Generally, the later expansion method of this Hash sub-database sub-table plan is through the double expansion method, then 11 libraries After doubling, and 100 are no longer mutually prime.

Of course, if the number of sub-databases and the number of sub-tables are not only relatively prime, but also the number of sub-tables is odd (for example, 10 libraries and 101 tables), then this scheme can be used theoretically, but I think most people may think that odd-numbered sub-tables are used The number is strange.

common error case 2: The expansion is difficult to continue

If we avoid the trap of case one above, then we can easily plunge into another trap. The general idea is as follows:

We regard the 10 database 100 tables as a total of 1000 logical tables, take the remainder of the obtained hash value to 1000, and obtain a number between [0,999), and then divide this number twice into each In the library and each table, the approximate logic code is as follows:

public static ShardCfg shard(String userId) {
        // ① 算Hash
        int hash = userId.hashCode();
        // ② 总分片数
        int sumSlot = DB_CNT * TBL_CNT;
        // ③ 分片序号
        int slot = Math.abs(hash % sumSlot);
        // ④ 计算库序号和表序号的错误案例
        int dbIdx = slot % DB_CNT ;
        int tblIdx = slot / DB_CNT ;
 
        return new ShardCfg(dbIdx, tblIdx);
    }

This solution does cleverly solve the problem of data skew. As long as the Hash value is uniform enough, theoretically the assigned serial numbers will be average enough, so the amount of data in each library and table can also be kept in a more balanced state.

However, this solution has a big problem, that is, when calculating the table number, it relies on the number of total libraries. Then when the subsequent double expansion method is used for expansion, the data before and after the expansion will not be in the same table, so it cannot be implemented. .

As shown in the figure above, for example, the data whose Hash was 1986 before expansion should be stored in table 6 in 98, but after doubling and expanding to 100 tables in 20, it is allocated to table 99 in 6 database, and the table number is shifted. In this case, we will not only migrate data based on the database, but also migrate data based on the table when we expand the capacity in the future, which is very troublesome and error-prone.

After reading the above typical error cases, what are the more correct solutions? The following will introduce several schemes of Hash sub-database sub-table combined with some actual scenarios.

Commonly used posture 1: Standard two-

In the second error case above, the overall idea is completely correct, except that the number of libraries is used as a factor that affects the table number when calculating the library number and table number at the end, which causes the table number to shift during expansion and cannot be performed.

In fact, we only need to change the wording to arrive at a more popular sub-database and sub-table plan.

public static ShardCfg shard2(String userId) {
        // ① 算Hash
        int hash = userId.hashCode();
        // ② 总分片数
        int sumSlot = DB_CNT * TBL_CNT;
        // ③ 分片序号
        int slot = Math.abs(hash % sumSlot);
        // ④ 重新修改二次求值方案
        int dbIdx = slot / TBL_CNT ;
        int tblIdx = slot % TBL_CNT ;
 
        return new ShardCfg(dbIdx, tblIdx);
    }

You can notice that the difference from the second error case is that the logic of recalculating the library number and table number by assigning the serial number has changed. Its distribution is as follows:

So why is it possible to have good extended persistence by using this scheme? We make a short proof:

From the above conclusions, we know that after doubling the capacity, our table serial number must remain unchanged, the library serial number may still be in the original library, or it may be translated to the new library (the original library number plus the original number of sub-libraries), which is completely in line with us The required expansion and persistence program.

【Proposal Disadvantages】

1. The doubling expansion method has high operability in the early stage, but if the number of sub-databases is already dozens of times later, each expansion is very resource intensive.

2. The hash value of the continuous sharding key is likely to be scattered in the same library, and some services may be prone to library hotspots (for example, newly generated user hashes are adjacent and increasing, and new users are active users with high probability , Then the new users generated within a period of time will be concentrated in several adjacent libraries).

Commonly used posture 2: Relationship table redundancy

We can record the relationship between the shard key corresponding to the library through the relationship table, and we call this relationship table the "routing relationship table".

public static ShardCfg shard(String userId) {
        int tblIdx = Math.abs(userId.hashCode() % TBL_CNT);
        // 从缓存获取
        Integer dbIdx = loadFromCache(userId);
        if (null == dbIdx) {
            // 从路由表获取
            dbIdx = loadFromRouteTable(userId);
            if (null != dbIdx) {
                // 保存到缓存
                saveRouteCache(userId, dbIdx);
            }
        }
        if (null == dbIdx) {
            // 此处可以自由实现计算库的逻辑
            dbIdx = selectRandomDbIdx();
            saveToRouteTable(userId, dbIdx);
            saveRouteCache(userId, dbIdx);
        }
 
        return new ShardCfg(dbIdx, tblIdx);
    }

This solution still calculates the table serial number through the conventional Hash algorithm, and when calculating the library serial number, data is read from the routing table. Because the routing table needs to be read for each data query, we need to record the correspondence between the shard key and the library serial number in the cache at the same time to improve performance.

In the above example, the selectRandomDbIdx method is used to generate the sequence number of the storage bank corresponding to the shard key, which can be dynamically configured very flexibly. For example, you can specify a weight for each library, the higher the weight is, the higher the probability of being selected, and the weight is configured to 0 to turn off the allocation of certain libraries. When the data is found to be skewed, the weights can also be adjusted so that the usage adjustments of each library tend to be close.

Another advantage of this solution is that in theory, when subsequent expansion is carried out, it is only necessary to mount a new database node and configure the weight to a larger value, which can be completed without any data migration.

As shown in the figure below: At the beginning, we assigned the same weight to the 4 databases. In theory, the probability of data falling in each database is equal. However, because users also have high frequency and low frequency, the data of some libraries may grow faster. After mounting the new database node, we flexibly adjusted the new weight of each database.

This solution seems to solve a lot of problems, so is there any unsuitable scenario for it? Of course, this solution is not really suitable in many scenarios. The following is an example.

a. Every time you read data, you need to access the routing table. Although the cache is used, there is still a certain performance loss.

b. Some scenarios are not suitable for the storage of the routing table. For example, the scale of the user ID in the above case is probably less than 1 billion, and we can use a single database to store the relational table. But if, for example, you want to use the MD5 summary value of the file as the sharding key, because the sample set is too large, it is impossible to specify the relationship for each md5 value (of course, we can also use the first N bits of md5 to store the relationship).

c. The problem of hunger occupation is detailed below :

We know that the feature of this solution is that there is no need for subsequent expansion, and the weight can be modified at any time to adjust the storage growth rate of each library. But this vision is rather vague and difficult to implement. We choose a simple business scenario to consider the following questions.

[Business Scenario] : Take the cloud disk service where users store files in the cloud as an example. The user's file information needs to be designed for sub-database and table. The following scenarios are assumed:

  • ① Assume that there are 200 million theoretical users, and that there are currently 3000W effective users.
  • ②The average file size of each user is within 2000
  • ③User id is a random 16-bit string
  • ④In the initial stage, there are 10 libraries, each with 100 tables.

We use the routing table to record the serial number information of each user's library. Then the program will have the following problems:

First : We have a total of 200 million users, and only 3000W users who have generated transactions. If the program does not process, the user initiates any request to create routing table data, which will result in the creation of routing tables in advance for a large number of users who actually do not have transaction data.

The author encountered this problem when initially storing cloud disk user data. The client app would check the user space usage on the home page, which led to the allocation of routes for each user almost at the beginning. As time goes by, these "silent" users who have no data may start their journey of using the cloud disk and "recover" at any time, causing the library it is in to grow rapidly and exceed the space capacity limit of a single library, thereby Forced to split and expand.

The solution to this problem is actually to allocate routes only for transaction operations (such as purchasing space, uploading data, creating folders, etc.), so that there is some input to the code level.

Second . According to the business scenario described above, a user will eventually have 2000 pieces of data on average, assuming that each row is 1K in size. In order to ensure that the B+ number is at level 3, we limit the data volume of each table to 2000W. If the number of tables is 100, it can be obtained that theoretically the number of users in each library cannot exceed 100W users.

That is, if there are 3000W users who have generated transactions, we need to allocate 30 libraries for them, so that in the early stage of the business, when the average amount of user data is relatively small, there will be a very large waste of database resources.

To solve the second problem, we can generally put many databases on one instance, and then split them as the growth situation occurs. You can also use conventional means to split and migrate the library that is almost full.

Commonly used posture 3: Genetic method

Inspired by error case 1, we found that the main reason why case 1 is unreasonable is that the common divisor affects the independence of database tables in the calculation logic of library number and table number.

So can we change our mindset? We use relatively independent Hash values to calculate the library number and table number.

public static ShardCfg shard(String userId) {
    int dbIdx = Math.abs(userId.substring(0, 4).hashCode() % DB_CNT );
    int tblIdx = Math.abs(userId.hashCode() % TBL_CNT);
    return new ShardCfg(dbIdx, tblIdx);
}

As shown above, we have made some changes when calculating the library serial number. We use the first four digits of the shard key as the hash value to calculate the library serial number.

This is also a commonly used scheme, which we call the gene method, which uses some genes (such as the first four digits) in the original fragment key as the calculation factor of the library, and uses other genes as the calculation factor of the table. This scheme is also a lot of practical schemes on the Internet or its variants. It seems to solve the problem very cleverly, but it still needs to be cautious in the actual generation process.

The author has used this scheme in the practice of sub-database and table sub-table of the space module of the cloud disk, using 16 databases and 100 tables to split the data, and the data was normal at the initial stage of the launch. However, when the magnitude of the data increased, it was found that the number of users in each library was severely unequal, so it is guessed that there is a certain data skew in this scheme.

In order to verify the point of view, the following test was carried out, with 200 million user ids (16-bit random string) randomly, and for different M library N table schemes, the average value was obtained after repeated several times and the conclusion was as follows:

8库100表
min=248305(dbIdx=2, tblIdx=64), max=251419(dbIdx=7, tblIdx=8), rate= 1.25%            √
16库100表
min=95560(dbIdx=8, tblIdx=42), max=154476(dbIdx=0, tblIdx=87), rate= 61.65%           ×
20库100表
min=98351(dbIdx=14, tblIdx=78), max=101228(dbIdx=6, tblIdx=71), rate= 2.93%

We found that in this scheme, the number of sub-databases is 16, the number of sub-tables is 100, the minimum number of rows is less than 10W, but the most has reached 15W+, and the maximum data skew rate is as high as 61%. According to this trend, it is very likely that one database has been fully used in the later period, while the other has 30%+ capacity left.

This scheme is not necessarily impossible, but when we adopt it, we must integrate the sample rules of the shard key, the number of prefix bits of the selected shard key, the number of libraries, the number of tables, and the four variables all have an impact on the final skewness. .

For example, in the above example, if instead of 16 databases with 100 tables, but 8 databases with 100 tables, or 20 databases with 100 tables, the data skew rate can be reduced to an acceptable range below 5%. Therefore, there are many hidden "pits" in this scheme. We not only need to estimate the skew rate at the initial stage of the launch, but also need to measure the data skew rate after several times of doubling and expansion.

For example, if you use the perfect solution with 8 banks of 100 tables in the early stage, and later expand the capacity to 16 banks of 100 tables, trouble will ensue one after another.

Commonly used pose 4: Eliminate common factor method

It is still inspired by the error case. In many scenarios, we still hope that adjacent hashes can be divided into different libraries. Just like in the case of the N library list, we calculate the library serial number and generally directly use the Hash value to take the remainder of the library quantity.

So can we have a way to remove the influence of the common factor? The following is an implementation case that can be considered:

public static ShardCfg shard(String userId) {
        int dbIdx = Math.abs(userId.hashCode() % DB_CNT);
        // 计算表序号时先剔除掉公约数的影响
        int tblIdx = Math.abs((userId.hashCode() / TBL_CNT) % TBL_CNT);
        return new ShardCfg(dbIdx, tblIdx);
}

After calculation, the maximum data skewness of this scheme is also relatively small. For many businesses, upgrading from the N database 1 table to the N database M table, it can be considered in scenarios where the database serial number needs to be maintained.

Commonly Used Posture Five: Consistent Hash Method

The consistent Hash algorithm is also a popular cluster data partitioning algorithm. For example, RedisCluster uses 16384 virtual slot nodes to manage each shard data through the consistent Hash algorithm. The specific principle of consistent Hash will not be repeated here, readers can browse the information by themselves.

Here is a detailed introduction of how to use consistent Hash to design the sub-database and sub-table.

We usually persist the configuration of each actual node in a configuration item or database, and load the configuration when the application starts or when a switch operation is performed. The configuration generally includes a left-closed right-open interval of [StartKey, Endkey) and a database node information, for example:

Sample code:

private TreeMap<Long, Integer> nodeTreeMap = new TreeMap<>();
 
@Override
public void afterPropertiesSet() {
    // 启动时加载分区配置
    List<HashCfg> cfgList = fetchCfgFromDb();
    for (HashCfg cfg : cfgList) {
        nodeTreeMap.put(cfg.endKey, cfg.nodeIdx);
    }
}
 
public ShardCfg shard(String userId) {
    int hash = userId.hashCode();
    int dbIdx = nodeTreeMap.tailMap((long) hash, false).firstEntry().getValue();
    int tblIdx = Math.abs(hash % 100);
    return new ShardCfg(dbIdx, tblIdx);
}

We can see that this form is very similar to the Range sub-table described above. The Range sub-table method divides the range for the shard key itself, and the consistent Hash is for the range configuration of the Hash value of the shard key.

The formal consistent Hash algorithm will introduce virtual nodes, and each virtual node will point to a real physical node. This design scheme is mainly to ensure that the data level of each node migration and the pressure of each node after the migration remain almost equal when a new node is added.

However, when used in sub-database and sub-table, most of them only use actual nodes. There are not many cases of introducing virtual nodes. The main reasons are as follows:

a. The application needs to spend extra time and memory to load the configuration information of the virtual node. If there are more virtual nodes, the memory usage will be a little less optimistic.

b. Since mysql has a very complete master-slave replication scheme, instead of migrating by filtering the range of data that needs to be migrated from each virtual node, it is better to delete redundant data after processing from the database upgrade is simple and controllable.

c. The main pain point that the virtual node solves is the load imbalance of each node in the process of node data relocation, which is dispersed to each node through the virtual node to share the pressure for processing.

As an OLTP database, we rarely need to suddenly take a database offline. After adding a new node, we generally do not move data from other nodes from 0, but prepare most of the data in advance, so generally speaking There is no need to introduce virtual nodes to increase complexity.

Four, common expansion plans

4.1 Double expansion method

The main idea of the doubling expansion method is that each time the capacity is expanded, the number of libraries is doubled, and the doubled data source is usually a way of upgrading the slave library from the original data source through the master-slave replication method to the master library to provide services. Therefore, some documents call it " from the library upgrade method ".

In theory, after the doubling expansion method, we will double the number of databases for storing data and coping with traffic, and the disk usage of the original database will also be released by half of the space. As shown below:

The specific process is roughly as follows:

①. Time t1: Add a new slave library for each node, and enable master-slave synchronization for data synchronization.

②. Time point t2: After the master-slave synchronization is completed, write to the master library is prohibited.

The ban on writing here is mainly to ensure the correctness of the data. If the write prohibited operation is not performed, data inconsistency will occur during the following two time windows:

a. After the master-slave is disconnected, if the master library cannot help writing, if the master library still has data to write, this part of the data will not be synchronized to the slave library.

b. The application cluster recognizes that the time point at which the number of sub-libraries doubles cannot be strictly consistent. At a certain point in time, two applications may use different sub-library numbers and calculate different library serial numbers, resulting in incorrect writing.

③ Time point t3: After the synchronization is completed, the master-slave relationship is disconnected. In theory, the slave library and the master library have exactly the same data set at this time.

④. Time t4: Upgrade from the library to the cluster node, and after the business application recognizes the new number of sub-databases, the new routing algorithm will be applied.

Under normal circumstances, we put the configuration of the number of sub-databases in the configuration center. After the above three steps are completed, we modify the number of sub-databases to double the number. After the application takes effect, the application service will use the new configuration. It should be noted here that the time point when the business application receives the new configuration is not necessarily the same, so there must be a time window period during which some machines use the original number of sub-databases and some nodes use the new number of sub-databases. This is the reason why our write prohibition operation must be released after this step is completed.

⑤ Time point t5: After confirming that all applications have received the configuration of the total number of libraries, release the write-protection operation of the original main library, and the application is fully restored to service at this time.

⑥ Start off-line timing tasks to clear about half of the redundant data in each library.

In order to save disk usage, we can choose offline timing tasks to clear redundant data. It is also possible to store the hash value of the index key as a field during the design of the table structure at the beginning of the business. Then take the above common posture four as an example, our offline cleanup task can be achieved simply through SQL (the entire table needs to be prevented from being locked, and it can be split into several sub-sql executions in the id range):

delete from db0.tbl0 where hash_val mod 4 <> 0;

delete from db1.tbl0 where hash_val mod 4 <> 1;

delete from db2.tbl0 where hash_val mod 4 <> 2;

delete from db3.tbl0 where hash_val mod 4 <> 3;

The specific expansion steps can refer to the following figure:

summary : migration scenarios can be seen from the above, within the time window t5 to time t2 from the point it needs to write to the database ban, the equivalent in the time range is part of the server lossy, consuming almost the whole stage in Within minutes. If the business is acceptable, this operation can be performed during the low peak period of the business.

Of course, there are many applications that cannot tolerate the unavailability of minute-level writes. For example, applications where write operations are much larger than read operations. At this time, you can use the canel open source framework to perform data double-write operations during the window to ensure data consistency.

This solution mainly relies on mysql's powerful and complete master-slave synchronization mechanism, which can prepare most of the data needed in the new node in advance, saving a lot of manual data migration operations.

But the shortcomings are also obvious. First, the entire service may need to be lossy during the process. Second, the number of libraries needs to be doubled for each expansion, which will waste a lot of database resources in advance.

4.2 Consistent Hash expansion

We mainly look at the consistent Hash expansion method without virtual slots. If the current database node DB0 load or disk usage is too large and needs to be expanded, we can achieve the effect shown in the figure below through expansion.

In the figure below, three Hash segments are configured before expansion, and it is found that when the amount of data in the range of [-Inf, -10000) is too large or the pressure is too high, it needs to be expanded.

The main steps are as follows:

①, time t1 : Add slave nodes for database nodes that need to be expanded, and enable master-slave synchronization for data synchronization.

②, time point t2 : After the master-slave synchronization is completed, write to the original master library is prohibited.

The reason here is similar to the doubling expansion method. It is necessary to ensure the consistency of the data in the new slave library and the original master library.

③, time point t3 : After the synchronization is completed, the master-slave relationship is disconnected. In theory, the slave library and the master library have exactly the same data set at this time.

④, time t4 : modify the configuration of the consistent Hash range, and make the application service re-read and take effect.

⑤, time point t5 : After confirming that all applications have received the new consistent Hash range configuration, release the write prohibition of the original main library, and the application is fully restored to service at this time.

⑥, starts the offline timing task to clear redundant data.

It can be seen that this scheme is similar to the doubling expansion method, but it is more flexible and can be selectively expanded according to the pressure of each node in the current cluster, without the need for the entire cluster to be doubled at the same time for expansion.

V. Summary

This article mainly describes some common schemes for our horizontal sub-database and sub-table design.

When we design the sub-database and sub-table, we can choose various solutions such as range sub-table, Hash sub-table, routing table, or consistent Hash sub-table. When making the selection, it is necessary to fully consider factors such as the sustainability of subsequent expansion and the maximum data skew rate.

The article also lists some common error examples, such as the influence of the common divisor in the library table calculation logic, and the common data skew factors that use the first several digits to calculate the library number.

When we actually make a selection, we must consider our own business characteristics, fully verify the data skewness of the shard key under each parameter factor, and plan in advance to consider the subsequent expansion plan.

Author: vivo platform product development team-Han Lei

vivo互联网技术
3.3k 声望10.2k 粉丝