1 Introduction

In the face of the three highs of the Internet system (high availability, high performance, and high concurrency), in the database aspect, we often adopt the strategy of sub-database and sub-table, so we will inevitably face another problem, how to generate the database primary key under the sub-database and sub-table strategy? So today, in response to this problem, we will talk about how to design a "million-level" distributed ID generator.

2. Project Background

Due to the sharp increase in the number of business expansion orders, in order to meet the development of the existing business, it was decided to carry out sub-database and sub-table transformation for the current business. In the form of sub-database and sub-table, how to ensure the uniqueness of the primary key of the logical table in different databases and different tables has become the primary problem to be solved. At the beginning, the database method was still used to generate the primary key, but considering the bottleneck of the database system, system performance and other issues, so After the investigation, it was decided to develop and deploy a set of distributed ID generators that can support millions of levels to support existing businesses and gradually support other subsequent businesses.

3. Technical selection

After clarifying the project background, it is technology selection.

After that, I compared the uuid method, Redis counter, database number segment, snowflake algorithm, Meituan Leaf and other ID generator methods. Due to the random disorder of uuid, it is easy to cause the split of B+Tree index, which is not suitable for MySQL data index; Redis counter needs to consider its persistence method, and it may cause problems such as duplication of number segments in the case of downtime. Consider the above 2 ways. After that, I analyzed the advantages and disadvantages of other methods such as database number segment and snowflake algorithm, whether to introduce new technical dependencies, complexity, etc., and finally decided to use a method similar to Meituan Leaf to generate distributed primary key IDs.

4. Architecture Design

4.1 Overall Architecture

In general, the double-cache architecture is adopted, and the number segment corresponding to the business key is persisted in the database. Each time the specified step number segment is loaded from the database, it is saved to the local cache, and the business request preferentially obtains the ID from the local cache.

图片

The execution steps are as follows:

STEP1: When the service is started or requested for the first time - load the current business key from the database, configure the step size according to the business key, and load the number segment to the local.

STEP2: When the business key is called, the ID is preferentially obtained from the local cache A.

  • Step 2.1: If the current usage rate of "local cache A" exceeds 15% (which can be dynamically adjusted), load the number segment from the database to local cache B asynchronously;
  • Step 2.2: If the current "local cache A" segment has been used up, switch the cache to "local cache B" and continue to provide services.

STEP3: Return the request result (in extreme cases, the cache segment A is exhausted, the cache segment B is not loaded, and it fails after retries for a certain number of times).

图片

4.2 Detailed Design

How to support millions of QPS, and how to ensure the high availability of the business? How to design the data structure to meet the high concurrency and high availability of distributed number segment? Next, let's take a look at the detailed design of the distributed number segment from the table structure and cache structure, and gradually uncover its mystery:

4.2.1 Table Structure Design

The core fields of the table are as follows:
id: primary key
biz_key: business
keymax_id: The MAX value used by the current business key number segment
step: step size (each time the step size is loaded to the local cache)

 <sql id="id_generator_sql">
    id as id,
    biz_key as bizKey,
    max_id as maxId,
    step as step,
    create_time as createTime,
    update_time as updateTime,
    version as version,
    app_name as appName,
    description as description,
    is_del as isDel
</sql>

<insert id="insert" parameterType="com.jd.presell.idgenerator.model.Segment">
    insert into id_generator
    (biz_key,max_id,step,create_time,update_time,version,app_name,description,isDel)
    values
    (#{bizKey},#{maxId},#{step},now(),now(),0,#{appName},#{description},#{isDel})
</insert>

4.2.2 Cache structure design

After understanding the table structure, you will definitely have questions. If you only use the database method to realize distributed ID, the QPS that can be supported and the stability of the system will not be guaranteed. What kind of data method is used? What about ensuring the high concurrency and high availability of the system? Next we find the answer from the "cache structure" design:

图片

  • Buffer (buffer manager)

bizKey: Business
keysegments: array storage double cache
currentIndex: cursor, pointing to the currently used cache in segments
segmentModifyStatus: Update the status of this segment by CAS
readWritelock Read-write lock: The reading and updating of the number segment adopts the lock method and the read-write lock (in this scenario, read more and write less)

 private String bizKey; //操作key
private Segment[] segments; //双缓存
private volatile int currentIndex; //当前使用的Segment[]的下标
private volatile int segmentModifyStatus; 
private final ReadWriteLock readWritelock = new ReentrantReadWriteLock();  //读写锁
  • segment (actual operation cache)

bizKey: Business
keymaxId: the current cache supports the maximum value
step: the step size of the business key when the database is loaded current: the used value of the current number segment
threshold: update the next cache threshold

 private String bizKey;                   //key
private volatile long maxId;             //当前号段最大值
private volatile int step;               //步长
private volatile AtomicLong current;     //当前号段已用值
private long threshold;                  //加载下一个缓存阀值
private Date modifyTime;                 //更新时间,后期用于动态计算step

4.3 Key Process Links

After the "table structure" and "cache structure" mentioned above are clear, let's take a look at the key process links, and more clearly understand the application of the "table" and "cache" introduced in the business. Message as follows:

  • Service initialization load business bizKey
  • Get ID according to business bizKey
  • Double cache - preload (load the next cache ahead of time)
  • Double Cache - Cache Switch

图片

I believe that you can see the key information from the above figure, and fully understand the key business and its implementation details. The following is a brief overview of the business and technology.

(1) Business overview

  • Service initialization loading number segment: In order not to affect the t after the service is released, the hungry Chinese mode is adopted, and the number segment with the specified step size is loaded to the local cache when the service is started;
  • Business key maintenance: New or offline business keys are regularly maintained through JOB, newly added bizKeys are added to the local cache, and expired bizKeys are removed from the local cache (full table scan with fewer business keys in the early stage, notification or notification can be used later when there are more bizKeys scan the incremental bizKey for the specified time change);
  • Preloading: After the current cache usage exceeds the threshold, another cache is loaded asynchronously; in order to ensure the stability of the business as much as possible, generally set the current cache usage to about 15% (which can be dynamically adjusted), and start preloading;
  • Cache switching: the current cache number segment is exhausted, switch to the next cache and continue to provide services;
    (2) Key technologies
  • ReadWriteLock lock application: This business scenario is a typical scenario of reading more and writing less, so the read-write lock mode is adopted.
    Read lock: get distributed ID;
    Write lock: preload the next cache, cache switching.
  • CAS atomic operation: When preloading the next cache, in order to avoid the simultaneous operation of a single machine and multiple threads, the CAS method is used to update the status flag of the Buffer, and only the thread that has been successfully updated can perform the asynchronous preload operation.
  • volatile: guarantees data visibility, ensuring that shared variables can be updated accurately and consistently.

5. Summary & Outlook

After the project is completed, the stress test is performed. When the step size is set reasonably, a single machine can support nearly 100,000 QPS. During the stress test, the TP is normal, and the TP99 and TP999 are basically maintained within 5 milliseconds, which has generally met the current business needs.

Although the design at this stage has met the current business needs, there is still a lot of room for optimization, and we still have a long way to go, such as the waste of the following number segments and the dynamic programming step size.

(1) The number segment is wasted <br>The number segment is loaded when the application starts, and part of the number segment will be wasted in case of service restart, version release, etc.
For this problem you can:

  • Initialize the number segment with 10% step size when the service starts, and minimize the number of segments initialized for the first time
  • A hook is added when the service is shut down to save the number segment usage to Redis. After the service is started, it can be optimized to load from the Redis number segment pool to the local cache.

(2) Dynamic programming step size <br>The current step size is manually configured, and later, according to the update frequency of the number segment, matching certain rules, dynamically adjust the number segment corresponding to the business key (can be configured at the time of application: step size dynamic adjustment rule).
(3) Database sub-database and sub-table <br>At this stage, there are few bizKeys, and if there is a demand in the later stage, the database and table can be subdivided according to bizKey.
(4) Optimization of persistence mode <br>Currently, only MySQL persistent number segment information is used. Multi-level cache can be added according to the business, and Redis can be introduced. The database preloads the number segment to Redis, and the local cache preferentially obtains the number segment from Redis and loads it into local.
(5) Monitoring alarms <br>Combined with company components, the QPS, availability, and TP of a single interface and a single bizKey are currently monitored. On this basis, it can be added: number segment update frequency, number segment stand-alone distribution (distributed number segment, used number segment), etc. to monitor.

6. Conclusion

The above content briefly summarizes the background, selection, design and other content of the project. The overall plan may not be the optimal solution, and there are still many points for improvement. It is also adhering to the idea of first, and then gradually expanding and iterating, and chooses to implement by stages and needs. In the case of satisfying the current business, it can be implemented quickly, stably and continuously!

Thank you for your support. I hope that through this article, you can understand that the design of some million-dollar business volumes is not complicated. It turned out that only ten servers can easily support a million-QPS business!

*Text/Yuan Xiangfei
@德物科技public account


得物技术
851 声望1.5k 粉丝