3
头图

If you were to develop an e-commerce inventory system, what would you worry about the most? Close your eyes and think about it, of course it is high concurrency and anti-oversold! This article gives a plan to consider how to increase concurrency and prevent oversold data accuracy. Readers can directly learn from this design, or make a design more suitable for use scenarios on this basis.

background

In this year's agile team building, I implemented one-click automated unit testing through the Suite executor. What other executors does Juint have in addition to the Suite executor? This is where my Runner exploration journey begins! The following uses e-commerce inventory as an example to illustrate how to deduct inventory with high concurrency. The principle is also applicable to other scenarios that require concurrent writing and data consistency. 1.1 Inventory Quantity Model Example For the convenience of description, a simplified inventory quantity model is used below. In real scenarios, there will be many more inventory data items than the following examples, but it is enough to illustrate the principle. As shown in the following table, the stock quantity table (stockNum) contains two fields: commodity ID and stock quantity. The stock quantity represents how many goods can be sold.

图片

In order to ensure that the inventory management is not oversold, the traditional solution of using the database to ensure that the inventory is not oversold is guaranteed by using database transactions: it is judged by Sql that the remaining inventory is sufficient, and only one of multiple concurrently executed update statements can be successfully executed; In order to ensure that the deduction is not repeated, an anti-duplication table will be used to prevent repeated submissions and achieve idempotency. An example of the anti-duplication table (antiRe) is designed as follows:

图片
For example, an example of the deduction process for an order placement process is as follows:

 事务开始
Insert into antiRe(code) value (‘订单号+Sku’)
Update stockNum set num=num-下单数量 where skuId=商品ID and num-下单数量>0
事务结束

In the face of increasing system traffic, the performance bottleneck of the database will be exposed: even sub-database and sub-table are useless. During promotion, high concurrency is aimed at a small number of products, and eventually concurrent traffic will be directed to a small number of tables. To improve the resilience of a single shard, we will design a solution that uses Redis cache for inventory deduction.

Redis cache for inventory deduction scheme

2.1 The principle of comprehensive use of database and Redis to meet high concurrency deduction

Deduction of inventory actually includes two processes: the first step is oversold verification, and the second step is the persistence of deduction data; in traditional database deduction, the two steps are completed together. The realization principle of anti-write is actually a clever use of the idea of separation, which separates anti-oversold and data persistence; first of all, anti-oversold is done by Redis; after anti-oversold through Redis, just drop the library; The library uses the task engine, the business database uses the commodity sub-library and sub-table, and the task engine task is sub-library and sub-table through the document number. The drop of hot commodities will be dispersed by the state machine to eliminate hot spots. The overall structure is as follows:

图片

The first step is to solve the oversold test: you can put the data into Redis, and every time the inventory is deducted, the data in Redis is deducted incryby. If the returned quantity is greater than 0, it means that the inventory is sufficient, because Redis is a single thread, The returned result can be trusted. The first level is Redis, which can resist high concurrency and has Ok performance. After passing the oversold check, enter the second level.

The second level solves the inventory deduction: after the first level, the second level does not need to judge whether the quantity is sufficient, just need to deduct the inventory by a fool, and execute the following statement on the database. It is necessary to judge whether the quantity is greater than 0, and the deduction SQL can be written as follows.

 事务开始
Insert into antiRe(code) value (‘订单号+Sku’)
Update stockNum set num=num-下单数量 where skuId=商品ID
事务结束

Point: In the end, you still have to use the database. How to solve the hot spot? The task library uses the order number for sub-library and sub-table, so that different orders for the same product will be hashed in different inventories of the task library. Although the database is still resistant to volume, it has eliminated database hot spots.

The overall interaction sequence diagram is as follows:

图片

2.2 Anti-brush hotspots But Redis also has a bottleneck. If there is an overheated SKU, it will hit the Redis single chip, which will cause the performance of the single chip to jitter. Inventory swipe prevention has a premise that it cannot be stuck. The current limit of the millisecond-level time window in the JVM can be customized and designed. The purpose of the current limit is to protect Redis and limit the current as much as possible. The extreme case of current limiting is that the product should have been sold out within one second, but it actually took two seconds. Normally, there will be no delayed sales. The reason why JVM is chosen is that if a remote centralized cache is used to limit current, it will not be collected in time. The data has already killed Redis.

The implementation scheme can use a framework such as guava, a time window every 10ms, each time window is counted, and a single server exceeds the count to limit the current. For example, if 10ms exceeds 2, the current will be limited, then one server will be 200 per second, and 50 servers can sell 10,000 goods per second. You can adjust the threshold according to the actual situation.

图片

2.3 Redis deduction principle

The incrby command of Redis can be used for inventory deduction. There may be multiple deduction items. Using the hincrby command of the Hash structure, first simulate the whole process with the Reids native command. In order to simplify the model, the operation of one data item and multiple data items will be demonstrated below. The principle is exactly the same.

 127.0.0.1:6379> hset iphone inStock 1 #设置苹果手机有一个可售库存
(integer) 1
127.0.0.1:6379> hget iphone inStock   #查看苹果手机可售库存为1
"1"
127.0.0.1:6379> hincrby iphone inStock -1 #卖出扣减一个,返回剩余0,下单成功
(integer) 0
127.0.0.1:6379> hget iphone inStock #验证剩余0
"0"
127.0.0.1:6379> hincrby iphone inStock -1 #应用并发超卖但Redis单线程返回剩余-1,下单失败
(integer) -1
127.0.0.1:6379> hincrby iphone inStock 1 #识别-1,回滚库存加一,剩余0
(integer) 0
127.0.0.1:6379> hget iphone inStock #库存恢复正常
"0"

2.3.1 Idempotency Guarantee for Deductions

If the application does not know whether the deduction is successful after calling Redis, you can add an anti-duplication code to the batch deduction command, and execute the setnx command on the anti-duplication code. When an exception occurs, you can determine whether the deduction is successful according to whether the anti-duplication code exists. For batch naming, pipelines can be used to improve the success rate.

 // 初始化库存
127.0.0.1:6379> hset iphone inStock 1 #设置苹果手机有一个可售库存
(integer) 1
127.0.0.1:6379> hget iphone inStock   #查看苹果手机可售库存为1
"1"
// 应用线程一扣减库存,订单号a100,jedis开启pipeline
127.0.0.1:6379> set a100_iphone 
"1"
 NX EX 10 #通过订单号和商品防重码
OK
127.0.0.1:6379> hincrby iphone inStock -1 #卖出扣减一个,返回剩余0,下单成功
(integer) 0
//结束pipeline,执行结果OK和0会一起返回

Check after preventing concurrent deductions: To prevent concurrent deductions, it is necessary to check whether the return value of the hincrby command of Redis is a negative number to determine whether high concurrency oversold occurs. Add the data back.

If network jitter occurs during the call, the call to Redis times out, and the application does not know the operation result. You can use the get command to check whether the anti-duplication code exists to determine whether the deduction is successful.

 127.0.0.1:6379> get a100_iphone   #扣减成功
"1"
127.0.0.1:6379> get a100_iphone   #扣减失败
(nil)

2.3.2 One-Way Guarantee

In many scenarios, because transactions are not used, it is difficult for you to not oversell and sell a lot, so in extreme cases, you can choose not to oversell, but it is possible to sell less. Of course, you should try your best to ensure that the data is accurate, not oversold, and not a lot; if you can't fully guarantee it, choose a one-way guarantee of not oversold, and you must use means to reduce the probability of underselling as much as possible.
For example, in the process of deducting Redis, the command arrangement is to set the anti-repeat code first, and then execute the deduction command and fail; if the network jitters during the execution process, the re-code may be successfully reproduced, but the deduction fails, and the retry will be considered as successful, resulting in excessive Sell, so the above command sequence is wrong, the correct way to write it should be:

If it is to deduct inventory, the sequence is: 1. Deduct inventory 2. Write the weight code.
If it is to roll back the inventory, the sequence is: 1. Write the weight code 2. Deduct the inventory.

2.4 Why use Pipeline

In the above command, Redis's Pipeline is used to see the principle of Pipeline.

non-pipeline mode

 request-->执行-->responserequest-->执行-->response

pipeline mode

 request-->执行 server将响应结果队列化request-->执行 server将响应结果队列化-->response-->response

Using Pipeline can ensure the integrity of the results returned by multiple commands as much as possible. Readers can consider using Redis transactions instead of Pipeline. In actual projects, individuals have had successful experience with Pipeline and have not used Redis transactions. The pipeline is slower, so it is not used.

Redis transactions
1) mutil: open the transaction, all subsequent operations will be added to the "operation queue" of the current linked transaction
2) exec: commit transaction
3) discard: cancel queue execution
4) watch: If the key of the watch is modified, the dicard is triggered.

2.5 Realize the eventual consistency of the database through the task engine

The task engine is used to ensure that the data must be persistent in the database. The design of the "task engine" is as follows, and the task scheduling is abstracted into a business-independent framework. The "task engine" can support simple process orchestration and guarantee at least one success. The "task engine" can also appear as the engine of the state machine to support the scheduling of the state machine, so the "task engine" can also be called the "state machine engine", which is the same concept in this article.

The core principle of task engine design: first put the task in the database, and ensure the transaction consistency between subtask splitting and parent task completion through database transactions.

Task library sub-database and sub-table: The task library uses sub-library and sub-table, which can support horizontal expansion. By designing the sub-library field and the business library field, there is no data hotspot.

2.5.1 The core processing flow of the task engine

图片

The first step: synchronously invoke the submit task, first persist the task to the database, and the status is "lock processing" to ensure that this matter must be processed.

Note: In the original original version, the task database was pending for processing, and then scanned by the scanning worker. In order to prevent concurrent repeated processing, a single task was locked after scanning, and the lock was successfully processed before processing. Later, it was optimized to directly mark the status of the drop database task as "locked processing", which is for performance reasons, eliminating the need to rescan and preempt the task, and asynchronously process it directly through the thread in the process. Lock Sql reference: UPDATE task table_part table number SET status = 100, modifyTime = now() WHERE id = #{id} AND status = 0

Step 2: The asynchronous thread calls the external processing process, and after the external processing is called, the subtask list is received and returned. The parent task status is set to completed through database transactions, and the child task is dropped. And add the subtask to the thread pool. Key Point: Ensure transactional subtask generation and parent task completion

Step 3: The subtask is scheduled to be executed, and the new subtask is re-stored. If no subtask returns, the whole process ends. Exception handling Worker Unlocks the Worker abnormally to unlock tasks that have not been processed for a long time, preventing tasks from being locked for serverless execution due to server restarts or full thread pools. The trapping worker prevents the thread pool task from being completed due to server restart, and the trapping program is re-locked to trigger execution.

Task state transition process

图片

2.5.2 Task engine database design Task table database structure design example (only for example use, real use needs to be improved)

图片

The task engine database disaster recovery task database uses sub-database and sub-table. When a database is down, the traffic routed to the down database can be re-hashed to other surviving databases, which can be manually configured or automated through system monitoring for disaster recovery. As shown in the figure below, when task library 2 is down, the traffic of task library 2 can be routed to task library 1 and 3 by modifying the configuration. The trap engine continues to scan the task library 2 because after the task library 2 recovers through the master-slave disaster recovery, future and processed tasks can be supplemented when the task library 2 is down.

图片

An example of task engine scheduling For example, a user buys two mobile phones and one computer, and the mobile phone and computer are scattered in two databases. The task engine is used to persist the task first, and then the driver splits it into two sub-tasks, and finally ensures that the two sub-tasks will succeed. Achieve eventual consistency of data. The tasks of the entire execution process are arranged as follows:

图片
Figure 7 Task engine scheduling example task engine interaction process

图片
Figure 8 Task engine interaction process

Summarize

As long as there is heterogeneity, there must be differences. In order to ensure that the impact of differences is controllable, the ultimate solution is to rely on difference comparisons. Due to the limited space of this article, it will not be expanded, and will be written separately in the future. The approximate process of comparing the differences between DB and Redis is as follows: Receive inventory change information, and continuously follow up to compare whether the data of Redis and DB are consistent. If the data is consistent and inconsistent, perform data repair and use DB data to modify Redis data.

FAQ

Q: The first step is to verify the Redis memory deduction for overselling, and the second step is to deduct the persistence of data. What should I do if it is interrupted? (Example: Service restart)

Answer: If the service is restarted, the service of this server will be stopped before the server restarts; however, this solution cannot guarantee absolute consistency of data. For example, after deducting redis, the application server fails and crashes directly. In this case, the processing needs to be done. A more complex solution can ensure real-time consistency (no more complex solution is currently adopted). Another solution can be used to compare and repair data using inventory data and user order data to achieve eventual consistency.


京东云开发者
3.4k 声望5.4k 粉丝

京东云开发者(Developer of JD Technology)是京东云旗下为AI、云计算、IoT等相关领域开发者提供技术分享交流的平台。