Add some hard dishes to the interview: how to improve throughput and timeliness in delayed task scenarios!

Author: Brother Xiaofu
Blog: https://bugstack.cn

Precipitate, share, grow, and let yourself and others gain something! 😄

I. Introduction

is no longer available, just use it!

Hahaha, it's okay not to roll it up, it's enough if you can make use of it. But every time I receive a new demand, I feel itchy. I want to combine the previous architecture design and implementation experience, iteratively update this demand, or find a better solution before completely subverting it. I always feel refreshed the moment I wrap the code

In fact, most of the pure coders who like to write code are relatively convoluted. For example, if a requirement can be used in implementation, it is probably P6, in addition to being easy to use, it also condenses common requirements and develops it into a general component service is P7. Every coder who has grown up has verified his ideas and put them into practice again and again on the way to build wheels. It is definitely not enough to exhaust a high-level technical expert with eight-part essays.

`2. Delayed task scenarios`

What is a delayed task?

In our actual business demand scenario, there are some status changes before the start of activities, T+1 reconciliation after order settlement, and the generation of loan single interest charges, all of which require the use of delayed tasks to reach them. The actual operation generally has Quartz and Schedule to scan and process your database table data regularly. When the conditions are met, the data status will be changed or new data will be inserted into the table.

Such a simple requirement is the initial requirement of the delayed task. If there is less content in the early stage of the requirement and not many users, it may be just a single machine in the actual development. However, with the development of business requirements and the increase in the complexity of functions, it is often not so simple to feed back to the R&D design and implementation. For example, you need to ensure that the scan processing of large-scale data volume is completed with the lowest possible delay, otherwise it will be like The generation of the loan single interest fee has arrived on the second day, and the user has not seen his interest fee information or reconciled after repayment, and may have a customer complaint at this time.

So, how to design a scene like this?

`3. Delayed task design`

The usual task center processing flow is mainly that the scheduled task scans the task library table, scans the task information about to reach the timeout time to the processing queue ( memory/MQ message), and then processes the task by the business system, and updates it after the processing is completed. Task status in the library table.

issue :

Task list data with massive data and large scale needs to be quickly scanned under sub-database and sub-table.
The task scanning service is coupled with business logic processing, and is not universal and reusable.
Some subdivided task systems require low-latency processing and cannot wait for too long.

`1. Task list method`

In addition to some minor status change scenarios, for example, the library table of the respective business contains a status field. On the one hand, this field has the status of program logic processing changes, and it is also automatically changed by the task service after reaching the specified expiration time of . The processing operations, generally such functions, can be directly designed into their own library tables.

Then there are some larger and more frequently used scenarios. If such fields are added to each of the N tables required by each system for maintenance, it will be very redundant and not so easy. maintain. Therefore, for such a scenario, it is very suitable to make a general task delay system. Each business system submits the actions that need to be delayed to be executed to the delay system, and then there is a delay system that calls back at a specified time, and the callback action It can be reached by interface or MQ message. For example, you can design such a task schedule:

The extracted task schedule is mainly about what tasks to get and when to initiate actions. The specific action processing is still handed over to the business engineering.
For centralized processing of a large number of tasks of their respective businesses, it is necessary to design a sub-database and sub-table to meet the growth of the subsequent business volume.
The house number design is aimed at scanning a table. If the amount of data is large, and you don't want to scan a table for only one task, you can scan a table for multiple tasks and add it to the scanning volume. At this time, a house number is needed to isolate the scanning range of different tasks and avoid scanning duplicate task data.

`2. Low latency method`

The low-latency processing scheme is based on the task table method, and the newly added time control processing. It can put the tasks that are about to expire in the previous period of time into the Redis cluster team, and then pop them out of the queue when consuming, so that the processing time of the task can be approached faster and avoid the large interval between scanning the database. Delay task execution.

When receiving delayed tasks submitted by the business system, they are placed in the task library or synchronized to the Redis cluster according to the execution time. Some tasks with a later execution time can be placed in the task library first, and then added to the task library by scanning. Timeout task execution queue.
Then the core of this design lies in the use of Redis queues, and in order to ensure the reliability of consumption, it is necessary to introduce two-stage consumption and register the ZK registry to ensure at least one consumption process. This article focuses on the design of Redis queues, other more logical processing, which can be expanded and improved according to business needs

Redis consumption queue

Calculate the slot to which the corresponding data belongs according to the message body index = CRC32 & 7
StoreQueue uses Slot to sort by the execution task score according to the data structure of SlotKey = #{topic}_#{index} and Sorted Set, and store the task execution information. Timing messages use the timestamp as a score, and pop up a message with a score less than the current timestamp each time when consuming.
In order to ensure that each message can be consumed at least once, the consumer does not directly pop the elements in the ordered collection, but moves the elements from the StoreQueue to the PrepareQueue and returns the message to the consumer. After the consumption is successful, it will be deleted from the PrepareQueue. If the consumption fails, it will be re-moved from the PrepareQueue to the StoreQueue, so that the two-stage consumption is processed.
Reference documents: 2021 Alibaba Technician's Baibao Black Book PDF, Low-latency timeout center implementation

Simple Case

@Test
public void test_delay_queue() throws InterruptedException {
    RBlockingQueue<Object> blockingQueue = redissonClient.getBlockingQueue("TASK");
    RDelayedQueue<Object> delayedQueue = redissonClient.getDelayedQueue(blockingQueue);
    new Thread(() -> {
        try {
            while (true){
                Object take = blockingQueue.take();
                System.out.println(take);
                Thread.sleep(10);
            }
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }).start();
    int i = 0;
    while (true){
        delayedQueue.offerAsync("测试" + ++i, 100L, TimeUnit.MILLISECONDS);
        Thread.sleep(1000L);
    }
}

test data

2022-02-13  WARN 204760 --- [      Finalizer] i.l.c.resource.DefaultClientResources    : io.lettuce.core.resource.DefaultClientResources was not shut down properly, shutdown() was not called before it's garbage-collected. Call shutdown() or shutdown(long,long,TimeUnit) 
测试1
测试2
测试3
测试4
测试5

Process finished with exit code -1

Source code: https://github.com/fuzhengwei/TimeOutCenter
Description: Use DelayedQueue in redisson as a message queue, and wait for consumption time for POP consumption after writing.

`4. Summary`

The use of scheduling tasks is very frequent in actual scenarios. For example, we often use xxl-job, and there are also distributed task scheduling components developed by large manufacturers. These may originally be small and simple functions, but after abstraction and integration , refined, and turned into a core general middleware service.
When we are considering the use of task scheduling, no matter which method of design and implementation, we need to consider the iteration and maintainability of this function when using it. If it is only a very small scene and not many people use it, then Just toss on your own machine. The design and use of the transition sometimes also substitutes R&D resources into the quagmire
In fact, the knowledge points of various technologies are like tools, knives, guns, sticks, axes, and hooks. How to combine their own characteristics and use these weapons is the process of a programmer's continuous growth. If you want to learn more such in-depth technical content, you can join the Lottery distributed lottery spike system to learn more valuable and more resistant actual combat methods.

Add some hard dishes to the interview: how to improve throughput and timeliness in delayed task scenarios!

I. Introduction

`2. Delayed task scenarios`

`3. Delayed task design`

`1. Task list method`

`2. Low latency method`

`4. Summary`

`Five, series recommendation`

小傅哥

`引用和评论`

爽了！免费的SSL，还能自动续期，支持CDN/OSS！

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性