Asynchronous task processing system, how to solve the problem of long time-consuming and high concurrency of business?

Author: Bu Fang (Alibaba Cloud Serverless Technology Leader)

When we build an application, we always want it to be responsive and inexpensive. In practice, our system faces various challenges, such as unpredictable traffic spikes, dependent downstream services becoming slow, and a small number of requests consuming a lot of CPU/memory resources. These factors often cause the entire system to be slowed down or even unable to respond to requests. In order to make application services always respond quickly, more computing resources have to be reserved in many cases, but most of the time, these computing resources are idle. A better approach is to separate the processing logic that is time-consuming or consumes a lot of resources from the main request processing logic and hand it over to a more resource-elastic system for asynchronous execution, which not only allows the request to be processed quickly and returned to the user , and also save costs.

Generally speaking, logic that is time-consuming, consumes a lot of resources, or is prone to error is very suitable for being separated from the main request process and executed asynchronously. For example, when a new user registers, the system usually sends a welcome email after the registration is successful. The act of sending a welcome email can be decoupled from the registration process. Another example is when a user uploads an image, which usually needs to generate thumbnails of different sizes after the image is uploaded. However, the process of image processing does not need to be included in the image upload processing process. After the user uploads the image successfully, the process can be ended, and the processing logic such as generating thumbnails can be executed as an asynchronous task. In this way, the application server can avoid being overwhelmed by computationally intensive tasks such as image processing, and users can also get a faster response. Common asynchronous execution tasks include:

Send Email/Instant Message
Check for spam
Document processing (converting formats, exporting, ...)
Audio and video, image processing (thumbnail generation, watermarking, yellowing, transcoding,...)
Calling external third-party services
rebuild the search index
Import/export large amounts of data
web crawler
Data cleaning
...

Slack , Pinterest , Facebook and other companies widely use asynchronous tasks to achieve better service availability and lower costs. According to Dropbox statistics , there are more than 100 different types of asynchronous tasks in their business scenarios. A fully functional asynchronous task processing system can bring significant benefits:

Faster system response time . Separating the time-consuming and resource-heavy logic from the request processing process and executing it asynchronously elsewhere can effectively reduce the request response delay and bring a better user experience.
Better handling of large bursts of requests . In many scenarios such as e-commerce, there are often a large number of sudden requests that impact the system. Similarly, if the resource-heavy resource consumption logic is stripped from the request processing flow and executed asynchronously elsewhere, a system with the same resource capacity can respond to larger peaks of request traffic.
lower cost. The execution time of asynchronous tasks is usually between hundreds of milliseconds to several hours. According to different task types, reasonable selection of task execution time and more flexible use of resources can achieve lower costs.
Improved retry strategy and error handling capabilities . The task is guaranteed to be executed reliably (at-least-once) and retried according to the configured retry policy, thus achieving better fault tolerance. For example, if a third-party downstream service is called, if it can be turned into an asynchronous task, a reasonable retry strategy should be set. Even if the downstream service is occasionally unstable, it will not affect the success rate of the task.
Complete tasks faster . The execution of multiple tasks is highly parallelized. By scaling the resources of the asynchronous task processing system, massive tasks can be completed faster and at a reasonable cost.
Better task priority management and flow control . Tasks are usually processed with different priorities depending on their type. The asynchronous task management system can help users better isolate tasks of different priorities, so that high-priority tasks can be processed faster, and low-priority tasks will not be starved to death.
More diverse task triggering methods . There are various ways of triggering tasks, such as submitting tasks directly through API, or triggering them through events, or executing them regularly.
better observability . Asynchronous task processing systems usually provide capabilities such as task logs, indicators, status query, and link tracking, so that asynchronous tasks can be better observed and problems can be diagnosed more easily.
Higher R&D efficiency . Users focus on the implementation of task processing logic, task scheduling, resource expansion and contraction, high availability, flow control, task priority and other functions are completed by the task processing system, and the R&D efficiency is greatly improved.
Task Processing System Architecture
A task processing system usually consists of three parts: task API and observables, task distribution and task execution. We first introduce the functions of these three subsystems, and then discuss the technical challenges and solutions faced by the entire system.
Task API/Dashboard

This subsystem provides a set of task-related APIs, including task creation, query, deletion, and more. Users use system functions through GUI, command-line tools, and the latter by directly calling APIs. The ability to be observable in ways like Dashboard is also very important. A good task processing system should include the following observable capabilities:

Logs: It can collect and display task logs, and users can quickly query the logs of specified tasks.
Indicators: The system needs to provide key indicators such as the number of queued tasks to help users quickly judge the execution of tasks.
Link tracking: The time-consuming of each link during the task from submission to execution. Such as the time to queue in the queue, the actual execution time, etc. The figure below shows the tracing capabilities of the Netflix Cosmos platform.

task distribution

Task distribution is responsible for the scheduling and distribution of tasks. A task distribution system that can be used in a production environment usually has the following functions:

Reliable distribution of tasks: Once a task is submitted successfully, the system should ensure that the task is scheduled for execution no matter what the situation is.
Scheduled/delayed distribution of tasks: Many types of tasks are expected to be executed at a specified time, such as sending emails/messages regularly, or generating data reports on a regular basis. Another situation is that the task can be delayed for a long time. For example, the data analysis task submitted before work can be completed before going to work the next day. Such tasks can be executed in the early morning when the resource consumption is low. , reduce costs through peak staggered execution.
Deduplication of tasks : We always do not want tasks to be repeated. In addition to wasting resources, duplication of tasks can have more serious consequences. For example, a metering task miscalculated the bill due to repeated execution. To execute a task exactly-once, it needs to be done in every link of task submission, distribution, and execution of the entire chain, including when the user implements the task processing code, the execution must be successful or the execution failed, etc. In every case, do it exactly-once. How to implement a complete exactly-once is more complex and beyond the scope of this article. Many times it is also valuable for the system to provide a simplified semantics, that a task is only successfully executed once . Task deduplication requires the user to specify a task ID when submitting a task, and the system uses the ID to determine whether the task has been submitted and successfully executed.
Task error retry : A reasonable task retry strategy is critical to efficient and reliable task completion. The retry of the task should consider several factors: 1) To match the processing capacity of the downstream task execution system. For example, if a flow control error from the downstream task execution system is received, or if the task execution is perceived as a bottleneck, exponential backoff is required to retry. Retry should not increase the pressure on the downstream system and overwhelm the downstream; 2) The retry strategy should be simple and clear, and easy for users to understand and configure. First of all, it is necessary to classify errors, distinguish non-retryable errors, retryable errors, and flow control errors. Non-retryable errors refer to errors that fail deterministically, and retry is meaningless, such as parameter errors, permission problems, and so on. A retryable error means that the factors that cause the task to fail are accidental, and the task will eventually succeed by retrying, such as internal system errors such as network timeout. A flow control error is a special kind of retryable error, which usually means that the downstream is already fully loaded, and the retry needs to use a backoff mode to control the amount of requests sent to the downstream.
Load balancing of tasks: The execution time of tasks varies greatly, ranging from hundreds of milliseconds to tens of hours. Distributing tasks in a simple round-robin manner will result in uneven load on the execution nodes. A common pattern in practice is to place tasks in a queue, and the execution node actively pulls tasks according to the execution of their own tasks. Use queues to save tasks, distribute tasks to appropriate nodes according to the load of nodes, and balance the load of nodes. Task load balancing usually requires the cooperation of the distribution system and the execution subsystem.
Tasks are distributed by priority : Task processing systems usually interface with many business scenarios, and their task types and priorities vary. Experience-related tasks at the core of the business have higher priority than edge tasks. Even if it is also a news notification, the importance of a product review notification received by a buyer on Taobao is definitely lower than the nucleic acid test notification in the new crown epidemic. But on the other hand, the system must also maintain a certain degree of fairness, so that high-priority tasks do not always preempt resources and starve low-priority tasks.
Task flow control : The typical usage scenario of task flow control is to cut peaks and fill valleys. For example, a user submits hundreds of thousands of tasks at one time and expects to process them slowly within a few hours. Therefore, the system needs to limit the distribution rate of tasks to match the ability of downstream tasks to execute. Task flow control is also an important means to ensure the reliability of the system. The amount of submitted tasks for certain types of tasks increases suddenly and explosively. The system should limit its impact on the system through flow control and reduce the impact on other tasks.
Batch suspend and delete tasks : In the actual production environment, it is very important to provide batch suspension and deletion of tasks. Users will always encounter various situations. For example, some problems occur in the execution of tasks. It is best to suspend the execution of subsequent tasks. After manual inspection, there is no problem, and then resume execution; or temporarily suspend low-priority tasks to release computing resources. to perform higher priority tasks. Another case is that the submitted task is problematic and the execution does not make sense. Therefore, the system should allow users to easily delete tasks that are being executed and queued. The suspension and deletion of tasks need to be implemented with the distribution and execution subsystems.

The architecture of task distribution can be divided into pull mode and push mode. Pull mode distributes tasks through a task queue. The instance that executes the task actively pulls the task from the task queue, and then pulls the new task after processing. Compared to the pull mode, the push mode adds a distributor role. The dispatcher reads tasks from the task queue, schedules them, and pushes them to the appropriate task execution instance.

The structure of the pull mode is clear, and a task distribution system can be quickly built based on popular software such as Redis, which performs well in simple task scenarios. However, if you want to support functions required by complex business scenarios such as task deduplication, task priority, batch suspension or deletion, and elastic resource expansion and contraction, the implementation complexity of the pull mode will increase rapidly. In practice, the pull mode faces some of the following main challenges:

Resource auto-scaling and load balancing are complex. The task execution instance establishes a connection with the task queue and pulls the task. When the scale of the task execution instance is large, it will cause great pressure on the connection resources of the task queue. Therefore, a layer of mapping and allocation is required, and task instances are only connected to the corresponding task queues. The figure below shows the architecture of Slack 's asynchronous task processing system. Worker nodes are only connected to some Redis instances. This solves the ability of worker nodes to scale massively, but increases the complexity of scheduling and load balancing.
From the perspective of supporting tasks such as priority, isolation and flow control, it is better to use different queues. However, there are too many queues, which increases the consumption of management and connection resources. It is very challenging to balance them.
Functions such as task deduplication, batch suspension or deletion of tasks rely on the message queue function, but few message products can meet all requirements and often need to be developed by themselves. For example, from the perspective of scalability, it is usually impossible to have a separate task queue for each type of task. When the task queue contains multiple types of tasks, it is more complicated to suspend or delete a certain type of tasks in batches.
The task type of the task queue is coupled with the task processing logic. If the task queue contains multiple types of tasks, the task processing logic is required to implement corresponding processing logic, which is not user-friendly. In practice, user A's task processing logic does not expect to receive other user tasks, so the task queue is usually managed by the user, which further increases the user's burden.

The core idea of the push mode is to decouple the task queue from the task execution instance, so that the boundary between the platform side and the user is clearer. Users only need to focus on the implementation of task processing logic, and the management of task queues and task execution node resource pools are all handled by the platform. The decoupling of the push mode also makes the expansion of task execution nodes no longer limited by the connection resources of the task queue, and can achieve higher flexibility. However, the push mode also introduces a lot of complexity. Task priority management, load balancing, scheduling and distribution, and flow control are all handled by the distributor, which needs to be linked with the upstream and downstream systems.
In general, when the task scene becomes complex, the system complexity is not low in either pull or push mode. However, the push mode makes the boundary between the platform and the user clearer, and simplifies the user's use complexity. Therefore, teams with strong technical strength usually choose the push mode when implementing a platform-level task processing system.

task execution

The task execution subsystem manages a batch of worker nodes that execute tasks and execute tasks in an elastic and reliable manner. A typical task execution subsystem needs to have the following functions:

Reliable execution of tasks . Once the task is submitted successfully, no matter what the situation, the system should ensure that the task is executed. For example, if the node executing the task is down, the task should be scheduled to other nodes for execution. The reliable execution of tasks is usually achieved by the cooperation of task distribution and task execution subsystems.
Shared resource pool . Different types of task processing resources share a unified resource pool, so as to cut peaks and fill valleys, improve resource utilization efficiency, and reduce costs. For example, by scheduling different types of tasks such as computing-intensive and IO-intensive to the same worker node, resources of multiple dimensions such as CPU, memory, and network on the node can be more fully utilized. The shared resource pool puts forward higher requirements for capacity management, task resource quota management, task priority management, and resource isolation.
Elastic scaling of resources . The system can scale and execute node resources according to the execution of the load to reduce costs. The timing and amount of scaling is critical. The common scaling is based on the CPU, memory and other resource water levels of the task execution node, which takes a long time and cannot meet the scenarios with high real-time requirements. Many systems also scale using metrics such as the number of queued tasks. Another point of concern is that the expansion of the execution node needs to match the capabilities of the upstream and downstream systems. For example, when the task distribution subsystem uses queues to distribute tasks, the expansion of worker nodes should match the connection capacity of the queues.
Task resource isolation . The resources are isolated from each other when multiple different tasks are executed on the worker nodes. Usually implemented using the container's isolation mechanism.
Task resource quota . Users have diverse usage scenarios, often including multiple task types and priorities. The system should support users to set resource quotas for tasks or processing functions with different priorities, reserve resources for high-priority tasks, or limit the resources that low-priority tasks can use.
Simplify the coding of task processing logic . A good task processing system allows users to focus on implementing a single task processing logic, and the system automatically executes tasks in parallel, elastically, and reliably.
Smooth upgrades . Upgrades to the underlying system do not interrupt the execution of long-running tasks.
Execution result notification . Real-time notification of task execution status and results. For tasks that fail to execute, the input of the task is saved in a dead letter queue, which is convenient for users to manually retry at any time.

The task execution subsystem usually uses the container cluster managed by K8s as a resource pool. K8s can manage nodes and schedule container instances that execute tasks to appropriate nodes. K8s also has built-in support for jobs (Jobs) and cron jobs (Cron Jobs), which simplifies the difficulty for users to use Job loads. K8s helps to achieve shared resource pool management, task resource isolation and other functions. However, the main capability of K8s is still in POD/instance management. In many cases, more functions need to be developed to meet the needs of asynchronous task scenarios. E.g:

The HPA of K8s is generally difficult to meet the automatic scaling in task scenarios. Open source projects such as Keda provide models that scale by metrics such as the number of queued tasks. AWS also provides a similar solution in conjunction with CloudWatch.
K8s generally needs to cooperate with queues to implement asynchronous tasks, and the management of queue resources requires users to be responsible.
The native job scheduling and startup time of K8s is relatively slow, and the tps for submitting jobs is generally less than 200, so it is not suitable for high tps and short-latency tasks.

Note: There are some differences between jobs in K8s and tasks discussed in this article. A K8s job usually involves processing one or more tasks. The task of this paper is an atomic concept, a single task is executed on only one instance. The execution time varies from tens of milliseconds to hours.

Practice of large-scale multi-tenant asynchronous task processing system

Next, the author takes the asynchronous task processing system of Alibaba Cloud Function Computing as an example to discuss some technical challenges and coping strategies of the large-scale multi-tenant asynchronous task processing system. On the Alibaba Cloud function computing platform, users only need to create a task processing function and submit the task. The processing of the entire asynchronous task is elastic, highly available, and has complete observability capabilities. In practice, we have adopted a variety of strategies to achieve isolation, scaling, load balancing and flow control in multi-tenant environments, and smoothly handle the highly dynamically changing loads of massive users.

Dynamic queue resource scaling and traffic routing

As mentioned earlier, asynchronous task systems usually require queues to implement task distribution. When the task processing platform corresponds to many business parties, it is no longer feasible to allocate separate queue resources for each application/function, or even each user. Because most applications are long-tailed, and infrequent calls will result in a large number of queues and waste of connection resources. And polling a large number of queues also reduces the scalability of the system.

However, if all users share the same batch of queue resources, they will face the classic "noisy neighbor" problem in multi-tenant scenarios. The burst load of application A will squeeze the processing capacity of the queue and affect other applications.

In practice, Function Compute builds a dynamic queue resource pool. At the beginning, some queue resources are preset in the resource pool, and the application hash is mapped to some queues. If the traffic of some applications grows rapidly, the system will adopt a variety of strategies:

If the application's traffic remains high and the queues are backlogged, the system will automatically create separate queues for them and offload the traffic to the new queues.
Migrate some latency-sensitive or high-priority application traffic to other queues to avoid queue backlogs caused by high-traffic applications.
Allows users to set the expiration time of tasks. For tasks with real-time requirements, quickly discard expired tasks when backlog occurs to ensure that new tasks can be processed faster.
load random sharding

In a multi-tenant environment, preventing "disruptors" from causing catastrophic damage to the system is the biggest challenge in system design. The saboteur could be a DDoS-attacked user, or in some corner cases just triggering a payload of a system bug. The figure below shows a very popular architecture, where all user traffic is sent evenly to multiple servers in a round-robin fashion. When all users' traffic is as expected, the system works well, each server is evenly loaded, and the downtime of some servers doesn't affect the overall service availability. But when there is a "disruptor", the availability of the system will be at great risk.

As shown in the figure below, assuming that the red user is DDoS attacked or some of his requests may trigger a bug that the server is down, then his load may overwhelm all servers, causing the entire service to be unavailable.

The essence of the above problem is that any user's traffic will be routed to all servers, and this model without any load isolation capability is quite vulnerable to "saboteurs". For any user, if his load will only be routed to some of the servers, can this be solved? As shown in the figure below, any user's traffic is routed to at most 2 servers. Even if the two servers are down, the green user's request is still unaffected. This load sharding mode, which maps the user's load to some but not all servers, can well implement load isolation and reduce the risk of service unavailability. The price is that the system needs to prepare more redundant resources.

Next, let's adjust how user load is mapped. As shown in the figure below, the load of each user is evenly mapped to the two servers. Not only is the load more balanced, but even better, even if both servers go down, user load other than red is not affected. If we set the partition size to 2, then there are C_{3}^{2}=3 combinations of selecting 2 servers from 3 servers, that is, 3 possible partitioning methods. Through the random algorithm, we evenly map the load to the partitions. If any partition becomes unserviceable, it will affect at most 1/3 of the load. Suppose we have 100 servers and the partition size is still 2, then there are C_{100}{2}=4950 partitions, and the unavailability of a single partition will only affect 1/4950=0.2% of the load. As the number of servers increases, the effect of random partitioning is more obvious. Random partitioning of loads is a very simple but powerful pattern that plays a key role in ensuring the availability of multi-tenant systems.

Adaptive task distribution for downstream processing capabilities

The task distribution of Function Compute adopts the push mode, so that users only need to focus on the development of task processing logic, and the boundary between the platform and the user is also very clear. In the push mode, there is a role of a task dispatcher, which is responsible for pulling tasks from the task queue and assigning them to downstream task processing instances. The task allocator should be able to adaptively adjust the task distribution speed according to the downstream processing capacity. When the user's queue is backlogged, we hope to continuously increase the task distribution capability of the dispatch worker pool; when the upper limit of the downstream processing capability is reached, the worker pool must be able to perceive this state and maintain a relatively stable distribution speed; , the work pool needs to shrink and release the distribution capability to other task processing functions.

In practice, we use the idea of tcp congestion control algorithm for reference, and adopt AIMD algorithm (Additive Increase Multiplicative Decrease) to expand and shrink the worker pool. When a user submits a large number of tasks in a short period of time, the allocator will not immediately send a large number of tasks to the downstream, but will linearly increase the distribution speed according to the "harmonious growth" strategy to avoid the impact on downstream services. When the flow control error of the downstream service is received, the "multiplicative reduction" strategy is adopted to shrink the worker pool according to a certain proportion. The flow control error needs to meet the thresholds of the error rate and the number of errors before triggering the shrinkage, so as to avoid the frequent expansion and shrinkage of the worker pool.

Send back pressure to upstream task producers

If the processing capacity of the task lags behind the production capacity of the task for a long time, the backlog of tasks in the queue will increase, although multiple queues can be used and traffic routing can be used to reduce the interaction between tenants. However, after the task backlog exceeds a certain threshold, the pressure should be more actively fed back to the upstream task producer, such as the request to start the flow control task submission. In the scenario of multi-tenancy and shared resources, the implementation of back pressure will be more challenging. For example, application A and application B share the resources of the task distribution system. If application A has a backlog of tasks, how to do it:

fair. Flow control A instead of B applications whenever possible. The essence of flow control is a probability problem. The flow control probability is calculated for each object. The more accurate the probability, the fairer the flow control.
timely. The back pressure should be transmitted to the outermost layer of the system. For example, when the task is submitted, flow control is applied to A, so that the impact on the system is minimal.

How to identify objects that need flow control in a multi-tenancy scenario is challenging. We borrowed the Sample and Hold algorithm in practice and achieved good results. Interested readers can refer to related papers.

Capability Layering of Asynchronous Task Processing System

According to the aforementioned analysis of the architecture and functions of the asynchronous task processing system, we divide the capabilities of the asynchronous task processing system into the following three layers:

Level 1 : Generally, a R&D team of 1-5 people is required. The system is built by integrating the capabilities of open source software/cloud services such as K8s and message queues. The capabilities of the system are limited by the open source software/cloud services it relies on, and it is difficult to customize it according to business needs. The use of resources is static, and it does not have the ability to scale and balance resources. The business scale that can be carried is limited. With the growth of business scale and complexity, the cost of system development and maintenance will increase rapidly.
Level 2 : A R&D team of 5-10 people is generally required. On the basis of open source software/cloud services, it has certain independent R&D capabilities to meet common business needs. It does not have complete task priority, isolation, and flow control capabilities, and usually configures different queues and computing resources for different business parties. The management of resources is relatively extensive and lacks real-time resource scaling and capacity management capabilities. The system lacks scalability and refined resource management capabilities, making it difficult to support large-scale complex business scenarios.
Level 3 : Generally, 10+ R&D teams are required to build platform-level systems. Ability to support large-scale and complex business scenarios. Using a shared resource pool, it has complete capabilities in task scheduling, isolation flow control, load balancing, and resource scaling. The boundaries between the platform and users are clear, and the business side only needs to focus on the development of task processing logic. Has full observability.

	Level 1	Level 2	Level 3
Reliable distribution of tasks	support	support	support
Task timing/delay sending	Depends on the selected message queue capability. Generally supports scheduled tasks, but does not support delayed tasks	support	support
task deduplication	not support	support	support
Task error auto-retry	Limited support. Generally rely on the built-in retry strategy of K8s Jobs. For tasks that do not use K8s Jobs, users need to implement it themselves in the task processing logic	Limited support. Generally rely on the built-in retry strategy of K8s Jobs. For tasks that do not use K8s Jobs, users need to implement it themselves in the task processing logic	support. Clear boundaries between the platform and the user, and retry according to the policy set by the user
task load balancing	Limited support. Implemented through message queues when task execution instances are small in scale	Limited support. Implemented through message queues when task execution instances are small in scale	support. The system has the load balancing capability of large-scale nodes
task priority	not support	Limited support. Allow users to reserve resources for high-priority tasks, or limit resource usage for low-priority tasks	support. High-priority tasks can preempt the resources of low-priority tasks, and the system will take into account fairness to prevent low-priority tasks from being starved to death
task flow control	not support	not support. Generally, independent queues and computing resources are configured for different task types or business parties	With flow control capability in every link of the system, the system will not submit avalanches in bursts due to tasks
Batch suspend/delete tasks	not support	Limited support. Depends on whether to configure separate queues and computing resources for different task types or business parties	support
Shared resource pool	Limited support. Depends on the scheduling capability of K8s. Generally, different clusters are built for each business party	Limited support. Depends on the scheduling capability of K8s. Generally, different clusters are built for each business party	support. Different types of tasks and different business scenarios share the same resource pool
Elastic scaling of resources	not support. The HPA of K8s is usually difficult to meet the scaling requirements in task scenarios	not support. The HPA of K8s is usually difficult to meet the scaling requirements in task scenarios	support. Real-time scaling according to the number of queued tasks, node resource utilization and other dimensions
task resource isolation	support. Depends on the resource isolation capability of containers	support. Depends on the resource isolation capability of containers	support. Depends on the resource isolation capability of containers
task resource quota	not support	support	support
Simplified task processing logic coding	not support. The task processing logic needs to pull the task by itself and execute the task	not support. The task processing logic needs to pull the task by itself and execute the task	support
Smooth system upgrade	not support	not support	support
Execution result notification	not support	not support	support
observability	It relies on the observability of open source software such as K8s and message queues. Have basic task status query	It relies on the observability of open source software such as K8s and message queues. Have basic task status query	Full observability from mission to system level

in conclusion

Asynchronous tasks are an important means of building resilient, highly available, and responsive applications. This paper introduces the applicable scenarios and benefits of asynchronous tasks, and discusses the architecture, functions and engineering practices of typical asynchronous task systems. To achieve a flexible and scalable asynchronous task processing platform that can meet the needs of various business scenarios, it has high complexity. Alibaba Cloud Function Compute FC provides users with out-of-the-box asynchronous task processing services that are close to Level ß3 capabilities. Users only need to create task processing functions and submit tasks through consoles, command line tools, API/SDK, event triggering, etc., and then they can process tasks in a flexible, reliable, and observable manner. Function Compute asynchronous tasks cover scenarios with task processing time ranging from milliseconds to 24 hours, and are widely used by Alibaba Cloud database self-made service DAS, Alipay applet stress testing platform, NetEase Cloud Music, New Oriental, Focus Media, Milian and other customers inside and outside the group.

appendix

A comparison of the capabilities of Function Compute asynchronous tasks and K8S Jobs.

Contrast	Function Compute asynchronous tasks	K8S Jobs
Applicable scene	Suitable for real-time tasks with task execution time of tens of milliseconds and offline tasks with task execution time of tens of hours	It is suitable for offline tasks with low requirements for task submission speed, relatively fixed task load, and low task real-time requirements.
mission observability	support. Provide logs, task queues and other indicators, task link time-consuming, task status query and other rich observable capabilities	Self-integrated open source software implementation.
Automatic scaling of task instances	support. According to the number of tasks queued, the instance resource utilization rate is automatically scaled up and down	not support. Generally, automatic scaling and instance load balancing are implemented by task queues, which are highly complex.
Task instance scaling speed	millisecond	minute level
Task instance resource utilization	Users only need to select the appropriate instance specification, the instance automatically scales, and is measured according to the actual processing time of the task, and the resource utilization rate is high.	The specification and number of instances need to be determined when the job is submitted. Instances are difficult to automatically scale and load balance, and resource utilization is low
task submission speed	A single user supports tens of thousands of tasks submitted per second	The entire cluster starts up to hundreds of jobs per second (Jobs)
Task timing/delay submission	support	Supports scheduled tasks, but does not support delayed tasks
task deduplication	support	not support
Pause/resume task execution	support	Alpha state (K8S v1.21)
Terminate the specified task	support	Limited support. Indirectly by terminating the task instance
task flow control	support. Flow control can be performed at different granularities such as users, task processing functions, etc.	not support
The task result is automatically called back	support	not support
DevOps cost	Only need to implement the processing logic of the task	K8S cluster needs to be maintained

NetEase Cloud Music Serverless Jobs practice, audio processing algorithm business landing speed is 10x faster
Other asynchronous task cases

If you get something, give it a like

For more content, pay attention to the Serverless WeChat official account (ID: serverlessdevs), which brings together the most comprehensive content of serverless technology, regularly holds serverless events, live broadcasts, and user best practices.

Asynchronous task processing system, how to solve the problem of long time-consuming and high concurrency of business?

Task Processing System Architecture

Task API/Dashboard

task distribution

task execution

Practice of large-scale multi-tenant asynchronous task processing system

Dynamic queue resource scaling and traffic routing

load random sharding

Adaptive task distribution for downstream processing capabilities

Send back pressure to upstream task producers

Capability Layering of Asynchronous Task Processing System

in conclusion

appendix

Serverless

引用和评论

MCP Server 之旅第 5 站：服务鉴权体系解密

嘎嘎好用！推荐三款开源的 Redis 桌面客户端！

GitHub 趋势日报 (2025年04月20日)

GitHub 趋势日报 (2025年04月17日)

GitHub 趋势日报 (2025年04月21日)

2025年GitHub Star增长最快的15个开源低代码项目

MCP Server 实践之旅第 3 站：MCP 协议亲和性的技术内幕