Four Tips for Serverless Application Optimization

Under the serverless architecture, although we focus more on our business code, in fact, we also need to pay attention to some configurations and costs, and when necessary, we need to configure our serverless applications according to the configuration and costs. Optimization and code optimization.

Resource assessment remains important

Although the serverless architecture is paid by volume, it does not mean that it is necessarily lower than the traditional server rental fee. If we do not evaluate our own projects accurately and set some indicators unreasonably, the cost incurred by the serverless architecture may be huge. of.

In general, the charges for a FaaS platform are directly related to three indicators:

Configured memory specifications;
the time spent by the program;
and the traffic charges incurred.

Usually, the time consumed by the program may be related to the memory specification and the business logic handled by the program itself. The traffic cost is related to the size of the data packets that the program itself interacts with the client. Therefore, among these three common indicators, the memory specification may cause a large deviation in billing due to irregular configuration. Taking Alibaba Cloud Function Compute as an example, we assume that there is a Hello World program that is executed 10,000 times a day, and the costs incurred by instances of different specifications (excluding network costs) can be counted:

Ali Cloud
As can be seen from the above table, when the program can be executed normally in the memory of 128MB specification, if we mistakenly set the memory specification to 3072MB, the monthly cost may increase by 25 times! Therefore, before we launch serverless applications, we need to evaluate resources in order to get a more reasonable configuration to further reduce our costs.

Reasonable code package specification

Each cloud vendor's FaaS platform has restrictions on the size of the code package. The cloud vendor's restrictions on the code package are discarded, and the possible impact of the code package specification can be seen through the cold start process of the function:

image
In the process of function startup, there is a process of loading code, then when the code package we upload is too large, or the decompression speed is too slow due to too many files, it will directly cause the process of loading code to become longer, further Directly lead to a longer cold start time.

It can be imagined that when we have two compressed packages, one is a code compressed package with only 100KB, and the other is a code compressed package of 200MB, both of which are idealized under the gigabit intranet bandwidth at the same time (that is, regardless of disk storage) speed, etc.) download, even if the maximum speed can reach 125MB/S, the download speed of the former is less than 0.01s, and the latter takes 1.6s. In addition to the download time, there is also the decompression time of the file, so the cold start time of the two may differ by 2s.

Under normal circumstances, if a traditional web interface needs a response time of more than 2s, it is actually unacceptable for many businesses, so when we package the code, we must reduce the compressed package size as much as possible. Taking the Node.js project as an example, when packaging the code package, you can use Webpack and other methods to compress the size of the dependency package, further reduce the overall code package size, and improve the cold start efficiency of the function.

Reasonable use of instance reuse

In the FaaS platforms of various cloud vendors, in order to better solve the problem of cold start and use resources more reasonably, there is a situation of "instance" reuse. The so-called instance reuse means that when an instance completes a request, it will not be released, but will enter a "silent" state. Within a certain time range, if a new request is allocated, the corresponding method will be called directly without the need to initialize various resources, which greatly reduces the occurrence of cold start of functions. To verify, we can create two functions:
Function 1:

-- coding: utf-8 --

def handler(event, context):
print("Test")
return 'hello world'

Function 2:

-- coding: utf-8 --

print("Test")

def handler(event, context):
return 'hello world'

We click the "Test" button on the console several times to test these two functions to determine whether they output "Test" in the log, and we can count the results:

image
According to the above situation, we can see that the situation of instance reuse actually exists. Because "function 2" does not execute some statements outside the entry function every time. According to "Function 1" and "Function 2", we can also think further, if the print("Test") statement is to initialize a database connection, or to load a deep learning model, is "Function 1" written every time Each request will be executed, and the writing of "Function 2" can reuse existing objects?

So in the actual project, there are some initialization operations that can be implemented according to "Function 2", for example:

In the machine learning scenario, load the model during initialization to avoid the efficiency problem caused by loading the model every time the function is triggered, and improve the response efficiency in the instance reuse scenario;
For link operations such as databases, link objects can be established during initialization to avoid creating link objects for each request;
For other scenarios that need to download files and load files when they are loaded for the first time, implement these requirements during initialization, which can be more efficient when instances are reused;
Make good use of functional features

The FaaS platforms of various cloud vendors have some "platform features". The so-called platform features mean that these functions may not be the capabilities specified in the "CNCF WG-Serverless Whitepaper v 1.0" or the capabilities described, but are only used as cloud platforms. According to its own business development and demands, the functions excavated from the user's point of view and realized may only be functions possessed by a certain cloud platform or several cloud platforms. Under normal circumstances, if these functions are used properly, our business performance will be improved qualitatively.

1、Pre-freeze & Pre-stop

Taking Alibaba Cloud Function Computing as an example, in the process of platform development, user pain points (especially the smooth migration of traditional applications to serverless architecture) are as follows:

Asynchronous Background Metrics Data Delay or Loss: If not sent successfully during a request, it may be delayed until the next request, or data points may be dropped.
Synchronous sending of metrics increases latency: If a Flush-like interface is called after each request ends, it not only increases the latency of each request, but also creates unnecessary pressure on backend services.
The function goes offline gracefully: When the instance is closed, the application needs to clean up the connection, close the process, and report the status. In Function Compute, developers cannot grasp the timing of instance offline, and there is also a lack of Webhook to notify function instance offline events.

Runtime extensions are released based on these pain points. This functionality extends the existing HTTP server programming model by adding PreFreeze and PreStop webhooks to the existing HTTP server model. The extension developer implements the HTTP handler and monitors the life cycle events of the function instance, as shown in the following figure:

image

PreFreeze: Every time Function Compute decides to freeze the current function instance, Function Compute will call the HTTP GET /pre-freeze path. The extension developer is responsible for implementing the corresponding logic to ensure the necessary operations before the instance is frozen, such as waiting for metrics to be sent successfully Wait. The time when the function calls InvokeFunction does not include the execution time of the PreFreeze Hook.

image

PreStop: Before each time Function Compute decides to stop the current function instance, the Function Compute service will call the HTTP GET /pre-stop path. The extension developer is responsible for implementing the corresponding logic to ensure that the necessary operations before the instance is released, such as closing the database connection, and Report, update status, etc.

image

2. Single instance with multiple concurrency

As we all know, the function computing of various manufacturers is usually isolated at the request level, that is, when the client initiates three requests to the function computing at the same time, three instances will theoretically be generated to deal with it. At this time, the cold start problem may be involved. It involves state correlation between requests, etc., but some cloud vendors provide the capability of single instance multi-concurrency (such as Alibaba Cloud Function Compute), which allows users to set an instance concurrency (InstanceConcurrency) for a function, that is, a single function instance can How many requests to process at the same time.

As shown in the figure below, assuming that there are 3 requests to be processed at the same time, when the instance concurrency is set to 1, Function Compute needs to create 3 instances to process these 3 requests, and each instance handles 1 request; when the instance concurrency is set to 1 When set to 10 (that is, 1 instance can handle 10 requests at the same time), Function Compute only needs to create 1 instance to handle these 3 requests.

Single instance multi-concurrency effect diagram

The advantages of single instance multiple concurrency are as follows:

Reduce execution time and save costs. For example, functions with partial I/O can be processed concurrently within an instance, reducing the number of instances and thus reducing the overall execution time.
State can be shared between requests. Multiple requests can share the database connection pool within one instance, thereby reducing the number of connections to the database.
Reduce cold start probability. Since multiple requests can be handled within one instance, fewer new instances are created and the probability of cold starts is reduced.
Reducing the occupation of VPC IPs Under the same load, a single instance with multiple concurrency can reduce the total number of instances, thereby reducing the occupation of VPC IPs.

The application scenarios of single-instance multi-concurrency are relatively wide. For example, the scenario where the function has a lot of time waiting for the response of the downstream service is more suitable for using this function, but the single-instance multi-concurrency is not suitable for all application scenarios. For example, when The function has shared state and cannot be accessed concurrently. The execution of a single request consumes a lot of CPU and memory resources, so it is not suitable to use the single-instance multi-concurrency function.

For more content, pay attention to the Serverless WeChat official account (ID: serverlessdevs), which brings together the most comprehensive content of serverless technology, regularly holds serverless events, live broadcasts, and user best practices.

Four Tips for Serverless Application Optimization

Resource assessment remains important

Reasonable code package specification

Reasonable use of instance reuse

-- coding: utf-8 --

-- coding: utf-8 --

Make good use of functional features

1、Pre-freeze & Pre-stop

2. Single instance with multiple concurrency

Serverless

引用和评论

MCP Server 之旅第 5 站：服务鉴权体系解密

嘎嘎好用！推荐三款开源的 Redis 桌面客户端！

GitHub 趋势日报 (2025年04月20日)

GitHub 趋势日报 (2025年04月17日)

GitHub 趋势日报 (2025年04月21日)

2025年GitHub Star增长最快的15个开源低代码项目

一键部署 Dify + MCP Server，高效开发 AI 智能体应用