Spring Boot Serverless Combat Series | Performance Tuning - 阿里巴巴云原生

Author: Xiliu | Alibaba Cloud Function Computing Expert

Introduction: Spring Boot is a suite based on the Java Spring framework. It pre-installs a series of Spring components, allowing developers to create stand-alone applications with minimal configuration. In a cloud-native environment, there are a large number of platforms that can run Spring Boot applications, such as virtual machines, containers, etc. But the most attractive of them all is to run Spring Boot applications in a serverless fashion.

Through a series of articles, I will analyze the advantages and disadvantages of running Spring Boot applications on the Serverless platform from five aspects: architecture, deployment, monitoring, performance, and security. In order to make the analysis more representative, I chose the e-commerce application mall with more than 50k stars on Github as an example. This is the fourth article in a series that shows you how to tune serverless applications for performance.

Instance startup speed optimization

In the actual combat tutorial of the previous article, I believe that everyone can feel the beauty of the convenience of Serverless. You can easily launch a flexible and highly available web application just by uploading the code package and mirroring.

However, it still has the problem of "cold start delay" for the first startup. The startup of the Mall application instance is about 30 seconds, and users will experience a long cold start delay. In this "instant era", the application response will be somewhat slow. The flaws do not hide the beauty. (“Cold start” refers to the state when a function serves a specific call request. When there is no request for a period of time, the serverless platform will recycle the function instance; when there is another request next time, the system will pull up the instance again in real time. This process Call it a cold start.)

Before optimizing the cold start, we must first analyze the time-consuming of each stage of the cold start.

First, enable the link tracking function on the service configuration interface of the Function Compute (FC) console.

在这里插入图片描述

Initiate a request to the mall-admin service, check the FC console after success, and we can see the corresponding request information. Make sure to turn off "view function errors only" so that all requests will be displayed. There will be a certain delay in indicator monitoring and invoking link data collection. If it is not displayed, please wait for a while before refreshing. Find the cold start flagged request and click Request Details under More.

在这里插入图片描述

The call link shows the time taken for each link of the cold start. Cold start includes the following steps:

code preparation (PrepareCode): mainly download the code package or mirror. Since we have enabled the image acceleration function, we do not need to download the entire image, so the delay in this step is very short.
Runtime initialization (RuntimeInitialization): from the start of the function until the Function Compute (FC) system detects that the application port is ready. This includes the application startup time. Execute s mall-admin logs on the command line to view the corresponding log time, we can also see that the startup of the Spring Boot application takes a lot of time.
Application initialization (Initialization): Function Compute provides the Initializer interface, and users can put some initialization logic in the Initializer to execute.
call delay (Invocation): The delay of processing the request, this delay is very short.

在这里插入图片描述

From the above link trace diagram, instance startup time is the bottleneck, but we can take many ways to optimize.

Use Reserved Instances

Java-like applications generally start slowly. When the application is initialized, it also needs to interact with many external services, which takes a long time. This kind of process is required by business logic, and it is difficult to optimize the delay. Therefore, Function Compute provides Reserved Instance functionality. The start and stop of a reserved instance is controlled by the user, and it will stay there if there is no request, so there is no cold start problem. Of course, the user needs to pay for the operation of the entire instance, even if the instance does not process any requests.

In the Function Compute console, we can set reserved instances for functions on the "Auto Scaling" page.

在这里插入图片描述

The user configures the minimum and maximum number of instances in the console. The platform will reserve instances with the minimum number of instances, and the maximum instance refers to the upper limit of instances under this function. Users can also set rules for timed reservations and reservations by metrics.

在这里插入图片描述

Once a reservation rule is created, a reserved instance is created. When the reserved instance is ready, we will not have a cold start when we access the function again.

在这里插入图片描述

Optimize instance startup speed

lazy initialization

In Spring Boot 2.2 and higher, a global lazy initialization flag can be turned on. This will speed up startup, but at the cost of potentially longer latency for the first request, as it needs to wait for the component to initialize for the first time.

The following environment variables can be configured in s.yaml for the relevant application

SPRING_MAIN_LAZY_INITIATIALIZATION=true

Turn off optimizing compiler

By default, the JVM has multiple stages of JIT compilation. While these phases can gradually improve the efficiency of the application, they also increase the overhead of memory usage and increase startup time. For short-running serverless applications, consider turning this optimization off to trade long-term efficiency for shorter startup times.

The following environment variables can be configured in s.yaml for the relevant application:

JAVA_TOOL_OPTIONS="-XX:+TieredCompilation -XX:TieredStopAtLevel=1"

Example of setting environment variables in s.yaml:

As shown in the figure below, configure environment variables for the mall-admin function. Then execute sudo -E s mall-admin deploy to deploy.

在这里插入图片描述

Find the corresponding request in the request list on the console function details page, and click the "Instance Details Link" in More.

在这里插入图片描述

On the instance details page, click "Login to Instance".

在这里插入图片描述

Execute the echo command in the shell interface to check whether the corresponding environment variables are set correctly.

Note: For non-reserved instances, the Function Compute system will automatically reclaim the instance after a period of no requests. It is no longer possible to log in to the instance at this point (the login instance button in the instance details page above will be grayed out). So please log in as soon as possible after executing the call before the instance is recycled.

在这里插入图片描述

Configure reasonable instance parameters

When we choose the application instance size, such as 2C4G or 4C8G, we want to know how many requests an instance handles to fully utilize resources and ensure performance. When the processed requests exceed a limit, the system can quickly pop up instances to ensure smooth application performance.

How to measure instance overload has multiple dimensions, such as qps exceeding a certain threshold, or instance CPU/Memory/Network/Load and other indicators exceeding the threshold, etc. Function Compute uses Instance Concurrency as a measure of instance load and a basis for instance scaling.

Instance concurrency refers to the number of requests that an instance can execute at the same time. For example, setting the instance concurrency to 20 means that an instance can execute a maximum of 20 requests at any time.

Note: Please distinguish the difference between instance concurrency and QPS.

Using instance concurrency to measure load has the following advantages:

The system can quickly count the value of the instance concurrency index to expand and shrink the capacity. Instance-level metrics such as CPU/Memory/Network/Load are usually background statistics, and it takes dozens of seconds for metrics statistics to be scaled, which is difficult to meet the elastic scaling requirements of online applications.
Under various conditions, the instance concurrency index can stably reflect the system load level. If the request delay is used as an indicator, it is difficult for the system to distinguish whether the overload of the instance causes the delay to increase, or the downstream service becomes the bottleneck and the delay increases. For example, a typical Web application usually accesses a MySQL database. If the database becomes the bottleneck and the request delay increases, the expansion will not only make no sense at this time, but will overwhelm the database and make the situation worse. QPS is related to request delay, and there are also the above problems.

Although instance concurrency as a scaling basis has the above advantages, users often do not know how much instance concurrency should be set. I recommend following the process below to determine a reasonable degree of concurrency:

Set the maximum number of instances of the application function to 1 to ensure that the performance of a single instance is measured.

Use the load stress testing tool to stress the application and view metrics such as tps and request latency.

Gradually increase the instance concurrency. If the performance is still good, continue to increase it; if the performance does not meet expectations, decrease the concurrency.

Spring Boot Serverless Combat Series | Performance Tuning

Instance startup speed optimization

Use Reserved Instances

Optimize instance startup speed

lazy initialization

Turn off optimizing compiler

Configure reasonable instance parameters

Related Links

阿里云云原生

引用和评论

Log/Trace/Metric 完成 APIServer 可观测覆盖

SpringMVC-ResponseBodyAdvice

本地玩转 DeepSeek 和 Qwen 最新开源版本（入门+进阶）

【Spring】@Size 无法拦截null的原因

SpringMVC-@InitBinder

K8s 小白入门｜从电影配乐谈起，聊聊容器编排和 K8s

面向教育场景的大模型 RAG 检索增强解决方案