Exploration and Practice of Serverless Elasticity under Cloud Native System

Author: Jing Xiao

The Coming of the Serverless Era

Serverless, as the name suggests, is a "serverless" architecture, because it shields the various operation and maintenance complexity of the server, allowing developers to devote more energy to the design and implementation of business logic. Under the serverless architecture, developers only need to focus on the development of upper-level application logic, and complex server-related operations such as resource application, environment construction, load balancing, capacity expansion, and so on are all maintained by the platform. In the cloud native architecture white paper, the characteristics of Serverless are summarized as follows:

fully managed computing service. customers only need to write code to build applications, without paying attention to the development, operation and maintenance, security, and high availability of homogeneous and burdensome server-based infrastructure;
universal, can support all important types of applications on the cloud;
automatically elastically need for capacity planning for resource usage in advance;
by volume, and allows enterprises to effectively reduce the cost of use, without having to pay for idle resources.

Looking back at the entire development process of Serverless, we can see that from the time when the concept of Serverless was first proposed in 2012 as the starting point to the launch of the Lambda cloud product by AWS, people’s attention to Serverless has exploded. The expectation and imagination gradually detonated the entire industry, but the process of serverless promotion and production landing is not optimistic. The existence of Gap in the process of serverless concept and practical production challenges people's inherent user experience and habits.

Alibaba Cloud firmly believes that serverless will be the deterministic development direction after cloud native, and has successively launched a variety of cloud products such as FC and SAE to cover different fields, different types of application loads use serverless technology, and continue to promote the popularization of the entire serverless concept And development.

As far as the current serverless market is concerned, Alibaba Cloud has achieved the number one serverless product capability in China, leading the world. In the Magic Quadrant of the Forrester evaluation last year, it is obvious that Alibaba Cloud has been on par with AWS in the serverless field. At the same time, Alibaba Cloud Serverless users accounted for the first place in China. In the 2020 China Cloud Native User Survey Report, the overall Alibaba Cloud Serverless users accounted for 66%. The survey on the adoption of Serverless technology shows that there have been more and more More developers and enterprise users apply serverless technology to their core business or will apply it to their core business.

Serverless flexible exploration

As one of the core capabilities of the cloud, elasticity is concerned with the contradiction between capacity planning and actual cluster load. Through the comparison of the two figures, it can be seen that if the pre-planning method is used for resource arrangement, it will be due to the amount of resource preparation. The mismatch with the resource demand leads to waste of resources or insufficient resources, which in turn leads to excessive cost and even business damage. We expect extreme flexibility. The prepared resources and the actual demand resources almost match, so that The overall resource utilization rate of the application is relatively high, and the cost increases and decreases with the increase or decrease of the business and the corresponding increase or decrease. At the same time, there will be no capacity problems that affect the availability of the application. This is the value of flexibility.

elasticity is divided into scalability and fault tolerance in its implementation. Scalability means that the underlying resources can have a certain degree of self-adaptability with reference to changes in indicators, while fault tolerance is to ensure that the application or service in the service is guaranteed through elastic self-healing. The instance is in a healthy state. The value and benefits of the above capabilities are to reduce costs while improving application availability. On the one hand, the resource usage matches the actual consumption of the application. On the other hand, it improves the peak application availability, thereby flexibly adapting to the continuous development and changes of the market.

The following will explain and analyze the three more common elastic scaling modes.

The first is IaaS elastic scaling. Its representative product is the elastic scaling of cloud servers of various cloud vendors. For example, Alibaba Cloud ESS can trigger the corresponding increase or decrease of ECS by configuring the alarm rules of cloud monitoring. At the same time, it supports the dynamic increase or decrease of Slb back-end servers and rds whitelist to ensure availability, through the health check function to achieve flexible self-healing capabilities. ESS defines the concept of a scaling group, which is the basic unit of elastic scaling. It is a collection of ECS instances in the same application scenario and associated Slb and Rds. It also supports multiple scaling rules, such as simple rules, progress rules, target tracking rules, and prediction rules. And so on, the user's use process is to create scaling groups and scaling configurations, create scaling rules, and monitor and view the execution of elasticity.

Kubernetes elastic scaling. The main focus here is on horizontal elastic Hpa. Its representative product is K8s and its corresponding managed cloud products, such as Alibaba Cloud Container Service. K8s, as an application-oriented operation and maintenance infrastructure and Platform for Platform, provides built-in capabilities It mainly revolves around container-level management and orchestration, while elasticity focuses on the dynamic horizontal scaling of the underlying Pod. K8s hpa polls the monitoring data of the pod and compares it with the target expected value, and calculates it in real time through an algorithm. Generate the desired number of copies, and then increase or decrease the number of Workload copies. In actual use, users need to create and configure corresponding indicator sources, elastic rules, and corresponding workloads. You can view the elastic execution status through events.

Finally, I will introduce the elastic scaling of application portraits, which are mainly used within Internet companies, such as the Ali ASI capacity platform. The capacity platform provides capacity prediction services and capacity change decision-making services, guides the underlying capacity change components such as AHPA/VPA to implement capacity elastic expansion, and revises the capacity profile based on the elastic results. Use image-driven as the main + index-driven as a supplement to achieve elastic scaling capabilities, and reduce the risk of elastic scaling through advance scaling + real-time correction.

The entire elastic scaling will use odps and machine learning capabilities to process instance monitoring and other data and generate application portraits, such as benchmark portraits, elastic portraits, big promotion portraits, etc., and use the capacity platform to complete portrait injection, change control, and fault fuse operations. . The user's use process is application access, based on historical data/experience to generate corresponding capacity portraits, real-time monitoring index correction portraits, and monitoring and viewing of flexible implementation.

It can be seen from the comparison that the elastic scaling function modes of each product are basically the same in abstract terms. They are all composed of trigger sources, flexible decision-making and trigger actions. The trigger source generally relies on an external monitoring system to collect and process node indicators and application indicators. Decisions are generally based on periodic polling and algorithmic decisions. Some are based on historical data analysis and prediction and user-defined timing strategies. The trigger action is to horizontally expand and shrink the instance, and provide change records and external notifications. On this basis, each product is competitive with scene richness, efficiency, and stability, and enhances the transparency of the elastic system through observability, which facilitates troubleshooting and guides elastic optimization, and at the same time enhances user experience and stickiness.

There are also certain differences in the elastic scaling models of various products. For IaaS elastic scaling, as an old-brand elastic scaling capability, it has a long settling time, powerful and rich functions, and the capabilities of cloud vendors tend to be homogeneous. Compared with containers, elastic efficiency is limited, and the respective underlying Iaas resources are strongly bound. As an open source product, Kubernetes continuously optimizes its iterative flexibility and best practices through the power of the community, which is more in line with the demands of most development and operation personnel. The elastic behavior and API are highly abstracted, but its scalability is not strong and cannot support custom requirements. The flexible scaling of application portraits has the internal characteristics of the group. It is designed according to the group's application status and flexibility requirements, and focuses more on the cost optimization of resource pool budget, shrinking risks, complexity and other pain points. It is not easy to copy and expand, especially not suitable for external small and medium customers.

From the final goal, it can be seen that the direction of public cloud and Internet companies are different:

Internet companies often have significant flow characteristics for their internal applications, rely on more application startups, slow speed, and have many organizational demands on the overall resource pool capacity level, inventory financial management, and away from the online mixing department, so they are more based on capacity. The main purpose of the image is to expand elastically in advance, and the capacity data calculated based on Metrics is used as real-time correction. The goal is that the capacity image is accurate enough to achieve the expected goal of resource utilization.
Public cloud vendors serve external customers, provide more general and universal capabilities, and meet the differentiated needs of different users through scalability. Especially in the serverless scenario, more emphasis is placed on the application's ability to cope with sudden traffic. Its goal is to eliminate the need for capacity planning, and achieve almost on-demand use of application resources and availability of services throughout the process through indicator monitoring and extreme flexibility.

Flexible landing

Serverless, as the best practice of cloud computing, the direction of cloud native development and the future evolution trend, its core value lies in fast delivery, intelligent flexibility, and lower cost.

In the context of the times, SAE came into being. SAE is an application-oriented Serverless PaaS platform that supports mainstream development frameworks such as Spring Cloud and Dubbo. Users can directly deploy applications to SAE with zero-code transformation and use them on demand. Billing can give full play to the advantages of Serverless to save customers the cost of idle resources. At the same time, the experience is fully managed and free of operation and maintenance. Users only need to focus on core business development, while application lifecycle management, microservice management, and logs. Monitoring and other functions are completed by SAE.

The competitiveness of elasticity mainly lies in the competitiveness of scene richness, efficiency, and stability. First, let's talk about the optimization of SAE's elastic efficiency.

Through data statistics and visual analysis of the entire life cycle of SAE applications, it includes scheduling, init container creation, pulling user images, creating user containers, starting user containers & applications. Simplified than. We can see that the entire application life cycle is time consuming to focus on the stages of scheduling, pulling user images, and application cold start.

For the scheduling phase, the time consuming is mainly because SAE will currently perform the operation of opening up the user's VPC. Because this step is strongly coupled to the scheduling, it takes a long time, and there are situations such as long tail creation timeout, failure retry, etc., resulting in scheduling links The overall time-consuming is longer. The question that arises is whether the scheduling speed can be optimized? Can I skip the scheduling phase? For pulling user images, it includes the length of time for pulling the image and decompressing the image, especially in the case of large-capacity image deployment.

The idea of optimization is whether the cache can be optimized for pulling the image, and whether the decompression image can be optimized. As for application cold start, SAE has a large number of JAVA applications with monomers and microservices. JAVA-type applications often rely on many startups, load configuration is slow, and the initialization process is long, resulting in cold start speeds often reaching the minute level. The direction of optimization lies in whether the cold start process can be avoided and the user is as insensitive as possible, and the application is not modified.

First, SAE adopted the in-place upgrade capability. SAE initially used the K8s native Deployment rolling upgrade strategy for the release process. A new version of the Pod will be created first, and then the old version of the Pod will be destroyed for upgrade. The so-called in-place upgrade means that only the Pod is updated. Upgrade of one or more container versions without affecting the entire Pod object and other containers. The principle is to use the K8s patch capability to upgrade the container in situ, and to use the K8s readinessGates capability to achieve no loss of traffic during the upgrade process.

In-situ upgrade brings many values to SAE, the most important of which is to avoid rescheduling and avoid Sidecar container (ARMS, SLS, AHAS) reconstruction, which makes the entire deployment time-consuming from consuming the entire Pod life cycle to just pulling and creating At the same time, the business container does not need to be scheduled, so new images can be cached on the Node in advance to improve elastic efficiency. SAE uses the Cloneset provided by the Alibaba open source Openkruise project as the new application load, and with the in-situ upgrade capability provided by it, the overall elastic efficiency is increased by 42%.

At the same time, SAE adopts the image preheating capability, which includes two preheating forms: preheating before scheduling. SAE will cache the common basic image on a full node to avoid frequent pulls from the remote end. At the same time, for batch scenarios, it supports preheating during scheduling. With the help of Cloneset's in-situ upgrade capability, you can perceive the node distribution of the instance during the upgrade process, so that you can deploy the new version of the image in the first batch of images. The node where the instances of the subsequent batches are located performs image pre-pull, and then realizes the parallelism of scheduling and pulling user images. Through this technology, SAE's elastic efficiency has increased by 30%.

The optimization point just described is to pull the image part. For decompressing the image, the traditional container operation needs to download the full amount of image data and then unpack it. However, the container startup may only use part of the content, which causes the container startup to take a long time. SAE uses the image acceleration technology to automatically convert the original standard image format into an accelerated image that supports random reading, which can realize the full download and online decompression of image data, which greatly improves the efficiency of application distribution. At the same time, it can also use the p2p distribution capability provided by Acree Effectively reduce the time of image distribution.

For the slow cold start of Java applications, SAE and Dragonwell 11 provide an enhanced AppCDS startup acceleration strategy. AppCDS is Application Class Data Sharing. Through this technology, you can obtain the Classlist at application startup and dump the shared class files in it. When the application starts again, the shared file can be used to start the application, thereby effectively reducing the time-consuming cold start. Mapped to the deployment scenario of SAE, the corresponding cache file will be generated in the shared NAS after the application is started, and the cache file can be used to start the next release. The overall cold start efficiency is increased by 45%.

In addition to optimizing the efficiency of the entire application lifecycle, SAE also optimizes elastic scaling. The entire elastic scaling process includes three parts: elastic index acquisition, index decision-making, and execution of elastic scaling operations. For the acquisition of elastic indicators, the basic monitoring indicator data has reached the second level, and for the seven-layer application monitoring indicators, SAE is planning to adopt a traffic transparent interception scheme to ensure the real-time performance of indicator acquisition. In the elastic decision-making phase, the elastic component enables multiple queues to concurrently perform Reconcile, and monitors queue accumulation and delay in real time.

SAE elastic scaling includes a powerful indicator matrix, rich strategy configuration, complete notification and alarm mechanism and all-round observability, and supports multiple data sources: native MetricsServer, MetricsAdapter, Prometheus, cloud products SLS, CMS, SLB, and external Gateway routing, etc., supports multiple indicator types: CPU, MEM, QPS, RT, TCP connection number, number of bytes in and out, disk usage, Java thread number, GC number and custom indicators. After capturing and preprocessing the indicators, you can customize the elastic strategy to adapt to the specific scenarios of the application: fast expansion and fast contraction, fast expansion and slow contraction, only expansion but not contraction, only contraction but not expansion, DRYRUN, adaptive expansion and contraction Wait. At the same time, more refined elastic parameter configuration can be carried out, such as the upper and lower limits of the instance, the index interval, the step ratio range, the cooling and warm-up time, the index collection period and the aggregation logic, the CORN expression, and subsequent event-driven capabilities will also be supported. .

After the elastic trigger, the corresponding expansion and contraction operations will be performed, and the flow will be cut to ensure that the traffic is not damaged, and the perfect notification and alert capabilities (Dingding, webhook, phone, email, SMS) can be used to reach and inform users in real time. Elastic scaling provides a full range of observability, clear display of flexible decision time and decision context, and enables the instance status to be traceable and the instance SLA can be monitored.

The flexibility of SAE also has corresponding competitiveness in terms of scene richness. Here are four scenarios currently supported by SAE:

timing flexibility: when the application traffic load cycle is known. The number of application instances can be regularly expanded and contracted according to the time, week, and date cycle, such as keeping 10 in the time period from 8 am to 8 pm The number of instances corresponds to the daytime traffic, and during the rest of the time, the number of instances is maintained at 2 or even reduced to 0 due to low traffic. It is suitable for application scenarios where the resource usage rate is cyclically regular, and is mostly used in securities, medical, government, and education industries.
index flexibility: can be configured with the desired monitoring index rules. SAE will stabilize the applied index within the configured index rules, and adopt the fast expansion and slow contraction mode by default to ensure stability. For example, the target value of the CPU indicator of the application is set to 60%, the qps is set to 1000, and the range of the number of instances is 2-50. This kind of application scenario suitable for burst traffic and typical periodic traffic is mostly used in industries such as the Internet, games, and social platforms.
Hybrid Elasticity: combines timing elasticity with index elasticity, and can be configured with index rules under different times, weeks, and dates, so as to respond more flexibly to the needs of complex scenarios. For example, the target value of the CPU indicator is set to 60% for the time period from 8 am to 8 pm, the range of the number of instances is 10-50, and the range of the number of instances is reduced to 2-5 for the rest of the time, which is suitable for both resource utilization and periodic The application scenarios of regularity and burst traffic and typical periodic traffic are mostly used in industries such as the Internet, education, and catering.
Adaptive Elasticity: SAE is optimized for traffic surge scenarios. With the help of the traffic surge window, calculate whether the current indicator has a traffic surge problem at this moment, and calculate the instances required for expansion according to the intensity of the traffic surge It will increase a part of the redundancy, and in the surge mode, shrinking is not allowed.

Stability is a very important part of the process of building SAE's resiliency. It is the focus of attention to ensure that user applications expand and contract according to the expected behavior during the resiliency process, and to ensure the availability of the entire process. SAE elastic scaling generally follows the principle of fast expansion and slow contraction, and guarantees execution stability through multi-level smooth anti-shake. At the same time, for scenarios where indicators increase rapidly, the capacity is expanded in advance with the help of adaptive capabilities. SAE currently supports four levels of flexible and smooth configuration to ensure stability:

First-level smoothing: configure the indicator acquisition cycle, the time window for a single indicator acquisition, and the indicator calculation aggregation logic;
Second-level smoothing: configure the tolerance of index values and interval flexibility;
Three-level smoothing: configure the expansion and contraction step size, percentage, and upper and lower limits per unit time;
Four-level smoothing: configure the expansion and contraction cooling window and the warm-up time of the instance.

Serverless elastic best practices

SAE elastic scaling can effectively solve the application of automatic expansion when the instantaneous flow peak arrives, and automatically shrinks after the peak. High reliability, free operation and maintenance, and low cost guarantee the smooth operation of applications. , it is recommended to follow the following best practices for flexible configuration.

Configure health check and life cycle management

It is recommended to configure the application health check to ensure the overall availability of the application during the elastic scaling process, to ensure that your application only receives traffic when it is up, running, and ready to receive traffic. It is also recommended to configure lifecycle management Prestop to ensure scalability When your application is gracefully offline as expected.

Use exponential retry mechanism

In order to avoid service invocation abnormalities caused by untimely elasticity, untimely application startup, or application not gracefully going online and offline, it is recommended that the caller adopt an exponential retry mechanism for service invocation.

Application startup speed optimization

In order to improve flexibility and efficiency, it is recommended that you optimize the speed of application creation. You can consider optimization from the following aspects:

software package optimization: optimizes the application startup time and reduces the excessively long application startup caused by external dependencies such as class loading and caching;
mirror optimization: reduces the size of the mirror, reducing the time consumption of mirroring when creating an instance. You can use the open source tool Dive to analyze the mirror layer information and streamline changes in a targeted manner;
the Java application startup optimization: With SAE joint Dragonwell 11, provides application acceleration to start Java 11 users.

Elastic scaling index configuration

Elastic scaling index configuration, SAE supports basic monitoring, application monitoring multi-index combination configuration, you can choose flexibly according to the attributes of the current application (CPU sensitive/memory sensitive/io sensitive).

You can view and estimate the target value of the indicator through the historical data of the corresponding indicators of basic monitoring and application monitoring (such as the past 6h, 12h, 1 day, 7 day peak value, P99, P95 value), and pressure measurement can be performed with the help of pressure measurement tools such as PTS , Understand the number of concurrent requests that the application can handle, the number of CPUs and memory required, and the application response method under high load conditions to evaluate the peak application capacity.

The target value of the index needs to weigh the availability and cost to make a strategic choice, such as:

Availability optimization strategy configuration index value is 40%
Availability cost balancing strategy configuration index value is 50%
Cost optimization strategy configuration index value is 70%

At the same time, flexible configuration should consider sorting out related dependencies such as upstream and downstream, middleware, DB, etc., and configure corresponding elastic rules or current limiting and downgrading methods to ensure that the full link can guarantee availability during expansion.

After configuring the elasticity rules, continuously monitor and adjust the elasticity rules to make the capacity closer to the actual load of the application.

Memory indicator configuration

Regarding memory indicators, considering that some application types use dynamic memory management for memory allocation (such as Java jvm memory management, glibc malloc and free operations), application idle memory is not released to the operating system in time, and the physical memory consumed by the instance will not be reduced in time In addition, adding new instances cannot reduce average memory consumption, and thus cannot trigger scaling. Memory metrics are not recommended for this type of application.

Java application runtime optimization

Releasing physical memory and enhancing the correlation between memory indicators and business: With the help of the Dragonwell runtime environment, the ElasticHeap capability is enabled by increasing the JVM parameters, which supports the dynamic elastic scaling of the Java heap memory and saves the physical memory occupied by the actual use of the Java process.

Minimum number of instances configuration

It is recommended to configure the minimum number of instances for elastic scaling to be greater than or equal to 2, and configure a multi-zone VSwitch to prevent the application from stopping due to the eviction of instances due to an abnormal underlying node or when there are no available instances in the availability zone, and to ensure the overall high availability of the application.

Maximum number of instances configuration

When configuring the maximum number of instances for elastic scaling, you should consider whether the number of IP addresses in the Availability Zone is sufficient to prevent failure to add new instances. You can view the available IP of the current application at the console VSwitch. If the IP is available, you will less consider replacing or adding a VSwitch.

Maximum elasticity

You can use the application overview to view the applications that have the elastic scaling configuration currently enabled, and find the applications whose current instance number has reached the peak in time, and re-evaluate whether the maximum elastic scaling configuration is reasonable. If you expect the maximum number of instances to exceed the product limit (currently limit the number of applications to 50 instances, you can submit a work order feedback to increase the upper limit)

Availability zone rebalancing

After the elastic scaling triggers the shrinkage, the availability zone may be unevenly distributed. You can view the availability zone to which the instance belongs in the instance list. If the availability zone is unbalanced, you can rebalance it by restarting the application operation.

Automatically restore elastic configuration

When performing change order operations such as application deployment, SAE will stop the elastic scaling configuration of the current application to avoid conflicts between the two operations. If you want to restore the elastic configuration after the change order is completed, you can check the system automatic recovery during deployment.

Flexible history

SAE elastic effective behavior can currently be viewed through events to view the expansion and contraction time, expansion and contraction actions, as well as real-time, historical decision records and decision context visualization functions to measure the effectiveness of the elastic scaling strategy and make adjustments when necessary.

Flexible event notification

Combined with DingTalk, webhook, SMS and other notification channels, it is convenient to understand the flexible trigger status in time

Finally, I will share a customer case using SAE's elastic scaling function. During the 2020 COVID-19 pandemic, the business traffic of an online education customer soared 7-8 times, and the hardware cost and business stability faced huge risks. If the traditional ECS architecture is adopted at this time, the customer will need to upgrade the infrastructure in a very short time, which poses a very big challenge to the user's cost and energy. However, if SAE is adopted, users can enjoy the technical dividends brought by Serverless at zero transformation cost. Combining SAE’s multi-scenario flexible strategy configuration, flexible self-adaptation and real-time observability, it guarantees the business SLA of user applications during peak periods and passes Extremely flexible and efficient, saving hardware costs up to 35%.

To sum up, in the direction of elastic development, especially in the serverless scenario, the ability to deal with sudden traffic is more emphasized. Its goal is to eliminate the need for capacity planning, and realize the almost on-demand use of application resources and the availability of services throughout the process through indicator monitoring and extreme flexibility. . SAE achieves second-level flexibility by continuously optimizing the full life cycle of flexible components and applications, and has core competitiveness in terms of flexibility, scene richness, and stability. It is the best choice for serverless transformation of traditional applications.

Exploration and Practice of Serverless Elasticity under Cloud Native System

The Coming of the Serverless Era

Serverless flexible exploration

Flexible landing

Serverless elastic best practices

Configure health check and life cycle management

Use exponential retry mechanism

Application startup speed optimization

Elastic scaling index configuration

Memory indicator configuration

Java application runtime optimization

Minimum number of instances configuration

Maximum number of instances configuration

Maximum elasticity

Availability zone rebalancing

Automatically restore elastic configuration

Flexible history

Flexible event notification

阿里云云原生

引用和评论

通义灵码 AI IDE 上线，第一时间测评体验

支付宝H5下载被拦截的原因排查与解决指南

JManus - 面向 Java 开发者的开源通用智能体

MCP协议重大升级，Spring AI Alibaba联合Higress发布业界首个Streamable HTTP实现方案

PAI Model Gallery 支持云上一键部署 Qwen3 全尺寸模型

2025年3月中国数据库排行榜：PolarDB夺魁傲群雄，GoldenDB晋位入三强

分析型数据库入门指南：如何选择适合你的实时分析工具？