Exploration and Practice of Serverless Elasticity under Cloud Native System

Introduction to SAE achieves second-level elasticity by continuously optimizing the full life cycle of flexible components and applications, and has core competitiveness in terms of flexibility, scene richness, and stability. It is the best for traditional applications and serverless transformation. choose.

Author: Jing Xiao

The Coming of the Serverless Era

Serverless, as the name suggests, is a "serverless" architecture, because it shields the various operation and maintenance complexity of the server, allowing developers to devote more energy to the design and implementation of business logic. Under the serverless architecture, developers only need to focus on the development of upper-level application logic, and complex server-related operations such as resource application, environment construction, load balancing, capacity expansion, and so on are all maintained by the platform. In the cloud native architecture white paper, the characteristics of Serverless are summarized as follows:

Fully managed computing services, customers only need to write code to build applications, and do not need to pay attention to homogeneous and burdensome server-based infrastructure development, operation and maintenance, security, and high availability;
Versatility, able to support all important types of applications on the cloud;
Automatic elastic scaling, so that users do not need to plan capacity in advance for resource use;
Pay-as-you-go billing allows enterprises to effectively reduce the cost of use without having to pay for idle resources.

Looking back at the entire development of Serverless, we can see that from the time when the concept of Serverless was first proposed in 2012 as the starting point, to the time when AWS launched the Lambda cloud product, people’s attention to Serverless has exploded. Expectations and imaginations gradually detonated the entire industry, but the process of serverless promotion and production landing is not optimistic. The existence of Gap in the process of serverless concept and practical production challenges people's inherent user experience and habits.

Alibaba Cloud firmly believes that serverless will be the deterministic development direction after cloud native. It has successively launched FC, SAE and other cloud products to cover different fields, different types of application loads use serverless technology, and continue to promote the popularization of the entire serverless concept And development.

As far as the current serverless market is concerned, Alibaba Cloud has achieved the number one serverless product capability in China, leading the world. In the Magic Quadrant of the Forrester evaluation last year, it is obvious that Alibaba Cloud is already on par with AWS in the serverless field. At the same time, Alibaba Cloud Serverless users accounted for the first place in China. In the 2020 China Cloud Native User Survey Report, the proportion of Alibaba Cloud Serverless users has reached 66%. The survey on the adoption of Serverless technology shows that there have been more and more More developers and enterprise users apply serverless technology to their core business or will apply it to their core business.

Serverless flexible exploration

As one of the core capabilities of the cloud, elasticity is concerned with the contradiction between capacity planning and actual cluster load. Through the comparison of the two figures, it can be seen that if the resource arrangement is planned in advance, it will be due to the amount of resource preparation. The mismatch with the resource demand leads to waste of resources or insufficient resources, which in turn leads to excessive cost and even business damage. We expect extreme flexibility. The prepared resources and the actual demand resources almost match, so that The overall resource utilization rate of the application is relatively high, and the cost increases and decreases with the increase or decrease of the business and the corresponding increase or decrease. At the same time, there will be no capacity problems that affect the availability of the application. This is the value of flexibility.

Elasticity is divided into scalability and fault tolerance in terms of implementation. Scalability means that the underlying resources can have a certain degree of self-adaptability with reference to changes in indicators, while fault tolerance is to ensure the application or instance in the service through elastic self-healing In a healthy state. The value benefit brought by the above capabilities lies in reducing costs while improving application availability. On the one hand, the resource usage fits the actual consumption of the application. On the other hand, it improves the peak application availability, thereby flexibly adapting to the continuous development and changes of the market.

The following will explain and analyze the three more common elastic scaling modes.

The first is IaaS elastic scaling. Its representative product is the elastic scaling of cloud servers of various cloud vendors. For example, Alibaba Cloud ess can trigger the corresponding ECS increase and decrease operations by configuring the alarm rules of cloud monitoring. At the same time, it supports the dynamic increase and decrease of Slb back-end servers and Rds is whitelisted to ensure availability, and elastic self-healing capabilities are achieved through the health check function. ESS defines the concept of a scaling group, which is the basic unit of elastic scaling. It is a collection of ECS instances in the same application scenario and associated Slb, Rds, and supports multiple scaling rules, such as simple rules, progress rules, target tracking rules, and prediction rules. And so on, the user's use process is to create scaling groups and scaling configurations, create scaling rules, and monitor the execution of elasticity.

Kubernetes elastic scaling, here is mainly focused on horizontal elastic hpa, its representative product is K8s and its corresponding managed cloud products, such as Alibaba Cloud Container Service, K8s as an application-oriented operation and maintenance infrastructure and Platform for Platform, provides built-in Capabilities are mainly developed around container-level management and orchestration, while elastic capabilities focus on the dynamic horizontal scaling of the underlying Pod. K8s hpa polls the Pod's monitoring data and compares it with the target expected value, and calculates in real time through algorithms To generate the desired number of copies, and then increase or decrease the number of copies of Workload, users need to create and configure corresponding indicator sources, elastic rules and corresponding workloads in actual use, and can view the elastic execution status through events.

Finally, I will introduce the elastic scaling of application portraits, which are mainly used within Internet companies, such as the Ali ASI capacity platform. The capacity platform provides capacity prediction services and capacity change decision-making services, guides the underlying capacity change components such as AHPA/VPA to implement capacity elastic expansion, and revises the capacity profile based on the elastic results. Image-driven + index-driven as a supplement to achieve elastic scaling capabilities, and reduce the risk of elastic scaling through advance scaling + real-time correction. The entire elastic scaling will use odps and machine learning capabilities to process instance monitoring and other data and generate application portraits, such as benchmark portraits, elastic portraits, big promotion portraits, etc., and use the capacity platform to complete portrait injection, change control, and fault fuse operations. . The user's use process is application access, based on historical data/experience, the corresponding capacity profile is generated, real-time monitoring indicators are used to correct the profile, and the flexible implementation is monitored.

It can be seen from the comparison that the elastic scaling function modes of each product are basically the same in abstract terms. They are all composed of trigger sources, flexible decision-making and trigger actions. The trigger source generally relies on an external monitoring system to collect and process node indicators and application indicators. Decisions are generally based on periodic polling and algorithmic decisions. Some are based on historical data analysis and prediction and user-defined timing strategies. The trigger action is to horizontally expand and shrink the instance, and provide change records and external notifications. On this basis, each product is competitive with scene richness, efficiency, and stability, and enhances the transparency of the elastic system through observability, which facilitates troubleshooting and guides elastic optimization, and at the same time enhances user experience and stickiness.

There are also certain differences in the elastic scaling models of various products. For IaaS elastic scaling, as an old-brand elastic scaling capability, it has a long settling time, powerful and rich functions, and the capabilities of cloud vendors tend to be homogeneous. Compared with containers, the elastic efficiency is limited, and the respective underlying Iaas resources are strongly bound. As an open source product, Kubernetes continuously optimizes its iterative flexibility and best practices through the power of the community, which is more in line with the demands of most development and operation personnel. The elastic behavior and api are highly abstracted, but their scalability is not strong and cannot support custom requirements. The flexible scaling of application portraits has the internal characteristics of the group. It is designed according to the group's application status and flexibility requirements, and focuses more on the cost optimization of the resource pool budget, the risk of shrinking, and the complexity of the pain points. It is not easy to copy and expand, especially not suitable for external small and medium customers.

From the end-state goal, it can be seen that the direction of public cloud and Internet companies are different:

Internet companies often have significant flow characteristics of their internal applications, rely on more application startup, slow speed, and have many organizational demands on the overall resource pool capacity level, inventory financial management, and away from the online mixing department, so they are more based on capacity. The main purpose of the image is to expand elastically in advance, and the capacity data calculated based on Metrics is used as real-time correction. The goal is that the capacity image is accurate enough to achieve the expected goal of resource utilization.
Public cloud vendors serve external customers, provide more general and universal capabilities, and meet the differentiated needs of different users through scalability. Especially in the serverless scenario, more emphasis is placed on the application's ability to cope with sudden traffic. Its goal is to eliminate the need for capacity planning, and achieve almost on-demand use of application resources and availability of services throughout the process through indicator monitoring and extreme flexibility.

Serverless flexible landing

Serverless, as the best practice of cloud computing, the direction of cloud native development and the future evolution trend, its core value lies in fast delivery, intelligent flexibility, and lower cost.

In the context of the times, SAE came into being. SAE is an application-oriented Serverless PaaS platform that supports mainstream development frameworks such as Spring Cloud and Dubbo. Users can directly deploy applications to SAE with zero-code transformation, and use them on demand and according to the amount. Billing can give full play to the advantages of Serverless to save customers the cost of idle resources. At the same time, the experience is fully managed and free of operation and maintenance. Users only need to focus on core business development, while application lifecycle management, microservice management, and logs. The monitoring and other functions are completed by SAE.

The competitiveness of elasticity mainly lies in the competitiveness of scene richness, efficiency, and stability. Let me talk about the optimization of SAE's elastic efficiency.

Through data statistics and visual analysis of the entire life cycle of SAE applications, it includes scheduling, init container creation, pulling user images, creating user containers, starting user containers & applications. Simplified than. We can see that the entire application life cycle is time consuming to focus on the stages of scheduling, pulling user images, and application cold start. For the scheduling phase, the time-consuming is mainly because SAE will currently perform the operation of opening up the user's VPC. Because this step is strongly coupled to the scheduling, it takes a long time, and there are situations such as long tail creation timeout, failure retry, etc., resulting in scheduling links The overall time-consuming is longer.

The question that arises is whether the scheduling speed can be optimized? Can I skip the scheduling phase? For pulling user images, it includes the length of time for pulling the image and decompressing the image, especially in the case of large-capacity image deployment. The idea of optimization is whether the cache can be optimized for pulling the image, and whether the image can be optimized for decompression. For application cold start, SAE has a large number of JAVA applications with monomers and microservices. JAVA-type applications often depend on many startups, load configuration is slow, and the initialization process is long, resulting in cold start speeds often reaching the order of minutes. The direction of optimization lies in whether the cold start process can be avoided and the user is as insensitive as possible, and the application is not modified.

First of all, SAE adopted the in-place upgrade capability. SAE initially used the K8s native Deployment rolling upgrade strategy for the release process. A new version of the Pod was created first, and then the old version of the Pod was destroyed for upgrade. The so-called in-situ upgrade means that only the Pod is updated. Upgrade of one or more container versions without affecting the entire Pod object and other containers. The principle is to use the K8s patch capability to upgrade the Container in situ, and to use the K8s readinessGates capability to achieve lossless traffic during the upgrade process.

In-place upgrade brings many values to SAE, the most important of which is to avoid rescheduling and avoid Sidecar container (ARMS, SLS, AHAS) reconstruction, which makes the entire deployment time-consuming from consuming the entire Pod life cycle to just pulling and creating At the same time, the business container, because there is no need for scheduling, can cache new images on Node in advance to improve elastic efficiency. SAE uses Cloneset provided by Alibaba's open source Openkruise project as a new application load, and with the in-situ upgrade capability provided by it, the overall elastic efficiency is increased by 42%.

At the same time, SAE adopts the image preheating capability, which includes two preheating forms: preheating before scheduling. SAE will cache the common basic image on a full node to avoid frequent pulls from the remote end. At the same time, for batch scenarios, it supports preheating during scheduling. With the help of Cloneset's in-situ upgrade capability, you can perceive the node distribution of the instance during the upgrade process, so that you can deploy the new version of the first batch of images at the same time. The node where the instances of the subsequent batches are located performs image pre-pull, thereby realizing the parallelism of scheduling and pulling user images. Through this technology, SAE's elastic efficiency has increased by 30%.

The optimization point just described is to pull the image part. For decompressing the image, the traditional container operation needs to download the full amount of image data and then unpack it. However, the container startup may only use part of the content, which causes the container startup to take a long time. SAE uses the image acceleration technology to automatically convert the original standard image format into an accelerated image that supports random reading, which can realize the full download and online decompression of image data, which greatly improves the efficiency of application distribution. At the same time, it can also use the P2P distribution capability provided by Acree. Effectively reduce the time of image distribution.

For the slower cold start of Java applications, SAE and Dragonwell 11 provide an enhanced AppCDS startup acceleration strategy. AppCDS is Application Class Data Sharing. Through this technology, you can obtain the Classlist at application startup and dump the shared class files in it. When the application starts again, the shared file can be used to start the application, thereby effectively reducing the time-consuming cold start. Mapping to the deployment scenario of SAE, the corresponding cache file will be generated in the shared NAS after the application is started, and the cache file can be used to start the next release. The overall cold start efficiency is increased by 45%.

In addition to optimizing the efficiency of the entire application lifecycle, SAE also optimizes elastic scaling. The entire elastic scaling process includes three parts: elastic index acquisition, index decision-making, and execution of elastic scaling operations. For the acquisition of elastic indicators, the basic monitoring indicator data has reached the second level, and for the seven-layer application monitoring indicators, SAE is planning to adopt a traffic transparent interception scheme to ensure the real-time performance of indicator acquisition. In the elastic decision-making phase, the elastic component enables multi-queue concurrent Reconcile, and monitors queue accumulation and delay in real time.

SAE elastic scaling includes powerful indicator matrix, rich strategy configuration, perfect notification and alarm mechanism and all-round observability, and supports multiple data sources: native MetricsServer, MetricsAdapter, Prometheus, cloud products SLS, CMS, SLB and external Gateway routing, etc., supports a variety of indicator types: CPU, MEM, QPS, RT, TCP connection number, number of bytes in and out, disk usage, Java thread number, GC number and custom indicators. After the index is captured and preprocessed, you can customize the elastic strategy to adapt to the specific application scenario: fast expansion and fast contraction, fast expansion and slow contraction, only expansion but not contraction, only contraction but not expansion, DRYRUN, adaptive expansion and contraction Wait.

At the same time, more refined elastic parameter configuration can be carried out, such as the upper and lower limits of the instance, the index interval, the step ratio range, the cooling and warm-up time, the index collection period and the aggregation logic, the CORN expression, and the event-driven follow-up will also be supported. ability. After the elastic trigger, the corresponding expansion and contraction operations will be performed, and the flow will be cut to ensure that the flow is not damaged, and the perfect notification and alarm capabilities (Dingding, webhook, phone, email, SMS) can be used to reach and inform users in real time. Elastic scaling provides a full range of observability, clear display of flexible decision time and decision context, and enables the instance status to be traceable, and the instance SLA can be monitored.

The flexibility of SAE also has corresponding competitiveness in terms of scene richness. Here are four scenarios currently supported by SAE:

Timing flexibility: Configure when the application traffic load cycle is known. The number of application instances can be regularly expanded and contracted according to the time, week, and date cycle, such as maintaining 10 instances in the time period from 8 am to 8 pm Coping with the daytime traffic, and during the rest of the time, the number of instances is maintained at 2 or even reduced to 0 due to low traffic. It is suitable for application scenarios where the resource usage rate is cyclically regular, and is mostly used in securities, medical, government, and education industries.
Index flexibility: The desired monitoring index rules can be configured. The indexes applied during the SAE meeting are stable within the configured index rules, and the fast expansion and slow contraction mode is adopted by default to ensure stability. For example, the target value of the CPU indicator of the application is set to 60%, the QPS is set to 1000, and the range of the number of instances is 2-50. This kind of application scenario suitable for burst traffic and typical periodic traffic is mostly used in industries such as the Internet, games, and social platforms.
Hybrid elasticity: Combining timing elasticity and index elasticity, you can configure index rules under different times, weeks, and dates, and respond more flexibly to the needs of complex scenarios. For example, the target value of the CPU indicator is set to 60% in the time period from 8 am to 8 pm, the range of the number of instances is 10-50, and the range of the number of instances is reduced to 2-5 for the rest of the time, which is suitable for both resource utilization and periodic The application scenarios of regularity and burst traffic and typical periodic traffic are mostly used in industries such as the Internet, education, and catering.
Adaptive resilience: SAE is optimized for traffic surge scenarios. With the help of the traffic surge window, it calculates whether the current indicator has a traffic surge at this moment, and will calculate the number of instances required for expansion according to the intensity of the traffic surge. Increase a part of the redundancy, and in the surge mode, shrinking is not allowed.

Stability is a very important part of the process of building SAE's resiliency. It is the focus of attention to ensure that user applications expand and contract in accordance with the expected behavior during the resiliency process, and to ensure the availability of the entire process. SAE elastic scaling generally follows the principle of fast expansion and slow contraction, and ensures execution stability through multi-level smoothing and anti-shake. At the same time, for scenarios with a surge in indicators, it is expanded in advance with the help of adaptive capabilities. SAE currently supports four levels of flexible and smooth configuration to ensure stability:

First-level smoothing: configure the indicator acquisition cycle, the time window for a single indicator acquisition, and the indicator calculation aggregation and logic
Second-level smoothing: configure the tolerance of index values and interval flexibility
Three-level smoothing: configure the expansion and contraction step, percentage, and upper and lower limits per unit time
Four-level smoothing: configure the expansion and contraction cooling window and the instance warm-up time

Serverless elastic best practices

SAE elastic scaling can effectively solve the application of automatic expansion when the instantaneous flow peak arrives, and automatically shrinks after the peak. High reliability, free operation and maintenance, and low cost guarantee the smooth operation of applications. It is recommended to follow the following best practices for flexible configuration during use.

Configure health check and life cycle management

It is recommended to configure the application health check to ensure the overall availability of the application during the elastic scaling process, to ensure that your application only receives traffic when it is up, running, and ready to receive traffic. It is also recommended to configure lifecycle management Prestop to ensure scalability When your application is gracefully offline as expected.

Use exponential retry mechanism

In order to avoid service invocation abnormalities caused by untimely elasticity, untimely application startup, or application not gracefully going online and offline, it is recommended that the caller adopt an exponential retry mechanism for service invocation.

Application startup speed optimization

In order to improve flexibility and efficiency, it is recommended that you optimize the speed of application creation. You can consider optimization from the following aspects:

Software package optimization: Optimize application startup time and reduce application startup time caused by external dependencies such as class loading and caching
Image optimization: reduce the size of the image and reduce the time it takes to pull the image when creating an instance. You can use the open source tool Dive to analyze the image layer information and streamline changes in a targeted manner.
Java application startup optimization: With the help of SAE and Dragonwell 11, the application startup acceleration function is provided for Java 11 users

Elastic scaling index configuration

Elastic scaling index configuration, SAE supports basic monitoring, application monitoring multi-index combination configuration, you can choose flexibly according to the attributes of the current application (CPU sensitive/memory sensitive/io sensitive).

You can view and estimate the target value of the indicator through the historical data of the corresponding indicators of basic monitoring and application monitoring (such as the past 6h, 12h, 1 day, 7 day peak value, P99, P95 value), and pressure measurement can be carried out with the help of pressure measurement tools such as PTS , Understand the number of concurrent requests that the application can handle, the number of CPUs and memory required, and how the application responds under high load to evaluate the peak application capacity.

The target value of the index needs to weigh the availability and cost to make a strategic choice, such as

Availability optimization strategy configuration index value is 40%
Availability cost balancing strategy configuration index value is 50%
Cost optimization strategy configuration index value is 70%

At the same time, flexible configuration should consider sorting out related dependencies such as upstream and downstream, middleware, db, and configuring corresponding flexible rules or current limiting and downgrading methods to ensure that the full link can guarantee availability during expansion.

After configuring the elastic rules, continuously monitor and adjust the elastic rules to make the capacity closer to the actual load of the application.

Memory indicator configuration

Regarding memory indicators, considering that some application types use dynamic memory management for memory allocation (such as Java jvm memory management, Glibc Malloc and Free operations), application idle memory is not released to the operating system in time, and the physical memory consumed by the instance will not be reduced in time In addition, adding new instances cannot reduce average memory consumption, and thus cannot trigger scaling. Memory metrics are not recommended for this type of application.

Java application runtime optimization: release physical memory, enhance memory indicators and business relevance

With the help of the Dragonwell runtime environment, the ability to enable ElasticHeap by increasing the JVM parameters supports the dynamic elastic scaling of the Java heap memory and saves the physical memory occupied by the Java process.

Minimum number of instances configuration

It is recommended to configure the minimum number of instances for elastic scaling to be greater than or equal to 2, and configure a multi-zone VSwitch to prevent the application from stopping due to the eviction of instances due to an abnormal underlying node or when there are no available instances in the availability zone, and to ensure the overall high availability of the application.

Maximum number of instances configuration

When configuring the maximum number of instances for elastic scaling, you should consider whether the number of IP addresses in the Availability Zone is sufficient to prevent failure to add new instances. You can view the available IP of the current application at the console VSwitch. If the available IP is less, consider replacing or adding a VSwitch.

Maximum elasticity

You can view the applications that are currently enabled for elastic scaling through the application overview, and find the applications whose current instance number has reached the peak in time, and re-evaluate whether the maximum elastic scaling configuration is reasonable. If the maximum number of instances is expected to exceed the product limit (the current limit is 50 instances for a single application, you can submit a work order feedback to increase the upper limit)

Availability zone rebalancing

After the elastic scaling triggers the shrinkage, the availability zone may be unevenly distributed. You can view the availability zone to which the instance belongs in the instance list. If the availability zone is unbalanced, you can rebalance it by restarting the application operation.

Automatically restore elastic configuration

When performing a change order operation such as application deployment, SAE will stop the elastic scaling configuration of the current application to avoid conflicts between the two operations. If you want to restore the elastic configuration after the change order is completed, you can check the system automatic recovery during deployment.

Flexible history

SAE elastic effective behavior can currently be viewed through events to view the expansion and contraction time, expansion and contraction actions, as well as real-time, historical decision records and decision context visualization functions, in order to measure the effectiveness of the elastic scaling strategy, and adjust if necessary.

Flexible event notification

Combined with DingTalk, Webhook, SMS and other notification channels, it is convenient to understand the flexible trigger status in time.

Finally, I will share a customer case using SAE's elastic scaling function. During the 2020 new crown epidemic, the business traffic of an online education customer soared 7-8 times, and the hardware cost and business stability were facing huge risks. If the traditional ECS architecture is adopted at this time, the customer needs to upgrade the infrastructure in a very short time, which poses a very big challenge to the user's cost and energy. However, if SAE is adopted, users can enjoy the technical dividends brought by Serverless at zero transformation cost. Combined with SAE's multi-scenario flexible strategy configuration, flexible self-adaptation and real-time observability, it guarantees the business SLA of user applications during peak periods and passes Extremely flexible and efficient, saving hardware costs up to 35%.

To sum up, in the direction of elastic development, especially in the serverless scenario, the ability to cope with sudden traffic is more emphasized. Its goal is to eliminate the need for capacity planning, and realize the almost on-demand use of application resources and the availability of services throughout the process through indicator monitoring and extreme flexibility. . SAE achieves second-level elasticity by continuously optimizing the full life cycle of elastic components and applications, and has core competitiveness in terms of elasticity, scene richness, and stability. It is the best choice for serverless transformation of traditional applications.

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

Exploration and Practice of Serverless Elasticity under Cloud Native System

The Coming of the Serverless Era

Serverless flexible exploration

Serverless flexible landing

Serverless elastic best practices

阿里云开发者

引用和评论

福利来了！计算巢支持在已经购买的 ECS 上搭建幻兽帕鲁服务器，支持图形化管理配置