AHPA: Opening the Door to Kubernetes Elastic Prediction

Author: Yuan Yi,

Guided reading

In the era of cloud-native containers, users need to face different business scenarios: periodic business, serverless on-demand use, etc. In the use of automatic elasticity, one problem or another will be found, the most important of which are elastic lag and cold start problems. Alibaba Cloud Native Team and Alibaba Dharma Academy's Decision-making Intelligence Time Series team jointly developed the AHPA elastic prediction product. The main starting point of this product is to make "timed planning" based on the detected cycle. Through planning, the purpose of expanding capacity in advance is realized, and the business stability is guaranteed. In the case, let you really achieve on-demand use.

background

User expectations for cloud resiliency are getting higher and higher, and this expectation mainly comes from two aspects. One is the rise of the cloud native concept. From the VM era to the container era, the cloud usage model is changing. The second is the rise of new business models. These new business models are built on the cloud at the beginning of their design, and naturally have a demand for flexibility.

With the cloud, users no longer need to build infrastructure from physical machines and computer rooms. The cloud provides users with a very flexible infrastructure. The biggest advantage of the cloud is that it can provide users with elastic resource supply. Especially in the cloud-native era, users' demands for elasticity are becoming stronger and stronger. In the VM era, the strength of elastic demand is still at the minute level of manual operation. In the container era, it has reached the second level requirement. In the face of different business scenarios, users' expectations and requirements for the cloud are also changing:

Periodic business scenarios: new businesses, such as live broadcasting, online education and games, have one thing in common is that they are very cyclical, which prompts customers to think about flexibility-oriented business architecture. Coupled with the concept of cloud native, it is natural to think of popping up a batch of services on demand and releasing them when they are used up.
The arrival of Serverless: The core concept of Serverless is on-demand use and automatic elasticity. Users do not need capacity planning. But when you actually start to use Serverless, you will find one or another problem, the most important of which are elastic lag and cold start problems. This is unacceptable for response-delay-sensitive services.

So in the face of the above scenario, can the existing elastic solutions in Kubernetes be solved?

Problems faced by traditional resiliency solutions

There are generally three ways to manage the number of application instances in Kubernetes: fixed number of instances, HPA and CronHPA. The most used is the fixed number of instances. The biggest problem with the fixed number of instances is the obvious waste of resources during business troughs. In order to solve the problem of waste of resources, HPA is used, but the elastic trigger of HPA is delayed, which leads to a delay in the supply of resources, and the failure to supply resources in time may lead to a decrease in business stability. CronHPA can scale regularly, which seems to solve the problem of elastic lag, but how fine is the specific timing granularity, and do you need to manually adjust the timing elasticity policy frequently when the business volume changes? If you do, this introduces very heavy operational complexity and is prone to errors.

AHPA Resilience Forecast

The main starting point of AHPA (Advanced Horizontal Pod Autoscaler) elastic prediction is to make "timed planning" based on the detected period, and to achieve the purpose of early capacity expansion through planning. However, since it is planning, there will be omissions, so it is necessary to have the ability to adjust the number of planned instances in real time. Therefore, this scheme has two flexible strategies: active prediction and passive prediction. Active prediction is based on the RobustPeriod algorithm [1] of Bodhidharma Academy to identify the cycle length and then uses the RobustSTL algorithm [2] to bring up the periodic trend, and actively predict the number of instances of the application in the next cycle; passive prediction is based on the application real-time data to set the number of instances, which can be very Good for dealing with burst traffic. In addition, AHPA also adds a bottom-up protection policy, and users can set the upper and lower bounds for the number of instances. The number of instances that finally takes effect in the AHPA algorithm is the maximum value among active prediction, passive prediction and bottom-line strategy.

Architecture

First of all, elasticity must be carried out under the condition of stable business. The core purpose of elastic scaling is not only to help users save costs, but also to enhance the overall stability of the business, the ability to avoid operation and maintenance, and to build core competitiveness. Basic principles of AHPA architecture design:

Stability: ensures elastic scaling under the condition that user services are stable
free of operation and maintenance: does not add additional operation and maintenance burden to users, including: no new Controller on the user side, Autoscaler configuration semantics are clearer than HPA
-oriented: is application-centric and designed for application dimensions. Users do not need to care about the configuration of the number of instances, and can be used on demand and automatically elastic.

The architecture is as follows:

rich data indicators: supports CPU, Memory, QPS, RT and external indicators, etc.
Stability Guarantee: The elastic logic of AHPA is based on the strategy of active warm-up and passive bottom-up, combined with downgrade protection to ensure resource stability.

Active prediction: predicts the trend results for a period of time in the future based on history, which is suitable for periodic applications.
Passive forecasting: real-time forecasting. For burst traffic scenarios, resources are prepared in real time through passive prediction.
Downgrade protection: Supports configuring the maximum and minimum instances in multiple time intervals.
Multiple scaling methods: AHPA supports scaling methods including Knative, HPA and Deployment:

Knative: Solve the problem of elastic cold start based on concurrency/QPS/RT in serverless application scenarios
HPA: Simplify HPA elastic policy configuration, lower the threshold for users to use elasticity, and solve the problem of cold start when using HPA
Deployment: Use Deployment directly to automatically scale up and down

Adapt to the scene

AHPA adaptation scenarios include:

There are clearly periodic scenarios. Such as live broadcast, online education, game service scenarios, etc.
Fixed number of instances + elastic bottom. Such as dealing with sudden traffic in normal business, etc.
Recommended instance number configuration scenarios

prediction effect

After enabling AHPA flexibility, we provide a visualization page for viewing AHPA effects. Here is an example of forecasting based on CPU metrics (compared to using HPA):

illustrate:

Predict CPU Observer: Blue represents the actual CPU usage of the HPA, and green represents the predicted CPU usage. The green curve is larger than the blue, indicating that the capacity given by the forecast is sufficient.

Predict POD Observer: Blue indicates the actual number of Pods to be expanded using HPA, green indicates the predicted number of Pods to be expanded, and the green curve is smaller than the blue, indicating that the number of Pods that pass the prediction elasticity is lower.
Periodicity: According to the historical 7-day data, the application is detected to be periodic through the prediction algorithm.

Conclusion: The forecast results show that the elastic forecast trend is in line with expectations.

Invitation to test

Click to view the details of the Cloud Container Service AHPA Elastic Prediction product documentation. At present, AHPA has opened the user invitation test. Interested users are welcome to click " to submit a work order " in the document to apply for a whitelist. We look forward to your trial and feedback.

references

[1] (Alibaba DAMO Academy Decision Intelligence Time Series Team) Qingsong Wen, Kai He, Liang Sun, Yingying Zhang, Min Ke, and Huan Xu. RobustPeriod: Robust Time-Frequency Mining for Multiple Periodicity Detection, in Proc. of 2021 ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD 2021), Xi'an, China, Jun. 2021.

[2] (Ali Dharma Academy Decision-making Intelligence Time Series Team) Qingsong Wen, Jingkun Gao, Xiaomin Song, Liang Sun, Huan Xu, Shenghuo Zhu. RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series, in Proc. of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), 2019, pp. 5409-5416, Honolulu, Hawaii, Jan. 2019.

[3] (Alibaba Dharma Academy Decision-making Intelligence Time Series Team) Qingsong Wen, Zhe Zhang, Yan Li and Liang Sun. Fast RobustSTL: Efficient and Robust Seasonal-Trend Decomposition for Time Series with Complex Patterns, in Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2020), San Diego, CA, Aug. 2020.

AHPA: Opening the Door to Kubernetes Elastic Prediction

Guided reading

background

Problems faced by traditional resiliency solutions

AHPA Resilience Forecast

Architecture

Adapt to the scene

prediction effect

Invitation to test

references

阿里云云原生

引用和评论

2025年第二届“兴智杯”智能编码创新应用开发挑战赛正式启动

🔥吐血整理 Bolt.diy 部署与应用攻略

JManus - 面向 Java 开发者的开源通用智能体

MCP协议重大升级，Spring AI Alibaba联合Higress发布业界首个Streamable HTTP实现方案

PAI Model Gallery 支持云上一键部署 Qwen3 全尺寸模型

Jenkins 企业级 CI/CD 实践：安装、配置与 Kubernetes & Docker 集成

k8s集群部署（一主两从）