Effective HPA: Predicting future elastic scaling products

author

Hu Qiming, Tencent Cloud expert engineer, focuses on Kubernetes, cost reduction and efficiency enhancement and other cloud native fields, Crane core development engineer, is currently responsible for the cost optimization open source project Crane open source governance and the implementation of elastic capabilities.

Yu Yufei, Tencent Cloud expert engineer, focuses on cloud native observability, cost optimization and other fields, Crane core developer, is currently responsible for Crane resource forecasting, recommendation implementation, operation platform construction and other related work.

Tian Qi, Tencent senior engineer, focuses on distributed resource management and scheduling, elasticity, co-location, Kubernetes Contributor, and is currently responsible for Crane-related research and development.

introduction

The conflict between business stability and cost has a long history. In the cloud-native era, the pay-as-you-go cost model gave birth to an automatic elastic scaling technology that reduces costs while meeting business needs by dynamically applying and returning cloud resources.

What is HPA

When it comes to elasticity in cloud native, people naturally think of various Auto Scaling technologies of Kubernetes, the most representative of which is Horizontal Pod Auto Scaling (HPA).

As a built-in feature of Kubernetes, HPA offers a number of benefits:

Taking into account the demands of stable business peaks and low-trough cost reduction .
Stable functions and community neutrality : With the iteration of kubernetes version, its own functions are constantly enriched and improved, but the core mechanism of HPA has remained stable, which also shows that it can meet the most general elastic scaling scenarios.
Complying with the serverless trend : With the release of serverless container products by various major manufacturers and the proposal of virtual node pool technology, HPA largely covers the function of cluster automatic scaling (CA), making automatic scaling lighter and more agile.
Perfect extension mechanism : Provide extension metrics such as custom_metrics, external_metric, etc., users can configure the most suitable HPA according to the actual situation.

Problems with traditional HPA

HPA isn't perfect either:

How to configure : The effect of HPA operation depends on the configuration of user resources (target, minReplicas, maxReplicas, etc.). If the configuration is too aggressive, the availability and stability of the application may be affected, and if the configuration is too conservative, the effect of elasticity will be greatly reduced. How to configure reasonably is the key to making good use of HPA.
Insufficient flexibility and timeliness : Native HPA is a passive response to monitoring data. In addition, it takes a certain amount of time for the application itself to start and warm up, which makes HPA inherently lagging, which may affect business stability. This is also an important reason why many users do not trust and dare not use HPA.
Low observability : HPA cannot be tested in a way similar to Dryrun, and once it is used, it will actually modify the number of instances of the application, which is risky; and the elastic process is also difficult to observe.

time series forecasting

HPA is usually applied to services with tidal loads. If you look at the time dimension of indicators such as traffic or resource consumption, you will find obvious peaks and valleys. Further observation, such volatile businesses are often naturally cyclical in time series, especially those businesses that directly or indirectly serve "people".

This periodicity is determined by people’s work and rest patterns. For example, people are used to ordering takeaways at noon and at night; there will be travel peaks in the morning and evening; even when searching for services that are not obvious during business hours, the number of requests at night will be much lower than daytime. For this type of business, a natural idea is to predict today's data from data from the past few days. With the predicted data (for example: business CPU usage in the next 24 hours), we can make some kind of "advance deployment" for elastic scaling, which is also the key for Effective HPA to overcome the lack of real-time performance of native HPA.

What is Effective HPA

Effective HPA (EHPA for short) is an elastic scaling product in the open source project Crane. It is based on community HPA for underlying elastic control, supports more elastic triggering strategies (prediction, monitoring, cycle), makes elasticity more efficient, and guarantees services the quality of:

Expand capacity in advance to ensure service quality : predict future traffic peaks through algorithms to expand capacity in advance to avoid avalanches and service stability failures caused by untimely capacity expansion.
Reduce invalid scaling : By predicting the future, unnecessary scaling can be reduced, resource utilization of workloads can be stabilized, and spurious misjudgments can be eliminated.
Support Cron configuration : Support Cron-based flexible configuration to deal with abnormal traffic peaks such as big promotions.
Compatible community : Using community HPA as the execution layer of elastic control, the ability is fully compatible with the community.

Architecture

The main architecture of EHPA is as follows:

EHPA Controller: Responsible for the control logic of EHPA objects, including the addition, deletion, modification and checking of EHPA and synchronization of HPA
Metric Adapter: responsible for the generation of forecast indicators and other related indicators
Predictor: mainly used for time series data analysis and prediction
TimeSeriesPrediction: Time series data prediction CRD, mainly for consumption by EHPA and MetricAdapter
HPA Controller: Community native HPA controller, fully compatible with EHPA, allowing users to have already configured HPA
KubeApiServer: Community-native Kubernetes ApiServer
Metric Server: Community native Metric Server

The main function

Forecast-Based Resiliency

EHPA fully mines relevant indicators of Workload. For workloads with obvious periodicity of resource consumption and traffic, it predicts its time-series indicators in a future period of time. Using the prediction window data, the indicators obtained by HPA will be forward-looking. The current EHPA will take the maximum value of the indicator in the future window period as the observation indicator of the current HPA.

In this way, when the future traffic rises beyond the HPA tolerance, the HPA can complete the early expansion at the moment, and when the traffic decreases in a short time in the future, it is actually a short-term jitter. At this time, since the EHPA takes the maximum value, it will not Immediately scales down to avoid ineffective scale downs.

Users can configure the following metrics by:

 apiVersion: autoscaling.crane.io/v1alpha1
kind: EffectiveHorizontalPodAutoscaler
spec:
  # Metrics 定义了弹性阈值，希望 workload 的资源使用保持在一个水平线上
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  # Prediction 定义了预测配置
  # 如果不设置，则不开启弹性预测
  prediction:
    predictionWindowSeconds: 3600   # PredictionWindowSeconds 定义了预测未来多久的时间窗口。
    predictionAlgorithm:
      algorithmType: dsp
      dsp:
        sampleInterval: "60s"
        historyLength: "3d"

MetricSpec: Configuration and HPA are consistent to maintain a consistent user experience
Prediction: It is mainly used to set the prediction first-off parameters, including the prediction window and algorithm type. In the future, the processing strategy of the window time series indicator can be customized by the user.
- PredictionWindowSeconds: predict the length of the future time window
- Dsp: This algorithm is a time series analysis algorithm based on FFT (Fast Fourier Transform), which has a good prediction effect for time series with strong periodicity, and does not require model training, which is simple and efficient

Cron based resilience

In addition to monitoring indicators, sometimes there are differences between holiday flexibility and working days, and simple prediction algorithms may not work well. Then at this time, you can set the number of replicas to be larger by setting the Cron on the weekend to make up for the lack of prediction.

For some non-Web traffic applications, for example, some applications do not need to work on weekends. At this time, you want to reduce the working copy to 1. You can also configure Cron to reduce the size to reduce user costs.

The timing Cron Elastic Spec is set as follows:

 apiVersion: autoscaling.crane.io/v1alpha1
kind: EffectiveHorizontalPodAutoscaler
spec:
  crons:
  - name: "cron1"
    description: "scale up"
    start: "0 6 ? * *"
    end: "0 9 ? * *"
    targetReplicas: 100
  - name: "cron2"
    description: "scale down"
    start: "00 9 ? * *"
    end: "00 11 ? * *"
    targetReplicas: 1

CronSpec: Multiple Cron elastic configurations can be set. The Cron cycle can set the start time and end time of the cycle. Within this time range, the number of copies of Workload can be continuously guaranteed to be the set target value.
- Name: Cron identifier
- TargetReplicas: The number of target replicas of the Workload in this Cron time range
- Start: Indicates the start time of Cron, the time format is the standard linux crontab format
- End: Indicates the end time of Cron, the time format is the standard linux crontab format

At present, there are some deficiencies in the elastic Cron capabilities of some manufacturers and communities:

Cron capabilities are provided separately, and there is no overall view of elasticity, and the compatibility with HPA is poor, which will cause conflicts
Cron's semantics and behavior do not match very well, and it is even very difficult to understand when using it, which is easy to cause user failures

The following figure is a comparison of the current EHPA's Cron elastic implementation and other Cron capabilities:

In response to the above problems, the Cron elasticity implemented by EHPA is designed on the basis of compatibility with HPA. As an indicator of HPA, Cron acts on Workload objects like other indicators. In addition, the setting of Cron is also very simple. When Cron is configured separately, the workload will not be scaled by default outside the activation time range.

Elastic result preview

EHPA supports previewing (Dry-run) the results of horizontal elasticity. In preview mode, EHPA does not actually modify the number of replicas of the target workload, so you can decide whether you need to actually start automatic elasticity by previewing the effect of EHPA elasticity. Another scenario is when you want to temporarily turn off automatic elasticity, which can also be achieved by adjusting to preview mode.

ScaleStrategy: Preview is preview mode, Auto is automatic elastic mode
SpecificReplicas: In preview mode, you can specify the number of replicas for the workload by setting SpecificReplicas

 apiVersion: autoscaling.crane.io/v1alpha1
kind: EffectiveHorizontalPodAutoscaler
spec:
  scaleStrategy: Preview   # ScaleStrategy 弹性策略，支持 "Auto" 和 "Preview"。
  specificReplicas: 5      # SpecificReplicas 在 "Preview" 模式下，支持指定 workload 的副本数。
status:
  expectReplicas: 4        # expectReplicas 展示了 EHPA 计算后得到的最终推荐副本数，如果指定了 spec.specificReplicas，则等于 spec.specificReplicas.
  currentReplicas: 4       # currentReplicas 展示了 workload 实际的副本数。

Implementation principle : When EHPA is in preview mode, Ehpa-controller will point the underlying HPA object to a Substitute object. The underlying computing and elastic HPA will only act on the substitute, and the actual workload will not be changed. .

landing effect

At present, EHPA has been used within Tencent to support the elastic demand of online business. Here is the landing effect of an online application using EHPA.

The graph above shows the app's CPU usage over the course of a day. The red curve is the actual usage, and the green curve is the usage predicted by the algorithm. It can be seen that the algorithm can predict the usage trend well, and achieve a certain preference (such as high) according to the parameters.

The graph above shows the trend of replica counts over a day after the app uses elasticity. The red curve is the number of replicas automatically adjusted by native HPA, and the green curve is the number of replicas automatically adjusted by EHPA. It can be seen that EHPA's elasticity strategy is more reasonable: early bounce and reduction of invalid elasticity.

Derivative Reading: What is Crane

In order to promote cloud native users to achieve the ultimate cost reduction on the basis of ensuring business stability, Tencent launched the industry's first cost optimization open source project based on cloud native technology Crane (Cloud Resource Analytics and Economics). Crane follows FinOps standards and aims to provide cloud-native users with a one-stop solution for cloud cost optimization.

Crane's intelligent horizontal resiliency is based on the Effective HPA implementation. After installing Crane, users can start their journey to intelligent resiliency directly with Effective HPA.

The main contributors to the current Crane project include industry experts from well-known companies such as Tencent, Xiaohongshu, Google, eBay, Microsoft, and Tesla.

Reference link

Crane open source project address: [ https://github.com/gocrane/crane/ ]
Crane official website: 【 https://docs.gocrane.io/ 】
Effective HPA usage documentation: [ https://docs.gocrane.io/dev/zh/tutorials/using-effective-hpa-to-scaling-with-effectiveness/ ]

about us

For more cases and knowledge about cloud native, you can pay attention to the public account of the same name [Tencent Cloud Native]~

Welfare:

① Reply to the [Manual] in the background of the official account, you can get the "Tencent Cloud Native Roadmap Manual" & "Tencent Cloud Native Best Practices"~

②The official account will reply to [series] in the background, and you can get "15 series of 100+ super practical cloud native original dry goods collection", including Kubernetes cost reduction and efficiency enhancement, K8s performance optimization practices, best practices and other series.

③If you reply to the [White Paper] in the background of the official account, you can get the "Tencent Cloud Container Security White Paper" & "The Source of Cost Reduction - Cloud Native Cost Management White Paper v1.0"

④ Reply to [Introduction to the Speed of Light] in the background of the official account, you can get a 50,000-word essence tutorial of Tencent Cloud experts, Prometheus and Grafana of the speed of light.

[Tencent Cloud Native] New products of Yunshuo, new techniques of Yunyan, new activities of Yunyou, and information of cloud appreciation, scan the code to follow the public account of the same name, and get more dry goods in time! !

Effective HPA: Predicting future elastic scaling products

author

introduction

What is HPA

Problems with traditional HPA

time series forecasting

What is Effective HPA

Architecture

The main function

Forecast-Based Resiliency

Cron based resilience

Elastic result preview

landing effect

Derivative Reading: What is Crane

Reference link

about us

Welfare:

账号已注销

引用和评论

Serverless AI绘画技术沙龙【深圳站】火热报名中

DeepSeek 从热潮到应用，腾讯云携手行业专家共探 AI 下一步

2025免费云服务器盘点

信息安全风云录，AI 时代安全江湖如何见招拆招？

腾讯云TVP AI与安全高峰论坛圆满落幕，共探大模型时代的安全破局之道

腾讯云cos大文件上传服务端实现一篇搞定

具身智能全解读，从实验室到产业化 | TVP技术夜未眠