HPA默认的伸缩策略

kubernetes v1.18增加了HPA v2beta2的behavior字段，可以精细化的控制伸缩的行为。

若不指定behavior字段，则按默认的behavior行为执行伸缩。

一. 默认behavior

默认的behavior:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300            // 冷却时间=5min
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15
  scaleUp:
    stabilizationWindowSeconds: 0            // 没有冷却时间
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15                        // replicas翻倍扩容
    - type: Pods
      value: 4
      periodSeconds: 15                        // replicas扩容4个
    selectPolicy: Max                        // 取最大的策略

扩容时，快速扩容：

不考虑历史计算值(stabilizationWindowSeconds=0)，即reconcile()发现指标变化时，计算新replicas后，立即执行伸缩；
每15秒最多允许：
- 副本翻倍(增加currentReplicas*100%个副本) 或者每15s新增4个副本，两者取max；
- 即：max(2*currentReplicas, 4)

缩容时，缓慢缩容：

缩容后的最终副本数不得<过去5min的历史最大值(stabilizationWindowSeconds=300)，即冷却时间=300s；
每15秒最多允许：副本减少currentReplicas*100%个副本；

二. demo

扩容，将指标猛增(1-->13)：

首先，从1副本扩容到4副本；
然后，从4副本扩容到8副本；
最后，从8副本扩容至指标计算的13副本；

# kubectl describe hpa
Name:                    sample-app
Namespace:               default
Labels:                  <none>
Annotations:             <none>
Reference:               Deployment/sample-app
Metrics:                 ( current / target )
  "metric_hpa" on pods:  1 / 1
Min replicas:            1
Max replicas:            15
Deployment pods:         13 current / 13 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric metric_hpa
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type    Reason             Age    From                       Message
  ----    ------             ----   ----                       -------
  Normal  SuccessfulRescale  2m14s  horizontal-pod-autoscaler  New size: 4; reason: pods metric metric_hpa above target
  Normal  SuccessfulRescale  119s   horizontal-pod-autoscaler  New size: 8; reason: pods metric metric_hpa above target
  Normal  SuccessfulRescale  103s   horizontal-pod-autoscaler  New size: 13; reason: pods metric metric_hpa above target

缩容，将指标猛降(13-->1):

5min后，hpa直接将replicas缩容至minReplicas=1；

# kubectl describe hpa
Name:                    sample-app
Namespace:               default
Labels:                  <none>
Annotations:             <none>
Reference:               Deployment/sample-app
Metrics:                 ( current / target )
  "metric_hpa" on pods:  1 / 1
Min replicas:            1
Max replicas:            15
Deployment pods:         1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric metric_hpa
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  11m   horizontal-pod-autoscaler  New size: 4; reason: pods metric metric_hpa above target
  Normal  SuccessfulRescale  11m   horizontal-pod-autoscaler  New size: 8; reason: pods metric metric_hpa above target
  Normal  SuccessfulRescale  11m   horizontal-pod-autoscaler  New size: 13; reason: pods metric metric_hpa above target
  Normal  SuccessfulRescale  98s   horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

三. 源码分析

基于v1.20源码。

代码入口：

输入参数：prenormalizedDesiredReplicas = 根据指标计算的所需replicas；
首先，根据默认的冷却时间，计算stabilizedRecommendation副本数；
然后，根据stabilizedRecommendation副本数 + 默认的伸缩策略，计算最终的副本数；

// pkg/controller/podautoscaler/horizontal.go
func (a *HorizontalController) normalizeDesiredReplicas(hpa *autoscalingv2.HorizontalPodAutoscaler, key string, currentReplicas int32, prenormalizedDesiredReplicas int32, minReplicas int32) int32 {
    // 冷却时间
    stabilizedRecommendation := a.stabilizeRecommendation(key, prenormalizedDesiredReplicas)
    ...
    // 伸缩策略
    desiredReplicas, condition, reason := convertDesiredReplicasWithRules(currentReplicas, stabilizedRecommendation, minReplicas, hpa.Spec.MaxReplicas)
    ...
    return desiredReplicas
}

先看下冷却时间：

计算maxRecommendation的值：
- 初始=prenormalizedDesiredReplicas，即指标计算的replicas；
遍历历史伸缩记录：
- 若5min内有伸缩 && 伸缩replicas > maxRecommendation，则覆盖maxRecommendation；
根据上述逻辑：
- 扩容时，最近的oldReplicas < newReplicas，故最终返回值=prenormalizedDesiredReplicas，即立即扩容；
- 缩容时，要选择max(5min内oldReplicas，newReplicas)，由于缩容时oldReplicas > newReplicas，故要等待5min后才能执行缩容；

// pkg/controller/podautoscaler/horizontal.go
func (a *HorizontalController) stabilizeRecommendation(key string, prenormalizedDesiredReplicas int32) int32 {
    maxRecommendation := prenormalizedDesiredReplicas
    foundOldSample := false
    oldSampleIndex := 0
    cutoff := time.Now().Add(-a.downscaleStabilisationWindow)    // 300s
    for i, rec := range a.recommendations[key] {
        if rec.timestamp.Before(cutoff) {
            foundOldSample = true
            oldSampleIndex = i
        } else if rec.recommendation > maxRecommendation {
            maxRecommendation = rec.recommendation
        }
    }
    if foundOldSample {
        a.recommendations[key][oldSampleIndex] = timestampedRecommendation{prenormalizedDesiredReplicas, time.Now()}
    } else {
        a.recommendations[key] = append(a.recommendations[key], timestampedRecommendation{prenormalizedDesiredReplicas, time.Now()})
    }
    return maxRecommendation
}

再看下伸缩策略：

扩容时，最终副本数=min(hpaMaxReplicas, max(2*currentReplicas, 4), desiredReplicas)；
- 也就是说，若计算的desiredReplicas太大的话，会使用max(2*currentReplicas, 4)进行限制；
缩容时，最终副本数=max(desiredReplicas, hpaMinReplicas)；

// pkg/controller/podautoscaler/horizontal.go
func convertDesiredReplicasWithRules(currentReplicas, desiredReplicas, hpaMinReplicas, hpaMaxReplicas int32) (int32, string, string) {
    var minimumAllowedReplicas int32
    var maximumAllowedReplicas int32
    var possibleLimitingCondition string
    var possibleLimitingReason string
    minimumAllowedReplicas = hpaMinReplicas
    // Do not upscale too much to prevent incorrect rapid increase of the number of master replicas caused by
    // bogus CPU usage report from heapster/kubelet (like in issue #32304).
    scaleUpLimit := calculateScaleUpLimit(currentReplicas)
    if hpaMaxReplicas > scaleUpLimit {
        maximumAllowedReplicas = scaleUpLimit
        possibleLimitingCondition = "ScaleUpLimit"
        possibleLimitingReason = "the desired replica count is increasing faster than the maximum scale rate"
    } else {
        maximumAllowedReplicas = hpaMaxReplicas
        possibleLimitingCondition = "TooManyReplicas"
        possibleLimitingReason = "the desired replica count is more than the maximum replica count"
    }
    if desiredReplicas < minimumAllowedReplicas {
        possibleLimitingCondition = "TooFewReplicas"
        possibleLimitingReason = "the desired replica count is less than the minimum replica count"
        return minimumAllowedReplicas, possibleLimitingCondition, possibleLimitingReason
    } else if desiredReplicas > maximumAllowedReplicas {
        return maximumAllowedReplicas, possibleLimitingCondition, possibleLimitingReason
    }
    return desiredReplicas, "DesiredWithinRange", "the desired count is within the acceptable range"
}

func calculateScaleUpLimit(currentReplicas int32) int32 {
    return int32(math.Max(scaleUpLimitFactor*float64(currentReplicas), scaleUpLimitMinimum))    // scaleUpLimitFactor=2, scaleUpLimitMinimum=4
}

参考：

1.https://zhuanlan.zhihu.com/p/245208287

HPA默认的伸缩策略

一. 默认behavior

二. demo

三. 源码分析

参考：

a朋

引用和评论

alertmanager源码：整体架构和流程分析

Jenkins 企业级 CI/CD 实践：安装、配置与 Kubernetes & Docker 集成

k8s集群部署（一主两从）

k8s实战基础

使用kubeadm部署高可用IPV4/IPV6集群---V1.32

深度解析：通过 AIBrix 多节点部署 DeepSeek-R1 671B 模型

centos7使用yum网络安装