How to gracefully cope with sudden traffic peaks in microservice governance

Why do I need to reduce the load

In a microservice cluster, the invocation link is complicated. As a service provider, it is necessary to have a mechanism to protect itself to prevent the caller from overwhelming itself with unintended calls, and to ensure the high availability of its own services.

The most common protection mechanism is the current limiting mechanism. The premise of using a current limiter is to know the maximum number of concurrency that can be processed. Generally, the maximum number of concurrency is obtained through pressure testing before going online, and each interface in the daily request process The current limiting parameters of the system are different, and the system is constantly iterating and its processing capacity often changes accordingly. It becomes very cumbersome to perform pressure testing and then adjust the current limiting parameters before going online each time.

So is there a more concise current-limiting mechanism that can achieve maximum self-protection?

What is adaptive load shedding

Adaptive load shedding can protect the service itself very intelligently, and dynamically determine whether it is necessary to reduce the load according to the service's own system load.

Design goals:

Ensure that the system is not dragged down.
On the premise of system stability, maintain system throughput.

So the key is how to measure the load of the service itself?

The judgment of high load mainly depends on two indicators:

Whether the cpu is overloaded.
Whether the maximum concurrent number is overloaded.

When the above two points are met at the same time, it indicates that the service is in a high load state, and then adaptive load reduction is performed.

At the same time, it should be noted that the cpu load and concurrent number of high-concurrency scenarios often fluctuate greatly. From the data, we call this phenomenon a glitch. The glitch phenomenon may cause the system to frequently perform automatic load reduction operations, so we generally get it for a period of time. The average value of the indicator within to make the indicator smoother. Implementation can be used to accurately record the indicators within a period of time and then directly calculate the average value, but it needs to occupy a certain amount of system resources.

There is an algorithm in statistics: exponential moving average, which can be used to estimate the local mean of a variable, so that the update of the variable is related to the historical value of a period of time in history, and the average can be achieved without recording all historical local variables. Estimated to save valuable server resources.

moving average algorithm principle refer to this article to explain very clearly.

Variable V is recorded as Vt at time t, θt is the value of variable V at time t, that is, when the moving average model is not used, Vt=θt, after using the moving average model, the update formula of Vt is as follows:

Vt=β⋅Vt−1+(1−β)⋅θt

Vt = θt when β = 0
When β = 0.9, it is roughly equivalent to the average of the past 10 θt values
When β = 0.99, it is roughly equivalent to the average of the past 100 θt values

Code

Next, let's look at the code implementation of go-zero adaptive load reduction.

core/load/adaptiveshedder.go

Definition of adaptive load shedding interface:

// 回调函数
Promise interface {
    // 请求成功时回调此函数
    Pass()
    // 请求失败时回调此函数
    Fail()
}

// 降载接口定义
Shedder interface {
    // 降载检查
    // 1. 允许调用，需手动执行 Promise.accept()/reject()上报实际执行任务结构
    // 2. 拒绝调用，将会直接返回err：服务过载错误 ErrServiceOverloaded
    Allow() (Promise, error)
}

The interface definition is very concise, which means it is actually very simple to use, exposing an `Allow()(Promise,error) to the outside.

Examples of go-zero usage:

In the business, only need to adjust this method to determine whether to reduce the load. If the load is reduced, the process will be ended directly, otherwise the return value Promise will be used to call back the result according to the execution result.

func UnarySheddingInterceptor(shedder load.Shedder, metrics *stat.Metrics) grpc.UnaryServerInterceptor {
    ensureSheddingStat()

    return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo,
        handler grpc.UnaryHandler) (val interface{}, err error) {
        sheddingStat.IncrementTotal()
        var promise load.Promise
        // 检查是否被降载
        promise, err = shedder.Allow()
        // 降载，记录相关日志与指标
        if err != nil {
            metrics.AddDrop()
            sheddingStat.IncrementDrop()
            return
        }
        // 最后回调执行结果
        defer func() {
            // 执行失败
            if err == context.DeadlineExceeded {
                promise.Fail()
            // 执行成功
            } else {
                sheddingStat.IncrementPass()
                promise.Pass()
            }
        }()
        // 执行业务方法
        return handler(ctx, req)
    }
}

Interface implementation class definition:

Mainly contains three types of attributes

CPU load threshold: Exceeding this value means that the CPU is in a high load state.
Cooling-off period: If the load has been reduced before the service, it will enter the cooling-off period. The purpose is to prevent the load from being lowered and immediately pressurizing during the load reduction process to cause back-and-forth jitter. Because it takes a certain amount of time to reduce the load, you should continue to check whether the number of concurrency exceeds the limit during the cooling down period, and continue to discard requests if the number exceeds the limit.
Concurrency: The number of concurrency currently being processed, the average number of concurrency currently being processed, and the number of requests and response time in the most recent period. The purpose is to calculate whether the number of concurrency currently being processed is greater than the maximum number of concurrency that the system can carry.

// option参数模式
ShedderOption func(opts *shedderOptions)

// 可选配置参数
shedderOptions struct {
    // 滑动时间窗口大小
    window time.Duration
    // 滑动时间窗口数量
    buckets int
    // cpu负载临界值
    cpuThreshold int64
}

// 自适应降载结构体，需实现 Shedder 接口
adaptiveShedder struct {
    // cpu负载临界值
    // 高于临界值代表高负载需要降载保证服务
    cpuThreshold int64
    // 1s内有多少个桶
    windows int64
    // 并发数
    flying int64
    // 滑动平滑并发数
    avgFlying float64
    // 自旋锁，一个服务共用一个降载
    // 统计当前正在处理的请求数时必须加锁
    // 无损并发，提高性能
    avgFlyingLock syncx.SpinLock
    // 最后一次拒绝时间
    dropTime *syncx.AtomicDuration
    // 最近是否被拒绝过
    droppedRecently *syncx.AtomicBool
    // 请求数统计，通过滑动时间窗口记录最近一段时间内指标
    passCounter *collection.RollingWindow
    // 响应时间统计，通过滑动时间窗口记录最近一段时间内指标
    rtCounter *collection.RollingWindow
}

Adaptive load shedding constructor:

func NewAdaptiveShedder(opts ...ShedderOption) Shedder {
    // 为了保证代码统一
    // 当开发者关闭时返回默认的空实现，实现代码统一
    // go-zero很多地方都采用了这种设计，比如Breaker，日志组件
    if !enabled.True() {
        return newNopShedder()
    }
    // options模式设置可选配置参数
    options := shedderOptions{
        // 默认统计最近5s内数据
        window: defaultWindow,
        // 默认桶数量50个
        buckets:      defaultBuckets,
        // cpu负载
        cpuThreshold: defaultCpuThreshold,
    }
    for _, opt := range opts {
        opt(&options)
    }
    // 计算每个窗口间隔时间，默认为100ms
    bucketDuration := options.window / time.Duration(options.buckets)
    return &adaptiveShedder{
        // cpu负载
        cpuThreshold:    options.cpuThreshold,
        // 1s的时间内包含多少个滑动窗口单元
        windows:         int64(time.Second / bucketDuration),
        // 最近一次拒绝时间
        dropTime:        syncx.NewAtomicDuration(),
        // 最近是否被拒绝过
        droppedRecently: syncx.NewAtomicBool(),
        // qps统计，滑动时间窗口
        // 忽略当前正在写入窗口（桶），时间周期不完整可能导致数据异常
        passCounter: collection.NewRollingWindow(options.buckets, bucketDuration,
            collection.IgnoreCurrentBucket()),
        // 响应时间统计，滑动时间窗口
        // 忽略当前正在写入窗口（桶），时间周期不完整可能导致数据异常
        rtCounter: collection.NewRollingWindow(options.buckets, bucketDuration,
            collection.IgnoreCurrentBucket()),
    }
}

Allow() inspection 0619ed9178353f:

Check whether the current request should be discarded. The discarded business side needs to directly interrupt the request protection service, which also means that the load reduction takes effect and enters the cooling period. If it is released, it returns a promise and waits for the business side to execute the callback function to perform indicator statistics.

// 降载检查
func (as *adaptiveShedder) Allow() (Promise, error) {
    // 检查请求是否被丢弃
    if as.shouldDrop() {
        // 设置drop时间
        as.dropTime.Set(timex.Now())
        // 最近已被drop
        as.droppedRecently.Set(true)
        // 返回过载
        return nil, ErrServiceOverloaded
    }
    // 正在处理请求数加1
    as.addFlying(1)
    // 这里每个允许的请求都会返回一个新的promise对象
    // promise内部持有了降载指针对象
    return &promise{
        start:   timex.Now(),
        shedder: as,
    }, nil
}

Check if it should be discarded shouldDrop() :

// 请求是否应该被丢弃
func (as *adaptiveShedder) shouldDrop() bool {
    // 当前cpu负载超过阈值
    // 服务处于冷却期内应该继续检查负载并尝试丢弃请求
    if as.systemOverloaded() || as.stillHot() {
        // 检查正在处理的并发是否超出当前可承载的最大并发数
        // 超出则丢弃请求
        if as.highThru() {
            flying := atomic.LoadInt64(&as.flying)
            as.avgFlyingLock.Lock()
            avgFlying := as.avgFlying
            as.avgFlyingLock.Unlock()
            msg := fmt.Sprintf(
                "dropreq, cpu: %d, maxPass: %d, minRt: %.2f, hot: %t, flying: %d, avgFlying: %.2f",
                stat.CpuUsage(), as.maxPass(), as.minRt(), as.stillHot(), flying, avgFlying)
            logx.Error(msg)
            stat.Report(msg)
            return true
        }
    }
    return false
}

cpu threshold check systemOverloaded() :

The sliding average algorithm used in the cpu load value calculation algorithm to prevent glitches. The β sampled every 250ms is 0.95, which is roughly equivalent to the average value of 20 historical cpu loads, and the time period is about 5s.

// cpu 是否过载
func (as *adaptiveShedder) systemOverloaded() bool {
    return systemOverloadChecker(as.cpuThreshold)
}

// cpu 检查函数
systemOverloadChecker = func(cpuThreshold int64) bool {
        return stat.CpuUsage() >= cpuThreshold
}

// cpu滑动平均值
curUsage := internal.RefreshCpu()
prevUsage := atomic.LoadInt64(&cpuUsage)
// cpu = cpuᵗ⁻¹ * beta + cpuᵗ * (1 - beta)
// 滑动平均算法
usage := int64(float64(prevUsage)*beta + float64(curUsage)*(1-beta))
atomic.StoreInt64(&cpuUsage, usage)

Check if it is in the cooling-off period stillHot :

Determine whether the current system is in the cooling-off period. If it is in the cooling-off period, you should continue to try to check whether the request is discarded. The main purpose is to prevent the system from failing to reduce the load during the overload recovery process, and immediately increase the pressure to cause back and forth jitter. At this time, you should try to continue to discard requests.

func (as *adaptiveShedder) stillHot() bool {
    // 最近没有丢弃请求
    // 说明服务正常
    if !as.droppedRecently.True() {
        return false
    }
    // 不在冷却期
    dropTime := as.dropTime.Load()
    if dropTime == 0 {
        return false
    }
    // 冷却时间默认为1s
    hot := timex.Since(dropTime) < coolOffDuration
    // 不在冷却期，正常处理请求中
    if !hot {
        // 重置drop记录
        as.droppedRecently.Set(false)
    }

    return hot
}

Check the number of concurrency currently being processed highThru() :

Once the > the upper limit of the number of concurrent loads, , the load will be reduced.

Why is there a lock here? Because the adaptive load reduction is used globally, in order to ensure the correctness of the average value of the concurrent number.

Why do we need to add a spin lock here? Because in the process of concurrent processing, you can not block other goroutines from executing tasks, and use lock-free concurrency to improve performance.

func (as *adaptiveShedder) highThru() bool {
    // 加锁
    as.avgFlyingLock.Lock()
    // 获取滑动平均值
    // 每次请求结束后更新
    avgFlying := as.avgFlying
    // 解锁
    as.avgFlyingLock.Unlock()
    // 系统此时最大并发数
    maxFlight := as.maxFlight()
    // 正在处理的并发数和平均并发数是否大于系统的最大并发数
    return int64(avgFlying) > maxFlight && atomic.LoadInt64(&as.flying) > maxFlight
}

How to get the number of concurrency being processed and the average number of concurrency?

The current processing concurrency count is actually very simple. Each time a request is allowed, the concurrency count is +1. After the request is completed, you can call back -1 through the promise object, and use the moving average algorithm to solve the average concurrency count.

type promise struct {
    // 请求开始时间
    // 统计请求处理耗时
    start   time.Duration
    shedder *adaptiveShedder
}

func (p *promise) Fail() {
    // 请求结束，当前正在处理请求数-1
    p.shedder.addFlying(-1)
}

func (p *promise) Pass() {
    // 响应时间，单位毫秒
    rt := float64(timex.Since(p.start)) / float64(time.Millisecond)
    // 请求结束，当前正在处理请求数-1
    p.shedder.addFlying(-1)
    p.shedder.rtCounter.Add(math.Ceil(rt))
    p.shedder.passCounter.Add(1)
}

func (as *adaptiveShedder) addFlying(delta int64) {
    flying := atomic.AddInt64(&as.flying, delta)
    // 请求结束后，统计当前正在处理的请求并发
    if delta < 0 {
        as.avgFlyingLock.Lock()
        // 估算当前服务近一段时间内的平均请求数
        as.avgFlying = as.avgFlying*flyingBeta + float64(flying)*(1-flyingBeta)
        as.avgFlyingLock.Unlock()
    }
}

It is not enough to get the current number of systems. We also need to know the upper limit of the number of concurrency that the current system can handle, that is, the maximum number of concurrency.

The number of requests and response time are realized through sliding windows. For the realization of sliding windows, please refer to the article on Adaptive Fuse.

The maximum concurrent number of the current system = the maximum number of passes in the window unit time * the minimum response time in the window unit time.

// 计算每秒系统的最大并发数
// 最大并发数 = 最大请求数（qps）* 最小响应时间（rt）
func (as *adaptiveShedder) maxFlight() int64 {
    // windows = buckets per second
    // maxQPS = maxPASS * windows
    // minRT = min average response time in milliseconds
    // maxQPS * minRT / milliseconds_per_second
    // as.maxPass()*as.windows - 每个桶最大的qps * 1s内包含桶的数量
    // as.minRt()/1e3 - 窗口所有桶中最小的平均响应时间 / 1000ms这里是为了转换成秒
    return int64(math.Max(1, float64(as.maxPass()*as.windows)*(as.minRt()/1e3)))
}    

// 滑动时间窗口内有多个桶
// 找到请求数最多的那个
// 每个桶占用的时间为 internal ms
// qps指的是1s内的请求数，qps: maxPass * time.Second/internal
func (as *adaptiveShedder) maxPass() int64 {
    var result float64 = 1
    // 当前时间窗口内请求数最多的桶
    as.passCounter.Reduce(func(b *collection.Bucket) {
        if b.Sum > result {
            result = b.Sum
        }
    })

    return int64(result)
}

// 滑动时间窗口内有多个桶
// 计算最小的平均响应时间
// 因为需要计算近一段时间内系统能够处理的最大并发数
func (as *adaptiveShedder) minRt() float64 {
    // 默认为1000ms
    result := defaultMinRt

    as.rtCounter.Reduce(func(b *collection.Bucket) {
        if b.Count <= 0 {
            return
        }
        // 请求平均响应时间
        avg := math.Round(b.Sum / float64(b.Count))
        if avg < result {
            result = avg
        }
    })

    return result
}

`Reference`

Google BBR congestion control algorithm

moving average algorithm principle

go-zero adaptive load

`project address`

https://github.com/zeromicro/go-zero

Welcome to use go-zero and star support us!

`WeChat Exchange Group`

Follow the " Practice " public account and click on the exchange group get the QR code of the community group.

How to gracefully cope with sudden traffic peaks in microservice governance

Why do I need to reduce the load

What is adaptive load shedding

Code

`Reference`

`project address`

`WeChat Exchange Group`

kevinwan

`引用和评论`

熔断原理分析与源码解读

一文掌握 MCP 上下文协议：从理论到实践

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

Go slice切片使用教程，一次通关！

gozero限流、熔断、降级如何实现？面试的时候怎么回答？

腾讯 tRPC-Go 教学——（1）搭建服务

Go-Zero实战：抽奖算法的设计与实现