Preface
Students in the go-zero group often ask:
What algorithm is used to implement service monitoring?
How does the sliding window work? Can you talk about the principle of this?
How is the fusing algorithm designed? Why is there no half-open and half-closed state?
In this article, let's go-zero
the implementation algorithm and logic behind the indicator statistics in 060bd7dabb197e.
How to count the indicators
For this, we directly look at breaker
:
type googleBreaker struct {
k float64
stat *collection.RollingWindow
proba *mathx.Proba
}
go-zero
defaultbreaker
is based on google SRE as the implementation blueprint.
When breaker
is intercepting the request, it will record the current success/failure rate of this type of request:
func (b *googleBreaker) doReq(req func() error, fallback func(err error) error, acceptable Acceptable) error {
...
// 执行实际请求函数
err := req()
if acceptable(err) {
// 实际执行:b.stat.Add(1)
// 也就是说:内部指标统计成功+1
b.markSuccess()
} else {
// 原理同上
b.markFailure()
}
return err
}
So in fact, the bottom layer is : the 160bd7dabb1a41 request is executed, and the internal statistical data structure will add the statistical value (positive or negative) according to the number of errors. At the same time, as time moves, statistical values also need to evolve over time.
Simply put: time series memory database database, it is a storage, just a memory version]
Let's talk about what data structure is used to organize this time series.
Sliding window
Let's take a look at the data structure defined by rollingwindow
type RollingWindow struct {
lock sync.RWMutex
size int
win *window
interval time.Duration
offset int
ignoreCurrent bool
lastTime time.Duration
}
In the above structure definition, window
stores indicator record attributes.
In a rollingwindow
contains several buckets (this depends on the developer's own definition):
Each bucket stores: Sum
total number of successful, Count
total number of requests. So in the end breaker
do the calculation time, it will add up to a total of Sum accepts
, the Count and the cumulative increase to total
, so you can count out the current error rate.
How does the sliding happen
First of all, for breaker
it needs to count the request status in a unit time (such as 1s). Corresponding to the above bucket
we only need to record the indicator data per unit time in this bucket
.
Then how do we ensure that the specified Bucket
stores the data per unit time during the progress of time?
The first method that comes to mind: Open a timer in the background, create a bucket
every unit time, and then when the request is made, the current timestamp falls in bucket
to record the current request status. There are critical conditions for creating buckets periodically. When the data comes, the buckets have not been built.
The second method is to create bucket
, and then check and create bucket
when a piece of data is encountered. In this way, there are sometimes buckets and sometimes no buckets, and a large number of bucket
will be created. Can we reuse it?
The go-zero method is: rollingwindow
directly created in advance, the current time of the request is determined to bucket
through an algorithm, and the request status is recorded.
Let's take a look at the process of breaker
calling b.stat.Add(1)
func (rw *RollingWindow) Add(v float64) {
rw.lock.Lock()
defer rw.lock.Unlock()
// 滑动的动作发生在此
rw.updateOffset()
rw.win.add(rw.offset, v)
}
func (rw *RollingWindow) updateOffset() {
span := rw.span()
if span <= 0 {
return
}
offset := rw.offset
// 重置过期的 bucket
for i := 0; i < span; i++ {
rw.win.resetBucket((offset + i + 1) % rw.size)
}
rw.offset = (offset + span) % rw.size
now := timex.Now()
// 更新时间
rw.lastTime = now - (now-rw.lastTime)%rw.interval
}
func (w *window) add(offset int, v float64) {
// 往执行的 bucket 加入指定的指标
w.buckets[offset%w.size].add(v)
}
The above picture is the window change of bucket
Add(delta)
explain:
updateOffset
is tobucket
and determine whichbucket
current time falls on [return the number of buckets directly if the number of buckets is exceeded], and resetbucket
- Determine the span of the current time relative to
bucket interval
[return the number of buckets directly if the number of buckets is exceeded] - Clear the data
bucket
in the span.reset
- Update
offset
, it is about to write databucket
- Update the execution time
lastTime
, and also make a mark for the next move
- Determine the span of the current time relative to
- From the last updated
offset
, write data to the correspondingbucket
In this process, how to determine the bucket
and the update time. The most important thing about the sliding window is the time update. The following figure explains this process:
The bucket
expiration point, saying that white is lastTime
an update time that is spanned several bucket
: timex.Since(rw.lastTime) / rw.interval
Thus, the Add()
process by lastTime
and nowTime
labels, be achieved through continuous sliding window reset, new data continues to fill, thereby achieving the window calculation.
to sum up
This article analyzes the go-zero
framework and the realization of sliding window rollingWindow
. Of course, in addition, store/redis
also has indicator statistics. There is no need for sliding window counting in this, because it only needs to calculate the hit rate. Hit +1 for hits, and miss +1 for misses. Index count, and finally count the hit rate.
The sliding window is suitable for calculating indicators in flow control, and it can also control flow.
For more design and implementation articles about go-zero
project address
https://github.com/tal-tech/go-zero
Welcome to use go-zero and star support us!
WeChat Exchange Group
Follow the " practice " public exchange group get the QR code of the community group.
For the go-zero series of articles, please refer to the official account of "Microservice Practice"
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。