fuse
Mutual calls and columns between services are very common in microservices. A problem with a downstream service may affect all requests or functions of the caller. This is something we don't want to see. In order to prevent the called service from having problems and then causing the calling service to have problems, the calling service needs to protect itself, and the common means of second protection is fusing.
The principle of fuse
The principle of fusing is similar to the fuse in life, in order to occur when the circuit is short-circuited or overloaded. It will disconnect after reaching a certain condition and threshold. So as to ensure that the electrical appliances in the circuit are not protected from Shanghai.
However, there is also a problem. Once the fuse is disconnected, it needs to be replaced manually before it can work normally again, but it is impossible for us to intervene manually in real time in the running project. Therefore, the fuse mechanism mechanism also needs to detect and judge whether to restore the fuse. There are many kinds of such judgment mechanisms, which can be a probability obtained from the ratio of success and failure in GoogleSre. Or determine whether to recover by the number of successes and failures within the event window.
Fused granularity
In the database, we know that there are table locks and row locks. Smaller granularity locks can prevent data from being affected too much. The same is true in circuit breakers, the granularity of our circuit breakers can be a service, a domain name, or even a specific method. These can be judged according to our business needs, and it does not mean that the finer the granularity, the better.
the state of the fuse
- Closed (closed): no circuit breaker protection is triggered in the closed state, and all requests pass normally
- Open (open): When the error threshold is triggered, it will enter the open state. At this time, all traffic will be throttled and no traffic will be run.
- Half-open: After being in the open state for a period of time, it will try to release a traffic to detect whether the current server can receive new traffic. If there is no problem, it will enter the closed state, and if there is a problem, it will return to open. state
The use of circuit breaker in microservices
This article is not to introduce Fusing Bunsen, but mainly to record the implementation and use of Fusing in various Go microservices. After referring to some articles and my own practice, I will introduce it from three aspects. (This article does not learn too much about the implementation code of the circuit breaker, but only describes its use in the microservice framework)
- Simply use circuit breakers without involving microservices. (hystrix-go)
- The application of circuit breaker in station B microservice framework kratos and how to implement a circuit breaker and use it in Kratos.
- The application of circuit breaker in gozero.
Use of circuit breaker in Go
The more classic time in the circuit breaker is hystrix, and go also has a corresponding version: hystrix-go.
Let's take an example: In this example, we can see that a http service is started using gin in the server function. The request in the first 200ms will return a 500 error, and the subsequent requests will return an http status code of 200.
Then create a fuse, the name of this fuse is test, we can set several parameters
- Timeout : the timeout time of the fuse
- MaxConcurrentRequests : maximum concurrency
- RequestVolumeThreshold: The number of requests in a statistical window within 10 seconds, and when the number of requests is reached, it is judged whether to turn on the fuse
- SleepWindow: After the circuit breaker is activated, how long to try to see if the service is available in milliseconds
- ErrorPercentThreshold: Error percentage. If the ratio of the number of requests to the number of errors within the window time reaches this threshold, the circuit breaker will be activated.
In the client code, we can see that the http service is requested 20 times, and each request consumes 100ms. Then according to this situation, the client returns a 500 error when the money is requested twice, but it does not hit 20% within the window time. Therefore, the circuit breaker is not turned on, and when the request is hit 10 times, the threshold of 20% of the error rate we set is reached. At this time, the circuit breaker will be activated, and after 500 milliseconds, that is, 5 requests, it will make a normal request.
There are two things we need to notice in this example:
- As for the granularity problem mentioned in the previous article, a commandName in hystrix-go is a circuit breaker, which is distinguished according to the name field at the time of creation. We can customize this name according to different businesses and dimensions, it can be a domain name or a specific method, etc. This requires us to implement our own logic.
- In this example, we can find that when the error rate reaches the threshold within the window period, all requests under this command will be cut off. If the granularity is not very fine, all requests in this dimension cannot be sent successfully. This problem can look at the practice of GoogleSre below.
package main
import (
"fmt"
"net/http"
"time"
"github.com/afex/hystrix-go/hystrix"
"github.com/gin-gonic/gin"
"gopkg.in/resty.v1"
)
func server() {
e := gin.Default()
start := time.Now()
e.GET("/ping", func(ctx *gin.Context) {
if time.Since(start) < 201*time.Millisecond {
ctx.String(http.StatusInternalServerError, "pong")
return
}
ctx.String(http.StatusOK, "pong")
})
err := e.Run(":8080")
if err != nil {
fmt.Printf("START SERVER OCCUR ERROR, %s", err.Error())
}
}
func main() {
go server()
hystrix.ConfigureCommand("test", hystrix.CommandConfig{
// 执行 command 的超时时间
Timeout: 10,
// 最大并发量
MaxConcurrentRequests: 100,
// 一个统计窗口 10 秒内请求数量
// 达到这个请求数量后才去判断是否要开启熔断
RequestVolumeThreshold: 10,
// 熔断器被打开后
// SleepWindow 的时间就是控制过多久后去尝试服务是否可用了
// 单位为毫秒
SleepWindow: 500,
// 错误百分比
// 请求数量大于等于 RequestVolumeThreshold 并且错误率到达这个百分比后就会启动熔断
ErrorPercentThreshold: 20,
})
// 模拟20个客户端请求
for i := 0; i < 20; i++ {
_ = hystrix.Do("test", func() error {
resp, _ := resty.New().R().Get("http://localhost:8080/ping")
if resp.IsError() {
return fmt.Errorf("err code: %s", resp.Status())
}
return nil
}, func(err error) error {
fmt.Println("fallback err: ", err)
return err
})
time.Sleep(100 * time.Millisecond)
}
}
The use of fuses in Kratos
Kratos is a set of lightweight Go microservice frameworks open sourced at Station B, including a large number of microservice-related frameworks and tools. It includes logs, service registration discovery, routing load balancing, and of course the more common function of circuit breakers.
interface
Most microservice frameworks provide plug-ins that can be freely replaced. In Kratos, there is a set of circuit breaker implementations by default. If you can't meet your needs and have no needs, you can also implement a set yourself, as long as the class inherits the CircuitBreaker interface, the interface is as follows:
// CircuitBreaker is a circuit breaker.
type CircuitBreaker interface {
Allow() error // 判断请求是否允许发送,如果返回 error 则表示请求被拒绝
MarkSuccess() // 标记请求成功
MarkFailed() // 标记请求失败
}
As long as the above three functions are implemented, the original fuse logic of Kratos can be completely replaced.
Instructions
Use circuit breakers in Client requests:
// http
conn, err := http.NewClient(
context.Background(),
http.WithMiddleware(
circuitbreaker.Client(),
),
http.WithEndpoint("127.0.0.1:8000"),
)
// grpc
conn,err := transgrpc.Dial(
context.Background(),
grpc.WithMiddleware(
circuitbreaker.Client(),
),
grpc.WithEndpoint("127.0.0.1:9000"),
)
GoogleSre overload algorithm
max(0, frac{requests - K * accepts}{requests + 1})
The algorithm is shown above, this formula calculates the probability that the request will be dropped
- requests: the number of requests over a period of time
- accepts: the number of successful requests
- K: Multiplier, the smaller the K, the more aggressive the request is, and the smaller the value is, the easier it is to discard requests , when K is smaller, the value of $requests - K * accepts$ is larger, and the calculated probability is larger, indicating that the probability of the request being discarded is larger.
core function
In the Allow function, the GoogleSre algorithm described above is used. First determine the number of successful and total requests, and determine whether the circuit breaker is triggered by probability.
func (b *sreBreaker) Allow() error {
// 统计成功的请求,和总的请求
success, total := b.summary()
// 计算当前的成功率
k := b.k * float64(success)
if log.V(5) {
log.Info("breaker: request: %d, succee: %d, fail: %d", total, success, total-success)
}
// 统计请求量和成功率
// 如果 rps 比较小,不触发熔断
// 如果成功率比较高,不触发熔断,如果 k = 2,那么就是成功率 >= 50% 的时候就不熔断
if total < b.request || float64(total) < k {
if atomic.LoadInt32(&b.state) == StateOpen {
atomic.CompareAndSwapInt32(&b.state, StateOpen, StateClosed)
}
return nil
}
if atomic.LoadInt32(&b.state) == StateClosed {
atomic.CompareAndSwapInt32(&b.state, StateClosed, StateOpen)
}
// 计算一个概率,当 dr 值越大,那么被丢弃的概率也就越大
// dr 值是,如果失败率越高或者是 k 值越小,那么它越大
dr := math.Max(0, (float64(total)-k)/float64(total+1))
drop := b.trueOnProba(dr)
if log.V(5) {
log.Info("breaker: drop ratio: %f, drop: %t", dr, drop)
}
if drop {
return ecode.ServiceUnavailable
}
return nil
}
// 通过随机来判断是否需要进行熔断
func (b *sreBreaker) trueOnProba(proba float64) (truth bool) {
b.randLock.Lock()
truth = b.r.Float64() < proba
b.randLock.Unlock()
return
}
One thing to note is that the circuit breaker in Kratos is encapsulated by a Group object, and the dimension of the requested path is used as the dimension of the circuit breaker. For example, a request is http://192.168.0.1:8080/helloworld.v1.Greeter/SayHello , then the dimension of this circuit breaker is helloworld.v1.Greeter/SayHello. Each such string is a circuit breaker, which is stored through a map. We can look at the definition of the group object
type Group struct {
new func() interface{}
vals map[string]interface{}
sync.RWMutex
}
How to implement a custom circuit breaker and use it in Kratos
First of all, we need to implement the Kratos interface CircuitBreaker for defining circuit breakers. Let's take a look at the simplest implementation, in which there is no algorithm but a simple implementation:
package mybreak
import (
"context"
"github.com/go-kratos/kratos/v2/errors"
"github.com/go-kratos/kratos/v2/middleware"
)
var ErrNotAllowed = errors.New(503, "MyBreak", "request failed due to circuit breaker triggered")
type MyBreaker struct {
Count int
}
func NewMyBreaker() *MyBreaker {
r := &MyBreaker{Count: 0}
return r
}
func (mb *MyBreaker) Allow() error {
return nil
}
func (mb *MyBreaker) MarkSuccess() {
}
func (mb *MyBreaker) MarkFailed() {
}
func Client() middleware.Middleware {
opt := NewMyBreaker()
return func(handler middleware.Handler) middleware.Handler {
return func(ctx context.Context, req interface{}) (interface{}, error) {
if opt.Count > 10 {
return nil, ErrNotAllowed
}
reply, err := handler(ctx, req)
if err != nil && (errors.IsInternalServer(err) || errors.IsServiceUnavailable(err) || errors.IsGatewayTimeout(err)) {
opt.MarkFailed()
} else {
opt.MarkSuccess()
}
return reply, err
}
}
}
As you can see from the above code, we only need to implement the three functions in the interface. After encapsulating a middleware, you can perfectly replace the default fuse implemented in Kratos. Note again that the above implementation does not have any logic in it, just to practice replacing the circuit breaker in the original framework.
Application of fuse in gozero
gozero is also a go language microservice framework that everyone is paying more attention to now, and now the git star of gozero has reached 19k. The community is also very active, so there will definitely be circuit breakers in this microservice framework, and it also has its own default implementation. It is worth mentioning that the fuse of gozero is also implemented based on googlesre.
The circuit breaker of gozero is implemented based on the interceptor in the framework. We know that the circuit breaker is mainly used to protect the caller. The caller needs to go through the circuit breaker when initiating a request, and the client interceptor just has this function, so in the zRPC framework, the circuit breaker is implemented in the client inside the end interceptor. The schematic diagram of gozero's interceptor is as follows:
The specific code is implemented as:
func BreakerInterceptor(ctx context.Context, method string, req, reply interface{},
cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
// 基于请求方法进行熔断
breakerName := path.Join(cc.Target(), method)
return breaker.DoWithAcceptable(breakerName, func() error {
// 真正发起调用
return invoker(ctx, method, req, reply, cc, opts...)
// codes.Acceptable判断哪种错误需要加入熔断错误计数
}, codes.Acceptable)
}
This article will not describe too much how the circuit breaker is implemented in gozero. The core is to use the Google sre algorithm.
How to implement your own circuit breaker in gozero
I think in gozero, you can imitate gozero's own implementation to make a circuit breaker that implements your own logic. Or you can use the function of the interceptor to implement it directly in the breakerInterceptor, for example, it is possible to implement it directly with hystrix-go.
Reference for this article:
gozero
go microservice fuse
Kratos
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。