Why do I need timeout control?

A common problem in many cascading failure scenarios is that the server is consuming a lot of resources to process requests that have already exceeded the client's deadline. As a result, the server consumes a lot of resources without doing any valuable work and responds to requests that have timed out. It doesn't make any sense.

Timeout control can be said to be an important line of defense to ensure service stability. Its essence is fail fast. A good timeout control strategy can clear high-latency requests as soon as possible, and release resources as soon as possible to avoid the accumulation of requests.

Timeout between services

If a request has multiple stages, such as a series of RPC calls, then our service should check the deadline before each stage to avoid unnecessary work, that is, check whether there is enough time left to process the request.

A common error implementation is to set a fixed timeout time for each RPC service. We should pass the timeout time between each service. The timeout time can be set at the top layer of the service call. The entire RPC tree triggered by the initial request will be Set the same absolute deadline. For example, set the timeout time at the top level of the service request to 3s. Service A requests service B. Service B takes 1s to execute. Service B then requests service C. At this time, the timeout period is 2s, and service C takes 1s to execute. When service C requests service D again, the execution time of service D is 500ms, and so on. Ideally, the same timeout delivery mechanism is used in the entire call chain.

If the timeout delivery mechanism is not used, the following situations will occur:

  1. Service A sends a request to service B, and the set timeout period is 3s
  2. Service B takes 2 seconds to process the request, and continues to request service C
  3. If timeout delivery is used, the timeout period of service C should be 1s, but no timeout delivery is used here, so the timeout period is 3s hard-coded in the configuration.
  4. Service C continues to execute for 2 seconds. In fact, the timeout period set by the top layer has expired at this time. The following request is meaningless
  5. Continue to request service D

If service B uses a timeout delivery mechanism, then service C should immediately abandon the request, because the deadline has already reached the client, and the client may have reported an error. When we set the timeout delivery, we generally reduce the delivery deadline by a little, such as 100 milliseconds, in order to take into account the network transmission time and the processing time after the client receives the reply.

In-process timeout delivery

Not only need timeout transfer between services, the process also needs timeout transfer. For example, when Mysql, Redis and Service B are called serially in a process, set the total request time to 3s, request Mysql to take 1s and then request Redis again. The timeout period of Redis is 2s. Redis takes 500ms to execute and then requests service B. The timeout period is 1.5s. Because each of our middleware or services will set a fixed timeout period in the configuration file, we need to take the remaining time and Set the minimum value in the time.

Context realizes timeout delivery

The principle of context is very simple, but the function is very powerful. Go's standard library has also implemented support for context, and various open source frameworks have also implemented support for context. Context has become the standard, and timeout delivery also depends on context. accomplish.

We generally set the initial context at the top of the service for timeout control delivery, such as setting the timeout period to 3s

ctx, cancel := context.WithTimeout(context.Background(), time.Second*3)
defer cancel()

When transferring context, such as requesting Redis in the above figure, then obtain the remaining time by the following method, and then compare the timeout time set by Redis to take a smaller time

dl, ok := ctx.Deadline()
timeout := time.Now().Add(time.Second * 3)
if ok := dl.Before(timeout); ok {
    timeout = dl
}

The timeout transfer between services mainly refers to the timeout transfer during RPC calls. For gRPC, we do not need to do additional processing. gRPC itself supports timeout transfer. The principle is similar to the above. It is transferred through metadata and will eventually be transformed. Is the value of grpc-timeout, as shown in the following code grpc-go/internal/transport/handler_server.go:79

if v := r.Header.Get("grpc-timeout"); v != "" {
        to, err := decodeTimeout(v)
        if err != nil {
            return nil, status.Errorf(codes.Internal, "malformed time-out: %v", err)
        }
        st.timeoutSet = true
        st.timeout = to
}

Overtime delivery is an important line of defense to ensure service stability. The principle and implementation are very simple. Is overtime delivery implemented in your framework? If it doesn't, just get started.

Timeout delivery in go-zero

In go-zero, api gateway and rpc Timeout in the configuration file, and it will be automatically passed between services.

The previous understands how to implement Go timeout control There is an explanation on how to use timeout control.

refer to

"SRE: Google Operation and Maintenance Decryption"

project address

https://github.com/zeromicro/go-zero

Welcome to use go-zero and star/fork support us!

WeChat Exchange Group

Follow the " Practice " public account and click on the exchange group get the QR code of the community group.


kevinwan
931 声望3.5k 粉丝

go-zero作者