6

When the server program is updated or restarted, if we directly kill -9 and start the new process, there will be the following problems:

  1. The old request has not been processed. If the server process exits directly, the client connection will be interrupted (received RST )
  2. The new request came, the service has not been restarted yet, resulting in connection refused
  3. Even if you want to exit the program, directly kill -9 will still interrupt the request being processed

A very direct feeling is that during the restart process, users will not be able to provide normal services for a period of time; at the same time, if the service is rudely shut down, it may also pollute the state services such as the database that the business depends on.

Therefore, during the restart or re-release of our service, we need to seamlessly switch between the new and old services, and at the same time, we can guarantee zero downtime for the !

As a go-zero framework, how does 060beb31251f40 help developers to exit gracefully? Let's take a look together.

Graceful exit

A problem before a graceful restart first need to solve is how elegant exit :

For the http service, the general idea is to close the fd of listen , make sure that no new requests come in, process the incoming requests, and then exit.

http is provided in go native server.ShutDown() , let’s take a look at how it is implemented first:

  1. Set inShutdown flag
  2. Close listeners ensure that no new requests will come in
  3. Wait for all active links to become idle
  4. Exit function, end

Explain the meaning of these steps separately:

inShutdown

func (srv *Server) ListenAndServe() error {
    if srv.shuttingDown() {
        return ErrServerClosed
    }
    ....
    // 实际监听端口;生成一个 listener
    ln, err := net.Listen("tcp", addr)
    if err != nil {
        return err
    }
    // 进行实际逻辑处理,并将该 listener 注入
    return srv.Serve(tcpKeepAliveListener{ln.(*net.TCPListener)})
}

func (s *Server) shuttingDown() bool {
    return atomic.LoadInt32(&s.inShutdown) != 0
}

ListenAndServe is a necessary function for http to start the server. The first sentence in it is to determine whether Server is closed.

inShutdown is an atomic variable, non-zero means it is closed.

listeners

func (srv *Server) Serve(l net.Listener) error {
    ...
    // 将注入的 listener 加入内部的 map 中
    // 方便后续控制从该 listener 链接到的请求
    if !srv.trackListener(&l, true) {
        return ErrServerClosed
    }
    defer srv.trackListener(&l, false)
    ...
}

Serve registered to the internal listeners map in listener , in ShutDown can directly from listeners get into, and then perform listener.Close() , after four TCP wave, a new request will not be entered.

closeIdleConns

To put it simply: Turn the current Server into an idle state and return.

shut down

func (srv *Server) Serve(l net.Listener) error {
  ...
  for {
    rw, err := l.Accept()
    // 此时 accept 会发生错误,因为前面已经将 listener close了
    if err != nil {
      select {
      // 又是一个标志:doneChan
      case <-srv.getDoneChan():
        return ErrServerClosed
      default:
      }
    }
  }
}

When getDoneChan has closed listener before, push the channel doneChan

To summarize: Shutdown can terminate the service gracefully without interrupting the already active link .

But at a certain point after the service is started, how does the program know that the service is interrupted? How to notify the program when the service is interrupted, and then call Shutdown for processing? Next, let's look at the role of the system signal notification function

Service interruption

At this time, it depends on the signal provided by the OS itself. Corresponding to go native, signal of Notify provides the ability of system signal notification.

https://github.com/tal-tech/go-zero/blob/master/core/proc/signals.go
func init() {
  go func() {
    var profiler Stopper
    
    signals := make(chan os.Signal, 1)
    signal.Notify(signals, syscall.SIGUSR1, syscall.SIGUSR2, syscall.SIGTERM)

    for {
      v := <-signals
      switch v {
      case syscall.SIGUSR1:
        dumpGoroutines()
      case syscall.SIGUSR2:
        if profiler == nil {
          profiler = StartProfile()
        } else {
          profiler.Stop()
          profiler = nil
        }
      case syscall.SIGTERM:
        // 正在执行优雅关闭的地方
        gracefulStop(signals)
      default:
        logx.Error("Got unregistered signal:", v)
      }
    }
  }()
}
  • SIGUSR1 -> goroutine , this is quite useful for error analysis
  • SIGUSR2 -> Turn on/off all indicator monitoring, control the profiling time by yourself
  • SIGTERM -> Really open gracefulStop , gracefully close

gracefulStop is as follows:

  1. Cancel the monitoring signal, after all, it is necessary to quit, there is no need to repeat the monitoring
  2. wrap up , close the current service request and resources
  3. time.Sleep() , wait for the resource processing to complete, and then close it
  4. shutdown , notice to quit
  5. If the main goroutine has not exited, send SIGKILL to exit the process

In this way, the service no longer accepts new requests, and the active requests of the service wait for processing to be completed, and at the same time wait for the resource to be closed (database connection, etc.). If there is a timeout, it will be forced to exit.

Overall process

Our current go programs are all docker container, so during the service release process, k8s will send a SIGTERM signal to the container, and then the program in the container will receive the signal and start executing ShutDown :

At this point, the entire graceful shutdown process is sorted out.

But there is a smooth restart. This depends on k8s . The basic process is as follows:

  • old pod Before exiting, start new pod
  • old pod continues to process the accepted requests, and no longer accepts new requests
  • new pod accept and process new requests
  • old pod exit

In this way, even if the entire service restart is successful, if new pod does not start successfully, old pod can also provide services without affecting the current online services.

project address

https://github.com/tal-tech/go-zero

Welcome to use go-zero and star support us!

WeChat Exchange Group

Follow the " practice " public communication group get the QR code of the community group.


kevinwan
931 声望3.5k 粉丝

go-zero作者