When the server program is updated or restarted, if we directly kill -9
and start the new process, there will be the following problems:
- The old request has not been processed. If the server process exits directly, the client connection will be interrupted (received
RST
) - The new request came, the service has not been restarted yet, resulting in
connection refused
- Even if you want to exit the program, directly
kill -9
will still interrupt the request being processed
A very direct feeling is that during the restart process, users will not be able to provide normal services for a period of time; at the same time, if the service is rudely shut down, it may also pollute the state services such as the database that the business depends on.
Therefore, during the restart or re-release of our service, we need to seamlessly switch between the new and old services, and at the same time, we can guarantee zero downtime for the !
As a go-zero
framework, how does 060beb31251f40 help developers to exit gracefully? Let's take a look together.
Graceful exit
A problem before a graceful restart first need to solve is how elegant exit :
For the http service, the general idea is to close thefd
oflisten
, make sure that no new requests come in, process the incoming requests, and then exit.
http
is provided in go native server.ShutDown()
, let’s take a look at how it is implemented first:
- Set
inShutdown
flag - Close
listeners
ensure that no new requests will come in - Wait for all active links to become idle
- Exit function, end
Explain the meaning of these steps separately:
inShutdown
func (srv *Server) ListenAndServe() error {
if srv.shuttingDown() {
return ErrServerClosed
}
....
// 实际监听端口;生成一个 listener
ln, err := net.Listen("tcp", addr)
if err != nil {
return err
}
// 进行实际逻辑处理,并将该 listener 注入
return srv.Serve(tcpKeepAliveListener{ln.(*net.TCPListener)})
}
func (s *Server) shuttingDown() bool {
return atomic.LoadInt32(&s.inShutdown) != 0
}
ListenAndServe
is a necessary function for http to start the server. The first sentence in it is to determine whether Server
is closed.
inShutdown
is an atomic variable, non-zero means it is closed.
listeners
func (srv *Server) Serve(l net.Listener) error {
...
// 将注入的 listener 加入内部的 map 中
// 方便后续控制从该 listener 链接到的请求
if !srv.trackListener(&l, true) {
return ErrServerClosed
}
defer srv.trackListener(&l, false)
...
}
Serve
registered to the internal listeners map
in listener
, in ShutDown
can directly from listeners
get into, and then perform listener.Close()
, after four TCP wave, a new request will not be entered.
closeIdleConns
To put it simply: Turn the current Server
into an idle state and return.
shut down
func (srv *Server) Serve(l net.Listener) error {
...
for {
rw, err := l.Accept()
// 此时 accept 会发生错误,因为前面已经将 listener close了
if err != nil {
select {
// 又是一个标志:doneChan
case <-srv.getDoneChan():
return ErrServerClosed
default:
}
}
}
}
When getDoneChan
has closed listener
before, push the channel doneChan
To summarize: Shutdown
can terminate the service gracefully without interrupting the already active link .
But at a certain point after the service is started, how does the program know that the service is interrupted? How to notify the program when the service is interrupted, and then call Shutdown for processing? Next, let's look at the role of the system signal notification function
Service interruption
At this time, it depends on the signal
provided by the OS itself. Corresponding to go native, signal
of Notify
provides the ability of system signal notification.
https://github.com/tal-tech/go-zero/blob/master/core/proc/signals.go
func init() {
go func() {
var profiler Stopper
signals := make(chan os.Signal, 1)
signal.Notify(signals, syscall.SIGUSR1, syscall.SIGUSR2, syscall.SIGTERM)
for {
v := <-signals
switch v {
case syscall.SIGUSR1:
dumpGoroutines()
case syscall.SIGUSR2:
if profiler == nil {
profiler = StartProfile()
} else {
profiler.Stop()
profiler = nil
}
case syscall.SIGTERM:
// 正在执行优雅关闭的地方
gracefulStop(signals)
default:
logx.Error("Got unregistered signal:", v)
}
}
}()
}
SIGUSR1
->goroutine
, this is quite useful for error analysisSIGUSR2
-> Turn on/off all indicator monitoring, control the profiling time by yourselfSIGTERM
-> Really opengracefulStop
, gracefully close
gracefulStop
is as follows:
- Cancel the monitoring signal, after all, it is necessary to quit, there is no need to repeat the monitoring
wrap up
, close the current service request and resourcestime.Sleep()
, wait for the resource processing to complete, and then close itshutdown
, notice to quit- If the main goroutine has not exited, send SIGKILL to exit the process
In this way, the service no longer accepts new requests, and the active requests of the service wait for processing to be completed, and at the same time wait for the resource to be closed (database connection, etc.). If there is a timeout, it will be forced to exit.
Overall process
Our current go programs are all docker
container, so during the service release process, k8s
will send a SIGTERM
signal to the container, and then the program in the container will receive the signal and start executing ShutDown
:
At this point, the entire graceful shutdown process is sorted out.
But there is a smooth restart. This depends on k8s
. The basic process is as follows:
old pod
Before exiting, startnew pod
old pod
continues to process the accepted requests, and no longer accepts new requestsnew pod
accept and process new requestsold pod
exit
In this way, even if the entire service restart is successful, if new pod
does not start successfully, old pod
can also provide services without affecting the current online services.
project address
https://github.com/tal-tech/go-zero
Welcome to use go-zero and star support us!
WeChat Exchange Group
Follow the " practice " public communication group get the QR code of the community group.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。