Original address: Go Exec Zombies and Orphaned Processes
Recently, I used golang
to manage the life cycle of local applications. There are several interesting points during this period. Let’s take a look at it today.
scene one
Let's see what the problem is with the following two scripts:
Create two shell scripts
- start.sh
#!/bin/sh
sh sub.sh
- sub.sh
#!/bin/sh
n=0
while [ $n -le 100 ]
do
echo $n
let n++
sleep 1
done
execute script
output result
$ ./start.sh
0
1
2
...
process relationship
View process information
ps -j
USER PID PPID PGID SESS JOBC STAT TT TIME COMMAND
root 31758 31346 31758 0 1 S+ s000 0:00.00 /bin/sh ./start.sh
root 31759 31758 31758 0 1 S+ s000 0:00.01 sh sub.sh
- The parent process (PPID) of
sub.sh
is the process id (PID) ofstart.sh
- The
PGID
of the two processessub.sh
andstart.sh
are the same (belonging to a process group).
Delete the process of start.sh
kill -9 31758
# 再查看进程组
ps -j
## 返回
USER PID PPID PGID SESS JOBC STAT TT TIME COMMAND
root 31759 1 31758 0 0 S s000 0:00.03 sh sub.sh
start.sh
process is gonesub.sh
process is still executingsub.sh
ofPID
process becomes 1
Question 1:
What does the process sub.sh
belong to now?
scene two
Suppose sub.sh
is the actual application and start.sh
is the application's startup script.
So how does golang
manage them? Let's continue to look at the scene below about golang
.
On the basis of the above two scripts and , we use the golang
library of os/exec
to call the start.sh
script
package main
import (
"context"
"log"
"os"
"os/exec"
"time"
)
func main() {
cmd := exec.CommandContext(context.Background(), "./start.sh")
// 将 start.sh 和 sub.sh 移到当前目录下
cmd.Dir = "/Go/src/go-code/cmd/"
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Start(); err != nil {
log.Printf("cmd.Start error %+v \n", err)
}
for {
select {
default:
log.Println(cmd.Process.Pid)
time.Sleep(2 * time.Second)
}
}
}
execute program
go run ./main.go
View progress
ps -j
USER PID PPID PGID SESS JOBC STAT TT TIME COMMAND
root 45458 45457 45457 0 0 Ss+ s004 0:00.03 ...___1go_build_go_code_cmd
root 45462 45458 45457 0 0 S+ s004 0:00.01 /bin/sh ./start.sh
root 45463 45462 45457 0 0 S+ s004 0:00.03 sh sub.sh
It is found that the three processes go
, start.sh
and sub.sh
to the same process group (same PGID)
The parent-child relationship is: main.go
-> start.sh
-> sub.sh
Delete the process of start.sh
In the actual scenario, it is possible that the startup program hangs, so that we cannot monitor the execution of the program, delete the start.sh
process, and simulate the following scenario:
kill -9 45462
Check the process again
ps -j
USER PID PPID PGID SESS JOBC STAT TT TIME COMMAND
root 45458 45457 45457 0 0 Ss+ s004 0:00.03 ...___1go_build_go_code_cmd
root 45462 1 45457 0 0 S+ s004 0:00.01 (bash)
root 45463 45462 45457 0 0 S+ s004 0:00.03 sh sub.sh
- Found no,
start.sh
ofPPID
is 1 - Even if
start.sh
ofPPID
becomes 1,log.Println(cmd.Process.Pid)
keeps outputting.
Question 2:
Then if PPID
is 1, is the golang
program unmanageable? Even if sub.sh exits, I don't know, what should I do?
problem analysis
- In both scenarios, there is one thing in common, that is,
PPID
is 1, which is properly a baby no one wants -orphan process
- In Scenario 2, if no process of
cmd
has not been recycled, and thego
program cannot be managed, thenstart.sh
has become a sub-process that occupies the dungeon and does not shit -zombie process
So what exactly are orphan processes and
zombie processes?
orphan process
In the class UNIX
operating system, the orphan process (Orphan Process) refers to a type of process that continues to run after its parent process is executed or terminated.
In order to avoid the death of the orphan process that cannot release the occupied resources when it exits, any orphan process will be automatically accepted as a child process for the system process init
or systemd
immediately when it is generated. This process is also called adoption. It should be noted here that although the process actually has
init
as its parent process, since the process that created the process no longer exists, it should still be called orphan process. Orphaned process will waste server resources and even potentially run out of resources .
Solution & Prevention
- Termination mechanism: forcibly kill orphaned processes (the most common means);
- Regeneration mechanism: The server searches for the calling client within a specified time, and if it is not found, it directly kills the orphan process;
- Timeout mechanism: Specify a certain running time for each process, and force it to terminate if the timeout is not completed. If desired, the process can also request a delay before the specified time expires.
- Process group: Because the termination or crash of the parent process will cause the corresponding child process to become an orphan process, it is also unpredictable whether a child process will be "abandoned" during execution. In view of this, most UNIX-like systems have introduced process groups to prevent orphaned processes.
zombie process
In the class UNIX
operating system, zombie process (zombie process) refers to: complete execution (through the exit system call, or a fatal error or termination signal is received during runtime), but still exists in the process table of the operating system Its process control block, the process in the "terminated state".
Under normal circumstances, the process is directly reclaimed by its parent process wait
and by the system. The zombie process is different from the normal process. kill
command is invalid for the zombie process and cannot be recycled, resulting in resource leak .
Solution & Prevention
The way to harvest a zombie process is to manually send a SIGCHLD signal to its parent process through the kill
command. If its parent process still refuses to harvest the zombie process, terminate the parent process so that the init
process adopts the zombie process. init
process periodically executes the wait
system call to harvest all zombie processes it adopts.
View process details
# 列出进程
ps -l
- USER: the user of the process
- PID: The process ID number of the process
- RSS: The fixed amount of memory occupied by the process (Kbytes)
- S: View process status
- CMD: The actual program corresponding to the process
Process state(s)
- R: Running Runnable (on run queue) is running or waiting in run queue
- S: Sleeping Sleeping, blocked, waiting for the formation of a certain condition or receiving a signal
- I: Idle Idle
- Z: Zombie (a defunct process) process has terminated, but the process descriptor exists, until the parent process calls the wait4() system call to release
- D: Uninterruptible sleep (ususally IO) does not wake up and cannot run when a signal is received, the process must wait until an interrupt occurs
- T: Terminate the Terminate process and stop running after receiving SIGSTOP, SIGSTP, SIGTIN, SIGTOU signals
- P: Waiting for page swap
- W: has no resident pages There are not enough memory pages to allocate
- X: dead process
Go solution
Using the kill process group (kill process group, not just the parent process, which is used in Linux is kill -- -PID
) and the process wait scheme, the results are as follows:
package main
import (
"context"
"log"
"os"
"os/exec"
"syscall"
"time"
)
func main() {
ctx := context.Background()
cmd := exec.CommandContext(ctx, "./start.sh")
// 设置进程组
cmd.SysProcAttr = &syscall.SysProcAttr{
Setpgid: true,
}
cmd.Dir = "/Users/Wilbur/Project/Go/src/go-code/cmd/"
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Start(); err != nil {
log.Printf("cmd.Start error %+v \n", err)
}
// 监听进程wait
errCmdCh := make(chan error, 1)
go func() {
errCmdCh <- cmd.Wait()
}()
for {
select {
case <-ctx.Done():
log.Println("ctx.done")
pid := cmd.Process.Pid
if err := syscall.Kill(-1*pid, syscall.SIGKILL); err != nil {
return
}
case err := <-errCmdCh:
log.Printf("errCmdCh error %+v \n", err)
return
default:
log.Println(cmd.Process.Pid)
time.Sleep(2 * time.Second)
}
}
}
Analysis cmd.Wait()
source code
Under os/exec_unix
:
var (
status syscall.WaitStatus
rusage syscall.Rusage
pid1 int
e error
)
for {
pid1, e = syscall.Wait4(p.Pid, &status, 0, &rusage)
if e != syscall.EINTR {
break
}
}
Conducted syscall.Wait4
to monitor the system, as "the zombie Zombie (a defunct process) process has terminated, but the process descriptor exists until the parent process calls the wait4() system call and releases it", which is consistent.
Summarize
Strictly speaking, the zombie process is not the source of the problem, the culprit is the parent process that spawns a lot of zombie processes.
Therefore, when we seek how to eliminate a large number of zombie processes in the system, we should think about how to avoid the generation of zombie processes in the actual development process.
refer to:
https://cs.opensource.google/go/go/+/refs/tags/go1.17.7:src/syscall/syscall_linux.go;l=279
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。