3

Original address: Go Exec Zombies and Orphaned Processes

Recently, I used golang to manage the life cycle of local applications. There are several interesting points during this period. Let’s take a look at it today.

scene one

Let's see what the problem is with the following two scripts:

Create two shell scripts

  • start.sh
#!/bin/sh
sh sub.sh
  • sub.sh
#!/bin/sh
n=0
while [ $n -le 100 ]
do
  echo $n
  let n++
  sleep 1
done

execute script

output result

$ ./start.sh 
0
1
2
...

process relationship

View process information

ps -j

USER   PID    PPID   PGID   SESS  JOBC  STAT   TT     TIME     COMMAND
root   31758  31346  31758  0     1     S+     s000   0:00.00  /bin/sh ./start.sh
root   31759  31758  31758  0     1     S+     s000   0:00.01  sh sub.sh
  • The parent process (PPID) of sub.sh is the process id (PID) of start.sh
  • The PGID of the two processes sub.sh and start.sh are the same (belonging to a process group).

Delete the process of start.sh

kill -9 31758

# 再查看进程组
ps -j

## 返回
USER     PID       PPID  PGID     SESS  JOBC   STAT    TT       TIME     COMMAND
root     31759     1     31758    0      0     S       s000     0:00.03  sh sub.sh
  • start.sh process is gone
  • sub.sh process is still executing
  • sub.sh of PID process becomes 1

Question 1:

What does the process sub.sh belong to now?

scene two

Suppose sub.sh is the actual application and start.sh is the application's startup script.

So how does golang manage them? Let's continue to look at the scene below about golang .

On the basis of the above two scripts and , we use the golang library of os/exec to call the start.sh script

package main

import (
    "context"
    "log"
    "os"
    "os/exec"
    "time"
)

func main()  {
    cmd := exec.CommandContext(context.Background(), "./start.sh")

  // 将 start.sh 和 sub.sh 移到当前目录下
    cmd.Dir = "/Go/src/go-code/cmd/"
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    if err := cmd.Start(); err != nil {
        log.Printf("cmd.Start error %+v \n", err)
    }

    for {
        select {
        default:
            log.Println(cmd.Process.Pid)
            time.Sleep(2 * time.Second)
        }
    }
}

execute program

go run ./main.go

View progress

ps -j

USER   PID    PPID   PGID     SESS  JOBC  STAT   TT      TIME     COMMAND
root   45458  45457  45457    0     0     Ss+    s004    0:00.03  ...___1go_build_go_code_cmd
root   45462  45458  45457    0     0     S+     s004    0:00.01  /bin/sh ./start.sh
root   45463  45462  45457    0     0     S+     s004    0:00.03  sh sub.sh

It is found that the three processes go , start.sh and sub.sh to the same process group (same PGID)

The parent-child relationship is: main.go -> start.sh -> sub.sh

Delete the process of start.sh

In the actual scenario, it is possible that the startup program hangs, so that we cannot monitor the execution of the program, delete the start.sh process, and simulate the following scenario:

kill -9 45462

Check the process again

ps -j

USER   PID    PPID   PGID     SESS  JOBC  STAT   TT      TIME     COMMAND
root   45458  45457  45457    0     0     Ss+    s004    0:00.03  ...___1go_build_go_code_cmd
root   45462  1      45457    0     0     S+     s004    0:00.01  (bash)
root   45463  45462  45457    0     0     S+     s004    0:00.03  sh sub.sh
  • Found no, start.sh of PPID is 1
  • Even if start.sh of PPID becomes 1, log.Println(cmd.Process.Pid) keeps outputting.

Question 2:

Then if PPID is 1, is the golang program unmanageable? Even if sub.sh exits, I don't know, what should I do?

problem analysis

  • In both scenarios, there is one thing in common, that is, PPID is 1, which is properly a baby no one wants - orphan process
  • In Scenario 2, if no process of cmd has not been recycled, and the go program cannot be managed, then start.sh has become a sub-process that occupies the dungeon and does not shit - zombie process

So what exactly are orphan processes and zombie processes?

orphan process

In the class UNIX operating system, the orphan process (Orphan Process) refers to a type of process that continues to run after its parent process is executed or terminated.

In order to avoid the death of the orphan process that cannot release the occupied resources when it exits, any orphan process will be automatically accepted as a child process for the system process init or systemd immediately when it is generated. This process is also called adoption. It should be noted here that although the process actually has init as its parent process, since the process that created the process no longer exists, it should still be called orphan process. Orphaned process will waste server resources and even potentially run out of resources .

Solution & Prevention

  1. Termination mechanism: forcibly kill orphaned processes (the most common means);
  2. Regeneration mechanism: The server searches for the calling client within a specified time, and if it is not found, it directly kills the orphan process;
  3. Timeout mechanism: Specify a certain running time for each process, and force it to terminate if the timeout is not completed. If desired, the process can also request a delay before the specified time expires.
  4. Process group: Because the termination or crash of the parent process will cause the corresponding child process to become an orphan process, it is also unpredictable whether a child process will be "abandoned" during execution. In view of this, most UNIX-like systems have introduced process groups to prevent orphaned processes.

zombie process

In the class UNIX operating system, zombie process (zombie process) refers to: complete execution (through the exit system call, or a fatal error or termination signal is received during runtime), but still exists in the process table of the operating system Its process control block, the process in the "terminated state".
Under normal circumstances, the process is directly reclaimed by its parent process wait and by the system. The zombie process is different from the normal process. kill command is invalid for the zombie process and cannot be recycled, resulting in resource leak .

Solution & Prevention

The way to harvest a zombie process is to manually send a SIGCHLD signal to its parent process through the kill command. If its parent process still refuses to harvest the zombie process, terminate the parent process so that the init process adopts the zombie process. init process periodically executes the wait system call to harvest all zombie processes it adopts.

View process details

# 列出进程
ps -l
  • USER: the user of the process
  • PID: The process ID number of the process
  • RSS: The fixed amount of memory occupied by the process (Kbytes)
  • S: View process status
  • CMD: The actual program corresponding to the process

Process state(s)

  • R: Running Runnable (on run queue) is running or waiting in run queue
  • S: Sleeping Sleeping, blocked, waiting for the formation of a certain condition or receiving a signal
  • I: Idle Idle
  • Z: Zombie (a defunct process) process has terminated, but the process descriptor exists, until the parent process calls the wait4() system call to release
  • D: Uninterruptible sleep (ususally IO) does not wake up and cannot run when a signal is received, the process must wait until an interrupt occurs
  • T: Terminate the Terminate process and stop running after receiving SIGSTOP, SIGSTP, SIGTIN, SIGTOU signals
  • P: Waiting for page swap
  • W: has no resident pages There are not enough memory pages to allocate
  • X: dead process

Go solution

Using the kill process group (kill process group, not just the parent process, which is used in Linux is kill -- -PID ) and the process wait scheme, the results are as follows:

package main

import (
    "context"
    "log"
    "os"
    "os/exec"
    "syscall"
    "time"
)

func main() {

    ctx := context.Background()
    cmd := exec.CommandContext(ctx, "./start.sh")
  
    // 设置进程组
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Setpgid: true,
    }

    cmd.Dir = "/Users/Wilbur/Project/Go/src/go-code/cmd/"
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    if err := cmd.Start(); err != nil {
        log.Printf("cmd.Start error %+v \n", err)
    }

    // 监听进程wait
    errCmdCh := make(chan error, 1)
    go func() {
        errCmdCh <- cmd.Wait()
    }()

    for {
        select {
        case <-ctx.Done():
            log.Println("ctx.done")
            pid := cmd.Process.Pid
            if err := syscall.Kill(-1*pid, syscall.SIGKILL); err != nil {
                return
            }
        case err := <-errCmdCh:
            log.Printf("errCmdCh error %+v \n", err)
            return
        default:
            log.Println(cmd.Process.Pid)
            time.Sleep(2 * time.Second)
        }
    }
}

Analysis cmd.Wait() source code

Under os/exec_unix :

var (
    status syscall.WaitStatus
    rusage syscall.Rusage
    pid1   int
    e      error
    )

for {
    pid1, e = syscall.Wait4(p.Pid, &status, 0, &rusage)
    if e != syscall.EINTR {
        break
    }
}

Conducted syscall.Wait4 to monitor the system, as "the zombie Zombie (a defunct process) process has terminated, but the process descriptor exists until the parent process calls the wait4() system call and releases it", which is consistent.

Summarize

Strictly speaking, the zombie process is not the source of the problem, the culprit is the parent process that spawns a lot of zombie processes.

Therefore, when we seek how to eliminate a large number of zombie processes in the system, we should think about how to avoid the generation of zombie processes in the actual development process.

refer to:

https://pkg.go.dev/syscall

https://cs.opensource.google/go/go/+/refs/tags/go1.17.7:src/syscall/syscall_linux.go;l=279

https://pkg.go.dev/os/exec


WilburXu
124 声望29 粉丝

do not go gentle into that good night.


« 上一篇
Go Errors 详解