守护进程, 孤儿进程, 僵尸进程与waitpid

守护进程是在一类脱离终端在后台执行的程序, 通常以d结尾, 随系统启动, 其父进程(ppid)通常是init进程

一般要让当前程序以守护进程形式运行, 在命令后加&并重定向输出即可

$ python someprogram.py > /dev/null 2>&1 &

或者使用nohup也可以
这是直接运行程序的方式, 如果是用具体语言代码的形式来实现呢, 首先看一下守护进程的实现方式

创建子进程, 父进程退出
父进程先于子进程退出会造成子进程成为孤儿进程，而每当系统发现一个孤儿进程时，就会自动由1号进程(init)收养它，这样，原先的子进程就会变成init进程的子进程
在子进程中创建新会话
1. 更改工作目录到/, 以便umount一个文件系统
2. 重设文件权限掩码, 以便拥有完全的写的权限, 即重设继承来的默认文件权限值
3. 调用setuid, 让当前进程成为新的会话组长和进程组长
执行第二次fork
关闭文件描述符, 一般是输入/输出和错误输出, 重定向到/dev/null

py代码
https://gist.github.com/jamiesun/3097215

上面守护进程的生成步骤中涉及到了孤儿进程
任何孤儿进程产生时都会立即为系统进程init自动接收为子进程，这一过程也被称为“收养”. 但由于创建该进程的进程已不存在，所以仍应称之为“孤儿进程”

与之相关的一个概念就是 僵尸进程了. 当子进程退出时, 父进程需要wait/waitpid系统调用来读取子进程的exit status, 然后子进程被系统回收. 如果父进程没有wait的话, 子进程将变成一个"僵尸进程", 内核会释放这个子进程所有的资源，包括打开的文件占用的内存等, 但在进程表中仍然有一个PCB, 记录进程号和退出状态等信息, 并导致进程号一直被占用, 而系统能使用的进程号数量是有限的(可以用ulimit查看相关限制), 如果产生大量僵尸进程的话, 将因为没有可用的进程号而导致系统不能产生新的进程

因此很多自带重启功能的服务实现就是用wait/waitpid实现的.
waitpid()会暂时停止目前进程的执行，直到有信号来到或子进程结束
比如tornado中fork多进程就是这样, 监控子进程的运行状态, 当其意外退出时自动重启子进程

def fork_processes(num_processes, max_restarts=100):
    """Starts multiple worker processes.

    If ``num_processes`` is None or <= 0, we detect the number of cores
    available on this machine and fork that number of child
    processes. If ``num_processes`` is given and > 0, we fork that
    specific number of sub-processes.

    Since we use processes and not threads, there is no shared memory
    between any server code.

    Note that multiple processes are not compatible with the autoreload
    module (or the ``autoreload=True`` option to `tornado.web.Application`
    which defaults to True when ``debug=True``).
    When using multiple processes, no IOLoops can be created or
    referenced until after the call to ``fork_processes``.

    In each child process, ``fork_processes`` returns its *task id*, a
    number between 0 and ``num_processes``.  Processes that exit
    abnormally (due to a signal or non-zero exit status) are restarted
    with the same id (up to ``max_restarts`` times).  In the parent
    process, ``fork_processes`` returns None if all child processes
    have exited normally, but will otherwise only exit by throwing an
    exception.
    """
    global _task_id
    assert _task_id is None
    if num_processes is None or num_processes <= 0:
        num_processes = cpu_count()
    if ioloop.IOLoop.initialized():
        raise RuntimeError("Cannot run in multiple processes: IOLoop instance "
                           "has already been initialized. You cannot call "
                           "IOLoop.instance() before calling start_processes()")
    gen_log.info("Starting %d processes", num_processes)
    children = {}

    def start_child(i):
        pid = os.fork()
        if pid == 0:
            # child process
            _reseed_random()
            global _task_id
            _task_id = i
            return i
        else:
            children[pid] = i
            return None
    for i in range(num_processes):
        id = start_child(i)
        if id is not None:
            return id
    num_restarts = 0
    while children:
        try:
            pid, status = os.wait()
        except OSError as e:
            if errno_from_exception(e) == errno.EINTR:
                continue
            raise
        if pid not in children:
            continue
        id = children.pop(pid)
        if os.WIFSIGNALED(status):
            gen_log.warning("child %d (pid %d) killed by signal %d, restarting",
                            id, pid, os.WTERMSIG(status))
        elif os.WEXITSTATUS(status) != 0:
            gen_log.warning("child %d (pid %d) exited with status %d, restarting",
                            id, pid, os.WEXITSTATUS(status))
        else:
            gen_log.info("child %d (pid %d) exited normally", id, pid)
            continue
        num_restarts += 1
        if num_restarts > max_restarts:
            raise RuntimeError("Too many child restarts, giving up")
        new_id = start_child(id)
        if new_id is not None:
            return new_id
    # All child processes exited cleanly, so exit the master process
    # instead of just returning to right after the call to
    # fork_processes (which will probably just start up another IOLoop
    # unless the caller checks the return value).
    sys.exit(0)

参考: http://bbs.chinaunix.net/thread-4071026-1-1.html

守护进程, 孤儿进程, 僵尸进程与waitpid

quietin

引用和评论

记一次网速不佳的排查过程

rocky linux 使用记录

如何减少跨团队交付摩擦？——基于 DevOps 与敏捷的最佳实践

Anaconda安装教程以及Anaconda和pip配置国内镜像

科学计算编程涉及到的技术栈简介

使用 chardet 判断文件编码需要注意的坑——过大的文件会导致高耗时

Python3 格式化时间（qbit）