10

Threads and processes are the basic concepts of computer operating systems, and they are high-frequency words among programmers. How to understand them? What about processes and threads in Node.js?

file

1. Processes and threads

1.1, professional text definition

  • A process is a running activity of a program in a computer on a certain data set. It is the basic unit of resource allocation and scheduling in the system, and the basis of the operating system structure. A process is a container of threads.
  • Thread is the smallest unit that the operating system can perform operation scheduling. It is included in the process and is the actual operation unit in the process.

1.2, popular understanding

The above description is relatively hard, you may not understand it after reading it, and it is not conducive to understanding memory. So let's take a simple example:

Suppose you are a little brother of a courier site. At first, there are not many residents in the area responsible for this site, and you are the only person who receives the parcels. After delivering the parcels to Zhang Sanjia, and then going to Li Sijia to pick up the parcels, things have to be done one by one, which is called a single thread, and all the work must be executed in order .
Later, there were more residents in this area, and the site assigned several brothers and a team leader to this area. You can serve more residents. This is called multi-threading . The team leader is the main thread . is a thread .
The trolleys and other tools used by the express site are provided by the site and can be used by all the little buddies, not just for one person. This is called multi-threaded resource sharing.
There is currently only one site trolley, and everyone needs to use it. This is called conflict . There are many ways to solve it, waiting in line or waiting for the notification after other brothers have used up, which is called thread synchronization .

file

The head office has many sites, and the operation mode of each site is almost the same, which is called multi-process . The head office is called the main process , and each site is called the child process .
Between the head office and the site, and between each site, the trolleys are independent of each other and cannot be mixed. This is called non-sharing of resources between processes . The stations can be contacted by telephone and other means, which is called a pipeline . There are other collaborative means between sites to facilitate the completion of larger computing tasks, which is called inter-process synchronization .

See also a brief explanation of Ruan Yifeng's processes and threads .

2. Processes and threads in Node.js

Node.js is a single-threaded service, the language features of event-driven and non-blocking I/O model make Node.js efficient and lightweight. The advantage is that it avoids frequent thread switching and resource conflicts; it is good at I/O-intensive operations (the underlying module libuv calls the asynchronous I/O capability provided by the operating system through multi-threading for multi-task execution), but for the server-side Node.js , there may be hundreds of requests to process per second, when faced with CPU-intensive requests, because it is a single-threaded mode, it will inevitably cause blocking.

2.1, Node.js blocking

We use Koa to simply build a web service and use the Fibonacci sequence method to simulate Node.js processing CPU-intensive computing tasks:

The Fibonacci sequence, also known as the golden ratio sequence, starts from the third term, and each term is equal to the sum of the first two: 0, 1, 1, 2, 3, 5, 8, 13, 21, . .....
 // app.js
const Koa = require('koa')
const router = require('koa-router')()
const app = new Koa()

// 用来测试是否被阻塞
router.get('/test', (ctx) => {
    ctx.body = {
        pid: process.pid,
        msg: 'Hello World'
    }
})
router.get('/fibo', (ctx) => {
    const { num = 38 } = ctx.query
    const start = Date.now()
    // 斐波那契数列
    const fibo = (n) => {
        return n > 1 ? fibo(n - 1) + fibo(n - 2) : 1
    }
    fibo(num)

    ctx.body = {
        pid: process.pid,
        duration: Date.now() - start
    }
})

app.use(router.routes())
app.listen(9000, () => {
    console.log('Server is running on 9000')
})

Execute node app.js start the service and send the request with Postman. It can be seen that it takes 617ms to calculate 38 times. In other words, because a CPU-intensive calculation task is executed, the Node.js main thread is blocked. Blocked for more than 600 milliseconds. If more requests are processed at the same time, or if the computational task is more complex, then all requests after these requests will be delayed.

file

We create a new axios.js to simulate sending multiple requests. At this time, the number of fibo calculations in app.js is changed to 43 to simulate more complex computing tasks:

 // axios.js
const axios = require('axios')

const start = Date.now()
const fn = (url) => {
    axios.get(`http://127.0.0.1:9000/${ url }`).then((res) => {
        console.log(res.data, `耗时: ${ Date.now() - start }ms`)
    })
}

fn('test')
fn('fibo?num=43')
fn('test')

file

It can be seen that when a request needs to perform CPU-intensive computing tasks, subsequent requests are blocked and waited. When there are too many such requests, the service is basically blocked. For this shortcoming, Node.js has been making up for it.

2.2, master-worker

The master-worker mode is a parallel mode. The core idea is that when the system has two or more processes or threads working together, the master is responsible for receiving, distributing and integrating tasks, and the workers are responsible for processing tasks.

file

2.3. Multithreading

A thread is a basic unit of CPU scheduling. It can only execute the tasks of one thread at the same time, and the same thread can only be called by one CPU. If you are using a multi-core CPU, you will not be able to take full advantage of the CPU's performance.

Multithreading brings us flexible programming methods, but we need to learn more Api knowledge, and there are more risks when writing more code. Thread switching and locks will also increase the overhead of system resources.

worker_threads is a multi-threaded API provided by Node.js. Useful for performing CPU-intensive computing tasks, less helpful for I/O-intensive operations, because Node.js built-in asynchronous I/O operations are more efficient than worker_threads. Worker and parentPort in worker_threads are mainly used for message interaction between the child thread and the main thread.

Slightly modify app.js to hand over CPU-intensive computing tasks to sub-threads:

 // app.js
const Koa = require('koa')
const router = require('koa-router')()
const { Worker } = require('worker_threads')
const app = new Koa()

// 用来测试是否被阻塞
router.get('/test', (ctx) => {
    ctx.body = {
        pid: process.pid,
        msg: 'Hello World'
    }
})
router.get('/fibo', async (ctx) => {
    const { num = 38 } = ctx.query
    ctx.body = await asyncFibo(num)
})

const asyncFibo = (num) => {
    return new Promise((resolve, reject) => {
        // 创建 worker 线程并传递数据
        const worker = new Worker('./fibo.js', { workerData: { num } })
        // 主线程监听子线程发送的消息
        worker.on('message', resolve)
        worker.on('error', reject)
        worker.on('exit', (code) => {
            if (code !== 0) reject(new Error(`Worker stopped with exit code ${code}`))
        })
    })
}

app.use(router.routes())
app.listen(9000, () => {
    console.log('Server is running on 9000')
})

Added fibo.js file to handle complex computing tasks:

 const { workerData, parentPort } = require('worker_threads')
const { num } = workerData

const start = Date.now()
// 斐波那契数列
const fibo = (n) => {
    return n > 1 ? fibo(n - 1) + fibo(n - 2) : 1
}
fibo(num)

parentPort.postMessage({
    pid: process.pid,
    duration: Date.now() - start
})

Execute the above axios.js, and change the number of fibo calculations in app.js to 43 to simulate more complex computing tasks:

file

It can be seen that when the CPU-intensive computing task is handed over to the sub-thread for processing, the main thread is no longer blocked. It only needs to wait for the sub-thread to complete the processing, and then the main thread can receive the result returned by the sub-thread, and other requests are no longer affected. Influence.
The above code is to demonstrate the process and effect of creating worker threads. In actual development, please use thread pools instead of the above operations, because frequent creation of threads will also have resource overhead.

A thread is a basic unit of CPU scheduling. It can only execute the tasks of one thread at the same time, and the same thread can only be called by one CPU.

Let's recall the descriptions of threads and CPUs mentioned at the beginning of this section. At this time, because they are new threads, they can be executed on other CPU cores, and multi-core CPUs can be more fully utilized.

2.4, multi-process

In order to make full use of the multi-core capability of the CPU, Node.js provides the cluster module. Cluster can implement cluster functions by managing multiple child processes through a parent process.

  • child_process child process , spawns a new Node.js process and calls the specified module using the established IPC communication channel.
  • cluster cluster , you can create child processes that share the server port, and the worker process is derived using the fork method of child_process.

The bottom layer of the cluster is the child_process, the master process is the master process, and starts one agent process and n worker processes. The agent process handles some public affairs, such as logs, etc. The worker process uses the established IPC (Inter-Process Communication) communication channel and the master process. For communication, share the service port with the master process.

file

Add fibo-10.js to simulate sending 10 requests:

 // fibo-10.js
const axios = require('axios')

const url = `http://127.0.0.1:9000/fibo?num=38`
const start = Date.now()

for (let i = 0; i < 10; i++) {
    axios.get(url).then((res) => {
        console.log(res.data, `耗时: ${ Date.now() - start }ms`)
    })
}

It can be seen that only one process is used, 10 requests are slowly blocked, and the cumulative time is 15 seconds:

file

Next, slightly modify app.js and introduce the cluster module:

 // app.js
const cluster = require('cluster')
const http = require('http')
const numCPUs = require('os').cpus().length
// const numCPUs = 10 // worker 进程的数量一般和 CPU 核心数相同
const Koa = require('koa')
const router = require('koa-router')()
const app = new Koa()

// 用来测试是否被阻塞
router.get('/test', (ctx) => {
    ctx.body = {
        pid: process.pid,
        msg: 'Hello World'
    }
})
router.get('/fibo', (ctx) => {
    const { num = 38 } = ctx.query
    const start = Date.now()
    // 斐波那契数列
    const fibo = (n) => {
        return n > 1 ? fibo(n - 1) + fibo(n - 2) : 1
    }
    fibo(num)

    ctx.body = {
        pid: process.pid,
        duration: Date.now() - start
    }
})
app.use(router.routes())

if (cluster.isMaster) {
    console.log(`Master ${process.pid} is running`)
    
    // 衍生 worker 进程
    for (let i = 0; i < numCPUs; i++) {
        cluster.fork()
    }

    cluster.on('exit', (worker, code, signal) => {
        console.log(`worker ${worker.process.pid} died`)
    })
} else {
    app.listen(9000)
    console.log(`Worker ${process.pid} started`)
}

Execute node app.js to start the service, you can see that the cluster helped us create 1 master process and 4 worker processes:

file

file

By simulating sending 10 requests through fibo-10.js, it can be seen that it takes nearly 9 seconds for the four processes to process 10 requests:

file

When starting 10 worker processes, take a look at the effect:

file

It takes less than 3 seconds, but the number of processes is not infinite. In daily development, the number of worker processes is generally the same as the number of CPU cores.

2.5. Multi-process description

Turning on multi-process is not all to deal with high concurrency, but to solve the problem of insufficient utilization of multi-core CPU in Node.js.
The child process derived from the parent process through the fork method has the same resources as the parent process, but they are independent and do not share resources with each other. The number of processes is usually set according to the number of CPU cores, because system resources are limited.

3. Summary

1. Most of the solutions for solving CPU-intensive computing tasks through multi-threading can be replaced by multi-process solutions;
2. Although Node.js is asynchronous, it does not mean that it will not block. It is best not to process CPU-intensive tasks in the main thread to ensure the smooth flow of the main thread;
3. Don't blindly pursue high performance and high concurrency, just meet the needs of the system. Efficiency and agility are what the project needs. This is also the lightweight feature of Node.js.
4. There are many concepts of processes and threads in Node.js that were mentioned in the article but did not go into detail or mentioned, such as: Node.js underlying I/O libuv, IPC communication channels, how to guard multiple processes , How to deal with timed tasks, agent processes, etc. if resources are not shared between processes;
5. The above code can be viewed at https://github.com/liuxy0551/node-process-thread .


袋鼠云数栈UED
277 声望34 粉丝

我们是袋鼠云数栈 UED 团队,致力于打造优秀的一站式数据中台产品。我们始终保持工匠精神,探索前端道路,为社区积累并传播经验价值。