Take you to thoroughly understand the high-performance network modes Reactor and Proactor

Abstract: Whether it is Reactor or Proactor, it is a network programming mode based on "event distribution". The difference is that Reactor mode is based on "to be completed" I/O events, while Proactor mode is based on " Completed" I/O event.

This article is shared from Huawei Cloud Community "High-Performance Network Framework: Reactor and Proactor" , the original author: Kobayashi coding.

This time, I will illustrate the two high-performance network modes of Reactor and Proactor.

Don't underestimate these two things, especially the Reactor mode. Many common open source software on the market adopt this solution, such as Redis, Nginx, Netty, etc., so learning the design ideas of this mode will not only help us understand a lot Open source software, but also can be used in the interview.

Start!

Evolution

If you want the server to serve multiple clients, the most straightforward way is to create a thread for each connection.

In fact, it is also possible to create a process. The principle is the same. The difference between a process and a thread is that the thread is lighter, and the cost of creating a thread and switching between threads is smaller. For the sake of description, the thread is used as an example in the following.

After the business logic is processed, the thread will also be destroyed as the connection is closed. However, creating and destroying threads continuously will not only bring performance overhead, but also waste resources, and if you want to connect tens of thousands of connections, It is also unrealistic to create tens of thousands of threads to deal with.

How to solve this problem like this? We can use the "resource reuse" approach.

That is, instead of creating threads for each connection, you create a "thread pool" to assign connections to threads, and then one thread can handle the business of multiple connections.

However, this raises a new question, how can threads efficiently handle the business of multiple connections?

When a connection corresponds to a thread, the thread generally adopts the processing flow of "read -> business processing -> send". If the current connection has no data to read, the thread will be blocked on the read operation (socket defaults to block I/O ), but this blocking method does not affect other threads.

But with the introduction of the thread pool, then a thread has to handle the business of multiple connections. When the thread is processing the read operation of a connection, if it encounters no data to read, it will be blocked, and the thread cannot continue to process other connections. Business.

To solve this problem, the easiest way is to change the socket to non-blocking, and then the thread continuously polls and calls the read operation to determine whether there is data. Although this method should solve the blocking problem, the solution is relatively crude , Because polling consumes CPU, and as a thread handles more connections, the efficiency of polling will be lower.

The problem above is that the thread does not know whether the current connection has data to read, so it needs to test it through read each time.

Is there a way for the thread to initiate a read request only when there is data on the connection? The answer is yes. The realization of this technology is I/O multiplexing.

The I/O multiplexing technology uses a system call function to monitor all the connections we care about, which means that many connections can be monitored in one monitoring thread.

The select/poll/epoll we are familiar with is the multiplexed system call provided by the kernel to the user mode. A thread can obtain multiple events from the kernel through a system call function.

PS: If you want to know the difference between select/poll/epoll, you can read this article written by Xiaolin: This time, promise me, I/O multiplexing in one fell swoop!

How does select/poll/epoll get network events?

When getting the event, first pass the connection we care about to the kernel, and then the kernel will check:

If no event occurs, the thread only needs to block in this system call, instead of calling the read operation in turn like the previous thread pool scheme to determine whether there is data.
If an event occurs, the kernel will return the connection that generated the event, and the thread will return from the blocked state, and then process the services corresponding to these connections in the user mode.

Is the reason why open source software can achieve high network performance at present is I/O multiplexing?

Yes, it is basically based on I/O multiplexing. Students who have used the I/O multiplexing interface to write network programs must know that the code is written in a process-oriented way. Such development is not efficient.

Therefore, based on the object-oriented thinking, the big guys used I/O multiplexing as a layer of encapsulation, so that users do not need to consider the details of the underlying network API, but only need to pay attention to the writing of application code.

The big guys also gave this mode a name that is difficult to understand for the first time: Reactor mode.

Reactor translates to "reactor". People may think of nuclear reactors in physics, but they don't actually mean it.

The reaction here refers to "reaction to an event", that is, when an event comes, Reactor has a corresponding reaction/response.

In fact, the Reactor mode is also called the Dispatcher mode. I think this name fits the meaning of the mode better, that is, I/O multiplexing monitor events. After receiving the event, it is assigned (Dispatch) to a certain process according to the event type. Thread.

The Reactor mode is mainly composed of two core parts: Reactor and processing resource pool. They are responsible for the following:

Reactor is responsible for monitoring and distributing events. Event types include connection events and read-write events;
The processing resource pool is responsible for processing events, such as read -> business logic -> send;

The Reactor mode is flexible and can cope with different business scenarios. The flexibility lies in:

The number of Reactor can be only one or more;
The processing resource pool can be a single process/thread or multiple processes/threads;

Set the above two factors permutation and group, in theory, there can be 4 options:

Single Reactor single process/thread;
Single Reactor multi-process/thread;
Multi-Reactor single process/thread;
Multi-Reactor multi-process/thread;

Among them, the "multi-reactor single process/thread" implementation scheme is not only complicated but also has no performance advantage compared to the "single Reactor single process/thread" scheme, so it is not applied in practice.

The remaining 3 schemes are relatively classic, and they are all applied in actual projects:

Single Reactor single process/thread;
Single Reactor multi-thread/process;
Multi-Reactor multi-process/thread;

The specific process or thread used in the plan depends on the programming language and platform used:

Java language generally uses threads, such as Netty;
C language can use processes and threads. For example, Nginx uses processes and Memcache uses threads.

Next, introduce these three classic Reactor schemes respectively.

Reactor

Single Reactor Single Process/Thread

Generally speaking, C language implements the "single Reactor single process" solution, because the program written in C language is an independent process after running, and there is no need to create threads in the process.

The Java language implements the "single Reactor single thread" solution, because the Java program runs on the Java virtual machine process. There are many threads in the virtual machine, and the Java program we write is just one of the threads.

Let's take a look at the schematic diagram of the "single Reactor single process" scheme:

You can see that there are three objects: Reactor, Acceptor, and Handler in the process:

The role of the Reactor object is to monitor and distribute events;
The role of the Acceptor object is to get the connection;
The role of the Handler object is to handle business;

The select, accept, read, and send in the object are system call functions, dispatch and "business processing" are operations that need to be completed, and dispatch is the dispatch event operation.

Next, introduce the "single Reactor single process" solution:

The Reactor object listens to events through select (IO multiplexing interface), and distributes them through dispatch after receiving the event. Whether it is distributed to the Acceptor object or the Handler object depends on the type of event received;
If it is a connection establishment event, it will be handled by the Acceptor object. The Acceptor object will obtain the connection through the accept method and create a Handler object to handle subsequent response events;
If it is not a connection establishment event, the Handler object corresponding to the current connection will respond;
The Handler object completes the complete business process through the read -> business processing -> send process.

The single-reactor single-process solution is relatively simple to implement because all the work is completed in the same process, and there is no need to consider inter-process communication, and there is no need to worry about multi-process competition.

However, this scheme has two disadvantages:

The first disadvantage is that because there is only one process, the performance of the multi-core CPU cannot be fully utilized;
The second disadvantage is that when the Handler object is in business processing, the entire process cannot handle other connected events. If the business processing takes a long time, it will cause a delay in response;

Therefore, the single Reactor single process solution is not suitable for computer-intensive scenarios, but only for scenarios where business processing is very fast.

Redis is implemented by the C language. It uses the "single Reactor single process" solution. Because Redis business processing is mainly completed in memory, the operation speed is very fast, and the performance bottleneck is not on the CPU, so Redis is The processing of commands is a single-process solution.

Single Reactor multi-threaded / multi-process

If you want to overcome the shortcomings of the "single-reactor single-thread/process" solution, then you need to introduce multi-thread/multi-process, which results in a single-reactor multi-thread/multi-process solution.

It's better to look at the picture after hearing its name. Let's first take a look at the schematic diagram of the "single Reactor multi-threaded" solution as follows:

Talk about this plan in detail:

The Reactor object listens to events through select (IO multiplexing interface), and distributes them through dispatch after receiving the event. Whether it is distributed to the Acceptor object or the Handler object depends on the type of event received;
If it is a connection establishment event, it will be handled by the Acceptor object. The Acceptor object will obtain the connection through the accept method and create a Handler object to handle subsequent response events;
If it is not a connection establishment event, the Handler object corresponding to the current connection will respond;

The above three steps are the same as the single-reactor single-threaded solution, and the next steps are different:

The Handler object is no longer responsible for business processing, but is only responsible for data receiving and sending. After the Handler object reads the data through read, it will send the data to the Processor object in the child thread for business processing;
The Processor object in the child thread performs business processing. After processing, the result is sent to the Handler object in the main thread, and then the Handler sends the response result to the client through the send method;

The advantage of the single-reator multi-threading solution is that it can make full use of the power of the multi-core CPU. Since the introduction of multi-threading, it naturally brings about the problem of multi-thread competition for resources.

For example, after the child thread completes the business processing, the result must be passed to the Reactor of the main thread for sending, which involves the competition of shared data.

In order to avoid the problem of data confusion caused by multi-thread competition for shared resources, it is necessary to add a mutex before operating the shared resource to ensure that only one thread is operating the shared resource at any time, and the mutex is released after the thread is operated. Later, other threads have the opportunity to manipulate the shared data.

After talking about the single-Reactor multi-threaded solution, let's take a look at the single-Reactor multi-process solution.

In fact, single Reactor multi-process is more troublesome than single-Reactor multi-threading, mainly because the two-way communication of the child process <-> parent process must be considered, and the parent process must also know which client the child process wants to send data to.

While data can be shared between multiple threads, although additional considerations need to be given to concurrency, this is far less complex than inter-process communication. Therefore, there is no single Reactor multi-process model in practical applications.

In addition, there is a problem with the "single Reactor" model, because a Reactor object is responsible for monitoring and responding to all events and only runs in the main thread. In the face of instantaneous high concurrency scenarios, it is likely to become a performance bottleneck. .

Multi-Reactor Multi-Process/Thread

To solve the "single Reactor" problem, it is to implement the "single Reactor" into a "multiple Reactor", which results in a multi-process/thread solution with the largest Reactor.

The old rules, it is better to look at the pictures after hearing the name. The schematic diagram of the multi-reactor multi-process/thread scheme is as follows (take thread as an example):

The plan is described in detail as follows:

The MainReactor object in the main thread monitors the connection establishment event through select, and obtains the connection through the accept in the Acceptor object after receiving the event, and assigns the new connection to a child thread;
The SubReactor object in the sub-thread adds the connection allocated by the MainReactor object to the select to continue monitoring, and creates a Handler to handle connection response events.
If a new event occurs, the SubReactor object will call the Handler object corresponding to the current connection to respond.
The Handler object completes the complete business process through the read -> business processing -> send process.

Although the multi-reactor multi-threaded solution looks complicated, the actual implementation is much simpler than the single-reactor multi-threaded solution. The reasons are as follows:

The main thread and the child thread have a clear division of labor. The main thread is only responsible for receiving new connections, and the child thread is responsible for completing subsequent business processing.
The interaction between the main thread and the sub-thread is very simple. The main thread only needs to pass the new connection to the sub-thread. The sub-thread does not need to return data, and can directly send the processing result to the client in the sub-thread.

The two well-known open source software Netty and Memcache have adopted the "multi-reactor multi-threading" solution.

The open source software that uses the "multi-reactor multi-process" solution is Nginx, but the solution is somewhat different from the standard multi-reactor multi-process.

The specific difference is that the main process is only used to initialize the socket, and the mainReactor is not created to accept the connection. Instead, the Reactor of the child process accepts the connection. Only one child process accepts at a time is controlled by the lock (to prevent the phenomenon of shocking groups). After the child process accepts the new connection, it is placed in its own Reactor for processing, and will not be allocated to other child processes.

Proactor
The Reactor mentioned earlier is a non-blocking synchronous network mode, while Proactor is an asynchronous network mode.

Let me review the concepts of blocking, non-blocking, synchronous, and asynchronous I/O.

Let's take a look at blocking I/O first. When the user program executes read, the thread will be blocked. It will wait until the kernel data is ready and copy the data from the kernel buffer to the application buffer. When the copy process is completed, read Will return.

Note that the blocking and waiting are the two processes of "kernel data ready" and "data copied from kernel mode to user mode". The process is as follows:

Knowing the blocking I/O, let’s take a look at the non-blocking I/O. The non-blocking read request returns immediately when the data is not ready, and can continue to execute. At this time, the application keeps polling the kernel until the data is ready Well, the kernel copies the data to the application buffer, and the result can be obtained only by the read call. The process is as follows:

Note that the last read call here, the process of obtaining data, is a synchronous process and a process that needs to be waited for. The synchronization here refers to the process of copying the data in the kernel mode to the buffer area of the user program.

For example, if the O_NONBLOCK flag is set on the socket, it means that the non-blocking I/O access method is used. Without any settings, the default is blocking I/O.

Therefore, regardless of whether read and send are blocking I/O or non-blocking I/O, they are all synchronous calls. Because the process of copying data from kernel space to user space by the kernel during the read call requires waiting, that is to say, this process is synchronous. If the copy efficiency of the kernel implementation is not high, the read call will be in this synchronization process. Wait a long time.

The real asynchronous I/O is the two processes of "kernel data ready" and "data copy from kernel mode to user mode" without waiting.

When we initiate aio_read (asynchronous I/O), we return immediately. The kernel automatically copies the data from the kernel space to the user space. This copy process is also asynchronous. The kernel automatically completes it. It is different from the previous synchronous operation. The program does not need to actively initiate the copy action. The process is as follows:

For example, when you go to the dining hall to eat, you are like an application, and the dining hall is like an operating system.

Blocking I/O is like, you go to the dining hall to eat, but the food in the dining hall is not ready yet, and then you wait and wait for a long time, and finally waited until the auntie in the dining hall brought the food out (data Preparation process), but you have to continue to wait for the aunt to put the food (kernel space) into your lunch box (user space). After these two processes, you can leave.

Non-blocking I/O For example, you go to the dining hall and ask Auntie if the dishes are ready. If the aunt tells you if the dishes are ready, you leave. After a few dozen minutes, you come to the dining hall and ask your aunt again, and the aunt says it is done. So the aunt will help you put the food into your lunch box. You have to wait for this process.

Asynchronous I/O is like, you ask the aunt in the dining room to prepare the food and put the food in the lunch box, and then send the lunch box to you. You don't need to wait for the whole process.

Obviously, asynchronous I/O has better performance than synchronous I/O, because asynchronous I/O does not have to wait in the two processes of "kernel data preparation" and "data copying from kernel space to user space".

Proactor uses asynchronous I/O technology, so it is called the asynchronous network model.

Now we come to understand the difference between Reactor and Proactor, it will be clearer.

Reactor is a non-blocking synchronous network mode, and it senses ready, read-write events. Every time an event is sensed (such as a readable ready event), the application process needs to actively call the read method to complete the data reading, that is, the application process actively reads the data in the socket receiving buffer to the application process memory In this process, this process is synchronous, and the application process can process the data after reading the data.
Proactor is an asynchronous network mode, and it senses completed read and write events. When initiating an asynchronous read and write request, you need to pass in information such as the address of the data buffer (used to store the result data), so that the system kernel can automatically help us complete the data read and write work, where the entire read and write work is performed by the operation The system does not require the application process to initiate read/write to read and write data, like Reactor does. After the operating system completes the read and write work, it will notify the application process to directly process the data.

Therefore, Reactor can be understood as "the operating system comes to notify the application process of the event, and let the application process handle it", and Proactor can be understood as "the event comes to the operating system to handle it, and then the application process is notified after processing." The "event" here refers to I/O events with new connections, data readable, and data writable. The "processing" here includes reading from the driver to the kernel and reading from the kernel to the user space.

To give an example in real life, the Reactor mode is that the courier is downstairs and calls you to tell you that the courier has arrived in your community. You need to go downstairs to get the courier. In Proactor mode, the courier directly delivers the courier to your door and then informs you.

Whether it is Reactor or Proactor, it is a network programming mode based on "event distribution". The difference is that Reactor mode is based on "to be completed" I/O events, while Proactor mode is based on "completed" I/O events. O event.

Next, take a look at the schematic diagram of the Proactor mode:

Introduce the workflow of Proactor mode:

Proactor Initiator is responsible for creating Proactor and Handler objects, and passing both Proactor and Handler
Register the Asynchronous Operation Processor to the kernel;
The Asynchronous Operation Processor is responsible for processing registration requests and handling I/O operations;
The Asynchronous Operation Processor notifies Proactor after completing the I/O operation;
Proactor calls back different Handlers for business processing according to different event types;
Handler completes business processing;

Unfortunately, the asynchronous I/O under Linux is imperfect. The aio series of functions are asynchronous operation interfaces defined by POSIX. They are not supported by the real operating system level, but are simulated in user space. They are only Supports aio asynchronous operation based on local files, sockets in network programming are not supported, which also makes Linux-based high-performance network programs use Reactor solutions.

Windows has implemented a complete set of asynchronous programming interfaces that support sockets. This set of interfaces is IOCP, which is asynchronous I/O implemented at the operating system level. In the true sense, asynchronous I/O is implemented. Therefore, a high-performance network is implemented in Windows. The program can use the more efficient Proactor program.

to sum up

There are three common Reactor implementation schemes.

The first solution is single Reactor single process/thread, without considering the issues of inter-process communication and data synchronization, so it is relatively simple to implement. The disadvantage of this solution is that it cannot make full use of multi-core CPUs, and the time for processing business logic cannot be too long. Otherwise, the response will be delayed, so it is not suitable for computer-intensive scenarios, and is suitable for scenarios with fast business processing. For example, Redis uses a single Reactor single process solution.

The second solution is single-Reactor multi-threading, which solves the shortcomings of solution one through multi-threading, but it is still a little far from high concurrency. The difference is that there is only one Reactor object to undertake the monitoring and response of all events, and it is only in the main Running in threads can easily become a performance bottleneck when faced with instantaneous high concurrency scenarios.

The third solution is multi-reactor and multi-process/thread. The defect of solution two is solved through multiple Reactor. The main Reactor is only responsible for monitoring events, and the work of responding to events is handed over to the slave Reactor. Netty and Memcache all adopt "multiple Reactor Nginx adopts a solution similar to "multi-reactor multi-process".

Reactor can be understood as "the operating system comes to notify the application process of the event, and let the application process handle it", while Proactor can be understood as "the event comes to the operating system to handle it, and then the application process is notified after processing."

Therefore, the real big killer is Proactor. It is an asynchronous network model implemented by asynchronous I/O. It perceives completed read and write events. It does not need to call read from the kernel after sensing the event like Reactor. Get data in.

However, both Reactor and Proactor are both a network programming model based on "event distribution". The difference is that Reactor mode is based on "to be completed" I/O events, while Proactor mode is based on "completed" I/O events.