Introduction

Any program is inseparable from IO, some are obvious IO, such as file reading and writing, and some are not obvious IO, such as network data transmission. So what modes do these IOs have? How should we choose in use? How do the advanced IO models kqueue and epoll work? Let's take a look together.

block IO and nonblocking IO

Let's first understand the two simplest models in the IO model: blocking IO and non-blocking IO.

For example, if we have multiple threads to read data from a Socket server, the reading process can actually be divided into two parts. The first part is to wait for the data of the socket to be prepared, and the second part is to read the corresponding data for business processing. . For blocking IO, its workflow is as follows:

  1. A thread waits for socket channel data to be ready.
  2. When the data is ready, the thread performs program processing.
  3. The other threads continue the above process after waiting for the first thread to finish.

Why is it called blocking IO? This is because when a thread is executing, other threads can only wait, which means that the IO is blocked.

What is non-blocking IO?

Or the above example, if it works like this in non-blocking IO:

  1. A thread attempts to read data from a socket.
  2. If the data in the socket is not ready, return immediately.
  3. The thread continues to try to read the socket's data.
  4. If the data in the socket is ready, then the thread continues to execute subsequent program processing steps.

Why is it called non-blocking IO? This is because the thread will return immediately if it finds that the socket has no data. It will not block the IO operation of this socket.

As can be seen from the above analysis, although non-blocking IO will not block the Socket, it will not release the Socket because it will always poll the Socket.

IO multiplexing and select

There are many models of IO multiplexing, and select is the most common one. In real time, both netty and JAVA's NIO use the select model.

How does the select model work?

In fact, the select model is somewhat similar to non-blocking IO, except that there is a separate thread in the select model dedicated to checking whether the data in the socket is ready. If it is found that the data is ready, select can choose to notify a specific data processing thread through the previously registered event handler.

The advantage of this is that although the select thread itself is blocking, other threads used to actually process data are non-blocking. And a select thread can actually be used to monitor multiple socket connections, thereby improving the processing efficiency of IO, so the select model is applied in many occasions.

In order to understand the principle of select in more detail, let's take a look at the select method under Unix:

int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds, struct timeval *timeout);

Let's first explain the meaning of these parameters. We know that in the Unix system, all objects are files, so fd here means file descriptor, which is a file descriptor.

fds represents file descriptor sets, which is a set of file descriptors.

nfds is an integer value representing the maximum value in the file descriptor set + 1.

readfds is the set of descriptors for file reads to check.

writefds is the set of descriptors for file writes to check.

errorfds is the set of file exception descriptors to check.

timeout is the timeout, indicating the maximum interval to wait for the selection to complete.

It works by polling all file descriptors, and then finding those file descriptors to monitor,

poll

The poll and select classes are very similar, but the way to describe the fd collection is different. Poll is mainly used in POSIX systems.

epoll

In real time, although both select and poll are multiplexed IO, they all have some shortcomings. And epoll and kqueue are their optimizations.

epoll is a system command in the linux system, which can be regarded as event poll. It was first introduced in version 2.5.44 of the linux kernel.

It is mainly used to monitor whether the IO in multiple file descriptors is ready.

For traditional select and poll, because all file descriptors need to be traversed continuously, the execution efficiency of each select is O(n), but for epoll, this time can be improved to O(1).

This is because epoll will trigger notifications when specific monitoring events occur, so there is no need to use polling like select, which will be more efficient.

epoll uses a red-black tree (RB-tree) data structure to keep track of all file descriptors currently being monitored.

epoll has three api functions:

int epoll_create1(int flags);

Used to create an epoll object and return its file descriptor. The passed flags can be used to control the behavior of epoll.

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

This method is used to control epoll, which can be used to monitor specific file descriptors and events.

The op here can be ADD, MODIFY or DELETE.

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

epoll_wait is used to listen for events registered with the epoll_ctl method.

epoll provides two trigger modes, edge-triggered and level-triggered.

If a pipe registered with epoll receives data, calling epoll_wait will return, indicating that there is data to read. But in level-triggered mode, the call to epoll_wait will return as soon as the pipe's buffer contains the data to be read. But in level-triggered mode, epoll_wait will only return after new data has been written to the pipe.

kqueue

Like epoll, kqueue is used to replace select and poll. The difference is that kqueue is used in FreeBSD, NetBSD, OpenBSD, DragonFly BSD, and macOS.

kqueue is not only capable of handling file descriptor events, but can also be used for various other notifications such as file modification monitoring, signals, asynchronous I/O events (AIO), child process state change monitoring, and timers supporting nanosecond resolution, Additionally kqueue provides a way to use user-defined events in addition to the ones provided by the kernel.

kqueue provides two APIs, the first is to build kqueue:

int kqueue(void);

The second is to create the kevent:

int kevent(int kq, const struct kevent *changelist, int nchanges, struct kevent *eventlist, int nevents, const struct timespec *timeout);

The first parameter in kevent is the kqueue to be registered, changelist is the list of events to be monitored, nchanges indicates the length of the event to be monitored, eventlist is the list of events returned by kevent, nevents indicates the length of the event list to be returned, and the last parameter is timeout.

In addition, kqueue has an EV_SET macro used to initialize the kevent structure:

EV_SET(&kev, ident, filter, flags, fflags, data, udata);

Advantages of epoll and kqueue

The reason why epoll and kqueue are more advanced than select and poll is that they make full use of the underlying functions of the operating system. For the operating system, it is certain to know when the data is ready. By registering the corresponding event with the operating system, select can be avoided polling operation to improve operation efficiency.

It should be noted that epoll and kqueue require the support of the underlying operating system, and you must pay attention to the support of the corresponding native libraries when using them.

This article has been included in http://www.flydean.com/14-kqueue-epoll/

The most popular interpretation, the most profound dry goods, the most concise tutorials, and many tricks you don't know are waiting for you to discover!

Welcome to pay attention to my official account: "Program those things", understand technology, understand you better!


flydean
890 声望433 粉丝

欢迎访问我的个人网站:www.flydean.com