1

In a distributed architecture, network communication is the underlying foundation. Without a network, there is no so-called distributed architecture. Only through the network can a large number of machines cooperate with each other to accomplish one thing together.

Similarly, in a large-scale system architecture, the application throughput will not go up, the network has communication delays, and the first thing we consider is the network problem, so the importance of the network is self-evident.

As a modern application programmer, it is very simple to develop a network communication application. Not only have mature APIs, but also a very convenient communication framework.

You may have forgotten the importance of network communication. This article will analyze the underlying principles of network communication in detail! !

1.1 Understand the nature of communication

As shown in Figure 1-1, when we visit a website through a browser, the website will render the visited content after a period of time. How is this process achieved?

image-20210821151126835

<center>Figure 1-1</center>

I want to stand today. All the students who are doing it know that it is based on the http protocol to realize data communication. There are two very important words here, that is, "protocol".

To achieve data communication between two computers, the same protocol must be followed. Otherwise, it is like when a Chinese person communicates with a foreigner, one speaks English and the other speaks Chinese, it will definitely be impossible to communicate normally. In computers, protocols are very common.

1.1.1 The composition of the agreement

The Java code we write can be understood and executed by the computer. The reason is that the human and the computer follow the same language, which is Java. As shown in Figure 1-2, the process of compiling .java files into .class files is also The agreement is also involved.

image-20210821151817974

<center>Figure 1-2 java compilation process</center>

Therefore, in the computer, the protocol refers to the rules that everyone needs to follow together. Only after the unified rules are implemented, data communication between different nodes can be realized, so that the application of the computer is more powerful.

To form an agreement, three elements are required:

  • Grammar means that the content of this paragraph must conform to certain rules and formats. For example, parentheses should be paired, and a semicolon should be used at the end.
  • Semantics means that this piece of content must represent a certain meaning. For example, the number minus the number is meaningful, and the number minus the text is generally meaningless.
  • Timing is what to do first and what to do later. For example, you can add a certain value first, and then subtract a certain value.

1.1.2 http protocol

After understanding the role of the agreement, what does the agreement look like?

Then look at the scene in Figure 1-3 again. People visit the website through a browser and use the http protocol.

image-20210821151126835

<center>Figure 1-3 http protocol</center>

The http protocol contains several parts:

  • http request composition

    • Status line
    • Request header
    • Message body
  • http response composition

    • Status line
    • Response header
    • Response body

The Http response message is shown in Figure 1-4, then the three elements of this protocol are:

  • Syntax: The message body of the http protocol is composed of status, header, and content.
  • Semantics: For example, status, 200 means success, 404 means the request path does not exist, etc. The communication parties must follow this semantics.
  • Timing: The sequence of the three parts that make up the message body. A request is required to generate a response.

After the browser has done the relevant processing in accordance with the http protocol, it will allow everyone to access various information on the Internet through the URL.

image-20210821155309605

<center>Figure 1-4</center>

1.1.3 Commonly used network protocols

DNS protocol, Http protocol, SSH protocol, TCP protocol, FTP protocol, etc., these are the commonly used protocol types. No matter what kind of agreement, it is still essentially composed of the three elements of the agreement, but the application scenarios are different.

The layer where DNS, HTTP, and HTTPS are located is called the application layer. After the application layer is encapsulated, the browser will pass the application layer package to the next layer for completion, which is implemented through socket programming. The next layer is the transport layer. The transport layer has two protocols, one is the connectionless protocol UDP, and the other is the connection-oriented protocol TCP. For scenarios requiring communication reliability, TCP protocol is often used. The so-called connection-oriented is that TCP will ensure that this packet can reach the destination. If it cannot be reached, it will be resent until it arrives.

1.3 Analysis of TCP/IP communication principle

How is a network communication completed?

When it comes to network communication, we will definitely mention the concept of a network model, as shown in Figure 1-5. Represents the four-layer conceptual model of TCP/IP and the seven-layer network model of OSI. It is a conceptual model proposed by the International Organization for Standardization to try to enable computers all over the world to be interconnected based on this network standard.

image-20210821170017901

<center>Figure 1-5</center>

Why should the network model be layered? In fact, it is not difficult to find from our current business layered architecture that once any system becomes complex, it will adopt a layered design. Its main benefit is

  • Achieve high cohesion and low coupling
  • Each layer has its own single responsibility
  • Improve reusability and reduce maintenance costs

1.2.1 Send data packet of http communication process

Since our course does not specifically talk about the network, we only mention the network layered model. In order to make it easier for everyone to understand the working principle of the network layered model, we still take the data packet transmission of a network communication as an example for analysis , As shown in Figure 1-6.

image-20210821175033443

<center>Figure 1-6</center>

The workflow of Figure 1-6 is described as follows:

  • Suppose we want to log in to a certain website. At this time, an http protocol message will be constructed based on the Http protocol. This message is assembled in accordance with the http protocol specifications, including the user name and password to be transmitted. This belongs to the application layer protocol.
  • After application layer encapsulation, the browser will hand over the application layer package to the next layer in the TCP/IP four-layer model, which is transport layer to complete. The transport layer has two protocols:

    • TCP protocol, a reliable communication protocol, this protocol will ensure that the data packet can reach the destination
    • UDP protocol, unreliable communication protocol, there may be data loss

    The TCP protocol is used in the http communication. The TCP protocol has two ports, one is the port that the browser listens to, and the other is the port of the target server process. The operating system will determine which process the data packet should be distributed to based on the port.

  • After the transport layer encapsulation is completed, the data packet will be technically handed over to the network layer for processing. The network layer protocol is the IP protocol, and the IP protocol will contain the source IP address (that is, the client and its IP) and the IP of the target server address.
  • After the operating system knows the target IP address, it starts to find the target machine based on this IP, and the target server must be deployed in a different place. This kind of cross-network node access requires a gateway (the so-called gateway is one network to another The gateway of the network).

    Therefore, the data packet first needs to go out through the gateway of the network where it is currently located, and then access the target server, but before the data packet is transmitted to the target server, the MAC header information needs to be assembled.

    The Mac header contains the local Mac address and the Mac address of the target server. How is this MAC address obtained?

    • The method to obtain the MAC address of the machine is that the operating system will send a broadcast message to ask who the gateway address (192.168.1.1) is? The gateway that receives the broadcast message will respond with a MAC address. This broadcast message is implemented based on the ARP protocol (the protocol is simply the ip of the known target machine, and the mac address of the target machine needs to be obtained. (Send a broadcast message, who owns this ip, please come and claim it. Claim the ip The machine will send a response with a mac address)).

      In order to avoid using ARP request every time, the machine will also perform ARP cache locally. Of course, the machine will continue to go online and offline, and the IP may also change, so the ARP MAC address cache will expire after a period of time.

    • The method of obtaining the MAC address of the remote machine is also implemented based on the ARP protocol.

After the MAC address is assembled, a complete data packet is formed. At this time, the data packet will be sent to the network card, and the network card will send out the data packet. Since the data packet contains the MAC address, it can reach the gateway for transmission. After the gateway receives the packet, it will determine the next step based on the routing information. The gateway is often a router. How to get to a certain IP address is called a routing table.

1.2.2 Receive data packet during http communication

When the data packet is sent to the gateway, it will determine which network segment the data packet should be transmitted to according to the routing information of the gateway. Data sent from the client to the target server may pass through multiple gateways. Therefore, after the data packet enters the next gateway according to the gateway route, it continues to find the next gateway based on the MAC address of the next gateway until it reaches the target network server.

At this time, after the server receives the packet, the last gateway knows that the network packet is going to the current local area network, so it takes the target IP and shouts out who is this through the ARP protocol? The target server will reply a MAC address to the gateway. Then the network packet modifies the MAC address of the target at the last gateway. Through this MAC address, the network packet finds the target server.

When the target server and MAC address are matched, start to fetch the MAC header information, and then send the data packet to the network layer of the operating system. The network layer will take out the IP header information, and a layer of encapsulated TCP protocol will be written in the IP header, so it is handed over to the transport layer for processing. The implementation process is shown in Figure 1-7.

In this layer, there will be a reply for each data packet received, indicating that the server has received the data packet. If the client does not receive the confirmation packet after a period of time, the TCP layer of the sender will resend the packet, which is the same process as above, until the reply is finally received.

This retry is implemented by the TCP protocol layer and does not require our application to initiate it actively.

image-20210822152538600

<center>Figure 1-7</center>

have the MAC layer and still go to the IP layer?

We mentioned earlier that the mac address is unique. In theory, between any two devices, I should be able to send data through the mac address. Why do I need an ip address?

The mac address is like an individual’s ID number. The ID number of a person is related to the city where the person’s household registration is located and the date of birth, but it has nothing to do with the location of the person. People can move and know a person’s ID number. It is not possible to find this person. The mac address is similar. It is associated with the device’s manufacturer, batch, date, etc. Knowing the mac of a device cannot send data to it on the network, unless it is related to sending Fang's are in the same network.

Therefore, to realize the communication between machines, we also need the concept of ip address. The ip address expresses the current position of the machine in the network, similar to the concept of city name + road number + house number. Through the addressing at the ip layer, we can know the path to transmit data between any two machines on the Internet in the world.

1.4 Explain the characteristics of TCP reliability communication in detail

We know that the TCP protocol is a reliable communication protocol, which can ensure that data packets are not lost. First of all, let's first understand TCP's three-way handshake and four waved hands.

1.4.1 TCP three-way handshake

Two nodes need to communicate with each other, and a connection must be established first. When establishing a connection, TCP uses a three-way handshake to establish the connection. As shown in Figure 1-8.

image-20210822161148923

<center>Figure 1-8</center>

first handshake (SYN=1, seq=x)

The client sends a TCP packet with the SYN flag position 1, indicating the port of the server that the client intends to connect to, and the initial sequence number X, stored in the sequence number (Sequence Number) field of the packet header. After sending, the client enters the SYN_SEND state.

second handshake (SYN=1, ACK=1, seq=y, ACK num=x+1):

The server sends back an acknowledgement packet (ACK) response. That is, the SYN flag and the ACK flag are both 1. The server selects its own ISN serial number, puts it in the Seq field, and sets the Acknowledgement Number to the client's ISN plus 1, that is, X+1. After sending, the server enters the SYN_RCVD state.

third handshake (ACK=1, ACK num=y+1)

The client sends an acknowledgment packet (ACK) again, the SYN flag bit is 0, the ACK flag bit is 1, and the sequence number field of the ACK sent by the server +1 is placed in the confirm field and sent to the other party, and the ISN is written in the data segment After sending, the client enters the ESTABLISHED state, and when the server receives this packet, it also enters the ESTABLISHED state, and the TCP handshake ends.

1.4.2 Why is TCP a three-way handshake?

TCP is full duplex. If there is no third handshake, the server cannot confirm whether the client is ready, and it does not know when it can send data packets to the client. In the three handshake, both sides confirmed that each other was ready.

We assume the unreliability of the network,

A initiates a connection. When a request is initiated and no feedback is received, there are many possibilities, such as the request packet is lost, or timed out, or B does not respond

Since A cannot confirm the result, it sends it again. When a request packet arrives at B, A does not know that the data packet has arrived at B, so it may try again.

So after B receives the request, he knows the existence of A and wants to establish a connection with me. At this time, B will send an ack to A, telling A that I have received the request packet.

For B, this response packet is also a network communication. How do I know if I can reach A? So at this time, B can't think subjectively that the connection has been established, and needs to wait until A sends a response packet again to confirm.

1.4.3 Four Waves of TCP

As shown in Figure 1-9, the disconnection of the TCP connection will be completed by the so-called four waves.

Four waves of hands indicate that when TCP is disconnected, the client and server need to send a total of 4 packets to confirm the disconnection; either the client or the server can actively initiate the wave action (because TCP is a full-duplex protocol), In socket programming, any party performs a close() operation to generate a wave operation.

image-20210822162756038

<center>Figure 1-9</center>

The above interaction process is as follows:

  • When disconnected, we can see that when client A says "I want to disconnect", it enters the FIN_WAIT_1 state.
  • After receiving the "I want to disconnect" message, the B server sends "I know" to the A client, and it enters the CLOSE_WAIT state.
  • When A receives "B said I know", it enters the FIN_WAIT_2 state. If the B server hangs up at this time, A will always be in this state. The TCP protocol does not deal with this state, but Linux does. You can adjust the tcp_fin_timeout parameter to set a timeout period.
  • If the server of B is normal, when the request of "B wants to close the connection" arrives at A, after A sends the ACK of "knows that B wants to close the connection", it ends from the FIN_WAIT_2 state.
  • Ordinarily, A can withdraw at this time, but what if B fails to receive the final ACK? Then B will re-send a "B wants to close the connection". If A has ran away at this time, B will no longer receive the ACK. Therefore, the TCP protocol requires A to wait for a period of time TIME_WAIT at the end. This time should be long enough. If B does not receive the ACK, "B said no more" will be retransmitted, and A will retransmit an ACK and have enough time to reach B.

This waiting implementation is 2MSL, MSL is Maximum Segment Lifetime, the maximum survival time of a message, it is the longest time any message exists on the network, and the message will be discarded after this time (at this time, A directly enters the CLOSE state). The agreement stipulates that the MSL is 2 minutes. In practical applications, 30 seconds, 1 minute and 2 minutes are commonly used.

first time (FIN=1, seq=x)

Suppose the client wants to close the connection, and the client sends a packet with the FIN flag set to 1, indicating that it has no data to send, but it can still receive data. After sending, the client enters the FIN_WAIT_1 state.

waved for the second time (ACK=1, ACKnum=x+1)

The server confirms the client's FIN packet and sends a confirmation packet to indicate that it has accepted the client's request to close the connection, but is not yet ready to close the connection. After sending, the server enters the CLOSE_WAIT state. After the client receives this confirmation packet, it enters the FIN_WAIT_2 state and waits for the server to close the connection.

waved for the third time (FIN=1, seq=w)

When the server is ready to close the connection, it sends a request to end the connection to the client, and FIN is set to 1. After sending, the server enters the LAST_ACK state and waits for the last ACK from the client.

waved for the fourth time (ACK=1, ACKnum=w+1)

The client receives the close request from the server, sends an acknowledgment packet, and enters the TIME_WAIT state, waiting for the possible ACK packet that requires retransmission. After the server receives this confirmation packet, it closes the connection and enters the CLOSED state.

[Question 1] Why is there a three-way handshake when connecting, but a four-way handshake when closing?

Answer: The three-way handshake is because when the Server side receives the SYN connection request message from the Client side, it can directly send a SYN+ACK message. The ACK message is used for reply, and the SYN message is used for synchronization. However, when the connection is closed, when the server receives a FIN message, it is very likely that it will not close SOCKET immediately (because there may be messages that have not been processed), so it can only reply with an ACK message and tell the client, "You I received the FIN message sent". Only after all the messages on my Server side have been sent, I can send FIN messages, so I cannot send them together. Therefore, a four-step handshake is required.

[Question 2] Why does the TIME_WAIT state need to pass 2MSL (maximum segment survival time) to return to the CLOSE state?

Answer: Although it is reasonable to say that the four messages have been sent, we can directly enter the CLOSE state, but we must assume that the network is unreliable, and the last ACK may be lost. So the TIME_WAIT state is used to retransmit ACK messages that may be lost.

1.4.4 TCP protocol message transmission

After the connection is established, data packet transmission begins. As a reliable communication protocol, how does TCP ensure the reliability of message transmission?

TCP uses a message confirmation method to ensure the security of data message transmission, which means that after the client sends a data packet to the server, the server will return a confirmation message to the client. If the client does not receive the confirmation Packet, it will be resent.

In order to ensure sequentiality, each packet has an ID. When the connection is established, the initial ID will be negotiated, and then sent one by one according to the ID. In order to ensure that no packets are lost, the sent packets must be responded to, but this response does not come one by one, but will respond to a previous ID, indicating that all have been received. This mode is called cumulative confirmation or cumulative response (Cumulative acknowledgment)

As shown in Figure 1-10, in order to record all sent and received packets, the TCP protocol has a send buffer and a receive buffer at the sender and receiver, respectively, the full-duplex working mode of TCP and the working mode of TCP The sliding window is dependent on these two independent Buffers and the filling state of the Buffer.

The receive buffer caches the data to the kernel. If the application process has not called the read method of the Socket to read, the data will always be cached in the receive buffer. Regardless of whether the process reads the Socket, the data sent from the opposite end will be received by the kernel and buffered in the kernel receiving buffer of the Socket.

What read needs to do is to copy the data in the kernel receive buffer to the application layer user's Buffer. When the process calls the send of the Socket to send data, it usually copies the data from the Buffer of the application layer user to the kernel sending buffer of the Socket, and then send will return at the upper layer. In other words, when send returns, the data will not necessarily be sent to the opposite end.

image-20210822173920975

<center>Figure 1-10</center>

The buffer of the sender/receiver is arranged one by one according to the ID of the packet, and is divided into four parts according to the processing situation.

  • The first part: Sent and confirmed.
  • The second part: Sent and not yet confirmed. You need to wait for confirmation before you can remove it.
  • The third part: not sent, but already waiting to be sent.
  • The fourth part: it has not been sent, and will not be sent yet.

The reason why the third part and the fourth part are distinguished here is actually because TCP adopts flow control. Here, a sliding window is used to achieve traffic shaping to avoid data congestion.

image-20210822173129991

<center>Figure 1-11</center>

In order to better understand the communication process of the data packet, we will demonstrate through the following URL

https://media.pearsoncmg.com/aw/ecs_kurose_compnetwork_7/cw/content/interactiveanimations/selective-repeat-protocol/index.html

1.4.5 Sliding Window Protocol

The part of the animation in the above address is actually the data packet sending and confirmation mechanism, and it also involves the interactive window protocol.

Sliding window is a flow control technology. In the early network communication, the two parties will not consider the congestion of the network and send data directly. Since everyone does not know the network congestion status, and the data is sent at the same time, which causes the intermediate nodes to block and drop the data, no one can send the data, so there is a sliding window mechanism to solve this problem; both the sender and the receiver maintain a sequence of data frames, this sequence Called window

send window

It is the sequence number table of the frames that the sender is allowed to send continuously.

The maximum number of frames that the sender can send continuously without waiting for a response is called the size of the sending window.

Receive window

The sequence number table of the frames that the receiver is allowed to receive. All frames that fall within the receive window must be processed by the receiver, and frames that fall outside the receive window are discarded.

The number of frames that the receiver allows to receive at a time is called the size of the receiving window.

1.5 Understand the nature of blocking communication

After understanding the principle of TCP communication, in Java, we will use Socket socket to realize network communication. The following code demonstrates the case of Socket communication.

public class ServerSocketExample {

    public static void main(String[] args) throws IOException {
        final int DEFAULT_PORT = 8080;
        ServerSocket serverSocket = null;
        serverSocket = new ServerSocket(DEFAULT_PORT);
        System.out.println("启动服务,监听端口:" + DEFAULT_PORT);
        while (true) {
            Socket socket = serverSocket.accept();
            System.out.println("客户端:" + socket.getPort() + "已连接");
            new Thread(new Runnable() {
                Socket socket;
                public Runnable setSocket(Socket s){
                    this.socket=s;
                    return this;
                }
                @Override
                public void run() {
                    try {
                        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(socket.getInputStream()));
                        String clientStr = null; //读取一行信息
                        clientStr = bufferedReader.readLine();
                        System.out.println("客户端发了一段消息:" + clientStr);
                        BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream()));
                        bufferedWriter.write("我已经收到你的消息了");
                        bufferedWriter.flush(); //清空缓冲区触发消息发送
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }.setSocket(socket)).start();

        }
    }
}

As mentioned in detail in our topic on Redis, the above communication is a BIO model, which is a blocking communication model. The main point of blocking is

  • accept, block waiting for client connection
  • io blocking, blocking data transmission waiting for the client.

I believe you all have something like me, how this blocking and awakening is going on in the future, let's take a brief look at it below.

1.5.1 The nature of blocking operations

Blocking refers to the waiting state of a process before an event occurs. It belongs to the operating system level scheduling. We use the following operations to track how many programs are in the Java program and what operations each thread has on the kernel.

strace, instructions in the Linux operating system

  1. Copy ServerSocketExample.java, remove the package import header, and copy it to the /data/app directory of the linux server.
  2. Use javac ServerSocketExample.java to compile and get the .class file
  3. Use the following command to track (open a new window)

    According to the description of strace's official website, strace is a Linux user space tracer that can be used for diagnosis, debugging and teaching. We use it to monitor the interaction between user space processes and the kernel, such as system calls, signal transmission, process state changes, etc.
    strace -ff -o out java ServerSocketExample
  • -f track the target process, and all child processes created by the target process
  • -o write the output of strace to the specified file separately
  1. After the above instructions are executed, you will get a lot of out.* files in the /data/app directory, and each file represents a thread. Because Java itself is multi-threaded.

    [root@localhost app]# ll
    total 748
    -rw-r--r--. 1 root root  14808 Aug 23 12:51 out.33320 //最小的表示主线程
    -rw-r--r--. 1 root root 186893 Aug 23 12:51 out.33321
    -rw-r--r--. 1 root root    961 Aug 23 12:51 out.33322
    -rw-r--r--. 1 root root    917 Aug 23 12:51 out.33323
    -rw-r--r--. 1 root root    833 Aug 23 12:51 out.33324
    -rw-r--r--. 1 root root    819 Aug 23 12:51 out.33325
    -rw-r--r--. 1 root root  23627 Aug 23 12:53 out.33326
    -rw-r--r--. 1 root root   1326 Aug 23 12:51 out.33327
    -rw-r--r--. 1 root root   1144 Aug 23 12:51 out.33328
    -rw-r--r--. 1 root root   1270 Aug 23 12:51 out.33329
    -rw-r--r--. 1 root root   8136 Aug 23 12:53 out.33330
    -rw-r--r--. 1 root root   8158 Aug 23 12:53 out.33331
    -rw-r--r--. 1 root root   6966 Aug 23 12:53 out.33332
    -rw-r--r--. 1 root root   1040 Aug 23 12:51 out.33333
    -rw-r--r--. 1 root root 445489 Aug 23 12:53 out.33334
  2. Open the file out.33321 (a file behind the main thread), shift+g to the end of the file, you can see the following content.

    The following methods are all system calls, that is, calling the kernel instructions provided by the operating system to trigger related operations.
    # 创建socket fd 
    socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 5 
    ....
    # 绑定8888端口
    bind(5, {sa_family=AF_INET6, sin6_port=htons(8888), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
    # 创建一个socket并监听申请的连接, 5表示sockfd,50表示等待队列的最大长度
    listen(5, 50)                           = 0
    mprotect(0x7f21d00df000, 4096, PROT_READ|PROT_WRITE) = 0
    write(1, "\345\220\257\345\212\250\346\234\215\345\212\241\357\274\214\347\233\221\345\220\254\347\253\257\345\217\243\357\274\23288"..., 34) = 34
    write(1, "\n", 1)                       = 1
    lseek(3, 58916778, SEEK_SET)            = 58916778
    read(3, "PK\3\4\n\0\0\10\0\0U\23\213O\336\274\205\24X8\0\0X8\0\0\25\0\0\0", 30) = 30
    lseek(3, 58916829, SEEK_SET)            = 58916829
    read(3, "\312\376\272\276\0\0\0004\1\367\n\0\6\1\37\t\0\237\1 \t\0\237\1!\t\0\237\1\"\t\0"..., 14424) = 14424
    # poll, 把当前的文件指针挂到等待队列,文件指针指的是fd=5,简单来说就是让当前进程阻塞,直到有事件触发唤醒
    * events: 表示请求事件,POLLIN(普通或优先级带数据可读)、POLLERR,发生错误。
    poll([{fd=5, events=POLLIN|POLLERR}], 1, -1

As you can see from this code, the accept method of Socket ultimately calls the poll function of the system to achieve thread blocking.

man 2 poll on the linux server

man: help manual

2: Indicates the function related to the system call

DESCRIPTION
       poll()  performs  a  similar  task  to  select(2): it waits for one of a set of file
       descriptors to become ready to perform I/O.

The poll is similar to the select function, it can wait for the IO ready event in a set of file descriptors

  1. Access the socket server through the following command.

    telnet 192.168.221.128 8888

At this time, through the tail -f out.33321 file, it was found that the blocked poll() method was awakened by the POLLIN event, indicating that a connection was monitored.

poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])
accept(5, {sa_family=AF_INET6, sin6_port=htons(53778), inet_pton(AF_INET6, "::ffff:192.168.221.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 6

1.5.2 Blocking the process of being awakened

As shown in Figure 1-12, the network data packet is transmitted through the network cable to the network card of the target server, and then transmitted through the hardware circuit shown in 2, and finally the data is written to an address in the memory, and then the network card is notified by an interrupt signal When the CPU arrives with data, the operating system knows that a new data packet is currently being transferred, so the CPU starts to execute the interrupt program. The main logic of the interrupt program is

  • First write the data received by the network card into the corresponding Socket receiving buffer
  • Then wake up the thread that is blocked on the poll() method

img

<center>Figure 1-12</center>

1.5.3 Analysis of the overall principle of blocking

In order to support multitasking, the operating system implements the process scheduling function. The running process indicates that it has obtained the right to use the CPU. When the process (thread) is blocked due to some operation, it will release the right to use the CPU, making the operating system Ability to perform multitasking.

When multiple processes are running and waiting for CPU scheduling, these processes will be saved to a runnable queue, as shown in Figure 1-13.

image-20210823143314215

<center>Figure 1-13</center>
When process A executes the Socket creation statement, a Socket object managed by the file system will be created in the Linux operating system. This Socket object contains the sending buffer, receiving buffer, waiting queue, etc. The waiting queue is a very important structure. It points to all processes that need to wait for the current Socket event, as shown in Figure 1-14.

When the process A calls the poll() method to block, the operating system will move the current process A from the work queue to the waiting queue of the Socket (point the pointer of the process A to the waiting queue, and the subsequent need to wake up), at this time A is blocked, The CPU continues to execute the next process.

image-20210823145055784

<center>Figure 1-14</center>

When the Socket receives data, the process waiting for the Socket FD will be awakened. As shown in Figure 1-15, the computer receives the data from the client through the network card, and the network card writes the data to the memory, and then Then the CPU is notified by the interrupt signal that data has arrived, so the CPU starts to execute the interrupt program.

When an interruption occurs, it means that the operating system needs to intervene to carry out management work. Because the management work of the operating system ( such as process switching, IO device allocation) requires the use of privileged instructions, the CPU needs to be converted from user mode to core mode. interrupt can convert the CPU from user mode to core mode, allowing the operating system to gain control of the computer. Therefore, with interrupts, concurrent execution of multiple programs can be realized.

The interrupt program here has two main functions. First, write network data into the receiving buffer of the corresponding Socket (step ④), then wake up process A (step ⑤), and put process A into the work queue again.

image-20210823150147501

<center>Figure 1-15</center>

1.5 The essence of the select/poll model in Linux

What I talked about earlier in section 1.4 is actually the Recv() method, which can only monitor a single Socket. In practical applications, this single Socket monitoring will obviously affect the number of client connections, so we need to find a way to monitor multiple Sockets at the same time, and select/poll is generated in this context, among which poll The method is mentioned in the previous case, and the poll model is used by default.

Let's first understand the select model, because in the previous analysis, we know that Recv() can only monitor a single socket. When the number of client connections is large, the throughput will be very low, so we thought, can we To achieve simultaneous monitoring of multiple sockets, as long as any socket connection has an IO ready event, the process is triggered to wake up.

As shown in Figure 1-16, assuming that the program listens to the two socket connections of socket1 and socket2 at the same time, when the application calls the select method, the operating system will point process A to the waiting queue of each socket. When any Socket receives data, the interrupt program will wake up the corresponding process.

When process A is awakened, it knows that at least one Socket has received data. The program only needs to traverse the Socket list once to get the ready Socket.

image-20210823153708311

<center>Figure 1-16</center>

There are two problems with select mode,

  • That is, every time select is called, the process needs to be added to the waiting queue of all monitor sockets, and each wakeup needs to be removed from the waiting queue. This involves two traversals, which has a certain performance overhead.
  • After the process is awakened, it does not know which sockets have received the data, so it is necessary to traverse all the sockets once to get the list of ready sockets.

Due to the performance impact of these two issues, the select default stipulates that only 1024 sockets can be monitored. Although the number of file descriptors that can be monitored can be modified, this will reduce efficiency. The poll mode and select are basically the same, the biggest difference is that poll does not have a maximum file descriptor limit.

1.6 epoll model in Linux

Is there a more efficient method that can reduce the traversal and achieve the purpose of monitoring multiple fd at the same time? The epoll model can solve this problem.

Epoll is actually a combination of event polls. The biggest difference between it and select is that epoll will notify the application of which IO events occurred on which socket, so epoll is actually event-driven. The specific principle is shown in Figure 1-17. .

Three methods are provided in epoll, namely epoll_create, epoll_ctl, and epoll_wait. The specific execution process is as follows

  • First call the epoll_create method to create an eventpoll object in the kernel. This object will maintain an epitem collection, which is a red-black tree structure. This set is simply understood as the fd set.
  • Then call the epoll_ctl function to encapsulate the current fd into epitem and add it to the eventpoll object, and add a callback function to this epitem to register it with the kernel. When this fd receives a network IO event, the epitem corresponding to the fd will be added to the ready list rdlist (doubly linked list) in eventpoll. At the same time, the blocked process A is awakened.
  • Process A continues to call the epoll_wait method to directly read the epitem in the ready queue rdlist in epoll. If the rdlist queue is empty, it will block or wait for a timeout.

It can be known from the principle of epoll that due to the existence of rdlist, process A knows which Socket (fd) has IO events after being awakened, so that all ready socket connections can be obtained without traversal.

image-20210823212943850

<center>Figure 1-17</center>

Copyright statement: All articles in this blog, except for special statements, adopt the CC BY-NC-SA 4.0 license agreement. Please indicate the reprint from Mic takes you to learn architecture!
If this article is helpful to you, please help me to follow and like. Your persistence is the motivation for my continuous creation. Welcome to follow the WeChat public account of the same name for more technical dry goods!

跟着Mic学架构
810 声望1.1k 粉丝

《Spring Cloud Alibaba 微服务原理与实战》、《Java并发编程深度理解及实战》作者。 咕泡教育联合创始人,12年开发架构经验,对分布式微服务、高并发领域有非常丰富的实战经验。