头图

This article is shared by the author Kobayashi Coding, from the public account "Kobayashi Coding", with revisions and changes.

1 Introduction

Speaking of the TCP protocol, developers who are engaged in instant messaging/IM applications are all too familiar. With the deepening of the understanding of TCP, many TCP technical concepts or questions that have been encountered but have no time to explore in depth, it is time to go back and make up for it.

In this article, we will discuss an interesting TCP technical problem in depth from the system level: after unplugging the network cable and plugging it in again, is the original TCP connection still there? Or is it still "good"?

Some people may say: the network cable has been unplugged, which means that the physical layer (also called the physical layer) is disconnected (for the layered model of network protocols, please refer to "Quick Understanding of Network Communication Protocols (Part 1)"), The transport layer above the physical layer should also be disconnected, so the original TCP connection will not exist. It's like when we make a wired phone call, if one party's phone line is unplugged, then the call will be completely cut off.

Is the answer really so? It may not be what you understand, so let's follow the author to discuss it in depth.

study Exchange:

(This article was simultaneously published at: http://www.52im.net/thread-3846-1-1.html )

2. Series of articles

This article is the 14th in a series of articles, the outline of which is as follows:

"Unknown Network Programming (1): Analysis of Intractable Diseases in the TCP Protocol (Part 1)"
"Unknown Network Programming (2): Analysis of Intractable Diseases in the TCP Protocol (Part 2)"
"Unknown Network Programming (3): Why TIME_WAIT, CLOSE_WAIT When Closing a TCP Connection"
"Unknown Network Programming (4): In-depth study and analysis of abnormal shutdown of TCP"
"Unknown Network Programming (5): UDP Connectivity and Load Balancing"
"Unknown Network Programming (6): In-depth understanding of the UDP protocol and making good use of it"
"Unknown Network Programming (7): How to Make Unreliable UDP Reliable? 》
"Unknown Network Programming (8): Deep Decryption of HTTP from the Data Transport Layer"
"Unknown Network Programming (9): Combining theory with practice, comprehensive and in-depth understanding of DNS"
"Unknown Network Programming (10): Deepening the Operating System and Understanding the Receiving Process of Network Packets from the Kernel (Linux)"
"Unknown network programming (11): starting from the bottom, in-depth analysis of the secret of TCP connection time-consuming"
"Unknown Network Programming (12): Thoroughly Understand the KeepAlive Mechanism of the TCP Protocol Layer"
"Unknown Network Programming (13): Go deep into the operating system and thoroughly understand 127.0.0.1 local network communication"
"Unknown Network Programming (14): Unplug the network cable and plug it in again, is the TCP connection still there? Understand it in one sentence! "(* This article)

3. A more general answer

3.1 Answers
In the introduction, we said: Some people think that if the network cable is unplugged, it means that the physical layer is disconnected, then the transport layer above the physical layer will definitely be disconnected, so the original TCP connection naturally does not exist. (PS: For a detailed explanation of computer network layering, please refer to "Detailed Explanation of the Most Popular Computer Network Layering in History")

The above logic is flawed.

The problem is: it is wrong to think that the action of unplugging the network cable will affect the transport layer, but in fact it will not!

In fact: the TCP connection is a structure called struct socket in the Linux kernel, and the content of the structure contains information such as the status of the TCP connection.

So: when the network cable is unplugged, the operating system will not change any content of the structure, so the state of the TCP connection will not change.

3.2 Experiment to verify
I did a small experiment: I connected my cloud server with the ssh terminal, and then I simulated the scenario of unplugging the network cable by disconnecting the wifi. At this time, I checked that the status of the TCP connection has not changed, and it is still in the ESTABLISHED state ( As shown below).

The above experimental results can verify my conclusion: the action of unplugging the network cable does not affect the state of the TCP connection.

However, this answer is still a bit general. In fact, we should look at this question in a more specific scenario, and the answer will be more accurate.

This specific scenario is:

1) When there is data transmission after unplugging the network cable;
2) When the network cable is unplugged, there is no data transmission.

For the above two specific scenarios, let me analyze them more specifically. We read on.

4. Specific scenario 1: After unplugging the network cable, when there is data transmission

4.1 During the data transmission process, the network cable was just plugged back in
If the client is disconnected from the network cable, the data packet sent by the server to the client will not receive any response. After waiting for a certain period of time, the server will trigger the timeout retransmission mechanism of the TCP protocol (see: "TCP/IP Detailed Explanation - Chapter 21. TCP Timeout and Retransmission"), however, at this time, the data packet that cannot be responded to is retransmitted.

If the client just plugs the network cable back in the process of retransmitting the message by the server, since unplugging the network cable will not change the TCP connection state of the client, and it is still in the ESTABLISHED state, the client can receive it normally at this time. The data message sent by the server, and then the client will return an ACK response message.

At this point: the TCP connection between the client and the server will still exist and the working state will not be affected, giving the application layer the feeling that nothing happened. . .

4.2 During data transmission, the network cable has not been plugged back in
In the above case, if the client has not plugged the network cable back in the process of retransmitting the packet by the TCP protocol on the server side, the kernel will determine the There is a problem with TCP. Then it will tell the application that there is a problem with the TCP connection through the Socket interface, so the TCP connection of the server will be disconnected.

Next, if the client plugs back the network cable, if the client sends data to the server, since the server has no matching TCP connection information with the client, the server kernel will reply to the RST message, and the client will receive After that, the TCP connection will be released.

At this point: the TCP connection between the client and the server has been clearly disconnected, and the original connection does not exist.

4.3 Getting to the bottom of things: How many times are TCP data packets retransmitted?
In the spirit of knowing why it is, let's get to the bottom of it: How many times are TCP data packets retransmitted?

In the Linux system, a configuration item called tcp_retries2 is provided, and the default value is 15 (as shown in the following figure).

As shown in the figure above: This kernel parameter controls the maximum number of timeout retransmissions when the TCP connection is established.

However, if tcp_retries2 is set for 15 times, it does not mean that the application will be notified to terminate the TCP connection after 15 TCP timeout retransmissions. The kernel will also determine based on the "maximum timeout time".

The timeout time of each round is increased in multiples, for example, the first time the timeout retransmission is triggered is after 2s, the second time is after 4s, the third time is after 8s, and so on.

The kernel will calculate a maximum timeout according to the value set by tcp_retries2.

When the message is retransmitted and no response is received from the other party, one of the two conditions of "maximum number of retransmissions" or "maximum timeout" will be reached, and the retransmission will be stopped and then disconnected. TCP connection.

PS: For details of the TCP timeout retransmission mechanism, you can read "Analysis of Difficulties in the TCP Protocol (Part 2)".

5. Specific scenario 2: After unplugging the network cable, when there is data transmission

5.1 Scenario Analysis
For the scenario where there is no data transmission after the network cable is unplugged, it is necessary to check whether the TCP KeepAlive mechanism is enabled (see "Understanding the KeepAlive Mechanism of the TCP Protocol Layer" for details).

1) If the TCP KeepAlive mechanism is not enabled:

After the client unplugs the network cable, and neither side is transmitting data, the TCP connection between the client and the server will always exist.

2) If the TCP KeepAlive mechanism is enabled:

After the client unplugs the network cable, TCP will send a KeepAlive detection packet after a period of time, even if neither side is transmitting data.

According to the response to the KeepAlive detection packet, there are two possibilities:

1) If the peer end works normally: when the probe message is received by the peer end and responds normally, the TCP keep-alive time will be reset, waiting for the arrival of the next TCP keep-alive time;
2) If the peer host crashes or the peer end is unreachable due to other reasons: when the probe message is sent to the peer end, it will sink into the sea and there will be no response. After several times in a row, after the number of keepalive detections is reached, TCP will report the connection. already dead.
Therefore, the TCP keep-alive mechanism can determine whether the other party's TCP connection is alive through the detection message of the TCP KeepAlive mechanism when there is no data exchange between the two parties.

5.2 Getting to the bottom of things: What exactly is the TCP KeepAlive mechanism?
The principle of the TCP KeepAlive mechanism is as follows:

Define a time period. During this time period, if there is no connection-related activity, the TCP keep-alive mechanism will start to function, and a probe packet will be sent every time interval. The probe packet contains very little data. If there is no response to several probe packets in a row, it is considered that the current TCP connection has died, and the system kernel will notify the upper-layer application of the error information.

There are corresponding parameters in the Linux kernel to set the keep-alive time, the number of keep-alive detections, and the time interval of keep-alive detections.

The following are the default values in Linux:

net.ipv4.tcp_keepalive_time=7200
net.ipv4.tcp_keepalive_intvl=75
net.ipv4.tcp_keepalive_probes=9

explain:

1) tcp_keepalive_time=7200: Indicates that the keepalive time is 7200 seconds (2 hours), that is, if there is no connection-related activity within 2 hours, the keepalive mechanism will be activated;
2) tcp_keepalive_intvl=75: Indicates that each detection interval is 75 seconds;
3) tcp_keepalive_probes=9: Indicates that there is no response for 9 times of detection, and the other party is considered unreachable, thus interrupting this connection.

That is to say, in a Linux system, it takes at least 2 hours, 11 minutes and 15 seconds to find a "dead" connection.

The calculation formula is:

Note: If the application wants to use the TCP keep-alive mechanism, it needs to set the SO_KEEPALIVE option through the socket interface to take effect. If it is not set, then the TCP keep-alive mechanism cannot be used.

PS: For details on the KeepAlive mechanism of the TCP protocol, see "Comprehend the KeepAlive Mechanism of the TCP Protocol Layer" and "Understanding the Network Heartbeat Packet Mechanism in Instant Messaging Applications: Function, Principle, Implementation Ideas, etc.".

5.3 Getting to the bottom of it: Is the detection time of the TCP KeepAlive mechanism too long?
Yes, it is indeed a bit long.

The TCP KeepAlive mechanism is implemented by the TCP layer (kernel state), and it is a bottom-up solution for all programs based on the TCP transport protocol.

In fact: we usually implement a detection mechanism at the application layer, which can detect whether the other party is alive in a short period of time.

For example, a general Web server will provide the keepalive_timeout parameter, which is used to specify the timeout time for a long HTTP connection. If the timeout time of the HTTP long connection is set to 60 seconds, the Web service software will start a timer. If the client does not initiate a new request within 60 seconds after completing the last HTTP request, the timer Once it arrives, a callback function is triggered to release the connection.

Another example: the heartbeat mechanism in the IM and message push systems, through the heartbeat mechanism of the application layer (sent by the client, and the server responds with a response packet), to flexibly control and detect the health of the long connection.

"Why does the mobile IM based on the TCP protocol still need the heartbeat keep-alive mechanism? "This article explains the necessity of the application layer heartbeat keep-alive in applications such as IM, and you can read it if you are interested.

If you have no idea about the specific application of the application layer heartbeat, you can take a look at these two articles on WeChat:

"WeChat team original sharing: Android version WeChat background keep-alive actual combat sharing (network keep-alive)"
"Mobile IM Practice: Realizing the Intelligent Heartbeat Mechanism of Android WeChat"

Here are a few heartbeat implementation codes for applications like im, you can feel and learn about them:

"Correctly understand the heartbeat and reconnection mechanism of IM long connection, and implement it by hand (with complete IM source code)"
"Discussion on the design and implementation of an Android-side IM intelligent heartbeat algorithm (including sample code)"
"Is it so difficult to develop IM by yourself? Teach you to create a simple Android version of IM by yourself (with source code)"
"Teach you to use Netty to implement the heartbeat mechanism and disconnection reconnection mechanism of network communication programs"

6. Summary of this article

The following is a brief summary of the content of the article. The question at the beginning of this article is not something that can be explained accurately and clearly in a single sentence. It needs to be treated according to the situation.

That is: after the client unplugs the network cable, it will not directly affect the TCP connection status. So after unplugging the network cable, whether the TCP connection will still exist depends on whether there is data transmission after the network cable is unplugged.

1) In the case of data transmission:

After the client unplugs the network cable: If the server sends a data packet, and the client just plugs back the network cable before the number of retransmissions by the server reaches the maximum value, the original TCP connection between the two parties can still exist and work normally. , as if nothing happened.

After the client unplugs the network cable: If the server sends a data packet, the server will disconnect the TCP connection when the number of retransmissions by the server reaches the maximum value before the client plugs back the network cable. After the client plugs back the network cable, it sends data to the server. Because the server has disconnected the TCP connection with the same quadruple as the client, it will return an RST message, and the client will disconnect after receiving it. TCP connection. At this point, both TCP connections are disconnected.

2) In the case of no data transfer:

a. If both parties do not enable the TCP keepalive mechanism, then after the client unplugs the network cable, if the client does not plug back the network cable, the TCP connection status between the client and the server will always exist;
b. If both parties have enabled the TCP keepalive mechanism, then after the client unplugs the network cable, if the client does not plug in the network cable, the TCP keepalive mechanism will detect that the other party's TCP connection is not alive, so it will disconnect the TCP connection. However, if the client plugs back the network cable during the TCP detection period, the original TCP connection between the two parties can still exist normally.
In addition to the scenario where the client unplugs the network cable, there are also two scenarios where the client "goes down and kills the process".

Scenario 1: The client downtime cannot be sensed by the server just like unplugging the network cable, so if there is no data transmission and the TCP keepalive mechanism is not enabled, the server's TCP connection will always be in ESTABLISHED Connection state until the server restarts the process.

So: we can know a point - when the TCP keep-alive mechanism is not used and the two parties do not transmit data, when the TCP connection of one party is in the ESTABLISHED state, it does not mean that the TCP connection of the other party is still normal. .

The second scenario: After killing the client's process, the client's kernel will send a FIN message to the server and wave four times with the client (see "Learn TCP Three-way Handshake and Four Waves with Animation").

So: even if TCP KeepAlive is not turned on, and there is no data interaction between the two parties, if one of the processes crashes, the operating system can perceive this process, so it will send a FIN message to the other party, and then communicate with the other party. The other party waves TCP four times.

7. References

[1] Detailed explanation of TCP/IP - Chapter 21. TCP timeout and retransmission
[2] Easy to understand - in-depth understanding of the TCP protocol (I): theoretical basis
[3] Introduction to Lazy Network Programming (3): A quick understanding of the TCP protocol is enough
[4] An Introduction to Brain Stupid Network Programming (1): Follow the animation to learn TCP three-way handshake and four-way wave
[5] Introduction to Brain Stupid Network Programming (7): Necessary for face-to-face, detailed explanation of the most popular computer network layers in history
[6] The sharing of technical master Chen Shuo: from the shallow to the deep, the summary of network programming learning experience
[7] Getting started with network programming has never been easier (2): If you were to design the TCP protocol, what would you do?
[8] Unknown Network Programming (10): Go deep into the operating system and understand the receiving process of network packets from the kernel (Linux)
[9] Why does the TCP-based mobile IM still need the heartbeat keep-alive mechanism?
[10] One article to understand the network heartbeat packet mechanism in instant messaging applications: role, principle, implementation ideas, etc.
[11] Web-side instant messaging practice dry goods: how to make your WebSocket disconnect and reconnect faster?

(This article was simultaneously published at: http://www.52im.net/thread-3846-1-1.html )


JackJiang
1.6k 声望810 粉丝

专注即时通讯(IM/推送)技术学习和研究。