9
头图

Written at the beginning, about 4 years ago, I heard the operation and maintenance students mentioned the problem of too many TCP connections in the TIME\_WAIT state, but I didn’t think about it. , The problems encountered, so take some time to study them in detail.

图片

Start from these aspects:

  • Problem description: What is the phenomenon? What impact?
  • problem analysis
  • solution
  • The underlying principle

Problem Description

Simulate high-concurrency scenarios, there will be batches of TIME_WAIT TCP connections:

图片

After a short time, all TIME_WAITs disappeared and were recycled. The ports and services were all normal.

That is, in a high-concurrency scenario, the TIME_WAIT connection exists, which is a normal phenomenon.

In online scenes, continuous high concurrency scenes

  • Some TIME_WAIT connections are recycled, but new TIME_WAIT connections are generated;
  • In some extreme cases, there will be a large number of TIME_WAIT connections.

Think:

Does the large number of TCP connections in the TIME_WAIT state have any business impact?

When Nginx is used as a reverse proxy, a large number of short links may cause the TCP connection on Nginx to be in the time_wait state:

  • Each time_wait state will occupy a "local port", the upper limit is 65535 (16 bit, 2 Byte);
  • When a large number of connections are in time_wait, an error will occur when a newly established TCP connection is established, address already in use: connect is abnormal

Statistics on the status of TCP connections:

// 统计:各种连接的数量
$ netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
ESTABLISHED 1154
TIME_WAIT 1645

Tips: The maximum number of TCP local ports is 65535 (6.5w). This is because the TCP header uses 16 bits to store the "port number", so the upper limit is 65535.

problem analysis

What is the essential reason for the existence of a large number of TIME_WAIT state TCP connections?

  • A large number of short connections exist
  • Especially in the HTTP request, if the value of the connection header is set to close, basically the "server" initiates the initiative to close the connection
  • However, in the TCP four-time wave of hands to close the connection mechanism, in order to ensure the ACK retransmission and discard the delayed data, the time_wait is set to 2 times the MSL (maximum survival time of the message)

TIME_WAIT status:

  • In the TCP connection, the state of the party that actively closed the connection; (receives the FIN command, enters the TIME_WAIT state, and returns the ACK command)
  • Maintain 2 MSL time, that is, 4 minutes; (MSL is 2 minutes)

Solution

To solve the problem that the above-mentioned time_wait state exists in large numbers and cause the creation of a new connection to fail, the general solution is:

  • Client, HTTP request header, connection is set to keep-alive, keep alive for a period of time: current browsers generally do this
  • Service-Terminal
  • Allow sockets in the time_wait state to be reused
  • Reduce the time_wait time and set it to 1 MSL (ie, 2 mins)
Conclusion: A few core points
  • The impact of time_wait status:
  • In a TCP connection, the end that "actively initiates closing the connection" will enter the time_wait state
  • time_wait state, the default will last for 2 MSL (the maximum survival time of the message), generally 2x2 mins
  • In the time_wait state, the port occupied by the TCP connection cannot be used again
  • The number of TCP ports, the upper limit is 6.5w (65535, 16 bit)
  • The existence of a large number of time_wait states will cause errors in the newly created TCP connection, address already in use: connect exception

Realistic scene

  • Server side, general setting: "Actively close the connection" is not allowed
  • But in the HTTP request, the connection parameter in the http header may be set to close, then the server will actively close the TCP connection after processing the request
  • Now in the browser, the connection parameter of the HTTP request is generally set to keep-alive
  • In the Nginx reverse proxy scenario, there may be a large number of short links, the server side may exist
Solution
  • Service-Terminal,
  • Allow sockets in the time_wait state to be reused
  • Reduce the time_wait time and set it to 1 MSL (ie, 2 mins)

appendix

several aspects:

  • Inquiry of TCP connection status
  • MSL time
  • TCP three-way handshake and four-way handshake
Appendix A: Query TCP connection status

Under Mac, the specific commands for querying the TCP connection status:

// Mac 下,查询 TCP 连接状态
$ netstat -nat |grep TIME_WAIT

// Mac 下,查询 TCP 连接状态,其中 -E 表示 grep 或的匹配逻辑
$ netstat -nat | grep -E "TIME_WAIT|Local Address
"Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp4       0      0  127.0.0.1.1080         127.0.0.1.59061        TIME_WAIT

// 统计:各种连接的数量

$ netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'ESTABLISHED 1154TIME_WAIT 1645
Appendix B: MSL time

MSL, Maximum Segment Lifetime, "Maximum Segment Lifetime of Message",

  • The longest time that any message exists on the network, after this time, the message will be discarded. (IP message)
  • The TCP segment is the data part of the ip datagram.

Tips:

RFC 793 stipulates that MSL is 2 minutes, and 30 seconds, 1 minute and 2 minutes are commonly used in practical applications.

2MSL, TCP's TIME_WAIT state, also known as 2MSL waiting state:

  • When one end of TCP initiates an active shutdown (receives a FIN request), after sending the last ACK response, that is, after the third handshake is completed, after sending the ACK packet of the fourth handshake, it enters the TIME_WAIT state.
  • You must stay in this state for twice the MSL time. The main purpose of waiting for the 2MSL time is to fear that the last ACK packet is not received by the other party. Then the other party will retransmit the FIN packet of the third handshake after the timeout, and actively close the terminal and receive the retransmission. After the FIN packet, you can send another ACK response packet.
  • In the TIME_WAIT state, the ports at both ends cannot be used, and they can only be used until the end of the 2MSL time. (IP layer)
  • When the connection is in the 2MSL waiting phase, any late segment will be discarded.

However, in actual applications, you can use the occupied port without waiting for the end of the 2MSL time by setting the "SO_REUSEADDR option".

Appendix C: TCP three-way handshake and four-way handshake, for details, refer to: TCP three-way handshake and four-way wave

Specific schematic diagram:

  • Three-way handshake, the process of establishing a connection
  • Wave four times to release the connection process

图片

A few core questions:

Is time_wait the "server-side" state? or "Client" status?
  • RE: time_wait is the state of the "actively closing the TCP connection", which may be on the "customer service side" or on the "server side"
  • Under normal circumstances, it is the state of the "client"; the "server" is generally set to "not actively close the connection"
When the server is serving externally, is the disconnection initiated by the "client"? Or is it a disconnection initiated by the "server"?
  • Under normal circumstances, the disconnection is initiated by the "client"
  • "Server" is generally set to "Do not actively close the connection", the server usually executes "Passive shutdown"
  • But in the HTTP request, the connection parameter in the http header may be set to close, then the server will actively close the TCP connection after processing the request
Regarding the mechanism for actively closing the TCP connection in the HTTP request: TIME_WAIT will only appear when the actively disconnected party is active, so is the actively disconnected party the server?
  • The answer is yes. In the HTTP1.1 protocol, there is a Connection header. Connection has two values, close and keep-alive. This header is equivalent to the client telling the server whether the server will close the connection or keep the connection after completing the request. Keeping the connection means that while the connection is kept, only the client can actively disconnect. There is also a keep-alive header, the value set represents how long the server keeps the connection.
  • The default Connection value of HTTP is close, which means that the party that closes the request is almost always initiated by the server. Then it is normal for the server to generate too much TIME_WAIT.
  • Although the HTTP default Connection value is close, the current browser generally sets the Connection to keep-alive when sending a request. Therefore, some people say that there is no need to adjust the parameters to reduce TIME_WAIT.
About time_wait:
  • After the TCP connection is established, the end of the "actively close the connection" will send an ACK response after receiving the FIN request from the other party, and it will be in the time_wait state;
  • Time_wait state, the necessity of existence:
  • Reliably realize the termination of TCP full-duplex connection: In the process of closing the TCP connection with four waves, the last ACK is sent by the end "actively closing the connection". If the ACK is lost, the other party will resend the FIN request, so In the section of "actively closing the connection", a time_wait state needs to be maintained to process the FIN request retransmitted by the other party;
  • Processing delayed arrival packets: Due to the router's possible jitter, TCP packets will arrive late. In order to avoid "delayed arrival TCP packets" being mistaken for "new TCP connection" data, you need to allow new TCP connections to be created Previously, it kept an unavailable state and waited for all delayed packets to disappear. Generally set to 2 times the MSL (maximum time to live of the packet) to solve the problem of "delayed TCP packets";

Source: ningg.top/computer-basic-theory-tcp-time-wait/


民工哥
26.4k 声望56.7k 粉丝

10多年IT职场老司机的经验分享,坚持自学一路从技术小白成长为互联网企业信息技术部门的负责人。2019/2020/2021年度 思否Top Writer