Author: Bao Fengqi
A development member of the dble team at Axon, mainly responsible for dble requirements development, troubleshooting and community question answering. Stop talking nonsense and come here.
Source of this article: original contribution
*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.
background
In a stable environment, when dble initializes the back-end connection pool, the back-end connection pool will have a mismatch between the connection counter (totalConnections) and the actual number of connections (allConnections). In theory, the two variables will maintain eventual consistency.
Later, I found the relevant documents by consulting the relevant documents on the Internet: https://mp.weixin.qq.com/s/FEybf-jRL8gMUYPBDH3aQg . For the detailed analysis process, please refer to this article.
To put it simply, in the process of dble initializing the backend connection pool, the number of instantaneously created connections may be too large, causing the TCP syn_cookie mechanism to be triggered during the handshake of some TCP connections, and the ACK message of the third TCP handshake is lost, thus lead to the above situation.
Subsequently, this situation was temporarily resolved after the TCP syn_cookie was turned off in a stable environment.
But suppose the following three abnormal situations occur in the normal TCP three-way handshake:
- TCP first handshake packet SYN lost
- TCP second handshake packet SYN, ACK packet loss
- TCP third handshake ACK packet lost
How does the client and server handle it, and if it retransmits, how many times? How long is each interval?
lab environment
Start the MySQL service on a server, the port is 3306, the IP address: 10.186.60.69
Use MySQL client to connect to MySQL service on one server, IP address: 10.186.60.60
first scenario
What happens when the SYN packet of the first handshake packet of TCP is lost?
Execute on the MySQL server to block all TCP packets sent by the client through iptables:
$ iptables -i eth0 -A INPUT -p tcp --dport 3306 -j DROP
Start capturing packets on the MySQL server:
$ tcpdump -i eth0 tcp and port 3306 -w tcp_syn_timeout.cap
Connect to the MySQL server through the MySQL client. After more than a minute, the client returns an error and stops capturing packets, as shown in the following figure:
$ mysql -h10.186.60.69 -uroot -p123456 -P3306
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '10.186.60.69' (110)
Wireshark analyzes the captured file:
After the client does not receive the ACK message of the TCP SYN message, it will continue to retry. In the example, it will retry six times and the RTO is different each time:
- The first time is retry after 1 second
- The second time is to retry after 3 seconds, which is about 2s different from the first time
- The third time is to retry after 7 seconds, which is about 4s different from the second time.
- The fourth time is to retry after 15 seconds, which is about 8s different from the third time
- The fifth time is to retry after 31 seconds, which is about 16s different from the fourth time.
- The sixth time is to retry after 65 seconds, which is about 32s after the fifth time.
The RTO increases exponentially with each timeout. In addition, the number of retries here can be configured, specified by the following kernel parameters of the client machine:
$ cat /proc/sys/net/ipv4/tcp_syn_retries
6
# 不同的发行版本,参数可能不同
$ uname -a
Linux ubuntu 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
You can try to modify this parameter to see the effect:
$ echo 2 > /proc/sys/net/ipv4/tcp_syn_retries
therefore:
What happens when the SYN packet of the first handshake packet of TCP is lost?
The client will retransmit the SYN message until it receives an ACK or reaches the maximum number of times, and the time for each retry is doubled.
second scenario
The SYN + ACK packet of the TCP second handshake is lost, what will happen?
In order to simulate the packet loss situation of SYN + ACK, set up a firewall on the client side to intercept all messages from the MySQL server:
$ iptables -A INPUT -p tcp -s 10.186.60.69 -j DROP
Capture packets on the MySQL server side:
$ tcpdump -i eth0 tcp and port 3306 -w tcp_syn_ack_timeout.cap
Analyze the tcp flow with the flow graph function under wireshark Statistics:
From the above figure, it can be divided into two perspectives:
Client's perspective: After the client sends the SYN message, it does not receive the SYN + ACK message due to the firewall set up. Therefore, the client will continue to retry until it receives the SYN + ACK or reaches the maximum number of retries.
Server perspective: After receiving the SYN message, the server sends a SYN + ACK message, but cannot receive the ACK message of the last handshake, so the server will continue to retry to send the SYN + ACK message.
After the SYN packet retransmitted by the client overtime arrives at the server, the server then returns the SYN and ACK packets, but the retransmission timers of the SYN and ACK packets are not reset and continue to be retransmitted.
As can be seen from the figure, after the server receives the third SYN message and sends a SYN + ACK message, it retries four times, including the previous retry.
This number of retries is also controlled by a kernel parameter:
$ cat /proc/sys/net/ipv4/tcp_synack_retries
5
After setting the client kernel parameter tcp_synack_retries to 1, the TCP interaction diagram:
This makes the number of retries more obvious.
third scenario
The ACK of the TCP third handshake is lost
Set up a firewall on the MySQL server side to intercept the ACK message of the TCP third handshake:
$ iptables -A INPUT -p tcp --tcp-flag ack ack --dport 3306 -j DROP
Capture packets on the MySQL server side:
$ tcpdump -i eth0 tcp and port 3306 -w tcp_3th_ack_timeout.cap
Analyze the captured file:
From the above figure, it can be divided into two perspectives:
Client perspective: For the client, the connection is actually established
View the status of the connection through the netstat command:
$ netstat -napt|grep 3306
tcp 0 0 10.186.60.60:42490 10.186.60.69:3306 ESTABLISHED 14391/mysql
The state of the connection at this time is the ESTABLISHED state.
Server perspective: Since the third ACK message is not received, similar to the second scenario, the server will continue to resend the SYN + ACK message until the maximum number of times is reached
During the retry, the state of the server connection is always in the SYN_RECV state:
$ netstat -napt|grep 3306
tcp 0 0 10.186.60.69:3306 10.186.60.60:42868 SYN_RECV -
After about half a minute, that is, after the server reaches the number of retries, the TCP connection that was in the SYN_RECV state just now on the server disappears.
However, the client connection still exists at this time.
What to do after the client is connected?
Discuss the scenarios at this point:
One scenario is that the client sends data directly after the TCP connection is established.
Another scenario is that the client does nothing. Both cases are discussed below.
Client sends data
To simulate this scenario, we connect the MySQL server through the client in advance.
After the connection is made, the client's message is isolated through the firewall on the MySQL server side:
$ iptables -A INPUT -p tcp -s 10.186.60.60 -j DROP
Capture packets on the MySQL server:
$ tcpdump -i eth0 tcp and port 3306 -w tcp_data.cap
Then through the connection just established, issue the use test ; statement.
Let's analyze the packet capture file:
analyze:
We found that the client retransmitted a total of eleven times.
For data packet transmission after TCP establishes a connection, the maximum number of timeout retransmissions is specified by tcp_retries2. The default value is 15 times. Here, for the convenience of observation, the value is adjusted to 10 times, as follows:
$ cat /proc/sys/net/ipv4/tcp_retries2
10
However, there are eleven packets transmitted in the packet capture file here, refer to the article here: https://perthcharles.github.io/2015/09/07/wiki-tcp-retries/
no action
In the MySQL protocol, after the TCP is established, the MySQL server will send a handshake packet. Since the MySQL server connection is no longer there, the handshake packet will not be sent, and the client will always hang.
$ mysql -h10.186.60.69 -uroot -p123456 -P3306
mysql: [Warning] Using a password on the command line interface can be insecure.
At this time, the survival of the client connection is ensured by the TCP keep-alive mechanism.
keep-alive mechanism:
- First of all, there is a premise: within a certain period of time, if there is no action on the connection, the TCP keep-alive mechanism will start to work.
- The keep-alive mechanism will send a "probe message" every fixed time. If several consecutive probe messages are not responded, it is considered that the TCP connection has died, and the system kernel will notify the upper-layer application of the error message.
There are corresponding parameters in the Linux kernel to set the keep-alive time, the number of keep-alive detections, and the time interval of keep-alive detections. The following are the default values:
$ sysctl -a|grep keepalive
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
# 也可以通过下面的方式来查看:
$ cat /proc/sys/net/ipv4/tcp_keepalive_intvl
$ cat /proc/sys/net/ipv4/tcp_keepalive_probes
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
Parameter explanation:
- tcp_keepalive_intvl=75: Indicates that each detection interval is 75 seconds;
- tcp_keepalive_probes=9: Indicates that no response is detected for 9 times, and the other party is considered unreachable, thus interrupting this connection.
- tcp_keepalive_time=7200: Indicates that the keepalive time is 7200 seconds (2 hours), that is, if there is no connection-related activity within 2 hours, the keepalive mechanism will be activated
We can modify the parameters to see the effect:
$ echo 10 > /proc/sys/net/ipv4/tcp_keepalive_intvl
$ echo 40 > /proc/sys/net/ipv4/tcp_keepalive_time
$ echo 2 > /proc/sys/net/ipv4/tcp_keepalive_probes
View through the packet capture file:
It can be seen that at the time of 40s, the detection of the live package was started, and the detection was performed twice. Each time interval was 10s, which met the definition of parameter modification.
After modifying the parameters, the client reported an error about a minute later:
$ mysql -h10.186.60.69 -uroot -p123456 -P3306
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 110
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。