The author of this article, Zhang Yanfei, has a little change in the original title "Talking about the time-consuming TCP connection".
1 Introduction
For Internet-based communication applications (such as IM chat, push system), the TCP protocol is relatively used for data transmission. This is because in the transport layer protocol of the TCP/IP protocol suite, the TCP protocol has the advantages of reliable connection, error retransmission, and congestion control, so it is currently more widely used in application scenarios than UDP.
I believe you must have heard that TCP also has some shortcomings, and it is often a commonplace that the overhead is slightly larger. However, various technical blogs simply say that the overhead is high or the overhead is small, and it is rare that a specific quantitative analysis is not given. To put it bluntly, similar discussions are nonsense with no nutrition.
After thinking about my daily work, what I want to understand more is how much TCP overhead is and whether it can be quantified. How long does it take to establish a TCP connection, how many milliseconds, or how many microseconds? Can there be even a rough quantitative estimate? Of course, there are many factors that affect the TCP time-consuming, such as network packet loss and so on. Today I only share various situations that I have encountered with high incidence in my work practice.
Written in the front: Thanks to the open source of the Linux kernel, the underlying and specific kernel-level code examples mentioned in this article are all based on the Linux system.
study Exchange:
- Introductory article on mobile IM development: "One entry is enough for beginners: developing mobile IM from scratch"
- Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK (click here for alternate address)
(This article has been published simultaneously at: http://www.52im.net/thread-3265-1-1.html )
2. Series of articles
This article is the 11th in a series of articles, the outline of which is as follows:
"Unknown Network Programming (1): Analysis of Intractable Diseases in the TCP Protocol (Part 1)"
"Unknown Network Programming (2): Analysis of Intractable Diseases in the TCP Protocol (Part 2)"
"Unknown Network Programming (3): Why TIME_WAIT, CLOSE_WAIT When Closing a TCP Connection"
"Unknown Network Programming (4): In-depth study and analysis of abnormal shutdown of TCP"
"Unknown Network Programming (5): UDP Connectivity and Load Balancing"
"Unknown Network Programming (6): In-depth understanding of the UDP protocol and making good use of it"
"Unknown Network Programming (7): How to Make Unreliable UDP Reliable? 》
"Unknown Network Programming (8): Deep Decryption of HTTP from the Data Transport Layer"
"Unknown Network Programming (9): Combining theory with practice, comprehensive and in-depth understanding of DNS"
"Unknown Network Programming (10): Deepening the Operating System and Understanding the Receiving Process of Network Packets from the Kernel (Linux)"
"Unknown Network Programming (11): Starting from the bottom, in-depth analysis of the secrets of TCP connection time-consuming" (this article)
"Unknown Network Programming (12): Thoroughly Understand the KeepAlive Mechanism of the TCP Protocol Layer"
"Unknown Network Programming (13): Go deep into the operating system and thoroughly understand 127.0.0.1 local network communication"
"Unknown Network Programming (14): Unplug the network cable and plug it in again, is the TCP connection still there? Understand it in one sentence! 》
3. Time-consuming analysis of TCP connections under ideal conditions
To understand the time-consuming TCP connection, we need to understand the connection establishment process in detail.
In the previous article "Go deep into the operating system and understand the receiving process of network packets from the kernel (Linux)", we introduced how the data packets are received at the receiving end: the data packets come out from the sender and reach the network card of the receiver through the network; After the receiver network card DMAs the data packet to the RingBuffer, the kernel processes it through hard interrupt, soft interrupt and other mechanisms (if the user data is sent, it will finally be sent to the receive queue of the socket and wake up the user process).
In the soft interrupt, when a packet is taken from the RingBuffer by the kernel, it is represented by the struct sk_buff structure in the kernel (see the kernel code include/linux/skbuff.h). The data member is the received data. When the protocol stack is processed layer by layer, the data concerned by each layer of the protocol can be found by modifying the pointer to point to different locations of the data.
For TCP protocol packets, there is an important field - flags in its Header.
As shown below:
By setting different flag bits, the TCP packets are divided into SYNC, FIN, ACK, RST and other types:
1) The client uses the connect system call to command the kernel to issue SYNC, ACK and other packets to establish a TCP connection with the server;
2) On the server side, many connection requests may be received, and the kernel also needs to use some auxiliary data structures - semi-connection queues and full-connection queues.
Let's take a look at the entire connection process:
In this connection process, let's briefly analyze the time-consuming of each step:
1) The client sends a SYNC packet: the client generally sends a SYN through the connect system call, which involves the CPU time-consuming overhead of the local system call and soft interrupt;
2) SYN is transmitted to the server: SYN is sent from the client's network card, and begins to "cross the mountains and the sea, and also through the sea of people...", which is a long-distance network transmission;
3) The server processes the SYN packet: the kernel receives the packet through a soft interrupt, then puts it in the semi-connected queue, and then sends a SYN/ACK response. It is also the CPU time-consuming overhead;
4) SYC/ACK is transmitted to the client: After the SYC/ACK is sent from the server, it also crosses many mountains and possibly many seas to the client. Another long-distance network journey;
5) The client processes SYN/ACK: After the client kernel receives the packet and processes the SYN, after a few us CPU processing, it sends an ACK. The same is the soft interrupt processing overhead;
6) The ACK is transmitted to the server: the same as the SYN packet, it is transmitted through almost the same distance. Another long-distance network journey;
7) The server receives the ACK: the server kernel receives and processes the ACK, and then takes the corresponding connection from the semi-connected queue and puts it in the full-connected queue. A soft interrupt CPU overhead;
8) Server-side user process wakeup: The user process blocked by the accpet system call is woken up, and then the established connection is taken out from the full connection queue. CPU overhead for one context switch.
The above steps can be simply divided into two categories:
The first category: the kernel consumes the CPU for receiving, sending or processing, including system calls, soft interrupts and context switching. Their time-consuming is basically a few us;
The second category is network transmission. After the packet is sent from a machine, it has to go through various network cables, various switches and routers in the middle. Therefore, the time-consuming of network transmission is much higher than that of local CPU processing. According to the distance of the network, it generally ranges from a few ms to several hundred ms.
1ms is equal to 1000us, so the network transmission time is about 1000 times higher than the CPU overhead of both ends, or even 100000 times higher.
Therefore, in the process of establishing a normal TCP connection, generally consider the network delay.
PS: An RTT refers to a round-trip delay time of a packet from one server to another server.
So from a global point of view: the network time required for the establishment of a TCP connection takes about three transmissions, plus a little CPU overhead on both sides, which is a little larger than 1.5 times the RTT in total.
However, from the client's point of view: as long as the ACK packet is sent, the kernel considers that the connection is established successfully. Therefore, if you count the time taken to establish a TCP connection on the client side, you only need two transmission times—that is, a little more than 1 RTT. (Similarly from the perspective of the server side, it takes time from the receipt of the SYN packet to the receipt of the ACK, and an RTT takes time).
4. Time-consuming analysis of TCP connections in extreme cases
As can be seen in the previous section: From the perspective of the client, under normal circumstances, the total time taken for a TCP connection is about the time taken for a network RTT. If everything were that simple, I don't think my sharing would be necessary. Things don't always have to be this good, and surprises are inevitable.
In some cases, it may lead to increased network transmission time during TCP connection, increased CPU processing overhead, or even connection failure. In this section, I will analyze the time-consuming situation of TCP connections in extreme cases based on the various personal experiences I have encountered online.
4.1 The time-consuming case of client connect call is out of control. Normally, the time-consuming of a system call is about a few us (microseconds). However, in my article "Tracking the Murderer Who Exhausted Server CPU!", one of the author's servers encountered a situation at that time: an operation and maintenance classmate conveyed that the service CPU was not enough and needed to be expanded.
The server monitoring at that time is as follows:
The service has been resisting about 2000 qps per second before, and the idel of the CPU has always been 70%+, why suddenly the CPU is not enough.
What's even more strange is that when the CPU was hit the bottom, the load was not high (the server is a 4-core machine, and the load of 3-4 is relatively normal).
Later, after investigation, it was found that when the TIME_WAIT of the TCP client is about 30,000, and the available ports are not particularly sufficient, the CPU overhead of the connect system call directly increased by more than 100 times, and the time consumption reached 2500us (microseconds) each time. millisecond level.
When this problem is encountered, although the TCP connection establishment time is only increased by about 2ms, the overall TCP connection time seems acceptable. But the problem here is that more than 2ms are consuming CPU cycles, so the problem is not small.
The solution is also very simple, and there are many ways: modify the kernel parameter net.ipv4.ip_local_port_range to reserve more port numbers, or use a long connection instead.
4.2 The case where the TCP half/full connection queue is full If any queue is full during the connection establishment process, the syn or ack sent by the client will be discarded. After the client waits for a long time without results, it will then issue a TCP Retransmission.
Take the semi-join queue as an example:
It should be known that the above TCP handshake timeout retransmission time is in seconds. That is to say, once the connection queue on the server side causes the connection establishment to fail, it will take at least seconds to establish the connection. In the case of the same computer room, it only takes less than 1 millisecond, which is about 1000 times higher.
Especially for programs that provide real-time services to users, the user experience will be greatly affected. If the handshake is not successful even in the retransmission, it is very likely that the user cannot wait for the second retry, and the user access will time out directly.
There is another worse situation: it may affect other users.
If you use the process/thread pool model to provide services, such as: php-fpm. We know that the fpm process is blocked. When it responds to a user request, the process has no way to respond to other requests. Suppose you open 100 processes/threads, and 50 processes/threads are stuck on the handshake connection with the redis or mysql server for a certain period of time (note: your server is the client side of the TCP connection at this time). This period of time is equivalent to only 50 normal working processes/threads that you can use. And the 50 workers may not be able to handle it at all, and your service may be congested at this time. If it continues for a little longer, an avalanche may occur, and the entire service may be affected.
Since the consequences can be so severe, how do we see if the service at hand is full because the half/full connection queue is full?
On the client side: You can capture packets to see if there is a SYN TCP Retransmission. If there is an occasional TCP Retransmission, it means that there may be a problem with the corresponding server connection queue.
On the server side: it is more convenient to view. netstat -s can view the packet loss statistics caused by the current system's semi-connection queue being full, but this number records the total number of lost packets. You need to use the watch command to monitor dynamically. If the following numbers change during your monitoring process, it means that the current server has lost packets due to the full semi-connection queue. You may need to increase the length of your semi-join queue.
$ watch 'netstat -s | grep LISTEN'
8 SYNs to LISTEN sockets ignored
For a fully connected queue, the viewing method is similar:
$ watch 'netstat -s | grep overflowed'
160 timesthe listen queue of a socket overflowed
If your service is dropping packets because the queue is full, one of the options is to increase the length of the half/full connection queue. In the Linux kernel, the length of the semi-connection queue is mainly affected by tcp_max_syn_backlog and it can be increased to an appropriate value.
cat /proc/sys/net/ipv4/tcp_max_syn_backlog
1024
echo "2048" > /proc/sys/net/ipv4/tcp_max_syn_backlog
The full connection queue length is the smaller of the backlog passed in when the application calls listen and the kernel parameter net.core.somaxconn. You may need to tune both your application and this kernel parameter.
cat /proc/sys/net/core/somaxconn
128
echo "256" > /proc/sys/net/core/somaxconn
After the modification, we can confirm the final effective length through the Send-Q output by the ss command:
$ ss -nlt
Recv-Q Send-Q Local Address:Port Address:Port
0128 :80 :*
Recv-Q tells us the current full connection queue usage length of the process. If Recv-Q has approached Send-Q, then you may not need to wait for packet loss and you should be ready to increase your fully connected queue.
If there are still very occasional queue overflows after increasing the queue, we can tolerate it for the time being.
What if there is still a long period of time that cannot be processed?
Another way is to report an error directly, and don't let the client wait for a timeout.
For example, set the kernel parameter tcp_abort_on_overflow of the backend interfaces such as Redis and Mysql to 1. If the queue is full, send reset directly to the client. Tell the backend processes/threads not to wait idiotically. At this point the client will receive the error "connection reset by peer". Sacrificing a user's access request is better than crashing the entire site.
5, TCP connection time-consuming measured analysis
5.1 Preparation before the test I wrote a very simple code to count how long it takes to create a TCP connection on the client side.
<?php
$ip = {server ip};
$port= {server port};
$count= 50000;
function buildConnect($ip,$port,$num){
for($i=0;$i<$num;$i++){
$socket= socket_create(AF_INET,SOCK_STREAM,SOL_TCP);
if($socket==false) {
echo"$ip $port socket_create() 失败的原因是:".socket_strerror(socket_last_error($socket))."\n";
sleep(5);
continue;
}
if(false == socket_connect($socket, $ip, $port)){
echo"$ip $port socket_connect() 失败的原因是:".socket_strerror(socket_last_error($socket))."\n";
sleep(5);
continue;
}
socket_close($socket);
}
}
$t1= microtime(true);
buildConnect($ip, $port, $count);
echo(($t2-$t1)*1000).'ms';
Before testing, we need to have enough ports available on the local linux. If it is not enough, it is best to adjust it enough.
echo "5000 65000" /proc/sys/net/ipv4/ip_local_port_range
5.2 Test under normal circumstances Note: Neither the client nor the server should choose a machine where the wired service is running, otherwise your test may affect normal user access
First of all: my client is located in the IDC computer room in Huailai, Hebei, and the server selected is a certain machine in the company's Guangdong computer room. The delay obtained by executing the ping command is about 37ms. After using the above script to establish 50,000 connections, the average connection time is also 37ms.
This is because as we said earlier, for the client, as long as the third handshake is sent out, the handshake is considered to be successful, so only one RTT and two transmissions are required. Although there will be system call overhead and soft interrupt overhead of the client and server in the middle, because their overhead is normally only a few us (microseconds), it has little effect on the total connection establishment delay.
Next: I changed a target server, the server room is located in Beijing. There is some distance from Huailai, but it is much closer than Guangdong. The RTT of this ping is about 1.6~1.7ms. After the client establishes 50,000 connections, it is calculated that each connection takes 1.64ms.
Do another experiment: The server and client selected for this experiment are directly located in the same computer room, and the ping delay is about 0.2ms~0.3ms. After running the above script, the experimental result is that 50000 TCP connections consume a total of 11605ms, with an average of 0.23ms each time.
Online architecture reminder: Here we see that the delay in the same computer room is only a few tenths of a millisecond, but across a computer room not far away, the time required for the TCP handshake alone has increased by 4 times. If you go to Guangdong across regions, it will be a hundred times the time-consuming gap. When deploying online, the ideal solution is to deploy the various mysql, redis and other services that your own services depend on in the same area and the same computer room (more abnormal, or even the same rack). Because of this, the transmission of various network packets, including TCP link establishment, is much faster. It is necessary to avoid long-distance cross-region computer room calls as much as possible.
5.3 Test in the case of TCP connection queue overflow The test is completed across regions, across computer rooms and across machines. This time, for the sake of speed, what will be the result of establishing a connection directly with this machine?
The delay of pinging the local ip or 127.0.0.1 is about 0.02ms, and the local ip is definitely shorter than the RTT of other machines. I think the connection will be very fast, um experiment.
Continuous establishment of 5W TCP connection: the total time consumption is 27154ms, and the average time is about 0.54ms each time.
Ok! ? How is it much longer than cross-machine?
With the previous theoretical basis, we should have thought: because the local RTT is too short, the instantaneous connection establishment request is very large, which will lead to the situation that the full connection queue or semi-connection queue is full. Once the queue is full, the connection request that hits at that time needs a connection establishment delay of 3 seconds +. So in the above experimental results, the average time-consuming seems to be much higher than RTT.
During the experiment, I used tcpdump to capture packets and saw the following scene. It turned out that a small number of handshakes took 3s+, because the semi-connection queue was full, and the client retransmitted SYN after waiting for a timeout.
We changed it again to sleep for 1 second every 500 connections. Well, finally there is no card (or you can increase the length of the connection queue).
The conclusion is: 50,000 TCP connections on the local machine take a total of 102399 ms to count on the client. After subtracting 100 seconds of sleep, each TCP connection consumes 0.048 ms on average. Slightly higher than ping latency.
This is because when the RTT becomes small enough, the time-consuming overhead of the kernel CPU will appear. In addition, the TCP connection is more complicated than the icmp protocol of ping, so it is normal that the delay is slightly higher than that of ping by about 0.02ms.
6. Summary of this article
In the case of abnormal establishment of a TCP connection, it may take several seconds. One disadvantage is that it will affect the user experience, and may even cause the current user access to timeout. Another disadvantage is that it may induce avalanches.
So when your server uses a short connection to access data: Be sure to learn to monitor your server's connection establishment for abnormal conditions. If there is, learn to optimize it away. Of course, you can also use the native memory cache, or use the connection pool to maintain long connections. These two methods can directly avoid the various overheads of the TCP handshake.
Besides, under normal circumstances: the delay of TCP establishment is about one RTT time between two machines, which is unavoidable. But you can control the physical distance between the two machines to reduce this RTT. For example, deploy the redis you want to access as close to the back-end interface machine as possible, so that the RTT can also be reduced from tens of ms to the lowest possible zero. few ms.
Finally, let's think again: if we deploy the server in Beijing, is it feasible for users in New York to access it?
Whether we are in the same computer room or across the computer room, the time-consuming of electrical signal transmission can basically be ignored (because the physical distance is very close), and the network delay is basically the time-consuming of the forwarding equipment. But if it spans half the world, we have to figure out how long it takes to transmit electrical signals. The spherical distance from Beijing to New York is about 15,000 kilometers, so regardless of the forwarding delay of the device, only one round trip at the speed of light (RTT is the Rround trip time, it needs to run twice), it takes time = 15,000,000 * 2 / speed of light = 100ms. The actual delay may be larger than this, generally more than 200ms. Based on this delay, it is difficult to provide second-level services that users can access. Therefore, for overseas users, it is best to build a local computer room or buy an overseas server.
Appendix: More Network Programming Essentials
[1] Network programming (basic) information:
"TCP/IP Explained - Chapter 17 TCP: Transmission Control Protocol"
"Technology Past: The TCP/IP Protocol that Changed the World (Precious and Multi-Picture, Be Careful with Your Mobile Phone)"
"Easy to Understand - In-depth Understanding of the TCP Protocol (Part 1): Theoretical Basis"
"Theoretical Classics: Detailed explanation of the 3-way handshake and 4-way wave process of the TCP protocol"
"Detailed Explanation of P2P Technology (1): Detailed Explanation of NAT - Detailed Principles, Introduction to P2P"
"Introduction to Network Programming Lazy People (1): Quickly Understand Network Communication Protocols (Part 1)"
"Introduction to Network Programming Lazy People (2): Quickly Understand Network Communication Protocols (Part 2)"
"Introduction to Network Programming Lazy People (3): A Quick Understanding of the TCP Protocol is Enough"
"Introduction to Network Programming Lazy People (4): Quickly Understand the Difference Between TCP and UDP"
"Introduction to Network Programming for Lazy People (5): Quickly Understand Why UDP is Sometimes More Advantageous than TCP"
"Introduction to Network Programming Lazy People (6): Introduction to the Functional Principles of the Most Popular Hubs, Switches, and Routers in History"
"Introduction to Lazy People in Network Programming (7): Explain the profound things in simple language and fully understand the HTTP protocol"
"Introduction to Lazy People in Network Programming (8): Teach you how to write TCP-based Socket long connections"
"Introduction to Network Programming Lazy People (9): A popular explanation, why use a MAC address when you have an IP address? 》
"Introduction to Lazy People in Network Programming (10): Quickly Read the QUIC Protocol in a Pissing Time"
"Introduction to Network Programming for Lazy People (11): Understand what IPv6 is in one article"
"Introduction to Lazy People in Network Programming (12): Quickly read and understand the Http/3 protocol, one article is enough! 》
"Introduction to Network Programming for Lazy People (13): Quickly understand the difference between TCP and UDP in a pee time"
"Introduction to Lazy People in Network Programming (14): What is Socket? Understand it in one sentence! 》
"Technical literacy: a new generation of UDP-based low-latency network transport layer protocol - QUIC detailed"
"Making the Internet Faster: The Technical Practice Sharing of the New Generation of QUIC Protocol in Tencent"
"Talking about the long connection of network programming in iOS"
Detailed explanation of IPv6 technology: basic concepts, application status, and technical practice (Part 1)
Detailed Explanation of IPv6 Technology: Basic Concepts, Application Status, and Technical Practice (Part 2)
"Detailed explanation of Java's support for IPv6: support, related API, demo code"
"From HTTP/0.9 to HTTP/2: Understanding the Historical Evolution and Design Ideas of the HTTP Protocol in One Article"
"Introduction to Brain Stupid Network Programming (1): Follow the animation to learn TCP three-way handshake and four-way wave"
"Introduction to Brain Stupid Network Programming (2): When we read and write Socket, what are we reading and writing? 》
"Introduction to Brain Stupid Network Programming (3): Some Knowledge You Must Know about the HTTP Protocol"
"Introduction to Brain Stupid Network Programming (4): Quickly Understand HTTP/2 Server Push"
"Introduction to Brain Stupid Network Programming (5): The Ping command I use every day, what is it? 》
"Introduction to Brain Stupid Network Programming (6): What are public IP and intranet IP? What the hell is NAT translation? 》
"Introduction to Brain Stupid Network Programming (7): A must-have for face-to-face, the most popular computer network layering in history"
"Introduction to Brain Stupid Network Programming (8): Do you really understand the difference between 127.0.0.1 and 0.0.0.0? 》
"Introduction to Brain Stupid Network Programming (9): Interview Required, Detailed Explanation of the Most Popular Big and Small Endian Byte Order in History"
"Towards the Advanced Level: Network Basics that Excellent Android Programmers Must Know and Know"
"Android programmers must know the network communication transport layer protocol - UDP and TCP"
"Technology Daniel Chen Shuo's Sharing: From Simple to Deep, Summary of Network Programming Learning Experience"
"May screw up your interview: Do you know how many HTTP requests can be made on a TCP connection? 》
"The 5G era has arrived, TCP/IP is old, can you still eat? 》
More similar articles...
[2] Network programming (high-level) information:
"High-performance network programming (1): How many concurrent TCP connections can a single server have"
"High-performance network programming (2): the last 10 years, the famous C10K concurrent connection problem"
"High-performance network programming (3): In the next 10 years, it is time to consider C10M concurrency"
"High-performance network programming (4): Theoretical exploration of high-performance network applications from C10K to C10M"
"High-performance network programming (5): One article to understand the I/O model in high-performance network programming"
"High-performance network programming (6): One article to understand the threading model in high-performance network programming"
"High-performance network programming (7): What exactly is high concurrency? Understand it in one sentence! 》
"Introduction to Zero-Based Communication Technology for IM Developers (10): Zero-Based, the Strongest 5G Technology Literacy in History"
"Introduction to Zero-Based Communication Technology for IM Developers (11): Why is the WiFi Signal Poor? Understand it in one sentence! 》
"Introduction to Zero-Based Communication Technology for IM Developers (12): Internet Stalling? Internet disconected? Understand it in one sentence! 》
"Introduction to Zero-Based Communication Technology for IM Developers (13): Why is the mobile phone signal poor? Understand it in one sentence! 》
"Introduction to Zero-Based Communication Technology for IM Developers (14): How Difficult is Wireless Internet Access on High-speed Rail? Understand it in one sentence! 》
"Introduction to Zero-Based Communication Technology for IM Developers (15): Understanding Positioning Technology, One Article is Enough"
"Take the design of the network access layer of the online game server as an example to understand the technical challenges of real-time communication"
"Zhuhu Technology Sharing: Zhihu Technology Practice of High-performance Long Connection Gateway with Tens of Millions of Concurrency"
"Taobao Technology Sharing: The Road to Technological Evolution of the Mobile Access Layer Gateway of Hand Taobao Billion"
More similar articles...
(This article has been published simultaneously at: http://www.52im.net/thread-3265-1-1.html )
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。