头图

This article quotes the content of the article "Understanding the difference between TCP and UDP" by the author Fundebug. Thank you for your selfless sharing.

1 Introduction

Network protocol is the basic knowledge that every programmer who engages in network communication application development (such as IM, push, gateway, etc.) must master. There are two most representative transport layer protocols in the TCP/IP protocol suite—— They are TCP and UDP.

Students who have experience in network communication development know that TCP and UDP are the two most commonly used protocols. For many people, when and in what scenarios should we use TCP or UDP? This is a topic of constant discussion.

Different from other long articles, this article tries to use concise and concise text to help you summarize the main differences between TCP and UDP protocols, so that students who want to master this knowledge but are not willing to spend too much time to systematically learn the basics of network theory can quickly understand !

Recommended reading: In order to deepen your understanding, another article in this series, "Introduction to Network Programming (4): Quickly understand the difference between TCP and UDP", can also be read together.
study Exchange:

(This article has been simultaneously published at: http://www.52im.net/thread-3793-1-1.html )

2. Quickly understand the TCP/IP protocol suite

For computers and network devices to communicate with each other, both parties must be based on the same method. For example: how to detect the communication target, which side initiates the communication first, which language to use for communication, and how to end the communication rules need to be determined in advance. Communication between different hardware and operating systems, all of this requires a rule. And we call this kind of rule a protocol.

TCP/IP is the general term for various Internet-related protocol families, such as: TCP, UDP, IP, FTP, HTTP, ICMP, SMTP, etc., all belong to the TCP/IP family of protocols.

The TCP/IP model is the foundation of the Internet, and it is the general term for a series of network protocols. These protocols can be divided into four layers, namely the link layer, network layer, transport layer and application layer.

specifically is:

1) Link layer: Responsible for encapsulating and de-encapsulating IP messages, sending and receiving ARP/RARP messages, etc.;
2) Network layer: responsible for routing and sending packet messages to the target network or host;
3) Transport layer: Responsible for grouping and reorganizing messages, and encapsulating messages in TCP or UDP protocol format;
4) Application layer: Responsible for providing applications to users, such as HTTP, FTP, Telnet, DNS, SMTP, etc.

The following table summarizes:

The following picture more vividly reflects the relationship between the TCP/IP protocol family (the high-definition picture can be downloaded from here):

In the network architecture, the establishment of network communication must be carried out at the peer-to-peer layer of the communication parties, and cannot be interleaved.

In the entire data transmission process, when the data passes through each layer at the sending end, the protocol header and protocol tail of the corresponding layer must be added (only the data link layer needs to encapsulate the protocol tail) part, that is, the data must be protocol encapsulated. To identify the communication protocol used by the corresponding layer.

I can’t finish writing a few books about the TCP/IP protocol suite, so I won’t repeat them here. If you are interested, you can read "TCP/IP Detailed Explanation Volume 1: Protocol (Online Reading)".

In addition, to learn knowledge, I particularly like to know some knowledge outside of technology, such as the following two articles:

"Technical Past: The TCP/IP Protocol That Changed the World (Precious Multi-Pictures, Be Careful on Mobile Phones)"
"The 5G era has arrived, and TCP/IP is old, can you still eat? 》

Next, we will return to the topic and learn that there are two representative transport layer protocols in TCP/IP-TCP and UDP.

3. Quickly understand the UDP protocol

3.1 Basic introduction
UDP protocol: The full name is the user datagram protocol. In the network, it is used to process data packets like the TCP protocol. It is a connectionless protocol.

In the OSI model, it is in the fourth layer-the transport layer, which is the upper layer of the IP protocol (see the figure below).


▲ The above picture is quoted from "Computer Network Communication Protocol Diagram"

UDP has the disadvantages of not providing data packet grouping, assembly and inability to sort data packets. That is to say, after the message is sent, it is impossible to know whether it has arrived safely and completely.

I will summarize the main features of the UDP protocol, and I will explain them one by one in the next section below.

3.2 Towards connectionless
First of all, UDP does not need to perform a three-way handshake to establish a connection before sending data like TCP, and you can start sending when you want to send data. It is also only a porter of data messages, and will not perform any split and splicing operations on data messages.

Specifically:

1) At the sending end: the application layer transfers the data to the UDP protocol of the transport layer. UDP will only add a UDP header to the data. The UDP protocol is then passed to the network layer;
2) At the receiving end: the network layer passes the data to the transport layer, and UDP only removes the IP header and passes it to the application layer without any splicing operation.

3.3 Support unicast, multicast, broadcast
UDP not only supports one-to-one transmission, but also supports one-to-many, many-to-many, and many-to-one methods. That is to say, UDP provides unicast, multicast, and broadcast functions.

3.4 message-oriented
The UDP protocol is message-oriented.

The UDP of the sender delivers the message delivered by the application program down to the IP layer after adding the header. UDP neither merges nor splits the packets handed over by the application layer, but preserves the boundaries of these packets.

Therefore, the application program must select the appropriate size of the packet (see "What is the maximum size of a packet in UDP?").

3.5 Unreliability
The unreliability of UDP is first reflected in the connectionlessness. The two parties in the communication do not need to establish a connection, and send as soon as they want. This situation is definitely unreliable.

In addition, the data will be transferred according to the received data, and the data will not be backed up, and the sending data will not care whether the other party has correctly received the data.

In addition, the network environment is good and sometimes bad, but because UDP does not have congestion control, it will always send data at a constant speed (even if the network condition is not good, the transmission rate will not be adjusted).

The disadvantage of this implementation is that it may cause packet loss under bad network conditions, but the advantages are also obvious. In some scenarios with high real-time requirements (such as teleconferences), you need to use UDP instead of TCP (see "Network Introduction to Programming for Lazy People (5): Quickly understand why UDP is sometimes more advantageous than TCP").

The following animation can illustrate the unreliability of UDP:

As can be seen from the above animation, UDP will only throw the data messages it wants to send to the other party, and does not care whether the data arrives safely and completely.

3.6 Low head overhead
UDP protocol header overhead is small (as shown in the figure below), and it is very efficient when transmitting data packets.


▲ The above picture is quoted from "TCP/IP Detailed Explanation-Chapter 11 UDP Protocol"

The UDP header contains the following data:

1) Two 16-digit port numbers, which are the source port (optional field) and the destination port;
2) The length of the entire data message;
3) The checksum of the entire data message (IPv4 optional field), this field is used to find errors in the header information and data.

Therefore, the header overhead of UDP is small, only 8 bytes, which is much less than the at least 20 bytes of TCP, and it is very efficient when transmitting data messages.

For comparison, the following figure shows the header overhead of the TCP protocol:


▲ The above picture is quoted from "TCP/IP Detailed Explanation-Chapter 17 TCP Protocol"

3.7 Learn UDP protocol more comprehensively
The UDP protocol is relatively simple and easy to learn. If you feel that it is lacking in theory, you can supplement it with the chapters in the classic network book "TCP/IP Detailed-Chapter 11 · UDP: User Datagram Protocol".

In fact, the UDP protocol has its complicated side in production applications. The following articles are worth learning:

"Unknown Network Programming (5): UDP Connectivity and Load Balancing"
"The Unknown Network Programming (6): Deeply understand the UDP protocol and use it well"
"The Unknown Network Programming (7): How to make the unreliable UDP reliable? 》

In addition, as Internet companies such as Google have vigorously promoted the Quic protocol in recent years, the UDP protocol may find more application scenarios in the mobile Internet environment of the new era. Interested readers can learn about the QUIC protocol: "Introduction to Network Programming Lazy People (10): A time to soak in the urine to quickly understand the QUIC protocol", "Technical literacy: a new generation of UDP-based low-latency network transport layer protocol-QUIC detailed explanation", "Make the Internet faster: a new generation of QUIC protocol in "Tencent's Technical Practice Sharing".

4. Quickly understand the TCP protocol

4.1 Basic introduction
When a computer wants to communicate with another computer, the communication between the two computers needs to be smooth and reliable, so as to ensure that data is sent and received correctly.

For example: When you want to view a webpage or view an email, you want to view the webpage completely and in order without losing any content. When you download a file, you want to get the complete file, not just a part of the file, because if the data is lost or out of order, it is not the result you want, so TCP is used.

TCP protocol: The full name is Transmission Control Protocol. It is a connection-oriented, reliable, byte stream-based transport layer communication protocol, defined by IETF RFC 793.

TCP is a connection-oriented and reliable streaming protocol. Stream refers to the uninterrupted data structure, and you can think of it as a stream of water in a drain pipe.

For the theory of TCP protocol, you can continue to read "TCP/IP Detailed Explanation-Chapter 17 TCP: Transmission Control Protocol", which will be expanded here due to space limitations.

Next, we will introduce the most important features of TCP one by one.

4.2 TCP connection process (3-way handshake)
As shown in the figure below, this is the process of establishing a TCP connection (commonly known as "3-way handshake"):

1) The first handshake: The client sends a connection request segment to the server. This message segment contains its own data communication initial serial number. After the request is sent, the client enters the SYN-SENT state.

2) Second handshake: After the server receives the connection request segment, if it agrees to connect, it will send a response, which will also contain its own data communication initial sequence number, and enter the SYN-RECEIVED state after the transmission is completed .

3) The third handshake: When the client receives the connection agreement response, it also sends a confirmation message to the server. After sending this message segment, the client enters the ESTABLISHED state, and the server also enters the ESTABLISHED state after receiving this response. At this time, the connection is established successfully.

Everyone may have a doubt here: Why does TCP need three handshake to establish a connection, instead of two? This is because this is to prevent invalid connection request segments from being received by the server, resulting in errors.

The following animation demonstrates the 3-way handshake process, which may be better understood:

▲ The animation is quoted from "Follow the animation to learn TCP three-way handshake and four waved hands"

4.3 TCP disconnected (4 waves of hands)

TCP is full duplex. As shown in the figure above, both ends need to send FIN and ACK when disconnecting.

1) First wave: If client A thinks that the data transmission is complete, it needs to send a connection release request to server B.

2) The second wave: B will tell the application layer to release the TCP link after receiving the connection release request. Then it will send an ACK packet and enter the CLOSE_WAIT state. At this time, it indicates that the connection from A to B has been released and no more data sent by A will be received. But because the TCP connection is bidirectional, B can still send data to A.

3) Wave the third time: B will continue to send if there is still unfinished data at this time, and will send a connection release request to A after completion, and then B will enter the LAST-ACK state.

4) The fourth wave of hands: After receiving the release request, A sends a confirmation response to B, at which time A enters the TIME-WAIT state. This state will last for 2MSL (the maximum segment lifetime, which refers to the time the message segment survives in the network, and it will be discarded after the timeout). If there is no B retransmission request within this time period, it will enter the CLOSED state. When B receives the confirmation response, it also enters the CLOSED state.

Regarding TCP's 4 wave of hands, the following animation may be more vivid:

▲ The animation is quoted from "Follow the animation to learn TCP three-way handshake and four waved hands"

It is very important to correctly understand the TCP three-way handshake and four-way wave process. Due to space limitations, there is no way to further develop this article. Interested colleagues can further read several feature articles:

"Introduction to Brain Disabled Network Programming (1): Follow the animation to learn TCP three-way handshake and four waved hands"
"Theoretical Classics: Detailed Explanation of the Process of 3 Handshake and 4 Waves of TCP Protocol"
"Integration of Theory with Practice: Wireshark Captures Packets and Analyzes the Process of TCP 3-way Handshake, 4 Waves"

4.4 Summary of the main points of the TCP protocol
1) Connection-oriented:

Connection-oriented means that a connection must be established at both ends before sending data.

The method of establishing a connection is the "three-way handshake", so that a reliable connection can be established. Establishing a connection lays the foundation for reliable data transmission.

2) Only support unicast transmission:

Each TCP transmission connection can only have two endpoints, can only carry out point-to-point data transmission, and does not support multicast and broadcast transmission methods.

3) Oriented byte stream:

TCP does not transmit individual messages independently like UDP, but transmits in byte stream mode without preserving message boundaries.

4) Reliable transmission:

For reliable transmission, judging packet loss and error codes, it relies on the TCP segment number and confirmation number.

In order to ensure the reliability of message transmission, TCP gives each packet a sequence number, and the sequence number also ensures that the packets sent to the receiving end entity are received in order.

Then the receiving entity sends back a corresponding acknowledgment (ACK) for the successfully received bytes: if the sending entity does not receive the acknowledgment within a reasonable round-trip delay (RTT), then the corresponding data (assuming it is lost) Will be retransmitted.

Regarding the theory of reliable transmission, you can study "TCP/IP Detailed Explanation-Chapter 21 · TCP Timeout and Retransmission", and I won't go into it here.

5) Provide congestion control:

When the network is congested, TCP can reduce the rate and amount of data injected into the network to alleviate the congestion.

Articles about congestion control in TCP are generally boring. This article "Easy to Understand-Deep Understanding of TCP Protocol (Part 2): RTT, Sliding Window, Congestion Handling" is relatively easy to understand, and you can read it if you are interested. .

6) TCP provides full-duplex communication:

TCP allows applications on both sides of the communication to send data at any time, because both ends of the TCP connection are equipped with buffers to temporarily store data for two-way communication.

Of course, TCP can send a data segment immediately, or it can be buffered for a period of time to send more data segments at a time (the maximum data segment size depends on the MSS).

4.5 Learn the TCP protocol more comprehensively
The content of the TCP protocol is relatively rich, and it is really necessary to talk about it in all aspects, and it will not be finished in three days and three nights. However, for web application developers, it is enough to learn on-demand based on the technology involved in their own applications.

Beginners are advised to consolidate the theory first, for example from the classic book "TCP/IP Detailed Explanation-Chapter 17 TCP: Transmission Control Protocol".

If you think the theory is too boring, you must read the following lively and interesting introductory articles:

"Introduction to Network Programming for Lazy People (1): Quickly Understand Network Communication Protocol (Part 1)"
"Lazy Introduction to Network Programming (2): Quickly Understand Network Communication Protocol (Part 2)"
"Lazy Introduction to Network Programming (3): A quick understanding of TCP protocol is enough"
"Introduction to Network Programming for Lazy People (6): Introduction to the Function Principles of the Most Popular Hubs, Switches, and Routers in History"
"Introduction to Brain Disabled Network Programming (1): Follow the animation to learn TCP three-way handshake and four waved hands"
"Introduction to network programming has never been easier (1): If you were to design a network, what would you do? 》
"Introduction to network programming has never been easier (2): If you were to design the TCP protocol, what would you do? 》

In addition, in the process of learning TCP or network programming practice, it is also necessary to understand some other network knowledge. The following articles can let you learn easily, don't miss it:

"Introduction to Brain Disabled Network Programming (5): What is the Ping command that I use every day? 》
"Introduction to Brain Disabled Network Programming (6): What are public IP and intranet IP? What the hell is NAT? 》
"Introduction to Brain Disabled Network Programming (7): Necessary for face-to-face viewing, the most popular computer network hierarchical explanation in history"
"Introduction to Brain Disabled Network Programming (8): Do you really understand the difference between 127.0.0.1 and 0.0.0.0? 》
"Introduction to Brain Disabled Network Programming (9): Interview Compulsory Exams, the most popular and small endian byte order in history"

In production applications, the high performance and high concurrency of the network must be involved. The following articles are worth learning:

"High-performance network programming (1): How many concurrent TCP connections can a single server have"
"High-performance network programming (2): The famous C10K concurrent connection problem in the last 10 years"
"High-Performance Network Programming (3): In the next 10 years, it's time to consider C10M concurrency"
"High-performance network programming (4): Theoretical exploration of high-performance network applications from C10K to C10M"
"Understanding high performance and high concurrency from the root (1): Going deep into the bottom of the computer, understanding threads and thread pools"
"Understanding High Performance and High Concurrency from the Root (2): In-depth Operating System, Understanding I/O and Zero Copy Technology"
"Understanding high performance and high concurrency from the root (3): In-depth operating system, thorough understanding of I/O multiplexing"
"Understanding high performance and high concurrency from the root (4): In-depth operating system, thorough understanding of synchronization and asynchrony"
"Understanding high performance and high concurrency from the root (5): in-depth operating system, understanding the coroutine in high concurrency"
"Understanding high performance and high concurrency from the root (6): easy to understand, how high-performance servers are implemented in the end"
"Understanding high performance and high concurrency from the root (7): In-depth operating system, one article to understand processes, threads, and coroutines"

With the continuous expansion of TCP protocol applications, various intractable diseases will be encountered:

"The Unknown Network Programming (1): Analysis of the Intractable Diseases in the TCP Protocol (Part 1)"
"The Unknown Network Programming (2): Analysis of the Intractable Diseases in the TCP Protocol (Part 2)"
"Unknown Network Programming (3): Why TIME_WAIT, CLOSE_WAIT When Close TCP Connection"
"The Unknown Network Programming (4): In-depth study and analysis of TCP's abnormal shutdown"

For the TCP protocol, the more you know, the more you will feel ignorant. The following articles may completely solve some doubts for you, which are rare:

"The Unknown Network Programming (10): Go Deep into the Operating System and Understand the Process of Receiving Network Packets from the Kernel (Linux)"
"Unknown Network Programming (11): Starting from the bottom, in-depth analysis of the time-consuming secrets of TCP connections"
"Unknown Network Programming (12): Thoroughly understand the KeepAlive mechanism of the TCP protocol layer"
"Unknown Network Programming (13): In-depth operating system, thoroughly understand 127.0.0.1 native network communication"

5. Summarize

The difference between TCP and UDP can be summarized in the following table:

Simply put, the difference between TCP and UDP is:

1) TCP provides connection-oriented reliable services to the upper layer, and UDP provides connectionless and unreliable services to the upper layer;
2) Although UDP is not as accurate as TCP transmission, it can also make a difference in many places with high real-time requirements;
3) The data accuracy requirements are high, and the speed can be relatively slow, and TCP can be used.

Finally, I want to use a picture to vividly summarize the difference between TCP and UDP:

As shown in the picture above: TCP is like the girl on the left-drinking water methodically and without dripping, UDP is like the girl on the right-you can drink as much as you can. . .

6. Series of articles

This article is the 13th article in a series. The outline of this series is as follows:

[1] Introduction to Network Programming for Lazy People (1): Quickly Understand Network Communication Protocol (Part 1)
[2] Introduction to Network Programming for Lazy People (2): Quickly Understand Network Communication Protocol (Part 2)
[3] Introduction to Network Programming for Lazy People (3): A quick understanding of TCP protocol is enough
[4] Introduction to Network Programming for Lazy People (4): Quickly understand the difference between TCP and UDP
[5] Introduction to Network Programming for Lazy People (5): Quickly understand why UDP is sometimes more advantageous than TCP
[6] Introduction to network programming for lazy people (6): Introduction to the most popular hubs, switches, and routers in history
[7] Introduction to Network Programming for Lazy People (7): Explain the basics in a simple way, fully understand the HTTP protocol
[8] Introduction to Network Programming for Lazy People (8): Teach you how to write TCP-based Socket long connections
[9] Introduction to Network Programming for Lazy People (9): Popular explanation, with an IP address, why use a MAC address?
[11] Introduction to network programming for lazy people (ten): a time to soak in the urine to quickly understand the QUIC protocol
[12] Introduction to Network Programming for Lazy People (11): One article to understand what IPv6 is
[13] Introduction to Network Programming for Lazy People (12): Quickly understand the Http/3 protocol, one article is enough!
[14] Introduction to Network Programming for Lazy People (13): In a soaking time, quickly understand the difference between TCP and UDP (* this article)

(This article has been simultaneously published at: http://www.52im.net/thread-3793-1-1.html )


JackJiang
1.6k 声望810 粉丝

专注即时通讯(IM/推送)技术学习和研究。