Text | Zeng Ke (Flower Name: Yi Si)
Senior Engineer of Ant Group
Responsible for the construction of the access layer of the Ant Group
The main direction is the design and optimization of high-performance secure network protocols
This article is 10279 words read in 18 minutes
PART. 1 Introduction
As the first article of the series, the introduction part is a bit more cumbersome, so that everyone has a simple understanding of this series of articles.
First introduce the background of the birth of this series of articles. Words such as QUIC and HTTP/3 are not unfamiliar to everyone. From a personal point of view, most developers actually already have some background knowledge. For example, the core of HTTP/3 is to rely on QUIC to implement the transport layer and TLS layer capabilities. When talking about the details, everyone knows very little. Most of the related articles are just a brief introduction to some mechanisms and features in HTTP/3, and there is little in-depth analysis. For the reasons behind these mechanisms, The analysis of design ideas is even more rare.
From the personal experience of not a lot of RFC reading and draft writing, just like writing papers, in order to ensure the conciseness and accuracy of an RFC, of course, it is also for the simplicity of the writing process. When it comes to other related agreements, the author often expresses it through direct quotation. This means that learning and understanding network protocols directly by reading the RFC is a relatively steep process. Often when readers read a key part, they have to jump to other documents, and then repeat this headache. When readers return to the original document again, they may have forgotten what the previous context was.
And HTTP/3 involves standard documents such as QUIC, TLS, HTTP/2, QPACK, and these standard documents each have a large number of related documents, so learning is not an easy task.
Of course, the title of the series of articles is "Deep HTTP/3" instead of "Deep QUIC". The reason behind it is that HTTP/3 is not just a point like QUIC, it also contains a large number of existing HTTP protocols and QUIC. The organic combination. In the follow-up of the series of articles, this part will be analyzed in depth.
The performance of a protocol is excellent or not, in addition to its own design, it is also inseparable from a large amount of software and hardware optimization, architecture implementation, special design and other engineering practices. Therefore, this series will not only share the characteristics of HTTP/3 itself. , Will also share the HTTP/3 solution in Ant.
The end of the introduction is also the official beginning of this article.
According to statistics, when humans are learning new knowledge, they are more accustomed to analogy and inference from existing knowledge in order to produce a deeper perceptual and rational understanding. I think for most of the students, "Why does TCP shake hands three times and wave four times?" This question is quite classic and can no longer be classic, so today this article will also discuss the process of establishing and closing the QUIC link. Get started and start the first article in our series.
PART. 2 Link establishment
2.1 Revisit TCP
"Why does TCP need a three-way handshake?"
Before answering the question, we need to understand the nature of TCP. The TCP protocol is designed as a connection-oriented, reliable, byte stream-based full-duplex transport layer communication protocol.
"Reliable" means that when using the TCP protocol to transmit data, if the TCP protocol returns a successful transmission, then the data must have been successfully transmitted to the opposite end. In order to ensure the "reliable" transmission of data, we first need a response mechanism to confirm that the opposite end has been sent. The data is received, and this is the ACK mechanism we are familiar with.
"Streaming" is a kind of abstraction in use (that is, the transceiver does not need to pay attention to the underlying transmission, and only needs to send and read the data as a continuous stream of bytes). The use of "streaming" strongly relies on the orderly transmission of data. For this abstraction of use, we need a mechanism to ensure the order of data. The design of the TCP protocol stack is to mark each byte sent The corresponding seq (seq is a range in actual application, but its real effect is that each byte is marked in order). The receiving end checks the seq of the currently received data, and compares it with the current record of the opposite end. seq is compared to confirm the order of the data.
"Full duplex" means that the receiving and sending process at one end of the communication is reliable and streaming, and the receiving and sending are two completely independent behaviors that do not interfere with each other.
It can be seen that these features of TCP are implemented using the seq and ACK fields as the carrier, and all TCP interaction processes are for the above-mentioned features. Of course, the three-way handshake is no exception. Let’s look at the three TCP’s Schematic diagram of handshake:
In order to ensure that both parties in the communication can confirm the sending order of the data from the opposite end, both the receiving and sending end need to record the current seq of the opposite end and confirm that the opposite end has synchronized its own seq. In order to ensure this process, at least 3 RTTs are required. For efficiency reasons, the actual implementation puts seq and ACK in one message, which also forms the three-way handshake we are familiar with.
Of course, the three-way handshake is not only to synchronize the seq, but also to verify that the client is a normal client. For example, TCP may face these problems:
(1) There are some TCP attack requests, only syn requests are sent, but no data is returned, which wastes socket resources;
(2) The invalid connection request message segment is suddenly transmitted to the server, and the data will no longer have a follow-up response. How to prevent such a request from wasting resources?
And these problems are just three-way handshake problems that can be solved easily, not the three-way handshake specifically designed for them.
Careful, you may have discovered a problem. If we agree that the seq of client and server start from 0 (or some fixed value that everyone knows), can we not synchronize seq? It seems that there is no need for the troublesome three-way handshake? Can I start sending data directly?
Of course, the designer of the protocol must have thought about this scheme. As for why it is not implemented in this way, let's take a look at what kind of problems TCP faces in the next chapter.
2.2 Problems faced by TCP
2.2.1 seq attack
In the previous section, we mentioned that TCP relies on seq and ACK to achieve reliable, streaming, and full-duplex transmission modes, but in the actual process, a three-way handshake is required to synchronize the dual-ended seq. If we agree in advance for the initial communication between the two parties seq can actually avoid the three-way handshake, so why didn't you do this? The answer is security.
We know that TCP data is not protected by any security, no matter its header or payload, for an attacker, he can forge a legitimate TCP message in any corner of the world.
A typical example is that an attacker can forge a reset message to forcefully close a TCP link, and the key to a successful attack is the seq and ACK fields in the TCP field. As long as these two items in the message are within the receiver’s sliding window, The message is legal, and the TCP handshake uses a random seq method (not completely random, but linearly increases with the passage of time, and rolls back at the end of 2^32) to increase the difficulty for the attacker to guess the seq. safety.
For this reason, TCP also has to perform a three-way handshake to synchronize their seq. Of course, this method has a certain effect on off-path attackers, and is completely invalid for on-path attackers. An on-path attacker can still reset the link at will, even forge messages, and tamper with user data. .
Therefore, although TCP has made some efforts for security, because it is essentially a transmission protocol, security is not its original consideration. In the current network environment, TCP will encounter a lot of security problems.
2.2.2 Inevitable data security issues
I believe that words such as SSL/TLS/HTTPS are familiar to everyone. The entire TLS (Transport Security Layer) actually solves the TCP payload security problem. Of course, this is also the most important problem.
For example, for a user, he may be able to tolerate a failed transfer, but he certainly cannot tolerate the money being transferred to the attacker. The emergence of TLS provides users with a mechanism to ensure that the middleman cannot read and tamper with TCP payload data. TLS also provides a secure identity authentication system to prevent attackers from impersonating a Web service provider. However, the TCP header layer is still out of the scope of protection. For an on/off-path attacker, he still has the ability to close the TCP link at any time in theory.
2.2.3 Efficiency issues caused by safety
In the current network environment, secure communication has become the most basic requirement. Students who are familiar with TLS know that TLS also requires handshake and interaction. Although the TLS protocol has been practiced and evolved for many years, a large number of optimization methods have been designed and implemented (such as TLS1.3, session multiplexing, PSK, 0-RTT, etc.) Technology), but due to the hierarchical design of TLS and TCP, the establishment of a secure data channel is actually still a relatively cumbersome process. Take a new data security channel creation process based on TLS1.3 protocol as an example, the detailed interaction is as follows:
It can be seen that before a client officially starts to send application layer data, three RTT interactions are required, which is considered a very large overhead. From the point of view of the process, the TCP handshake and the TLS handshake seem to be relatively similar, and it is possible to merge together. There are indeed related documents that have discussed the feasibility of integrating ClientHello in the SYN message, but due to the following reasons, this part of the exploration has not stopped slowly.
- TLS itself is also a protocol based on orderly transmission design, and integration into TCP requires a lot of redesign;
- For security reasons, TCP's SYN message is designed to not carry data. If you want to carry clienthello, you need to make a lot of changes to the protocol stack. Since TCP is a core protocol stack, changes and iterations are painful and difficult to implement. the process of;
- The new protocol is difficult to be compatible with traditional TCP, and the possibility of large-scale use is also very low.
2.2.4 Design issues of TCP
Due to the historical background of TCP design, the network situation at that time was not as complicated as it is now. The bottleneck of the entire network is bandwidth. Therefore, the field design of the entire TCP is very streamlined. However, the result is that the control channel and the data channel are coupled in the design. Together, problems can arise in certain scenarios.
for example:
The ambiguity of seq: Imagine a scenario where the sender sends a TCP message. The message is delayed and forwarded due to the blocking of the communication intermediate device, and the sender has not received the ACK for a long time. A TCP message is re-sent. When the new TCP message reaches the receiving end, the delayed packet also reaches the receiving end, and the receiving end will only respond with an ACK. When the client receives an ACK, it is not clear whether the ACK is an ACK for a delayed packet or a new packet. The impact is that the RTT estimate will be inaccurate, which will affect the congestion control algorithm. Behavior, reducing network efficiency.
Hard-to-use TCP keepalive: For example, the other party in the TCP connection is suddenly disconnected due to a power outage, and we don’t know that the connection is disconnected. At this time, the data will be retransmitted if the data fails to be sent, because the priority of the retransmitted packet is higher than the keepalive data Packets, so keepalive packets cannot be sent out. Only after a long retransmission fails can we judge that the connection is broken.
Head-of-line blocking problem: Strictly speaking, this is not a problem of TCP itself, because TCP itself is a link-oriented protocol, which guarantees the reliable transmission of data on a link, and it can be considered as a complete task. However, with the popularization of the Internet, people use the network to transmit more and more data. If all the data is transmitted on a TCP link, and one of the data is lost, the subsequent data transmission will be blocked, which is serious. Affect efficiency. Of course, using multiple TCP links to transmit data is a solution, but multiple links will bring new overhead and link management issues.
Knowing these problems of TCP, we can take a bit from the series of complex mechanisms of QUIC and see the source of QUIC's own design.
2.3 QUIC's joint design
Like TCP, the primary goal of QUIC is to provide a reliable and orderly streaming protocol. Not only that, QUIC must also ensure the security of native data and the efficiency of transmission.
It can be said that QUIC is using a more concise and efficient mechanism to benchmark TCP+TLS. Of course, like TCP+TLS, the essence of the QUIC establishment process is to serve the above features. Since QUIC is a redesigned protocol based on UDP, there is not so much historical burden. The demands of the new agreement:
After sorting out the requirements, let's take a look at the effect of QUIC.
Let's first look at the establishment process of a QUIC link. A rough schematic diagram of a QUIC link establishment is as follows:
It can be seen that compared to TCP+TLS, QUIC only needs 1.5 RTTs to complete the connection, which greatly improves the efficiency. Students who are familiar with TLS may find that QUIC's connection process does not seem to be much different from TLS handshake. TLS itself is a protocol that strongly relies on the orderly and reliable transmission of data, but QUIC relies on TLS to achieve orderly and reliable capabilities. , This seems to be a chicken-and-egg problem, so how does QUIC solve this problem?
We need to take a deeper look at the QUIC connection process. The rough schematic diagram can only help us roughly feel the efficiency of QUIC compared to the TCP+TLS process. Let's take a closer look at the more refined QUIC connection process:
The picture here is a bit cumbersome, leaving out the details of the TLS handshake (about QUIC's TLS design, we will use an article to explain it in the follow-up of the series), the whole process is actually a request-response model like TCP However, compared with TCP+TLS, we also see some differences:
1. There are more concepts of "init packet", "handshake packet", and "short-header packet" in the figure;
2. The concept of pkt_number and the concept of stream+offset are added in the figure;
3. The subscript change of pkt_number seems a bit strange.
These different mechanisms are the more efficient points of QUIC implementation compared to TCP. Let us analyze them one by one.
2.3.1 Design of pkt_number
From the flow chart, pkt_number looks similar to the seq field of TCP. However, there are still many differences. It can be said that the design of pkt_number is to solve the aforementioned TCP problem. Let’s take a look at the design of pkt_number:
- 0-based subscript
As we mentioned earlier, if the TCP seq is a field starting from 0, then there is no need for handshake to start orderly sending of data, so the solution to the chicken-and-egg problem of TLS and orderly and reliable transmission is very simple . That is, pkt_number starts counting from 0, which can directly guarantee the order of TLS data.
- Encrypt pkt_number to ensure safety
Of course, the technology of pkt_number starting from 0 will encounter the same security problems as TCP. The solution is also very simple. It is to use pkt_number encryption. After pkt_number is encrypted, the middleman cannot obtain the key of the encrypted pkt_number, and cannot obtain it. When the actual pkt_number is reached, the subsequent data transmission cannot be predicted by observing the pkt_number. And here is another problem. TLS needs to complete the handshake to get the key that the middleman cannot get, and pkt_number exists before the TLS handshake. This seems to be a chicken and egg problem. As for its solution, Let's sell a key here, and leave it to the feature article of QUIC-TLS later.
- Fine-grained pkt_number space design
Strictly speaking, TLS is not a strictly progressive protocol. Every time it enters a new state, it is still possible to receive data from the previous state, which is a bit abstract.
For example, TLS1.3 introduces a 0-RTT technology, which allows the client to send some application layers at the same time when it initiates a TLS request through clientHello. Of course, we expect that the application layer data process is asynchronous and non-interfering with respect to the handshake process, and if they are all marked with the same pkt_number, then the application layer data packet loss will inevitably lead to the impact on the handshake process . Therefore, QUIC has designed three different pkt_number spaces for the handshake state:
(1) init;
(2) Handshake;
(3) Application Data。
Corresponding to:
(1) The plaintext data transmission in the TLS handshake, that is, the init packet in the figure;
(2) Handshake data transmission encrypted by traffic secret in TLS, that is, handshake packet in the figure;
(3) Application layer data transmission and 0-RTT data transmission after the handshake is completed, that is, the short header packet in the figure and the 0-RTT packet not shown in the figure.
Three different spaces ensure that the packet loss detection of the three processes does not affect each other. This part will be analyzed in depth in the follow-up of the series (about QUIC packet loss detection).
- Pkt_number that is always incremented
The permanent increment here means that the plaintext of pkt_number will be incremented by 1 when sent with each QUIC packet. The self-increment of pkt_number solves the problem of ambiguity. After receiving the ACK corresponding to pkt_number, the receiver can clearly know whether the retransmitted message was ACKed or the new message was ACKed, so the RTT estimate And packet loss detection can be more refined, but only relying on the self-incrementing pkt_number cannot guarantee the order of the data. Let's take a look at what mechanism QUIC provides to ensure the order of the data.
2.3.2 Stream-based ordered transmission
We know that QUIC is a protocol implemented based on UDP, and UDP is an unreliable message-oriented protocol. This is not essentially different from the implementation of TCP based on the IP layer. Both are:
(1) The bottom layer is only responsible for best-effort, packet-based transmission;
(2) The upper layer protocol implements more critical features, such as reliability, orderliness, and security.
We know from the foregoing that the design of TCP has caused the problem of head-of-line blocking in the link dimension, and the order of data cannot be achieved only by relying on pkt_number, so QUIC must require a more fine-grained mechanism to solve these problems:
- Flow is both an abstraction and a unit
The root cause of TCP head-of-line blocking is that one link has only one sending stream, and the blocking of any packet on the stream will cause other data to fail. Of course, the solution is not complicated. We only need to abstract multiple streams on a QUIC link. The overall idea is as follows:
As long as the transmission of each stream can be guaranteed to be independent, then we actually avoid the head-of-line blocking of the QUIC link itself, that is, one stream is blocked and we can also send data on other streams.
With the abstraction of single link and multiple streams, let's look at QUIC's transmission orderly design. In fact, QUIC has a more fine-grained unit called frame at the stream level. A frame that carries data carries an offset field, indicating its offset relative to the initial data. The initial offset is 0. This idea is equivalent to pkt_number equal to 0, that is, data transmission can be started without a handshake. Students who are familiar with HTTP/2 and GRPC should be more aware of the design of this offset field, which is the same as streaming data transmission.
- A TLS handshake is also a flow
Although TLS data does not have a fixed stream to indicate, it can be regarded as a specific stream, or the initial stream that all other streams can be established, because it is actually carried based on the offset field and a fixed frame. , This is where the order of TLS data is guaranteed.
- Frame-based control
With the abstraction of frame, we can certainly do more. In addition to carrying actual data, it can also carry some control data. The design of QUIC draws on the experience and lessons of TCP. For keepalive, ACK, stream operations and other behaviors, special control frames are set up, which also realizes data and The complete decoupling of control, while ensuring fine-grained control based on stream, makes QUIC a more refined transport layer protocol.
Speaking of this, we can actually see that we have clarified the goal of QUIC design from the discussion of the QUIC establishment process, just as the concept has been emphasized in the article:
"Whether it is the establishment of the alliance or the process, it is all in the service of realizing the characteristics of QUIC."
We now have some knowledge of the features and implementation of QUIC, let’s summarize:
At this point, let’s take a look at some of the designs in the QUIC Jianlian process, so that we will no longer be troubled by its complicated process, but can directly hit its essence, because these designs are confirmed by the QUIC Jianlian framework. Some minor points, and these points tend to occupy a lot of space in the RFC, and consume the reader's mind.
For example, such as the amplification attack against QUIC and its processing method: the principle of the amplification attack is that there is very little clientHello data during the TLS handshake. But the server may respond to a lot of data, which may form an amplification attack. For example, attacker A initiates a large number of clientHello, but changes its src ip to client B, so that attacker A doubles its own traffic to attack client B. The solution is also very simple. QUIC requires that the first packet of each client be padding to a certain length, and the address validation mechanism is provided on the server side, and at the same time, the data size of the server response is limited before the handshake is completed.
It took a chapter in RFC9000 to introduce this mechanism, but its essence is just a fix for the current handshake process of QUIC, rather than designing the handshake process for the purpose of designing this mechanism.
PART. 3 Link closed
Look at the graceful closure of QUIC links from TCP
Link closure is a simple appeal, which can be simply sorted into two goals:
1. The user can actively and gracefully close the link, and can notify the other party to release resources.
2. When there is a link that cannot be established at one end of the communication, there is a mechanism to notify the other party.
However, the appeal is very simple, but the implementation of TCP is not simple. Let's take a look at the transition diagram of the state machine of the TCP link closing process:
This process seems complicated enough, and there are even more problems involved, such as classic interview questions:
"Why do you need TIME_WAIT?"
"What should I do if there are too many TIME_WAIT connections?"
"The role and difference of kernel parameters such as tcp_tw_reuse and tcp_tw_recycle?"
The root cause of all these problems comes from the binding of TCP links and streams, or the coupling of control signaling and data channels.
We can’t help but ask a soul torture, "What we need is a full-duplex data transmission mode, but do we really need to do this in the link dimension?" So there seems to be some abstraction, or to take TCP’s TIME_WAIT design as an example. Explain, let's look at the TCP problem from the bottom up:
Going back to our question, if we distinguish between the flow and the link, and ensure the reliable transmission of the control instructions of the flow in the link dimension, the link itself implements a simple simplex closing process, that is, the communication end actively closes the link, and the entire link is closed , Is everything simple?
Of course, this is the solution of QUIC. With this layer of thinking, let's sort out the appeal of QUIC link closure:
Aside from the design of the closing process of the stream (this part about the stream will be shared in the series of articles about the design of the stream), we can get a refreshing state machine in the link dimension:
It can be seen that thanks to the simplex closing mode, in the entire QUIC link closing process, there is only one closing instruction, namely CONNECTION_CLOSE in the figure, and there are only two closed states, namely closing and drawing in the figure. Let's first look at the behavior of the terminal in two states:
- closing state: When the user actively closes the link, it enters this state. In this state, the terminal will only reply to CONNECTION_CLOSE after receiving all application layer data
- Draining state: When the terminal receives CONNECTION_CLOSE, it enters this state, and the terminal does not reply any more data in this state
More simply, CONNECTION_CLOSE is an instruction that does not need to be ACKed, which means that it does not need to be retransmitted. Because from the link dimension, we only need to ensure that the last link can be successfully closed, and the new link is not affected by the old close instruction. This simple CONNECTION_CLOSE instruction can achieve all the demands.
3.2 A safer way to reset
Of course, link closure is also divided into many situations. Like TCP, except for the mode in which the QUIC end actively closes the link mentioned in the previous section, QUIC also needs to provide the ability to directly reset the peer link when it cannot reply to the response.
The QUIC reset method of peer connection is more secure than TCP. This mechanism is called stateless reset, which is not a very complicated mechanism. After the QUIC link is established, both QUIC parties will synchronize a token, and the subsequent link closure will verify the token to determine whether the peer has the right to reset the link. This mechanism fundamentally circumvents the aforementioned In this attack mode where TCP is maliciously reset, the whole process is as follows:
Of course, the stateless reset solution is not a silver bullet. The price of safety is a narrower range of use. Because in order to ensure safety, the token must be transmitted through a secure data channel (limited to NEW_TOKEN frame in QUIC), and the receiving end needs to maintain a state of recording this token, and only through this state can the validity of the token be guaranteed.
Therefore, stateless reset is limited to the last resort to close the QUIC link, and can only be used when the client and server are in a relatively normal situation. For example, in this case, stateless reset is not applicable, the server It is not listening on port 443, but the client sends data to port 443. In this case, RST can be dropped under the TCP protocol stack.
3.3 Timeout and disconnection of engineering considerations
The keepalive mechanism itself has no tricks. It can be done with a timer and a probe message. However, QUIC benefits from the splitting of the link and the data stream. Closing the link becomes a very simple matter, and the keepalive becomes Simpler and easier to use, QUIC provides a control command named PING frame in the link dimension for active detection and keep-alive.
More simply, the way QUIC closes the link after the timeout is silently close, that is, without notifying the peer, the machine directly releases all the resources of the link. The advantage of silently close is that resources can be released immediately, especially for QUIC, a single link that needs to maintain both the TLS state and the flow state, which has great benefits.
But its disadvantage is that if there is previous link data coming in the future, the peer can only be notified to close the link through stateless reset. The closing of stateless reset is relatively more expensive than CONNECTION_CLOSE. Therefore, this part can be said to be completely a tradeoff, and the final design of this part is based more on the experience and results of a large number of engineering practices.
PART. 4 See the evolution of the protocol from the establishment and closure of the QUIC link
From TCP to QUIC, although it is only the evolution of network protocol technology, we can also take a closer look at the development trend of the entire network. The establishment and closing of the link is only our entry point to the QUIC protocol. As the article has been emphasizing, whether it is a connection or a process, it is for the realization of the characteristics of QUIC, and this article is in addition to the detailed analysis of the QUIC link In addition to the established closing process, it is also summarizing the origin of these features and design ideas.
Reading through the full text, we can see that a modern network protocol can no longer circumvent the demand for security. It can be said that "security is the foundation of everything, and efficiency is the eternal pursuit."
And QUIC first started from the idea of convergence layered protocol, unifying the two interactive demands of safety and reliability. This seems to remind us that the development of future protocols does not seem to have to completely follow the OSI model. Layering is for the better division of labor and coordination of various components, while convergence is the ultimate performance pursuit. TCP+TLS can converge to QUIC. Just like the NEW IP technology proposed by Huawei, if we combine technologies such as intelligent routing, all It is also possible that the network protocols of Layer 3 and above cannot converge to a brand-new IPSEC protocol.
Of course, these have come too far. QUIC itself is a very grounded agreement. In the process of forming its open source-led standard, it has absorbed a lot of engineering experience so that it does not have too many idealized characteristics, and The scalability is very strong, and I think this kind of work model will also be a mainstream model in the future.
Concluding remarks
Writing this article is a painful process. Just like reading the RFC, it is almost impossible to completely self-contain the technology of a certain direction of QUIC in an article, and this article chooses to express some other dependent technologies in a weakened way. , And hope to focus on the introduction in a single article in the future.
Therefore, if readers want to fully understand HTTP/3 or QUIC, please also pay attention to the follow-up articles and read through them to have a deeper experience.
Of course, this article is based on the author's own personal understanding. It is inevitable that there are deficiencies. If readers find related questions, please feel free to discuss in depth together.
Recommended reading this week
The next five years of cloud native runtime
Accumulate a Thousand Miles: A Summary of the Landing of the QUIC Agreement in the Ant Group
Service Mesh Exploration and Practice in Industrial and Commercial Bank of China
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。