Introduction: As a leader in audio and video service providers, NetEase Yunxin has been committed to providing a top-level audio and video call service experience and providing users with reliable audio and video services in various harsh environments. How to still provide users with reliable audio and video services under extremely weak network conditions is the top priority of NetEase Yunxin. This article will describe the application optimization practice of NetEase Yunxin for the QUIC protocol.
introduction
Several advantages of the QUIC protocol over TCP at the transport level
- 0-RTT connection
The QUIC protocol is based on UDP and does not require a handshake itself, and it uses the Diffie-Hellman or ECC algorithm to complete the negotiation of the peer key only in 1-RTT. The 0-RTT connection of the QUIC protocol uses TLS1.3, and completes the transparent transmission of encrypted data through early_data.
- Multiplexing/no head blocking
Compared with HTTP/2 multiplexing, QUIC is not affected by head-of-line blocking, each stream is more independent, and the multiplexing effect is better.
- connection migration
Unlike TCP, which uses a quadruple to identify a unique connection, QUIC uses a 64-bit ConnectionID to identify a connection. Based on this feature, QUIC uses a connection migration mechanism. When the quadruple changes (for example, the client switches from WIFI to cellular) mobile network), try to "keep" the previous connection, thus maintaining uninterrupted data transfer.
- Customizable congestion control
The QUIC protocol does not define the use of congestion control algorithms. This part is implemented at the application layer, which is convenient for developers to optimize and iterate by themselves.
Several differences between the QUIC protocol and TCP at the protocol level
- Separate Packet Number Spaces
The QUIC protocol defines 4 different encryption levels, and each encryption level uses different packet sequence number spaces.
- Monotonically Increasing Packet Numbers
The packet sequence numbers in the same packet sequence number space are monotonically increasing, avoiding retransmission ambiguity. The packet sequence number space of the QUIC protocol only identifies the transmission sequence, and the sequence of the packet content is identified by the offset in the STREAM frame.
- Clearer Loss Epoch
When a QUIC packet is declared lost, QUIC starts a period of loss detection, and any QUIC packet sent after that is acknowledged to refresh the detection period. Unlike TCP, TCP waits for the sequence number space to be filled despite the possibility of multiple losses of the same packet during transmission. The significance of this is that QUIC can more precisely update the size of the congestion window within each round trip time (RTT).
- No Reneging
Can't go back on words. Once a packet is acknowledged by the peer, the modified packet can no longer be declared lost. This setting greatly simplifies the design of the double-ended transmission protocol, and also reduces the memory pressure on the sender.
- More ACK Ranges
Compared with TCP's SACK, which can only acknowledge three segments (ranges), the ACK frame of the QUIC protocol supports more segment (ranges) acknowledgments. In high packet loss scenarios, the speed of retransmission recovery is accelerated to avoid transmission interruption caused by scattered range confirmations.
- Explicit Correction For Delayed Acknowledgements
The QUIC protocol will calculate the delay time between receiving a packet and sending the ACK for that packet, and explicitly writes it in the ACK frame. This setting is designed to more accurately calculate the round-trip time of the network.
- Probe Timeout Replaces RTO and TLP
The QUIC protocol uses the PTO (probe timeout) detection timeout mechanism, which includes the expected maximum confirmation delay of the peer, rather than a fixed minimum timeout. Unlike TCP's RTO timeout, the QUIC protocol does not attempt to collapse the congestion window when the PTO expires, because the loss of tail data does not indicate continuous network congestion. The sender is free to send more packets as long as it has a remaining congestion window, even if the PTO timeout has occurred. Compared with the RTO mechanism of TCP, the PTO mechanism is more radical.
- The Minimum Congestion Window is Two Packets
TCP uses one packet as the minimum congestion window. If this packet is lost, it means that it needs to wait for RTO to retransmit, which is likely to be much larger than one round-trip time (RTT). The QUIC protocol recommends using two packets as the minimum The congestion window, although doing so will increase traffic, is considered safe.
Application of QUIC protocol in NetEase Yunxin
In the architecture of NetEase cloud audio and video services, signaling is used for SDP interaction, the creation and management of session rooms, and the upload and delivery of user information. The stability and timeliness of its transmission are crucial. The traditional WEBRTC recommends using WebSocket as the signaling transmission protocol, which is limited by the defects of the TCP protocol, and its effect in connection establishment time, transmission efficiency and weak network resistance is not satisfactory. These problems directly affect the baseline indicators of audio and video services, such as the first frame time, link stability, and weak network resistance.
Yunxin QUIC acceleration service design:
NetEase Yunxin uses the QUIC protocol to replace the WebSocket protocol for signaling transmission, and has made several optimization practices at the application and protocol levels:
- Multiplexing: Classify requests according to the characteristics of different signaling. For Request/Reponse type messages, which have the highest requirements for reliability and real-time performance, use high-priority STREAM for transmission. For heartbeat messages used for link keep-alive, a lower priority STREAM is used for transmission.
- Unreliable transmission extension: There is a type of Notify message that does not require a reply from the receiving end, and is often used to broadcast the network status or other information of users at each end. It has high requirements for real-time performance, but does not have high requirements for reliability. For this signaling, we can use the unreliable transport properties of the QUIC protocol for transport.
This special transmission uses a DATAGRAM frame, which needs to be declared in the QUIC transmission parameter table of the CH module in the Initial packet (name=max_datagram_frame_size, value=0x20) to notify the peer for DATAGRAM frame support. The max_datagram_frame_size transport parameter is an integer value (expressed as a variable-length integer) that represents the maximum size of a DATAGRAM frame (including frame type, length, and payload) that the endpoint is willing to receive, in bytes.
DATAGRAM frames are used to transfer application data in an unreliable manner. The Type field in the frame is in the form of 0b0011000X (or the values 0x30 and 0x31), the least significant bit is the LEN bit (0x01), indicating whether the Length field is present: if this bit is set to 0, the Length field is absent and the Datagram Data field is extended to the end of the packet; if this bit is set to 1, the length field is present. The structure of the DATAGRAM frame is as follows:
Although DATAGRAM frames are not retransmitted when loss is detected, they still need to be acked.
- Packet compression: Yunxin introduces the Deflate algorithm at the transport layer to compress STREAM frames, aiming to reduce the bandwidth occupation of signaling transmission.
- Dynamic redundancy strategy: Because the signaling is not streaming data, FEC is not suitable for intermittent data transmission. Dynamically increasing redundancy protection based on indicators such as RTT and packet loss rate is also quite positive for improving the weak network resistance of transmission. effect.
Weak network performance of Yunxin QUIC
Time-consuming on the first screen and time-consuming on login
The above figure shows the comparison between TCP and QUIC used for establishing a connection between cloud audio and video services. On the first frame time-consuming indicator, QUIC has a 20% improvement. In the indicator of login time, QUIC has an improvement of nearly 30%. The main reason is that QUIC's connection establishment has 2~3 RTT optimizations compared to TCP+TLS, and the handshake time is particularly shortened in high RTT scenarios. In the live broadcast scenario, Yunxin QUIC has optimized the privatized 0RTT handshake, making the connection faster.
Anti-packet loss
The figure above shows the maximum packet loss rate that Yunxin signaling data can withstand under QUIC and TCP links. QUIC can still provide services when the upstream packet loss rate reaches 70%, and the downstream boundary can even withstand 75% packet loss. The TCP link will be disconnected and reconnected in the case of 45% packet loss. Compared with TCP's signaling link, QUIC link has a 50% improvement.
main reason:
- Yunxin implements dynamic redundancy, which increases redundancy after packet loss is detected, so that redundant packets are used to make up for high packet loss and bring anti-packet loss performance.
- QUIC's improved flow control and congestion control algorithms allow QUIC to achieve greater transmission advantages under weak networks.
Band limited
We have also done that in the case of limited bandwidth, the utilization rate of QUIC for bandwidth, basically, the utilization rate of QUIC for bandwidth can reach more than 90%, but TCP is much worse.
Outlook & Summary
NetEase Yunxin has greatly improved reliable data transmission in reliable data acceleration, but there are still some areas that need to be optimized. to unnecessary redundancy. Subsequently, unidirectional packet loss will be detected, and redundancy will be increased only for the link with packet loss. For high RTT and high packet loss scenarios, the QUIC congestion control algorithm needs to be continuously optimized. NetEase Yunxin will continue to provide users with high-quality services in the field of audio and video in various extreme situations.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。