RTC system audio and video transmission weak network countermeasure technology

Hi~ This is the global Internet communication cloud Rongyun favored by 250,000+ social, pan-entertainment, and overseas developers. If you are interested in localization and collaborative office solutions, please pay attention to [Brothers] of this account. Follow [Rongyun] RongCloud], learn more about the collaborative office platform.

Today, are you Liu Genghong girl? Follow【Rongyun Global Internet Communication Cloud】Learn more <br>The "Li Jiaqi girl" who continues to be popular and the recent trend "Liu Genghong girl" both prove that the live broadcast industry is still hot. After years of development, live broadcast has derived more and more rich gameplay, such as Lianmai PK, watching together, etc. And these scenarios are very dependent on the rise of RTC real-time audio and video technology.

The epidemic has repeated, and many people have entered the time at home again. Moving life and work online has become the solution for everyone to combat the anxiety of "shutdown". In addition to entertainment and social scenes such as live broadcasts and chat rooms, there are also online classes, video conferences, and even online consultations.
In the process of moving all aspects of entertainment, social interaction and life online, RTC real-time audio and video technology has developed rapidly, constantly punching in new applications and infiltrating new scenarios.

One body and two sides, while advanced technology brings huge growth to online scenarios, it also faces higher and higher user experience requirements, lower latency, higher image quality, and smoother performance.
These three influencing factors of user experience correspond to the three core indicators of RTC, namely real-time, clarity, and fluency.
Between the three, it is often impossible to have both. Scenes with high real-time requirements may sacrifice clarity to ensure low latency; while scenes with high definition requirements may exchange a certain delay for high-quality audio and video data.

But children make choices. In order to achieve "both and want", we usually need to pursue lower latency, higher clarity and fluency through network transmission optimization. Among them, the big BOSS is the weak network, which is the main factor that causes congestion, packet loss, delay jitter and other problems that affect the user experience.

Weak network countermeasure technology is a general term for technical solutions to the above problems and other network damage problems. This article shares an overview of RTC system audio and video transmission weak network countermeasure technology.

About the choice of TCP/IP transport layer protocol

First, a brief introduction to the transport layer protocol.

The transport layer protocol is located under the application layer in the TCP/IP layered protocol, and is generally implemented by the operating system, including TCP and UDP. TCP is a connection-oriented reliable transmission protocol, which provides a guarantee for the integrity and order of data transmission; UDP is a connectionless unreliable transmission protocol, and the reliability of data transmission is completely handled by the application layer.

In real-time audio and video application scenarios, it has been widely agreed in the industry that UDP is the preferred choice. The reasons mainly include the following:

The TCP protocol is not designed for real-time audio and video application scenarios. Its internal mechanisms such as congestion control and error control lead to an increase in delay for reliability and high throughput, and the deterioration of delay is more obvious in a weak network environment. The definition of delay in ITU Standard G.114 is that when the end-to-end delay is greater than 400ms, the user's interactive experience will be significantly affected.

The congestion control mechanism and error control mechanism of TCP are implemented inside the operating system, and the application layer cannot be optimized and cannot be adjusted to meet the needs of different scenarios, which is seriously inflexible.
The overhead of the UDP protocol itself is smaller than that of the TCP, and the transmission control strategy is completely implemented by the application layer, which has sufficient flexibility.

Therefore, the weak network problem and the corresponding weak network countermeasure technology discussed later in this paper are discussed based on the UDP protocol and the RTP/RTCP protocol which is widely used in the audio and video field running on UDP.

Main Problems of Weak Net and Its Countermeasures

In short, the weak network problem of audio and video transmission refers to the network environment problems that affect the user experience of audio and video communication, mainly referring to problems such as network congestion, network packet loss, and jitter.

These problems are the main reasons for audio and video freezes and poor real-time performance. Due to the strong complexity and heterogeneity of the network environment, the severity of the above-mentioned weak network problems in different environments is also very different. How to ensure smooth communication between users in a complex network environment has always been a key issue in the RTC field.

congestion problem

Congestion problems arise when the data traffic transmitted in the network exceeds the network bottleneck capacity.
The direct impact of congestion is sudden packet loss or sudden jitter. If the occurrence of congestion is not predicted in time and the amount of data sent is reduced in time, problems such as lag, large delay, and poor image quality will occur at the receiving end.

The main solution to combat the congestion problem is to detect network congestion in time by designing a congestion control algorithm, and recover from the congestion state as quickly as possible to minimize the impact on user experience.

Demand Considerations for Congestion Control Algorithms

RFC8836 provides a comprehensive summary of the congestion control algorithm requirements for real-time interactive audio and video applications, which are briefly summarized as follows:

Latency: Congestion control algorithms should reduce latency as much as possible, especially the latency introduced by the algorithm itself. At the same time, the available bandwidth levels still need to be provided.
Throughput rate: The throughput rate should be as high as possible in the corresponding scenario.
Fairness: The congestion control algorithm should be able to fairly share link bandwidth with other real-time traffic and TCP traffic.
Avoid "starvation": Media streams should not be "starved" in competition with TCP streams, nor should they "starve" TCP streams.
Convergence speed: Convergence to steady state as fast as possible during media stream startup phase.
Network Support: The algorithm should not require special network feature support.
Stability: The algorithm should remain stable when the media stream changes, such as a temporary interruption of the media stream.
Quick response: The algorithm should quickly respond to changes in the network environment, such as bottleneck bandwidth changes, link delay changes, etc.

Based on the above requirements, the problems to be solved by the congestion control algorithm can be classified into two aspects: one is how to detect network congestion quickly and accurately ; the other is to take appropriate congestion countermeasures to avoid congestion and recover from congestion as quickly as possible. .

Congestion Detection Algorithm

The congestion detection algorithm can be divided into two categories according to the difference of the observed data:

Loss-based : Detect network congestion through packet loss events.
Delay-based algorithm : Determine network congestion by measuring delay.

For real-time audio and video interactive applications, the delay-based algorithm is a better choice . The main reason is that the delay-based algorithm can detect network congestion earlier, thereby avoiding packet loss caused by congestion.

In addition, packet loss-based algorithms tend to continuously increase the transmission bandwidth in order to detect link capacity until a packet loss event occurs. This strategy will lead to an uncontrollable increase in network queuing delay, especially when the buffer of network nodes is large, and even lead to an increase in second-level delay.

Choosing a delay-based algorithm should focus on avoiding the "starvation" problem in the requirement summary. The delay-based algorithm is relatively sensitive to the increase of delay. When competing with the packet loss-based algorithm for network resources, an appropriate strategy should be adopted to share the network bandwidth resources relatively fairly.

Delay-based algorithms generally judge congestion by measuring RTT (round-trip time) or OWD (one-way delay).

The RTT measurement is relatively intuitive, but because it measures the overall situation of the two-way delay, the reverse delay change will interfere with the network congestion judgment in the media stream direction. OWD delayed observation avoids this problem. As shown below:

(one-way delay change)

OWD judges the status of network queuing delay by observing the change of sending interval and receiving interval.

Congestion countermeasures

In short, the congestion countermeasure is to calculate the appropriate transmission rate according to the current congestion status of the network to control the total transmission rate of the media sender. According to other weak network confrontation strategies adopted by the sender, possible rate adjustment methods include adjusting the rate of audio and video encoders, adjusting the retransmission rate, and adjusting the FEC redundancy. For details, please refer to the subsequent introduction of this article. If the sender is an intermediate forwarding node, the adjustment means also include SVC code stream adaptation, large and small stream code stream adaptation, etc., as shown in the following figure:

(SVC code stream distribution diagram)

Typical Congestion Control Framework

A typical congestion control algorithm framework is shown in the following figure:

(WebRTC Congestion Control Architecture 1)

This is the congestion control framework adopted by early Google in their WebRTC framework. The mixed control mode of the sender and the receiver is adopted, and the sender adopts an algorithm based on packet loss. The basic strategies are as follows:

Packet loss rate <2%, increase sending bandwidth by 8%;
The packet loss rate is 2% ~ 10%, and the transmission bandwidth remains unchanged;
Packet loss rate>10%, reduce the sending bandwidth (1-0.5*packet loss rate).

The receiver uses a delay-based algorithm. Calculate the one-way delay change and estimate the current transmission delay through Kalman filtering, and then evaluate an optimal target bandwidth combined with the current actual receiving bandwidth, and feed it back to the sender through RTCP messages. The sender takes the minimum of the two algorithm results as the final target bandwidth.

The following figure is the improved congestion control framework of WebRTC.

(WebRTC Congestion Control Architecture 2)

We can see that the entire congestion control mechanism is implemented at the sender, and the receiver just feeds back the corresponding measurement data.

The new framework improves the network delay estimation algorithm. By performing linear regression analysis on the one-way delay change data, the current network queuing delay change trend is evaluated, that is, it is judged that the delay is increasing, there is no change, and the three trends are decreasing. Combined with the current sending rate, the optimal target bandwidth estimate is given.

In addition to improving the congestion detection algorithm, the new framework also introduces an active bandwidth detection mechanism to optimize the performance of the entire congestion control algorithm. After comparison, there are obvious improvements in the convergence speed during the startup phase and the response speed to changes in the network environment.

packet loss problem

As mentioned before, the real-time interactive media transmission is based on the RTP/UDP protocol, and the packet loss problem is handled by the application layer.

In terms of network transmission (ie, channel side), anti-packet loss technical means mainly include retransmission (ARQ) and forward error correction (FEC). In terms of source coding, some specific anti-packet loss capabilities can also be provided depending on the data and the coding side. For example, B frames are used in video coding to reduce the impact of packet loss. The following mainly introduces the anti-packet loss technology in network transmission.

Lost Packet Retransmission (ARQ)

In the RTP/RTCP protocol, the packet loss retransmission technology simply means that the receiver judges whether the packet is lost according to the continuity of the packet sequence number, and requests the source to retransmit the specified data packet by sending a NACK request in RTCP to the source. . The packet loss retransmission process is shown in the following figure:

(Packet loss retransmission process)

The retransmission process has the following details to consider:

Delay of the first request: whether to request immediately when packet loss is found should be considered in combination with other strategies, such as the FEC strategy.
Consideration of repeat request interval: The repeat request interval of the same data packet is greater than the current RTT.
Number of requests limit: Calculated by combining the current RTT and the maximum delay tolerated.
Retransmission bandwidth limit at the sender: The retransmission bandwidth is a part of the total transmission bandwidth and cannot exceed the overall bandwidth limit.
Retransmission packet return mechanism: It is recommended to use a separate RTP stream to send, which is conducive to packet loss rate statistics and retransmission bandwidth calculation.

Forward Error Correction (FEC)

In real-time audio and video applications, forward error correction (FEC) technology is widely used in anti-packet loss due to its good real-time performance.

Roughly speaking, the basic idea of FEC is that in addition to sending audio and video data packets, the sender sends a certain amount of redundant data packets according to the specific packet loss situation for the receiver to recover from packet loss. The basic process is shown in the following figure:

(FEC processing flow, where D is the data packet and C is the repair packet)

Currently, the commonly used encoding algorithms for repairing data packets include XOR-based algorithms, matrix-based algorithms, and other algorithms, which will not be described in detail in this article.

The following introduces the basic application framework of FEC in the real-time audio and video system. The application framework of the sender is shown in the following figure:

(FEC sender frame)

The ADU in the above figure is the application data unit, which is the audio and video data packet in the audio and video RTC system. It is input to the FEC encoder as a source data block. The FEC encoder generates FEC repair packets (repair payloads) according to a certain repair ratio. The repair packet is sent to the receiver together with the source packet and the protection relationship between them.

The FEC processing framework of the receiving end is shown in the following figure:

(FEC Receiver Frame)

The receiver receives the FEC source packet and repair packet and sends it to the FEC decoder. If the source packet is lost, the decoder performs decoding and repairing according to the protection relationship of the packet group and the status of the received packet group, and finally sends the repaired audio and video packets to the upper layer for further processing to complete audio and video decoding and display playback.

In short, the protection relationship between the repair package and the source package in the above is which source package is protected by which repair package, and this protection relationship is notified by the sender to the receiver through a certain packet format.

Under the RTP/RTCP protocol framework, two standards, ULPFEC (RFC5109) and FlexFEC (RFC8627), define two different policies and packet formats:

ULPFEC: ULP (Uneven Level Protection, unequal protection) uses different levels of protection strategies according to the importance of data packets.

FlexFEC: Flexible Forward Error Correction, this standard defines the format of interleaved and non-interleaved parity FEC coded packets under the RTP protocol framework.

Cooperation between ARQ and FEC

Compared with FEC, ARQ has the disadvantage of introducing delay and the advantage of higher bandwidth utilization. Generally speaking, the optimization goal of anti-packet loss technology is to obtain sufficient protection with the minimum additional bandwidth and computing cost on the premise of meeting the delay requirements.

Therefore, the following principles should be considered when considering the cooperation strategy of ARQ and FEC:

Use ARQ as much as possible under the premise that the delay limit allows, and the maximum number of retransmissions can be calculated according to the current RTT and the maximum delay limit;
If the maximum number of retransmissions can reduce the packet loss rate below a certain level (<%1), it is not necessary to enable FEC protection;
If FEC needs to be enabled, the protection degree of FEC should be calculated according to the remaining packet loss probability after ARQ repair is applied to implement bottom-up protection.

The following figure is a schematic diagram of the cooperation between FEC and ARQ in a certain scenario. If the RTT is within 20ms, if the transmission delay is required to be within 100ms, on a weak network link with 30% packet loss, ARQ can reduce the packet loss rate to 1 % or less, ARQ is responsible for packet loss repair.

With the increase of RTT, the protection ratio of FEC increases, and finally FEC is solely responsible for packet loss repair.

(ARQ cooperates with FEC mechanism)

Jitter problem

In a nutshell, the jitter problem is the problem of network transmission delay variation. The greater the jitter, the greater the network transmission delay variation.

The jitter problem will cause problems such as freeze at the receiving end and fast-forward playback, which seriously affect the audio and video communication experience. There are many reasons for the jitter problem, such as the intensified competition for network resources caused by the addition of new streams, the unstable data transmission rate at the source end, and other network reasons.

The current general strategy for dealing with jitter is to establish a jitter buffer (JitterBuffer) at the receiving end to eliminate jitter. The principle is shown in the following figure:

(Jitter buffer principle)

The receiving end absorbs the uneven delay by increasing the jitter delay (JitterDelay) to achieve the purpose of uniform playback.

Among them, the calculation of the jitter delay size is the key to the performance of the jitter buffer. Too large jitter delay will introduce additional delay, which is to be avoided in the field of real-time interactive applications; too small can not absorb all the jitter.

Regarding the estimation of the jitter delay, Google uses two methods in its WebRTC framework. In the audio jitter processing, the histogram plus forgetting factor is used for estimation, as shown in the following figure:

(NetEQ Jitter Delay Statistics)

WebRTC's audio jitter delay is estimated in its internal NetEQ module. Iat in the figure means the interval arrival time. WebRTC counts the audio packet arrival interval with a histogram, and takes the delay time of the 95th quantile as the audio jitter delay.

In order to track the delay change, the forgetting factor is used in NetEQ to forget the historical data in the histogram. Of course, what NetEQ provides is a comprehensive audio QoS guarantee technology, and there are many other technologies involved in dealing with audio jitter, such as peak jitter detection, voice speed change, etc., which will not be described in detail here.

In terms of video anti-jitter , WebRTC adopts a different jitter delay estimation algorithm than audio. Through the measurement and statistics of actual frame size change and delay change data, Kalman filter is used to dynamically estimate the optimal jitter delay.

However, WebRTC is mainly designed for one-to-one real-time communication application scenarios. In one-to-many and many-to-many scenarios such as video conferencing, audio and video streams are mostly forwarded through streaming media transfer services. Through the actual measurement, the algorithm needs to be improved for the full-link jitter delay estimation effect in the multi-node forwarding scenario.

Rongyun's weak network confrontation technology optimization practice

Rongyun's optimization measures

In terms of congestion control , it is improved based on Google GCC algorithm. In addition to counting the one-way delay changes to judge the congestion trend, it also further analyzes the packet loss pattern to improve the accuracy of bandwidth prediction.
In terms of anti-packet loss , based on the FlexFEC framework, FEC encoding with high repair capability is adopted, and comprehensive tuning is performed to improve the anti-packet loss capability.
Optimize the cooperation between ARQ and FEC mechanism , and strive to minimize the cost of anti-packet loss.
In terms of anti-jitter , the jitter delay estimation method with stronger scene adaptability is adopted to improve the fluency and reduce the delay.

partial results

The figure below shows the results of the weak network resistance test in the current high packet loss scenario.

(Test results in high packet loss scenarios)

In the high packet loss scenarios of 60% and 70%, the MOS value of the Android and iOS mobile terminal test results can still reach 4 points in terms of fluency.

In the specific weak network optimization process, due to the complexity and heterogeneity of the network and the diversity of scene requirements, the real-time audio and video transmission strategy and weak network confrontation technology are full of various choices, balances and trade-offs. The technical ideas of online learning and reinforcement learning have also been tried in this field.

We will also continue to explore and practice, and make continuous efforts to improve the comprehensive user experience of Rongyun real-time audio and video RTC products.

RTC system audio and video transmission weak network countermeasure technology

About the choice of TCP/IP transport layer protocol

Main Problems of Weak Net and Its Countermeasures

congestion problem

Congestion Detection Algorithm

Congestion countermeasures

Typical Congestion Control Framework

packet loss problem

Lost Packet Retransmission (ARQ)

Forward Error Correction (FEC)

Cooperation between ARQ and FEC

Jitter problem

Rongyun's weak network confrontation technology optimization practice

Rongyun's optimization measures

partial results

融云RongCloud

引用和评论

集成指南：如何基于融云 Flutter IMKit 实现双端丝滑社交体验

大模型中的Token究竟是什么？从原理到作用深度解析

DeepSeek行业应用实践报告100+份汇总解读|附PDF下载

功率器件热设计基础（九）——功率半导体模块的热扩散

英飞凌 | 驱动电路设计（二）——驱动器的输入侧探究

DeepSeek的开源之路:一文读懂从V1-R1的技术发展,见证从开源新秀到推理革命的领跑者

2025低空经济eVTOL行业研究报告42份汇总解读|附PDF下载