Through the last "RTC System Audio and Video Transmission Weak Network Countermeasure Technology Overview" , we know that the three core indicators of RTC are real-time, clarity, and fluency. Only when the core performance reaches the standard during the entire call process can the user be given a basic good experience. Follow [Rongyun Global Internet Communication Cloud] to learn more

However, when audio and video data are transmitted in the network, the network is changing and unpredictable, such as subway, bus, home, company public WiFi, and network changes caused by traffic peaks in different time periods. Jitter, dynamic changes in bandwidth capacity, etc., ultimately affect the audio and video experience.

Therefore, we need to effectively and dynamically detect and evaluate changes in network bandwidth to ensure that the audio and video data streams sent on the audio and video links do not exceed the upper limit of the link capacity. Otherwise, a large number of packets will be lost, which will be difficult to recover, and eventually cause video freezes, voice freezes, and word loss at one or both ends of the communication.

This article mainly shares the countermeasures against network congestion in the weak network problem of audio and video transmission.


Typical Dynamic Bandwidth Probing Methods

The research on network congestion is not a new topic. It has a history of nearly 40 years. Since memory was relatively expensive in the past, traditional dynamic bandwidth detection is generally based on the method of packet loss congestion. In the congestion state, when packet loss is detected, the audio and video experience has actually been stuck, which is not suitable for audio and video applications.

As the price of memory decreases, when the network is congested, the data packets that cannot be sent are generally temporarily stored in the network queue memory buffer, and the packets will be discarded only when the buffer is full for subsequent transmission.

Taking advantage of the large network queue buffer, the typical method now is a bandwidth estimation method based on a combination of delay gradient estimation and packet loss.

There are three more classic solutions:
The GCC algorithm [1] is a congestion control algorithm that combines delay estimation and packet loss from Google, and is used by default in WebRTC.

The NADA algorithm [2] is a method based on delay estimation proposed by Cisco. This algorithm has high bandwidth utilization and excellent performance in tracking bandwidth changes.

SCReAM algorithm [3] is an algorithm based on delay estimation proposed by Ericsson, which is adopted in OpenWebRTC.

This paper [4] compares the performance of the above three algorithms:
Multi-channel GCC can evenly utilize the entire bandwidth, but has sawtooth characteristics in dynamic links, and the convergence is slower than NADA; it has better performance in lossy links, which also makes it especially suitable for wireless networks.

NADA can quickly stabilize the rate in a dynamic link, and the convergence is fast. When the link is not affected by random packet loss, the bandwidth utilization rate is high, but in the event of random packet loss, the bandwidth utilization rate will become very low; The "latecomer effect" affects that latecomers will use more bandwidth, which is not as good as GCC in terms of bandwidth equalization.

SCReAM keeps the link queue delay at a low level, but has low bandwidth utilization and is relatively slow to bandwidth changes.
In general, GCC is better.

The GCC algorithm is a congestion control algorithm based on delay gradient estimation and packet loss proposed by Google. It has been widely used in WebRTC and is currently the default algorithm used by WebRTC.

There are two versions of this algorithm. One version is distributed on the sender and receiver. The receiver estimates the delay and feeds back the estimated bandwidth through REMB RTCP packets. The sender uses the packet loss rate and the feedback from the receiver. Bandwidth to calculate the final code rate, and then adjust the encoder code rate, this framework calls it REMB-GCC .
Another version is that all estimates are placed on the sender side, that is, based on delay bandwidth estimation and packet loss rate estimation, both are placed on the sender side. The method is generally referred to as TFB-GCC .

The framework principles of the two algorithms are basically the same. TFB-GCC is better than REMB-GCC, and it is also the congestion control algorithm recommended by WebRTC. The following two algorithms are analyzed in detail.


REMB-GCC algorithm


(REMB-GCC algorithm architecture diagram)

The GCC algorithm based on REMB is divided into two parts, the sender part and the receiver part.

The receiver part is responsible for evaluating the bandwidth change of the receiver based on the change of the time delay gradient. This part is mainly divided into five sub-sections, Arrival filter, Adaptive threshold, Overuse Detector, Remote Rate Controller, Remb Processing;

The bandwidth finally evaluated by the receiver will be fed back to the sender through the REMB RTCP packet. The sender will calculate the final bandwidth based on the packet loss rate and the feedback bandwidth. This final bandwidth will be used to adjust the encoder encoding bandwidth, FEC and pacing bandwidth of the sender. And the bandwidth of the retransmission packet, each module is described in detail below.

receiver module

Arrival filter

For packets transmitted by RTP, each frame may be divided into multiple RTP packets for transmission. Each transmitted RTP packet carries an RTP extension, which is called abs-send-time extension. This extension records the time when the sender sends the packet. Time information; when the receiver receives each RTP packet, it will also record the arrival time of the packet.

The algorithm will calculate the interval between the receiving time difference and the sending time difference of two adjacent frames through the following (formula 1), this interval is a change in the time of the frame transmission on the network, and this change includes the following three parts: :

① The change of the network transmission time of the frame data relative to the network transmission time of the previous frame, which is a measure of the change of the packet size ② The change of the time when the packet is queued in the network queue ③ Network noise interference

The changes in these three aspects, in terms of measurement, are reflected in the following formula 1 and the schematic diagram.
Equation 1: dmi = ti − ti -1 − (Ti − Ti-1 )

(Equation 1 schematic diagram)

It should be noted:
① ti means the time when the last RTP packet of the i-th frame was received ② ti-1 means the time when the last RTP packet of the i-1th frame was received ③ Ti means the first RTP packet of the i-th frame was sent Time ④ Ti-1 indicates the sending time of the first RTP packet of the i-1 frame ⑤ The sending time of the RTP packet is carried in the abs-send-time extension of the RTP packet

Another representation of the delay gradient estimation is:

Equation 2: dmi = dLi/Ci + mi + ni

Ci is the estimated link capacity at the moment of receiving the i-th frame, and dLi/Ci is the time change of the packet transmission in the network at the moment of receiving the i-th frame, which is mainly affected by the change of the packet size and the change of the link capacity, which can also be used as an evaluation A key indicator of jitter.

ni is the noise introduced by the network jitter at the moment of receiving the i-th frame, mi is the estimation of the network queue delay depth change at the moment of receiving the i-th frame, dLi represents the size difference between the i-th frame and the i − 1 frame two adjacent packets, according to the following formula 3 calculate:

Equation 3: dLi = Li −Li-1

After the two sample data dmi and dLi are obtained, they are used as the input of the kalman filter, and the kalman filter is used to estimate the change mi of the delay gradient.

Additional instructions:
In the delay bandwidth estimation, Kalman filter is used to estimate the input samples, but only the network queue delay variation is used as an indicator.

In fact, network jitter can also be estimated. In the video jitter buffer, the same method is used to estimate network jitter, but the delay bandwidth estimation only uses the indicator "change of packet queuing delay in network queue." ”, no jitter is used; and only the kalman filter is used to estimate the jitter in the jitter buffer.
Therefore, on the issue of anti-jitter, you can try to use these indicators for related optimization work, and Rongyun will continue to optimize in this regard.

Therefore, the function of the Arrival filter is to use the kalman filter to estimate the mi through the measured dmi and dLi, which reflects the change of the network link packet entering the buffer queue; at the same time, mi will also be used as the input of the Adaptive threshold and Overuse Detector sibling modules. .

The Kalman filter is an efficient recursive filter (autoregressive filter) that can estimate the state of a dynamic system from a series of incomplete and noisy measurements.
The Kalman filter will consider the joint distribution at each time according to the value of each measurement at different times, and then generate an estimate of the unknown variable, so it is more accurate than the estimation method based on only a single measurement.

Adaptive threshold

This module periodically updates the threshold γi according to the estimated mi of the Arrival filter, and the mi and γi will be used as the input of the Overuse Detector module to evaluate whether the current network is in an overload state.

The threshold γi is a dynamic adjustment process and is calculated as follows:

Equation 4: γi = γi + ∆T ⋅ kγi(∣mi∣−γi)
Equation 5: ∆T = ti − ti -1
Formula 6: The recommended value of kd is 0.039, and the recommended value of ku is 0.0087

Overuse Detector

This sub-module evaluates whether the current network is overloaded according to the thresholds and queue delay variation estimates output by the first two modules.

If mi is greater than 0, it means that the queue depth in the network is increasing, indicating that the amount of packets sent in the link is increasing, and the delay is increasing. When no processing is performed, the network queue will become full, resulting in a large number of packet loss;
mi is equal to 0, indicating that the current network transmission delay has not changed, and the network transmission volume has no queue;
If mi is less than 0, it means that the current network queue is decreasing and the network congestion is improving.

The algorithm judges the current network load status by comparing the values of mi and γi, as shown in the following formula 7 and figure:


(Equation 7 schematic diagram)

Remote Rate Controller

This submodule will adjust the bandwidth based on the network overload status output by the Overuse Detector module.

GCC maintains three states: increase, decrease, and hold. The transition relationship between the three states is shown in the following figure:

(Three state relationships maintained by GCC)

The bandwidth adjustment policies in these three states are shown in Equation 8 below:

Ri represents the time when the i-th frame is received, and the actual packet-receiving bandwidth of the receiver is calculated according to the statistics.
Ari represents the estimated network bandwidth based on delay estimation when receiving the i-th frame

In the increase state, the bandwidth is increased by 8% based on the last estimated bandwidth, but the actual value does not exceed 1.5 times the received bit rate, that is, it does not exceed 1.5 ∗ Ri;

In the case of overload, the code rate needs to be reduced, and the current estimated bandwidth is 0.85 times the code rate received by the actual receiver as a reference, that is, 0.85 ∗ Ri.
This can quickly reduce the bandwidth of the sender and restore the network congestion state.

Remb Processing[5]

After the Remote Rate Controller sub-module calculates the final estimated bandwidth of the receiver, it will feed it back to the sender through the REMB RTCP message to inform the sender of the estimated link bandwidth of the receiver. The message format of REMB is as follows:

The message contains the following information:

  • Feedback Message Type (FMT) is 15
  • Payload Type (PT) is 206
  • SSRC of the sender of the message
  • Media source SSRC, usually 0
  • Identified as "REMB"
  • SSRC number of received packets
  • Estimated bandwidth value
  • Estimate the SSRC of media streams received on this bandwidth link, 1 or more

When the GCC algorithm feeds back REMB RTCP packets, it generally sends it every 200ms. When the bandwidth overload is detected, and the detected bandwidth is less than 95% of the previous bandwidth, it will be fed back immediately. The purpose is to lower quickly and rise smoothly.

sender module

The sender mainly modifies the bandwidth according to the packet loss rate. The purpose is to adjust the bandwidth based on the packet loss when the bandwidth of the delay estimation module does not adjust the bandwidth of the sender in time and congestion still exists.

The final bandwidth of the sender will be combined with the bandwidth adjusted for packet loss and the bandwidth fed back by REMB, whichever is smaller. The logic for adjusting the bandwidth based on the packet loss rate is as follows: Equation 9 and Equation 10:

Equation 10: Ai = min(Asi, Ari)

Ai is the bandwidth estimated by the final GCC based on the sender and the receiver at the ith frame time, and is the smaller value of the bandwidth calculated by the sender based on packet loss and the bandwidth estimated by the receiver based on delay.

The encoder bandwidth, sender pacing bandwidth, and retransmission bandwidth will be adjusted with reference to this bandwidth. Generally, the encoder bandwidth is: max(0.5 ∗ Ai, Ai − FecRi − RtxRi) , the pacing bandwidth is 2.0 ∗ Ai, and the retransmission bandwidth is 1.5∗Ai.
Note: Asi is the bandwidth value estimated by the sender according to the packet loss rate at the i-th frame, and fli is the packet loss rate received by the receiver at the i-th frame.

Summary of REMB-GCC Algorithm

The REMB-GCC algorithm has now been abandoned by Google for maintenance. Due to its distribution of the publishing end and the receiving end, the sending end and the receiving end need to cooperate with each other, and Kalman filtering is used to estimate the delay gradient change. There are some problems in actual use, such as kalman filtering. The estimation is inaccurate and complicated. It is more convenient, accurate and fast for both the receiver and the sender to participate in the bandwidth estimation at the same time.

Therefore, Google used TFB-GCC instead of REMB-GCC in subsequent versions of WebRTC.


TFB-GCC algorithm


(TFB-GCC algorithm architecture diagram)

As can be seen from the above figure, most of the work for bandwidth estimation is placed on the sender, and the receiver only does two things. The packet rate is fed back by RR. When there are multiple streams, the actual calculation is calculated by SR and RR together. Both delay-based estimation and packet loss-based estimation are processed at the sender. Loss-based estimation is the same as REMB-GCC, and there is no change; delay-based estimation mainly replaces the kalman filter with TrendLine filter.

Transport-wide sequence number[7]

The RTP packet sent by the sender will carry an extension header Transport-wide sequence number. The content of the extension header is as follows:

Here is a single byte indicating that the extension header (0xBEDE) is an identifier, length = 1, indicating that the extension occupies 4 bytes, L=1, indicating that the Transport-wide sequence number occupies 2 bytes.

When the sender sends a packet, it will add 1 to the Transport-wide sequence number of the extension field of the packet. It should be noted that when the sender sends multiple streams (the SSRC of each stream is different), all This extension field of the RTP packet of the stream is continuously counted, and will not be counted independently. The function of the extension header is to identify the correspondence between the sent packet and the feedback packet.

The receiver feeds back Transport-wide RTCP packets [6]

Under the framework of TFB-GCC, the receiving end mainly sends Transport-wide feedback packets regularly to inform the sending end and the receiving end of the relevant information about receiving packets, including the arrival of the packets and the arrival time of the packets, the message format and the core The fields are parsed as follows:

base sequence number: The transmission range sequence number of the first packet in this feedback, this number does not necessarily increase with each feedback, it may decrease in case of reordering.

packet status count: The number of RTP packets this feedback contains, starting from the packet identified by the base sequence number; for example, if the transport sequence number of the first recorded RTP packet is the base sequence number, then the second recorded RTP The packet transport sequence number is base sequence number + 1.

reference time: Indicates the reference time, in units of 64ms. The RTP packet arrival time information recorded by the RTCP packet is calculated based on this reference time. The first recv increment in this packet is relative to the reference time. Even if some feedback packets are lost, the reference time can calculate the delta between feedbacks because it always uses the same time base.

feedback packets count: used to record the number of Transport-wide feedback packets sent by the receiver. The counter is incremented by one for each feedback packet sent. This field can be used to detect if feedback packets are lost.

packet chunk: List of packet status blocks, used to indicate the status of the arrival of the packet, the indicated RTP packet range is multiple packets starting from the packet identified by the basic sequence number.

recv delta: For the packets in the "packet received" state in the packet chunk, that is, the received RTP packets, add the corresponding arrival time interval information to the recv delta list to record the arrival time information of the RTP packets. Through the previous base time and recv delta, the sender can calculate the arrival time of the RTP packet at the receiver.

Delay-based controller

This module is a bandwidth estimation module based on delay, which is equivalent to the receiver part of REMB-GCC.
Specifically includes ATF (equivalent to Arrival filter/Adaptive threshold in REMB-GCC)/Overuse Dectector/Remote Rate Controller.

ATF

Its main function is to estimate the delay gradient change mi. It uses the least squares method in the Trendline filter to optimally estimate mi according to the input dmi. The Trendline least squares method is as follows:

Equation 11: dmi = ti − ti -1 − (Ti − Ti-1 )

Note: ti and ti -1 are obtained from the arrival time of Transport-wide RTCP packets returned by the client. When sending RTP packets, the sender will record a sending time T for each Transport-wide sequence number RTP packet.

The following Equation 12 describes the accumulated queue time delay:

The following formula 13 is the smooth accumulated queue time delay:

Formula 13:
smoothedDelayi =smoothingCoef∗smoothedDelayi_1 + (1−smoothingCoef)∗accuDelayi

The following Equation 14 constructs a two-tuple according to the relative time of receipt of the i-th frame and the smoothing time delay:

(xi, yi) ⇒ (ti − t1 , smoothedDelayi)

The dyad will estimate mi according to Equation 15 below
图片
Here TrendlineSlope is mi.

The slope of the trend line is a reflection of the link queue status. When the link queue length increases, the packet arrival interval also tends to increase. When it is less than 0, it indicates that the link queue is shrinking; the packet arrival interval is also decreasing; equal to 0, the packet arrival interval is constant.
The Adaptive threshold is the same as REMB-GCC, that is, the same as Equation 4.

Overuse Detector

According to the mi and threshold γi calculated in the previous section, judge the current network status, whether it is overloaded, low-load or normal, and determine the next adjustment of the bandwidth estimate according to the network status, whether to increase, decrease or keep unchanged. This is the same as REMB-GCC. the same.

Remote Rate Controller

Here, the estimated bandwidth is adjusted according to the output of the Overuse Detector. Different from REMB-GCC, the AIMD method is mainly used to adjust the bandwidth, that is, when increasing the estimated bandwidth, it can be softer or more aggressive.

Since TFB-GCC will track the bandwidth every time the link is overloaded, when the current actual receiving bandwidth is close to the estimated bandwidth of the previous link, and the bandwidth needs to be increased, it will increase some bandwidth in a gentle way with a small increase;

When the current actual receiving bandwidth is greatly deviated from the link bandwidth, the bandwidth is increased in a radical way, that is, the increase is large;

When identifying network overload, the bandwidth needs to be reduced, and 0.85 times the current actual bandwidth is used as the estimated bandwidth.
Finally, the bandwidth calculated based on the delay estimation is Ari

Network congestion bandwidth estimation based on packet loss

This is the same as the packet loss network congestion estimation of REMB-GCC. In order to prevent the failure of the delay-based estimation, the packet loss rate fli is calculated by the RR fed back by the receiver, and the congestion state is estimated by using fli. Equation 9 is also used here to determine the final bandwidth. Asi, the final estimated bandwidth Ai is calculated as follows:

Equation 16: Ai = min(Asi, Ari)

Summary of TFB-GCC Algorithm

The GCC algorithm using this architecture, because the estimated bandwidth is placed on the sender side, does not require synchronization optimization at the receiver side, which facilitates the deployment and optimization of subsequent versions. The accuracy and timeliness are relatively higher and better, and the Trendline filter is used, which is simpler and more accurate than the kalman filter, and the sensitivity is also higher.
It is more flexible and safe when increasing the bandwidth because it takes into account the estimation of the link bandwidth at each overload.

The paper [4] presents a bandwidth-limited comparison between TFB-GCC and REMB-GCC. It can be seen that the effect of TFB-GCC is faster than that of REMB-GCC, the following link bandwidth is more accurate, and the overall effect is better.
图片
(Comparison of TFB-GCC and REMB-GCC)


GCC algorithm optimization point

REMB-GCC optimization points

  • The receiver replaces the kalman filter with the filter in TrendLine
  • Overload logic judgment optimization, eliminating noise and introducing misjudgment
  • Actual receive bandwidth optimization
  • Scenario-based optimization based on packet loss and congestion estimation
  • Calculation algorithm optimization for bandwidth increase or decrease based on delay estimation
  • Optimized for overload that may not reduce bandwidth
  • Optimize for the problem of increasing the accumulated error when the estimated delay bandwidth keeps rising, but the actual receiving bandwidth fluctuates.
  • ...

TFB-GCC optimization points

  • Estimated receive bandwidth optimization based on Transport-wide RTCP packets
  • Trendline filter optimization
  • Packet loss rate optimization based on Transport-wide RTCP packet statistics
  • Using RTT Optimization in Delay Estimation
  • Overload logic optimization and threshold optimization
  • Overload may be ignored for logic optimization
  • AIMD bandwidth calculation and other related logic optimization
  • Link bandwidth estimation optimization
  • Optimization of bandwidth estimation based on packet loss congestion
  • TFB-GCC sender supports receive REMB optimization
  • ...

References:

[1] https://datatracker.ietf.org/doc/html/draft-ietf-rmcat-gcc-02
[2] https://datatracker.ietf.org/doc/html/rfc8698
[3] https://www.rfc-editor.org/rfc/rfc8298.html
[4] Congestion Control for RTP Media: a Comparison on Simulated Environment
[5] https://tools.ietf.org/id/draft-avestrand-rmcat-remb-03.html


融云RongCloud
82 声望1.2k 粉丝

因为专注,所以专业