头图

音视频学习--弱网对抗技术相关实践

Background introduction

Real-time audio and video calls exist all the time in current life, including social, security, transportation, and other aspects. Complex user scenarios, demanding requirements, and inconsistent network environments have brought great conditions for real-time audio and video calls. We have done a little work in this direction. Although we are still at a disadvantage compared with the optimization work of other big factories, there are still many things that can be optimized and improved, but some of the current progress and work content will be shared with you.

0.1 Network transmission:

We know that there are currently two types of network transmission: TCP and UDP. The related advantages and disadvantages are as follows; and there are many reasons that affect the quality of network transmission: including network congestion, network packet loss, and so on. These factors directly determine the quality of current real-time video calls and will also have a great impact on the user experience. This is the basic reason why we need to optimize.

img

0.2 Definition of weak network:

For real-time audio and video calls: network complexity, heterogeneity, protocol irregularities, network abnormalities, network errors, and other characteristics that are disrupted in the network environment are called weak networks. A weak network environment cannot provide high-quality network transmission. For the receiving end, it is impossible to receive continuous media packets, resulting in abnormal sound, video mosaic, blurred screen, black screen, etc., which is very fatal for audio and video real-time calls, and has a direct impact To the user’s experience, it causes product quality problems or customer complaints.

img

0.3 Real-time audio and video features

For real-time audio and video calls, the highest requirements are low latency and high quality. The existence of this pair of characteristics is a natural contradiction combination. High quality requires the sending end to send high-resolution, high-quality audio and video streams as much as possible. The bandwidth and network environment requirements are relatively high, and all kinds of packet loss and high jitter are not allowed; low latency is for the network environment It is not so strict. A certain amount of packet loss and a certain range of receiving jitter are allowed. Otherwise, space can only be used to exchange time, resulting in real-time audio and video timeliness and unable to meet the requirements of low latency. So this is the story of a pair of spears and shields. Only by seeking a breakthrough on the spear and giving protection on the shield can such harsh conditions be met.

img

One FEC

There are three ways to implement webrtc FEC: RED (rfc2198), ULPFEC (rfc5109), and FLEXFEC (not yet approved). But Ulpfec uses the RED shell for transmission, so we use ULPFEC for FEC protection.

img

The basic principle is: at the sender, a certain amount of redundant FEC packets are added to the media packets, and the FEC packets are obtained through exclusive OR XOR. If part of the media packets are lost during network transmission, the received media packets and FEC packets are XORed to obtain the lost media packets, so that there is no need to send NACK retransmission packets to occupy network resources.

Since this part has been implemented well in WebRTC, it only needs to be tested for compatibility. This part is not used as a key optimization object, and the content in WebRTC can be used.

Two NACK

2.1 NACK introduction:

NACK stands for Negative Acknowledgement. It is one of the error recovery mechanisms in WebRTC. NACK is a way for the receiving end to indicate that it has not received a specific packet.

The NACK message is sent to the sender of the media through RTCP, and the latter needs to determine whether to retransmit the lost data packet it has received based on its availability in the buffer and the estimation of the usefulness of the retransmission (whether it can be used). The sender maintains a buffer queue. If the retransmitted packet is in the buffer queue, it will be taken out of the buffer queue and sent again; if it is not in the buffer queue, it will not be sent again, so the decoder cannot receive the retransmitted packet.

img

2.2 Optimization and improvement:

Since the current system still uses the NACK process in the old version, the specific process is as follows:

  • After the RTP module decapsulates the packet, it transmits the data packets to the JitterBuffer module in the order of arrival;
  • Each data packet needs to be compared and sorted by sequence number when inserting into the JitterBuffer module. If the sequence number is relatively new to the data packet, then NACK construction is performed. The last saved sequence number + 1 is the starting value, and the new received The sequence number is the end, and the sequence number between is first cached in missing_sequence_numbers;
  • WebRTC uses thread query in JitterBuffer, traverses once at a certain time, confirms whether there are data packets in the current nack list that are not received in order at the current time node, if they exist, it will assemble and send NACK RTCP packets to the sender , The request to send corresponds to the received data packet.

NACK is a confirmation process for unreached data packets. The original process has many nested functions or complicated processes. Therefore, we have made two optimizations on the basis of understanding. The approximate flow chart of the two rounds of optimization is as follows:

img

There is also a problem that NACK cannot avoid: if the network packet loss rate is relatively high, or the network is jittered, the network heterogeneity leads to serious network out-of-order arrival situations, such as when the jitter exceeds 200ms, some lost data packets are delayed If it does not arrive, the packet will be repeatedly sent many times, which will cause network congestion, especially when the resolution is relatively high, it is very easy to cause the video frame to be unable to be completely decoded, mosaic or black screen.

Therefore, we once again added and modified some judgment conditions, optimized the judgment conditions of the buffer queue and the empty queue, and partially adjusted and optimized the process of obtaining the complete video frame to minimize the impact of the above situations and improve the user experience.

Three bandwidth adaptive

3.1 Bandwidth adaptation:

When we are making a real-time video call, we collect data set parameters at different frame rates on the device side, so that the sender maintains a maximum frame rate parameter set. After that, the captured images are encoded and sent to the network in packets. However, if the network changes and it is still sent to the uplink network frame by frame according to the current bit rate, or the encoding performance of the acquisition end is unstable and cannot consume the acquired image frame sequence, the sender will reduce the frame rate or lose frames to alleviate the situation. The sending pressure of the sending end.

In WebRTC, when the encoded transmission code rate is overloaded or the load is uneven, call the MediaOptimization interface to reduce or increase the frame rate and then reduce or increase the code rate, so as to effectively use the current bandwidth and prevent the network from getting worse, or Insufficient network bandwidth load affects user experience.

3.2 Frame diagram

Sender: Estimate the current available bandwidth based on the packet loss rate

Receiving end: calculate the available bandwidth based on the arrival time of the packet

Synthesis: The receiving end sends REMB feedback to the sending end, and then the final sending rate is determined based on the bandwidth estimation of the sending end and the bandwidth estimation of the receiving end.

img

3.3 sender

The basic principle of the bandwidth estimation algorithm at the sender is to read the packet loss rate information in RTCP, and then use the algorithm to dynamically calculate the basic situation in the current network, and determine whether to increase or decrease bandwidth resources. If it is judged that the bandwidth needs to be reduced, the TFRC algorithm is used to smooth the process to reduce the risk of sudden increase or decrease.

img

3.4 Receiver

The basic principle of the bandwidth estimation algorithm at the receiving end: read the received RTP data statistics to estimate the current network bandwidth; in WebRTC, the Kalman filter frame is used to complete basic statistics and calculations on the receiving and sending timestamps of the current frame to estimate the current network bandwidth Congestion situation and utilization rate, assess and correct the bandwidth size, and then affect the network bandwidth.

img

Four optimization and improvement:

4.1 Optimization and improvement

The main optimization points of our work are as follows:

NACK two rounds of optimization : Including the overall improvement of the previous version of the algorithm (refactoring the original R4X related code, using the same NACK acquisition algorithm as iOS, and on this basis, the buffer queue and emptying judgment conditions are adjusted and optimized, and the complete decoded frame is obtained Related process optimization, long-term buffer frame discarding strategy adjustment, etc.), optimization and adjustment of Jitterbuffer parameters;

FEC, dynamic resolution, NACK overall strategy optimization : According to different network conditions, according to the packet loss rate, RTT and other related parameters, as well as the jitter average within 5s, etc., design a set of dynamic adjustments to the overall plan of the current combination, both A certain amount of redundancy is added to prevent the aging phenomenon of FEC high packet loss, and the resolution can be gradually reduced to achieve a smooth playback experience when high packet loss occurs.

Network storm suppression optimization : There is a network suppression strategy for retransmission packets in WebRTC. NACK retransmission packets and FEC redundant packets are no longer sent when the proportion of retransmission packets is 35%, but this proportion is for 720P and more than 30% packet loss Very unfriendly, so a large number of actual verification tests, when the proportion of retransmission packets and FEC redundant packets is 30%, it can reasonably avoid the network storm phenomenon, while meeting the NACK request retransmission strategy, reaching 720P, 30% packet loss can still be obtained 15~25 frame rate for smooth playback.

4.2 Optimization results

img

The overall solution has been tested and accepted in some laboratory scenarios: including IP and SIP calls, 720P and VGA resolution verification, compared to the previous version, to a certain extent improved user experience. Especially under wired network conditions, the IP live broadcast has improved significantly, and the overall subjective score has been significantly improved. At the same time, the anti-delay is also created from scratch, which can cover the environment under 300ms. This solution has been used in the actual environment, has been recognized by users, and has taken the lead in the process of PK with competitors.

Five issues and measures:

5.1 Congestion detection:

Need to locate the current network status more quickly and accurately. Once congestion occurs, the current sending strategy can be quickly adjusted. A video project has already begun pre-research.

5.2 The latest WebRTC weak network control

The NACK module in WebRTC is integrated into the VIE module as an independent module, and is decoupled from the Jitterbuffer module to realize real-time monitoring of network packet loss and send it independently.

Advantages: NACK list can be obtained in real time; decoupled from JitterBuffer, obtaining more convenient and faster. A video project has already begun pre-research.

5.3 Frontier of Weak Net

2020-10-24 Participate in the RTE2020 Internet Real-time Internet Conference of Shengwang. Shengwang has achieved 65% packet loss under 720P 2.0M bandwidth and smooth playback.

img

1. Deeply strengthen the application of algorithms in congestion control;

2. Deep optimization of real-time H264 video encoder algorithm.

img

img

These are the places where we will go to investigate and learn later, and we also look forward to the children's shoes with relevant experience to discuss and learn together.


RTE开发者社区
647 声望966 粉丝

RTE 开发者社区是聚焦实时互动领域的中立开发者社区。不止于纯粹的技术交流,我们相信开发者具备更加丰盈的个体价值。行业发展变革、开发者职涯发展、技术创业创新资源,我们将陪跑开发者,共享、共建、共成长。