The balance of video QoE—demystifying the NetEase Yunxin NERTC video quality control system

This article is based on NetEase Yunxin's senior engine engineer Qi Jiyue's live broadcast sharing of "MCtalk Live#4: The Balance of Video QoE". There is a live video review and QA finishing at the end of the article.

Guide

With the rapid development of the Internet, the demand for Real Time Communication (RTC) is increasing day by day. How to obtain the best video experience quality (Quality of Experience, QoE) under various complex network quality of service (Quality of Serverice, referred to as QoS) and uneven hardware terminals is an important part of RTC technology.

Starting from the Video Quality Controller (VQC) module, this article introduces some of the work done by NetEase Yunxin NERTC in improving video QoE.

The role of VQC in video QoE

Video QoE mainly includes three indicators of video clarity, video fluency, and video delay, which are generally determined by network QoS, video processing algorithms, and VQC:

Network QoS: Provide as much available bandwidth as possible
Video processing algorithm: Under a certain bit rate, output the best possible video quality
VQC：
- Responsible for QoS, control code rate, ensure fluency and delay
- Responsible for the video algorithm, ensure performance, balance clarity and fluency

VQC monitors the video QoS status and video algorithm status, and outputs control signals to achieve the best QoE performance of the scene, including the balance of definition, fluency, and delay. Today, we mainly share the VQC implementation and QoE tuning related work on NetEase Yunxin NERTC.

VQC achieves 160deeab6802d3

The VQC module part of Netease Yunxin refers to the module design of WebRTC. The overall structure is shown in the figure, which mainly includes four monitoring modules and a strategy module. After the input parameters pass through the monitoring module, the current status results are obtained, and then the VQC strategy module determines the final output control signal to control the work of the video pipeline. Below, we introduce each module in detail.

QualityScaller

The function of the QualityScaller module is to monitor the current encoding quality and is mainly responsible for the clarity and encoder stability.

The module inputs the QP threshold determined according to the encoder type and encoding algorithm type, the QP value of the current output encoded frame, and the statistics of the current frame loss, and outputs the result of the quality of the video.

The QpSmoother module uses exponential weighted accumulation to determine the statistical value of QP, as follows:

Let's take a look at the composition of this formula in detail, in the formula:

sample is the QP of the current frame
Output y is the QP statistical value
Alpha is a set of coefficient values summarized according to the QP change characteristics of the encoder, and different coefficient values are used according to the upper and lower limits. For example, we tested the OpenH264 encoder, the QP upper limit coefficient value can use 0.9995, and the QP lower limit coefficient value can use 0.9999. Through this differential upper and lower limit coefficients, the response is quick when the video quality becomes worse (QP increases), and the response is slightly slow when the video quality becomes better (QP decreases).

The final statistic upper and lower limit QP statistic value is compared and judged with the input QP threshold value to determine the quality of the current picture quality. The input QP threshold is also different depending on the encoder. For example, in OpenH264, we tested the lower limit of the threshold used to be 24 and the upper limit is 37; hardware encoding on hardware iOS devices uses other values, and hardware encoding on Android is different. All need to be obtained based on extensive verification of the device.

MovingAverage is a sliding window function. The frame loss ratio in this window is taken. If it exceeds a certain threshold, it is considered that the quality has deteriorated.

In the end, the internal periodic query module will collect the statistical results of QpSmoother and MovingAverage, and output two results (no output will not be counted in the results):

Good video quality
- QpSmoother's QP statistical value lower limit is less than or equal to the QP lower limit threshold
- MovingAverage's statistical packet loss rate exceeds the threshold
Poor video quality
- QpSmoother's QP statistical value upper limit is greater than the QP upper limit threshold

OveruseFrameDetector

The main function of the OveruseFrameDetector module is to monitor whether the current performance can support the current frame rate operation, and is responsible for the smoothness of the video.

The module inputs the current target frame rate, resolution, and CPU Usage threshold, collects and sends video frame time, and outputs the current performance of good or bad results.

The ProcessingUsage module uses the input video frame acquisition and transmission time to count the entire video transmission link, that is, the time from video capture to transmission. Use this time to do some smoothing operations to obtain a statistical value. Use this statistical value and the current frame rate to download the frame Compare the theoretical duration of the interval, count whether the duration exceeds the theoretical value, and record the number of times. Then periodically collect the number of times, if more than a certain number of times, the result of poor performance CPU bad is output, and if the number of times is less than a certain number, the result of good performance CPU good is output.

In this module, you need to prevent some false CPU good or bad results, such as:

When the number of samples is small (for example, the frame rate is low), the time for periodically collecting data does not change, which may easily lead to error in results
When the new frame rate resolution first started to work, there was no problem with the processing time of each link, and special processing was required.

RateAllocator

The RateAllocator module is responsible for determining the use of the current bit rate, and acts as a strategy module for the use of large and small streams in the large and small stream scenarios.

This module has several key functions:

There are multiple users at the remote end, some users subscribe to small stream and some users subscribe to large stream. The module will determine the ratio of the limited bitrate according to what proportion is more appropriate
In the same scenario, when the bit rate is very insufficient, the module will decide to merge the large and small streams into one stream to improve the image quality.
When the downlink bandwidth is limited, the module will decide whether the sending end needs to reduce the bandwidth to send

MediaOptimization

The MediaOptimization module is mainly responsible for monitoring and correcting the real-time bit rate and frame rate to prevent network congestion caused by excessive bit rate, because the network will further deteriorate after congestion, resulting in a comprehensive reduction in image quality, fluency, and delay.

This module controls the real-time bit rate mainly through the internal FrameDropper module, which uses the funnel algorithm to determine whether the current bit rate is over-transmitted and whether it needs to drop frames to stabilize the bit rate.

Before encoding each frame, put the target bit rate of the frame as input into the funnel. After encoding, use the actual bit rate of the current frame as the output of the funnel, and then check whether the funnel is full. If it is full, discard the next one. Encode frames to control the bit rate. The size of the funnel is related to the tolerable delay and needs to be defined in a scenario.

The result of frame loss or not will also be output to the QpScaller module as an aspect of the basis for evaluating the encoding quality.

VQC decision module

The VQC decision-making module determines the current video strategy based on the results of all the aforementioned modules in VQC, combined with the user's scene settings.

It contains two state machines and a decision-making module.

The two state machines are independent of each other and do not affect each other:

Video quality state machine
Performance state machine

A decision module, we specify some of the important functions:

Set various internally adjusted thresholds according to the scene set by the user and the desired video parameters
According to the results of the state machine, decide to increase or decrease the video parameters (resolution, frame rate), and the strategy to increase or decrease
According to other information, determine other parameters of the current frame encoding, such as whether to encode a large stream or a small stream in a simulcast dual-stream scenario
According to other information, decide whether the algorithm needs to be adjusted, such as encoding algorithm, post-processing algorithm, etc.

Video QoE tuning through VQC

VQC guarantees good video QoE through full-link monitoring and adjustment of video quality. The following introduces some of the work of Xiayunxin RTC in tuning QoE through VQC.

Correctly judge the coding quality

There are many parameters that characterize the coding quality: PSNR, SSIM, QP, VMAF, etc. Because of the particularity of the hardware encoder and the consideration of the calculation cost of parameter acquisition, QP is selected as the criterion.

If you choose to use QP as an indicator to correctly reflect the coding quality, you need to consider the following points:

In conventional Slice QP encoding in H264/265, the general encoder can only reflect the quality of the first few encoded macroblocks. In the software encoding, you can use a better average QP as the QP of the video frame, so that the software encoding quality effect is better.
The QP thresholds of different encoding algorithms are different. For example, we can use (24, 37) as the upper and lower limits of QP judgment on OpenH264, but it needs to be adjusted on different encoders and different encoding algorithms, such as our NE264, NE265, and NEVC encoding algorithms all need to be adjusted accordingly.
The QP thresholds of encoders on different hardware acceleration platforms are different, such as iOS system, Android system, and even Android different chip platforms also need to be adapted accordingly.
Different coding algorithms and different hardware platforms have different QP quality change curves. In order to extract features, it is necessary to adjust the statistical coefficients of the statistical method.

Correctly judge performance problems

In order to prevent the video QoE from degrading due to performance issues, we need to be able to accurately identify performance issues and make correct and effective adjustments. In our current VQC, the video frame processing time is used to characterize the performance status. If you want to correctly identify the performance status, you need to consider the following aspects:

Can judge the performance of the entire process of coding and pre-processing
Some hardware has pipeline delay to consider
If the frame interval is not uniform, it will cause misjudgment performance problems, and this feature needs to be identified

In order to make effective adjustments, we mainly need to consider the following aspects:

Adjust according to the priority of performance consumption in the test. For example, the priority of some modules we tested is: pre-processing> encoding algorithm> frame rate adjustment> resolution adjustment
If you make corresponding adjustments, the statistical performance status still remains unchanged, we need to have corresponding processing methods, feedback the adjustment content and results to the state machine, and let the state machine report to the decision-making module for the next decision.
If the performance status changes too much, you need to increase the adjustment step

optimization adjustment

The effective adjustment is that the video QoE has improved significantly after adjustment, and we can adjust it mainly through the following aspects:

Resolution adjustment
Frame rate adjustment
Adjustment of Simulcast stream
Some algorithm switches for pre-processing
Codec adjustment

How does VQC optimize and adjust, as follows:

Support users to configure a variety of scenarios and strategies
- Communication mode, live mode
- Highly customizable by users: special scene mode, resolution constant mode, frame rate constant mode, minimum frame rate, minimum bit rate and other settings
Internal adaptive adjustment, according to a large number of test experiments to determine the parameter combination in a specific scene, adjust the step size and the best path, such as the following video resolution and frame rate adjustment step size and path

Conclusion

This article mainly introduces the design of the video quality control system VQC in NetEase Yunxin RTC, and some work in QoE tuning. No one strategy is perfect, fish and bear's paws are inextricable. What we do in QoE tuning is to balance the indicators of clarity, fluency, and delay through a series of methods under certain conditions, to seek advantages and avoid disadvantages. Through coordinated strategies and a large amount of data testing and verification, the optimal strategy is found.

QA finishing

The following content is organized according to QA records in the online live broadcast group:

@一 Weiyihang asks:
Q: When using ratecontroller, it is usually SFU forwarding mode. At this time, simulcast considers all subscribers' feedback from the server to the sender to adjust the bit rate of the large and small streams?
A: Our server has implemented strategies in various scenarios. The default is to use the configurable TopN strategy. Some head viewers try to use high-definition and high-smooth large streams. A small number of users with poor network quality use small streams. The server will Calculate an appropriate feedback bit rate according to the network conditions of all the downstream viewers
Q: Well, my question is not about the SFU forwarding strategy of the server, but the bit rate adjustment strategy of the sender in the size of the stream. When you talked about the ratecontrol module, the sender received a cc bandwidth feedback, and at the same time The sender provides simulcast, and you seem to be able to adjust the bit rate of large and small streams in consideration of the network status of different receiving ends. Do you have this capability?
A: We have a downstream cc, and the server will use the estimated bandwidth output from the downstream cc to synthesize a suitable bandwidth and feed it back to the sender. Simulcast makes decisions in our module based on the total bit rate on the terminal, whether to send a large stream, a small stream, or a dual stream. The server will also decide whether to send a large stream or a small stream to the downstream according to the situation of each end.

Q: How much delay will be caused by beauty and super scores?
A: Regarding the question of how much delay will be caused by beauty and overscore: less than the frame interval, we will not delay our pipeline, we have made dynamic adaptation, if it is greater than the frame interval, it will be dynamically closed, and the delay for the entire pipeline is less than A frame interval, if 30 fps is less than 33ms.

Q: Generally, this is done based on WebRTC. If you want to switch the codec in the middle, you must either recreate the peerconnection and negotiate again. Is your support added because of self-research, or is it supported by WebRTC itself?
A: We have partially referred to WebRTC. There is no need to renegotiate codec switching. We have done the ability negotiation in the channel. It is a private protocol. We do not need to do sdp exchange like WebRTC. We have our own ability to negotiate the protocol, and then the audio Switching inside the video engine.

Q: There is another question. How do you handle key frame requests in the RTC conference room. If every new user sends a key frame request, it will cause a lot of traffic in the room, but it will take a long time to wait until the next GOP if you don’t send it. , What kind of strategy did you use to balance?
A: 1. The general logic is that each newly added user will have an intra request sent out, and then the receiving end has a key frame sending interval control. In this way, there will not be too many key frames, and it will not lead to slow image output.

Of course, we have made some optimizations for quick image output, such as server intrarequest in advance, and save recent key frame operations.
These are some of the details that we actually adjusted, and we need to adapt them according to our own scenes, and we also divide them into scenes. Live broadcast and communication strategies are different.
@galen question:
Q: Tell me how to detect jams. Is there any particular attention to the frame interval time for jam detection?
A: Stuttering. We will count 200ms stuttering and 500ms stuttering, to detect the actual rendering frame rate interval, if it exceeds 200ms or 500ms, we will count small or large stuttering.
Author introduction

Qi Jiyue, a senior engine engineer at NetEase Yunxin, has long been engaged in audio and video related development work, and has in-depth research on WebRTC engines, audio and video conferencing systems, and video codecs. Currently, he is mainly responsible for the video experience of NetEase Yunxin NERTC engine.

Video review address: https://mctalk.yunxin.163.com/details-live/13

The balance of video QoE—demystifying the NetEase Yunxin NERTC video quality control system

Guide

The role of VQC in video QoE