This article shares translations based on Pranav Sodhani's "Evaluate videos with the Advanced Video Quality Tool" theme in WWDC 2021. Pranav Sodhani, from Apple's display and color technology team, has professional knowledge in algorithm development, machine learning, color science and video technology. The translator Tao Jinliang, a senior audio and video development engineer at NetEase Yunxin, has many years of end-to-side audio and video work experience.
This article mainly shares how the Advanced Video Quality Tool (AVQT) helps us accurately assess the perceived quality of compressed video files. Using the AVFoundation framework, AVQT supports various video formats, codecs, resolutions, and frame rates in the SDR and HDR domains, enabling simple and efficient workflows—for example, without decoding to the original pixel format, etc. AVQT uses Metal to sink heavy pixel-level calculations to the GPU to achieve high processing speed. It is usually used to analyze videos that exceed the real-time video frame rate. With its excellent ease of use and computational efficiency, AVQT can delete low-quality videos from the video catalog to prevent them from affecting users in the application in other ways.
Use background
In this presentation, we will introduce a video quality tool-AVQT (Advanced Video Quality Tool), and show how to use it to evaluate the perceived quality of compressed video in an application or content creation workflow. Let's start with a typical video delivery workflow.
In such a workflow, high-quality source video undergoes video compression and optional video reduction to generate a video with a lower bit rate. These low-bit-rate videos can then be easily transmitted over a bandwidth-constrained network.
Several ways to use this type of workflow include AVFoundation API (such as AVAssetWriter), applications (such as Compressor), and it can also be one of your own video compression workflows.
Common reduction and compression of the original video will cause the video to be blurred. This will result in a decrease in the subjective quality of the source video and result in visible unclearness. An example of this lack of clarity is the blocking effect in compressed video, as shown in the frame on the right.
Another example is when the video looks blurry and the video details begin to disappear, this kind of artifacts can adversely affect the consumer's video quality experience. We all know that consumers expect a high-quality video experience. Therefore, it is very important to realize this function.
Now, the first step in performing this operation is evaluate the quality of the delivered content . The most accurate method is to let real people watch the video and rate it based on the video quality level. But if we want to evaluate a large number of videos, this is very time-consuming and not scalable. Here we can choose another objective method of characterizing video quality so that we can automate the process to improve speed and scalability.
In such a setting, the perceptual video quality tool takes the compressed video and the source video as input and outputs a video quality score, which can be a floating point number in the range of 1 to 5, and simulates the ratings of real people on the compressed video.
What is AVQT?
Today, we are very happy to provide developers with such a perceptual video quality tool-Advanced Video Quality Tool, AVQT for short, let us learn more about AVQT below.
So what exactly is AVQT? AVQT is an executable file provided for the macOS command line. It tries to imitate how real people evaluate the quality of compressed video. We can use AVQT to calculate frame-level and segment-level scores, where segments are usually several seconds long. Of course, we also added support for all AVFoundation-based video formats in AVQT, including SDR and HDR video formats, such as HDR10, HLG and Dolby Vision.
Three characteristics of AVQT
Next, we are going to discuss three key attributes of AVQT that make it very useful across applications. First, we will see subjective perceptual consistency, then we will discuss AVQT's calculation speed is fast , and finally we will show why sets the viewing parameter when predicting video quality is important. Let us understand each of them in detail.
Subjective perception consistency
AVQT is closely related to human perceptions of video quality, and it is applicable to various types of content, such as animation, natural scenes, or sports. We found that traditional video quality indicators, such as PSNR and structural similarity (SSIM for short), usually cannot be uniformly and objectively evaluated in different content types.
Let us look at an example.
This is a frame from a high-quality sports clip and is our first source video. Let us look at the same frame in the compressed video, we can see that the frame is indeed of high enough perceptual quality, its PSNR score is about 35, and the AVQT score is 4.4.
Next, we perform the same test on the second source video. The compressed video in this case seems to have visible artifacts, in particular, we can see some artifacts on human faces. However, it’s interesting that it got about 35 PSNR scores that are the same as the previous video, but this time AVQT rated it around 2.5, which means the quality is poor. We believe that the AVQT score here is the correct prediction. But this is just an example we chose to illustrate possible problems in cross-content evaluation.
We want to test the perceptual accuracy of AVQT on different video sets. Therefore, we evaluated it on a publicly available video quality dataset. These data sets include source video, compressed video, and video quality scores provided by human subjects.
Here, we look at the results of two data sets: Waterloo IVC 4K and VQEG HD3:
- Waterloo IVC data set: includes 20 source videos and 480 compressed videos, covering encoding and scaling artifacts, it covers four different video resolutions and two different video standards.
- VQEG HD3 data set: relatively small, it has 9 source videos and 72 compressed videos, which are generated using video encoding with 1080p video resolution.
In order to objectively measure the performance of video quality indicators, we use the Person correlation coefficient and RMSE distance metric:
- Pearson correlation coefficient, or PCC for short, measures the degree of correlation between predicted scores and subjective scores , higher PCC value means better correlation.
- RMSE measures the difference between the prediction and the subjective score . A lower RMSE value means a higher prediction accuracy.
Now, we want to evaluate the AVQT's ability to predict the scores given by human subjects. As shown in the figure below, on the x-axis is the real subjective video quality score, and on the y-axis is the score predicted by AVQT. Each point represents a compressed video.
It can be seen from the scatter plot that, except for a few outliers, AVQT does a good job of predicting the subjective score of the data set, which is also reflected in the high PCC and low RMSE scores. We have also seen high performance on the VQEG HD3 dataset.
Fast calculation
Let us continue to discuss the calculation speed of AVQT. We all know that high calculation speed is very important to ensure scalability. AVQT's algorithm is designed and optimized to run quickly on Metal, which allows us to browse large video files very quickly. At the same time, it is also responsible for processing all preprocessing locally, so that we don't have to decode videos and scale them offline. AVQT can run 1080p video at 175 frames per second. Therefore, if we sometimes have a 10-minute, 24 Hz 1080p video, AVQT can calculate its quality in 1.5 minutes.
Set viewing parameters
The last property we want to discuss is to set the viewing parameters. Our viewing settings for the video will affect the video quality we perceive when watching the video. In particular, display size, display resolution, and viewing distance may mask or exaggerate artifacts in the video.
In order to solve this problem, AVQT inputs these parameters as tools, and then tries to predict the correct trend as these parameters change. Let’s look at a case like this and consider two scenarios:
In scene A, we watch 4K video on a 4K monitor at a viewing distance of 1.5 times the screen height. In scene B, we are watching the same video on the same monitor, but the viewing distance is now three times the height of the screen. Obviously, in scene B, we will miss some details that are visible when you look closely. This means that the video quality we perceive in scene B will be higher than in scene A. AVQT can be used to calculate the final video scores of different quality levels due to viewing distance to reflect a certain trend.
As shown in the figure above, as the viewing distance increases from 1.5H to 3H, the AVQT score will also increase. If you want to know more technical details, you can check the README file provided by the tool.
Now that everyone is excited about AVQT, let's see how to use the tool correctly. We will soon provide AVQT to everyone through the Apple Developer Portal ( https://developer.apple.com/
Let's first look at an operation demonstration. First, I have downloaded AVQT and installed it on the system, check "which AVQT", you can see AVQT is placed in the usr/local/bin directory. Now, we can call AVQT and help commands to read about the usage of the different flags supported by AVQT and other more information.
There is a sample reference and a sample compressed video in the current directory, and I use them to run AVQT. We will provide reference and test files as input and specify an output file, and name the output file sample_output.csv. The tool will print the progress on the screen and report the segment score. The default clip duration is 6 seconds. Since this clip is 5 seconds long, we only have one clip. Next, take a look at the output file, where you can view the frame level scores. Finally, we put the score of the subdivision level at the bottom.
In addition to the options shown during the presentation, the tool also has some built-in functions.
For example, we can use the segment-duration and temporal-pooling flags to change the way the frame-level scores are aggregated. Similarly, the viewing distance and display resolution flags can also be used to specify viewing settings.
Please refer to the readme file for more details. So far, we have understood some of the key attributes of AVQT and demonstrated how to use command line tools on a pair of videos to generate video quality scores.
AVQT use cases
Let us now look at a specific case where we can use AVQT to optimize the bit rate of HLS.
HLS (a streaming media network transmission protocol based on HTTP proposed by Apple) layers are encoded at different bit rates. We know that choosing these bit rates is not always a simple process.
To help solve this problem, we have published some bit rate guidelines in the HLS authoring specification document. These bit rates are only the initial encoding targets for delivering typical content via HLS. We also know that different content has different encoding complexity, which means that the optimal bit rate varies for different content.
Therefore, it is suitable for one type of content, for example, the bitrate of an animated movie may not be suitable for sports events.
Let’s take a look at how to use AVQT as feedback to help us determine the best bitrate for content. First, we start with the initial target bitrate and use this bitrate to encode our source video and create the HLS layer. Then, we use AVQT to calculate the video quality score through the source video and the encoded HLS layer. Finally, we can analyze the AVQT score to decide whether to increase or decrease the target bitrate of the HLS layer.
To demonstrate this, we choose a specific HLS layer. Here, we choose a video with a resolution of 2160p at 11.6 M per second. Then, we will use the recommended bitrate to encode the first two sequences: animation and sports. After we prepare the coding layers, we use AVQT to calculate their video quality scores.
The figure below shows the AVQT scores of two video sequences. For this particular layer, we want to get high video quality, so we set the threshold to 4.5, which means close to excellent quality. It can be seen that although this bit rate is sufficient for this animation clip, it is not enough for sports editing.
Therefore, we go back and use this feedback to adjust our bitrate target, we need to increase the bitrate target of sports clips and recalculate their AVQT scores.
Our goal is to increase the bit rate by 10%. Here, we plotted the new AVQT score for sports clips. The updated score is now higher than our expected quarter and a half threshold, and it is also closer to the video quality of the animation content.
Finally, we hope that the speech can tell you one thing: video compression will cause visible artifacts, thereby affecting the video quality experience of consumers.
We can use AVQT to evaluate the quality of compressed video. AVQT is provided as a macOS command line tool with fast calculation speed and viewing parameters can be set. It also supports all AVFoundation-based video formats, and AVQT can also be used to optimize the video quality of the HLS layer.
to sum up
The above is the translation of all the content shared by Pranav Sodhani at the WWDC 2021 conference. If there is any unreasonable translation, please correct me.
At present, the super-scoring function implemented by NetEase Yunxin's full platform can just use AVQT to evaluate the subjective image quality after the super-scoring. You are also welcome to use the latest SDK to experience our super-scoring function.
about the author
Pranav Sodhani, from Apple's display and color technology team. Pranav has expertise in algorithm development, machine learning, color science and video technology. He received a master's degree in computer science from the University of California, Los Angeles (UCLA) in 2017, and a bachelor's degree in electrical engineering from the Indian Institute of Technology Guwahati (IIT G) in 2015. Conducted research in Canadian and Korean universities, and published papers at international machine learning conferences. Won a number of scholarships and awards, including OP Jindal Engineering and Management Scholarship (OPJEMS), Mitacs Globalink Award, and the gold medalist of the 4th International Mathematical Olympiad. At the same time, he is also the author of "Haha Reacts Only-Pranav Sodhani Original Joke Collection", which was published in 2018 for Indian audiences.
- Share the recorded video: https://developer.apple.com/videos/play/wwdc2021/10145/
- Reference document: https://www.its.bldrdoc.gov/vqeg/vqeg-home.aspx
For more technical dry goods, please pay attention to [Netease Smart Enterprise Technology+] WeChat public account
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。