The rise of multiple scenarios such as online meetings, online education, and e-commerce live broadcasts has also enabled real-time interactive technology to move from behind the scenes to the front of the stage, attracting more people's attention. A series of RTE-related technologies, such as codec, network transmission, and computer vision, are also showing greater vitality. In 2021, with the blessing of deep learning, 5G and other technologies, what possibilities will RTE further give birth to?
The Agora developer community of the sound network is jointly planned with InfoQ, and a number of technical experts in the developer community of the sound network Agora are invited to co-author from the perspectives of video transmission, computer vision, codec standard development, WebRTC, machine learning, audio technology, etc. "2021 Real-time Interactive Technology Outlook Series" gives a glimpse of new technology trends. The author of this article Zoe Liu, chief scientist and co-founder of Microframe Technology. This series of content was jointly planned by the Agora developer community of first published on InfoQ .
In June 2018, the AOM Alliance (Alliance for Open Media, Open Media Alliance) released a new generation of video coding standards-AV1 (Alliance for Open Media Video 1). So far, the AOM Alliance has 47 corporate members, including 14 Board Members and 33 Promoter members.
The zero version of AV1 was originally derived from the same open source, royalty-free VP9 codec code library libvpx, and at the same time absorbed the research and development results of Google VP10, Mozilla Daala and Cisco Thor three open source coding projects. As of June 2018, AV1 sealed the manuscript. Compared with its predecessor, VP9, AV1 has launched more than 100 new coding tools, representing the latest coding technology in the industry.
In this article, we will discuss the technical trends that AV1 may appear in real-time scenarios in the future. At the same time, due to the limited data of AV1 in the real-time scene, in order to more intuitively explain the changes that will occur, we will base on the performance data of Aurora AV1 in the real-time scene and compare it with existing encoders, including H264, VP9 and other open source codecs. The comparison statistics of the device can be shared. Through the examples and data of Aurora, the main purpose is to show that the AV1 standard has fully entered the practical stage in real-time scenarios. We also look forward to working with industry colleagues around these research data, and more exchanges and discussions with us.
AV1 application practice and ecological development in RTC scenarios
RTC technology upgrade and application expansion have been turbulent in recent years. Especially under the 2020 epidemic, the RTC field has shown explosive growth, covering video conferencing, online education, remote terminals, game interaction, e-commerce interactive live broadcast, telemedicine, online finance, etc. In the field, the typical video content is mainly divided into two categories: screen content and camera Talking Head content. For RTC ultra-low-latency interactive scenes, the polishing and application of video encoders, in addition to the basic considerations for encoder performance such as encoding efficiency and video quality, include encoding delay, encoding speed, encoding complexity, and adaptive code control. , As well as the performance of adapting fault tolerance to the network layer, there are strict requirements. AV1's rich coding tools, such as its unique screen content coding tools, make AV1 a great possibility for improving the user experience of RTC real-time interactive scenes.
WebRTC is currently the industry's most influential real-time interactive open source project, providing audio and video APIs for Web and mobile RTC applications. In January 2021, the W3C standards organization officially determined WebRTC 1.0 as a standard recommendation. The WebRTC open source code library mainly includes three open source video encoders, VP8, VP9 and H264 Openh264 in libvpx. AV1 is derived from VP9 and has natural coupling and synergy with WebRTC, including support for temporal scalability (Temporal Scalability) and other features. At the same time, AV1 is the first screen content coding (Screen Content Coding, SCC) tool to introduce its main video coding standard, that is, any AV1 standard decoder must support SCC. This is a huge advantage for AV1 to process computer-generated content in real-time scenes compared to other standards.
The effective software solution of AV1 is an indispensable solution for RTC scenarios, whether on PC or mobile platforms. The AV1 software open source decoder currently includes libaom maintained by AOM/Google, SVT-AV1 maintained by AOM/Intel, libgav1 launched by Google especially for Android devices, and dav1d maintained by the VideoLAN, FFmpeg open source community and funded by AOM. According to our user evaluation, dav1d has the best overall performance. Dav1d 0.8 was launched in January 2021, and it has been further optimized on AMD and arm architecture.
The real-time file of libaom, the open source codec of AOM/AV1, also known as libaom-RT file, has been absorbed by WebRTC and officially adopted from Chrome 89 version. In 2020, Google’s real-time calling product DUO and video conferencing product Meet are based on libaom-RT AV1, and are the first to launch the application of AV1 in RTC scenarios. Afterwards, Cisco WebEx also announced that it has begun to use AV1 codec on the PC in its video conferencing scenes, especially screen sharing scenes.
The microframe team launched the fully self-developed Aurora AV1 encoder in 2019 and became the world's first RTC scene AV1 commercial encoder provider. Aurora AV1 has been continuously polished and upgraded in practical applications. At present, it has achieved PC-side screen content encoding and stable operation in the Talking Head scene of the camera. The application of Aurora AV1 on mobile terminals and other ARM models is also becoming more mature. The performance data in this article are all based on Aurora AV1.
Of course, no matter how advanced coding standards are, they need a complete and sustainable ecosystem to support them. AOM members cover the complete ecosystem of video collection and production, transmission and sharing to playback and consumption. For the RTC field, AOM members also include a number of global leaders in RTC technology and applications, such as Agora, Cisco /WebEx), Poly, etc. At the same time, AV1 members include browser providers: such as Google (Chrome), Apple (Safari), Microsoft (Edge) and Mozilla (Firefox); hardware manufacturers: such as Intel, AMD, nvidia, arm, SAMSUNG, Xilinx, Broadcom, and China's Huawei, etc.; cloud service providers: such as North America's Amazon (AWS), Microsoft (Azure), Google (GCP), IBM, and China's Alibaba (Ali Cloud), Tencent (Tencent Cloud), Kingsoft Cloud, Huawei (Huawei Cloud), etc.; also including network and system providers such as Cisco. AV1 has natural ecological advantages.
AV1 RTC is currently supported by browsers (except Safari, but Apple is a member of the AOM board of directors) and Android mobile OS support, and hardware solution support is gradually improving. Apple is a member of the AOM board of directors and currently has a positive attitude towards AV2 advancement. It is expected that Apple will support AV1 in a short time. In addition, although Qualcomm is not a member of AOM, the industry generally believes that for AV1 support, Qualcomm will launch a hardware solution chip that supports AV1 by the end of 2021 and early 2022 at the latest.
AV1 RTC screen content encoding
In the AV1 standard, specific tools such as IntraBC, Palette mode, etc., which are particularly suitable for encoding screen content, are provided. In addition, CfL (Chroma-from-Luma) is a tool that is not specifically designed for screen content, but it is a more effective tool for screen content encoding.
Note: x264 in the figure uses the ffmpeg command line-ffmpeg -r 30 -s 1920×1080 -c:v libx264 -x264-params bframes=0 -tune zerolatency -preset superfast -threads 1
Aurora AV1 has an absolute advantage over existing coding standards, including VP9, H264, etc., in terms of the efficiency of screen content compression at different resolutions. As shown in the figure, for example, using ordinary PC single-core resource encoding, Aurora is compared with the open source x264 superfast real-time file. For the 1080p30 screen content test sequence set, the BD-rate (PSNR) gain is 81.25%, that is: for the evaluation set, Aurora AV1 only needs x264 (1-81.25%)=18.75%, which is less than 1/5 of the bit rate, to obtain similar PSNR objective quality.
The figure above shows the encoding speed comparison between Aurora AV1 and x264 superfast files. For 1080p screen content video in a single thread, the x264 speed is as high as 132+FPS (frames per second), while the Aurora is 46+FPS, which is about 1/3 of the x264 encoding speed. Although the encoding speed of Aurora is far inferior to x264, it is further considered that in most scenarios, the required frame rate of screen content is generally lower than that of ordinary camera content. For screen content RTC scenes, AV1 has fully met the practical requirements.
AV1 RTC time-domain scalability coding
Temporal scalability and adaptive frame loss are especially important for RTC scenarios. Due to the dynamic changes of network conditions such as network bandwidth, RTT delay, Jitter jitter, and packet loss, the encoder needs to cooperate with the network control layer to make adaptive adjustments. The time domain scalability of the video encoder is more important than the spatial scalability, because the time domain scalability is in the encoder's ability to resist dynamic changes in network bandwidth, fault tolerance robustness, coding efficiency and subjective video experience , The overall performance is better, and it is suitable for dynamic adjustment while keeping the subjective quality stable.
As shown below, two temporal scalability modes are currently implemented in the Aurora AV1 encoder. In the two modes, other video frames outside the basic layer can be adaptively discarded to suit the dynamic network bandwidth requirements. AV1's time-domain scalability inherits the existing VP8 and VP9 encoder features in the WebRTC platform, and has a natural fit with WebRTC.
AV1 RTC camera content encoding
In addition to the screen content, for the Talking Head scene of the video conference, AV1 can also highlight its standard advantages after careful optimization.
As shown in the following two figures, in the 480p and 720p video conference scenes, Aurora AV1 compared to x264 medium file, in AMD Ryzen 9 3900X 12 core (12C24T), 2 thread encoding, Auora superfast can obtain BD-rate (PSNR) gain average Above 20%, meanwhile the coding speed advantage is above 30%.
Note: The command line used by x264 is --nal-hrd none --preset medium --profile main --threads 2 --tune zerolatency --no-psy --aq-mode 0 --no-scenecut
AV1 RTC mobile platform encoding performance
The complexity of AV1 standard tools makes it more challenging to implement on mobile phones.
At the same time, as mentioned earlier in this article, WebRTC/Chrome has opened up AV1 RTC support based on libaom-RT files, and the performance of libaom-RT open source encoder is constantly improving.
As shown in the figure below, we compare Aurora with libvpx-VP9, x264, and libaom-RT for RTC mobile terminal application scenarios in terms of encoding efficiency and encoding speed: the encoding platform is Snapdragon 845 mobile phone, single-threaded CBR settings, select 40 180p typical real-time scene videos, the target bit rate range is set in 50kps ~ 200kbps.
Each curve in the figure represents the performance of an encoder, and each coordinate point on the curve represents a specific speed gear of the encoder. The vertical axis represents BD-rate (PSNR). All encoder presets are based on x264 medium files (anchor). A negative value of BD-rate means that compared to anchors, a lower bit rate can be used to obtain the same video quality. Therefore, the lower the position of the curve coordinate point, the greater the compression performance advantage of the encoder; the horizontal axis marks the encoding speed, and the closer the curve coordinate point is to the right, the faster the corresponding encoding speed.
The figure shows that Aurora is far superior to VP9 and x264 in terms of coding efficiency. Aurora is still being optimized. The current settings of superfast and ultrafast files will most likely become lower speed configurations, and will provide multiple speed files suitable for RTC scenarios from medium, fast, faster, veryfast, superfast to ultrafast . Compared with libaom-RT AV1 in WebRTC, Aurora significantly exceeds the overall performance of encoding speed and encoding efficiency. While speeding up, Aurora will try its best to maintain the full standard advantage of AV1. (Note: Aurora and libaom-RT are both versions on March 5, 2021)
Regardless of the open source code library libaom-RT, or the commercial encoder Aurora, AV1's optimization iterations on the mobile phone mobile platform will certainly continue its historical trajectory. In the future, performance will continue to improve to meet the needs of more and more RTC scenarios , On the basis of the existing coding standard scheme, further greatly enhance the user experience.
Combination of AV1 and AI
In the RTC scenario, the combination of AV1 and AI should greatly improve the performance optimization of all aspects of the encoder, including pre-processing, content classification, ROI scene optimization, and intelligent code control design and implementation. AV1 AI technology can be used to show further potential. The microframe team has cooperated with many domestic and overseas universities, and the title "Advances In Video Compression System Using Deep Neural Network: A Review And Case Studies" has been accepted by the top IEEE journal "The Proceedings of the IEEE". The article is AV1 As a benchmark, there is a certain preliminary exploration of the combination of video coding and AI in pre-processing and post-processing, and the use of AI in future coding standards, such as AV2. This paper can be downloaded directly from arXiv.org (link: https://arxiv.org/abs/2101.06341)
AV1 subjective coding performance
As shown in the figure, using Aurora AV1 encoding, under the same bit rate, that is, the same bandwidth, the AV1 encoding image quality is significantly better than the x264 encoding result.
Combining the above-mentioned high-quality performance of AV1 and its natural fit with RTC scene applications, we expect AV1 to be driven by the ecology of WebRTC, browsers, and Android mobile terminals, and with the explosive growth of RTC applications, in the next 2 to 3 years It will usher in the rapid development of ecology.
Related reading in this series
2021 Technology Outlook | Real-time generation technology towards the future
2021 Technology Outlook | Extreme real-time video communication under weak network
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。