[Experience Sharing] Video Codec of RTC Technology Series

To understand what a video codec is, we first need to understand what a video is.

In the final analysis, a video is a series of continuous image frames. When these images are played at a certain rate, the human eye will judge that it is a continuous activity, which constitutes a video.

Then why do you need to code and decode video, because the amount of data after the video signal is digitized is huge. If such a data amount is used for network transmission or storage, it will take up a lot of bandwidth and storage space and cause waste. The current mainstream 1080P resolution, 30 frames per second video example. The height and width of the 1080P image are 1080 and 1920 respectively. Each pixel is represented by the three primary colors RGB (that is, three bytes per pixel), so the size of each frame of image The amount of data is 1080x1920x3x8=49766400, 30 frames per second, you need to multiply by 30, 49766400x30 = 1,492,992,000bps. Therefore, video codec technology was born.

Why can the video be compressed? Let's look at this question in two aspects:

1 In an image, there are often similar color areas, which contains a lot of redundant information. The redundant information can be based on 1615598bd54aca change coding and quantization coding to achieve compression.

2 Between the two images, there must be a lot of same and similar parts, so motion compensation and motion estimation are produced to describe the possibility of motion vector to perform redundant information compression between images.

based on the principle of intra-image predictive coding and inter-image predictive coding, and numerous video codecs have been born.

H.26X series, from H.261, H.263, to the current mainstream H.264, and H.265, the latest standard H.266; the development purpose of the H.26X series is to use technology to optimize compressed data Can not achieve better video quality; like H.265, it only needs half of the original bandwidth to play videos of the same quality. It has a similar algorithm architecture to H.264, and at the same time it improves some related technologies to greatly improve the video quality.

Mpeg series, Mpeg1, Mpeg2, Mpeg4 (after Mpeg4 merged with H.264);

VP series, VP8, VP9; VP series is Google’s self-developed and open source codec series. Google created the VP series codec because H.264 requires patent fees, that is, if WebRTC uses H.264, you need to click the browser Pay related patent fees (of course due to the wide support of H.264, the main reason is that cisco open sourced OpenH64). VP8 is the benchmark for H.264. Except in the WebRTC field, its popularity and support are relatively limited; VP9 is correct Standard H.265, one of the goals of VP9 is to reduce the bit rate by about 50% compared to VP8 while ensuring the same quality. In other words, with the same bit rate, VP9 can significantly improve the image quality than VP8. ；

domestic series, AVS standard, AVS1.0, AVS2.0; AVS is China's second-generation source coding standard with independent intellectual property rights. AVS2.0 is a new-generation standard at the same level as H.265 and VP9; although the use and popularity of AVS does not seem to be high, it shows that my country has noticed the concurrency in this field;

The SVAC standard, a codec used in the field of video surveillance in China, is characterized by high security of encryption and authentication: it specifies the encryption and authentication interface and data format to ensure data security, integrity, and non-repudiation; region of interest (ROI) Coding: The image is divided into several areas of interest and a background area. Real-time video information is guaranteed in the key monitoring area. The frame rate is high, which saves the cost of non-interest areas; video information is embedded, which can recognize voice feature parameters and special events Information such as, time, etc. can be embedded in the encoding, and can be extracted, quickly retrieved, and classified in a targeted manner without unlocking the video; Scalable Video Decoding (SVC): Layered encoding of video data to meet different transmissions For network broadband and data storage environment requirements, ordinary encoding programs have main stream and sub stream for transmission, occupying a large bandwidth. SVAC transmission has only one type of stream, and image information with different resolutions can be obtained by layering the stream.

WebRTC was originally proposed by Google, mainly based on browser-based RTC communication, so it is called WebRTC; the initial major browsers are also different in the strength of WebRTC and video codec support, such as Chrome, FireFox, and Opera in the Mozilla camp. It’s not the same. Chrome only supports the VP series at the beginning. The reasons are described above, and then gradually extended to H.264. Since most of the original RTC communication fields use H.264 codec, H.264 support is for cross-domain RTC intercommunication. It provides great convenience, and I think it has accelerated the development of WebRTC to a certain extent. For example, the browser and mobile phone can join the video conference at the same time; or the browser can talk with the current SIP terminal point by point. Due to the support of H.264, it is greatly reduced. The requirement of transcoding, the transcoding of video is very performance-consuming, or it can be realized by dedicated hardware.

Of course, more and more manufacturers have joined the WebRTC field in the future. The RTC system of Agora Voice Network has surpassed WebRTC, the SDK adaptation of various hardware chip platforms, and the SD-RTN system (priority path selection to ensure high transmission Quality, after all, communication is not a pure terminal-side function, and the influence of the network on communication quality, video or audio is also juda). The excellent weak network countermeasure algorithm can resist 70% of video packet loss and ensure smooth calls.

With the current development of the Internet of Things, in addition to people’s calls and communications, audio and video conferences, RTC is more and more widely used in various fields; such as security monitoring, smart hardware terminals, and video processing hardware devices are becoming more and more small or even miniature. The original software-based encoding and decoding has shown great disadvantages in terms of memory, CPU, performance and other resource usage. Many manufacturers are also aware of this situation, so more and more professional chips are doing professional work. The trend becomes more obvious. In the surveillance field, Huawei HiSilicon ARM+ professional video processing units account for more than 70% of the domestic video market; NVIDIA has launched Jeston series chips to deal with edge computing scenarios. The ARM+GPU processing method is more versatile, and due to ARM’s low power It can enable edge-side devices to have video processing, machine vision processing, and AI analysis capabilities, which greatly enriches the application of intelligent Internet of Things.

Due to the recent two-year epidemic, the development of online education, and live broadcasting, Web real-time communication has brought great development opportunities, and its commercialization success has also continued to inject vitality into the development of technology; with the popularity of 5G, VR/AR, automatic The emergence of new application scenarios such as driving will surely bring new impetus to WebRTC technology and spur the development of real-time audio and video communication technology based on the Internet.

[Experience Sharing] Video Codec of RTC Technology Series

RTE开发者社区

引用和评论

实时多模态如何重塑未来交互？我们邀请 Gemini 解锁了 39 个实时互动新可能丨Voice Agent 学习笔记

一文掌握 MCP 上下文协议：从理论到实践

AI Agent爆火后，MCP协议为什么如此重要！

2025年医疗大模型各医疗场景赋能实践研究报告130+份汇总解读|附PDF下载

AdventureX 2025 正式启动：五天四夜，120小时极限创造！一起在杭州点燃青年创新之火！

MCP 协议为何不如你想象的安全？从技术专家视角解读

祛魅最热门的通用Agent赛道