Real-time audio and video development theory essential: how to save traffic? The prediction technology behind the high video compression

This article quotes part of the content of "Pano Cloud Pano", "In-depth understanding of video coding and decoding technology" and "Predictive technology revealing the technical principles behind video thousand-fold compression", thanks to the original author for sharing.

1 Introduction

Since the 1990s, digital audio and video coding and decoding technology has developed rapidly, and it has always been a hot research field at home and abroad. With the maturity and widespread commercial use of 5G, bandwidth has become higher and higher, and audio and video transmission has become easier. Live video and video chat have been fully integrated into everyone's life.

Why is video so popular? This is because a large amount of information can be easily and quickly obtained through video. However, the amount of video data is huge, and the network transmission of video is also facing huge challenges. So the video codec technology came out.

When it comes to real-time video scenarios, it is not just a matter of data volume. Real-time communication also has very high requirements for delay, equipment adaptation, and bandwidth adaptation. To solve these problems, it is always inseparable from the scope of video codec technology.

This article will start with the basic knowledge of video coding and decoding technology, introduce the very basic and important prediction technology in video coding and decoding technology, and learn the technical principles of intra-frame prediction and inter-frame prediction.

(This article has been simultaneously published at: http://www.52im.net/thread-3581-1-1.html)

2. Related articles

If you are a beginner in audio and video technology, the following 3 entry-level dry goods are highly recommended to read:

"Zero Foundation, Introduction to the Most Popular Video Coding Technology in History"
"Introduction to Zero Basics: A Comprehensive Inventory of Basic Knowledge of Real-time Audio and Video Technology"
"Necessary for real-time audio and video face-to-face viewing: quickly master 11 basic concepts related to video technology"

3. Why do you need video codec

First of all, let's review the theoretical knowledge of video encoding and decoding.

video is composed of a series of pictures arranged in chronological order:

1) Each picture is a frame;
2) Each frame can be understood as a two-dimensional matrix;
3) Each element of the matrix is a pixel.

A pixel is usually expressed by three colors. For example, when expressed in an RGB color space, each pixel is composed of three color components. Each color component is expressed by 1 byte, and its value range is 0~255. The YUV format commonly used in encoding is similar and will not be expanded here.

Take the 1280x720@60fps video sequence as an example, the ten-second video has: 1280 720 3 60 10 = 1.6GB.

Such a large amount of data, whether it is storage or transmission, is facing huge challenges. The purpose of video compression or encoding is also to reduce the size of the video on the premise of ensuring the quality of the video to facilitate transmission and storage. At the same time, in order to restore the video correctly, it needs to be decoded.

PS: limited by space. The technical principles of video coding and decoding are not developed here. If you are interested, it is strongly recommended to learn from this article: "Instant Messaging Audio and Video Development (19): Zero foundation, introduction to the most popular video coding technology in history ".

In short, the main function of video codec technology is to pursue the highest possible video reconstruction quality and the highest possible compression ratio within the available computing resources to meet the bandwidth and storage capacity requirements.

Why highlight "reconstruction quality"?

Because video encoding is a lossy process, users can only parse the "reconstructed" picture from the received video stream, which is different from the original picture, such as the "blocking" effect often encountered when watching low-quality videos.

How to maintain the quality of the video as much as possible under a certain bandwidth occupation, or to reduce the bandwidth utilization as much as possible while maintaining the quality, is the basic goal of video coding.

In technical terms, the "rate-distortion" performance of the video codec standard:

1) "Rate" refers to the code rate or bandwidth occupation;
2) "Distortion" is used to describe the quality of the reconstructed video.

Corresponding to encoding is the decoding or decompression process, which reconstructs the received or compressed code stream stored on the medium into a video signal, and then displays it on various devices.

4. What is a video codec standard

Video codec standards usually only define the above-mentioned decoding process.

For example, the H.264/AVC standard, which defines what is a video stream that conforms to the standard, strictly defines the order and meaning of each bit, and also has precise information on how to use each bit or several bits. definition.

It is this rigor and accuracy that ensures that the video-related services of different manufacturers can be easily compatible. For example, you can watch the same video on the same online video website with iPhone, Android Phone or Windows PC.

There are many organizations in the world working on the formulation of video coding standards, such as the MPEG group of the International Standards Organization ISO, the VCEG group of the International Telecommunication Union ITU-T, the AVS working group of China, the Open Media Alliance formed by Google and major manufacturers, etc.

video coding standard and development history:

Since VCEG formulated the H.120 standard, video coding technology has continued to develop, and a series of video coding standards that meet different application scenarios have been successfully formulated. The VCEG organization has successively formulated H.120, H.261, H.262 (MPEG-2 Part 2), H.263, H.263+, H.263++.

MPEG has also formulated MPEG-1, MPEG-2, and MPEG-4 Part 2. And H.264/AVC, H.265/HEVC, H.266/VVC jointly developed by two international organizations.

AVS, AVS2, AVS3 video coding standards with independent intellectual property rights in China; VP8 and VP9 formulated by Google.

AV1 developed by the Open Media Alliance (AOM) formed by companies such as Google, Cisco, Microsoft, and Apple.

Here is a special mention of H.264/AVC: Although H.264/AVC has a history of nearly 20 years, it has excellent compression performance, appropriate computational complexity, excellent open source community support, friendly patent policies, and a strong ecosystem. Many factors still keep it alive, especially in the field of real-time communication. Video conferencing products like ZOOM, Cisco Webex, and WebRTC SDK-based video services, most mainstream scenes use H.264/AVC.

related to the video codec standard, I will not go into it here. For more detailed information, you can read the following selected articles:

"Instant Messaging Audio and Video Development (5): Understanding the Mainstream Video Coding Technology H.264"
"Instant Messaging Audio and Video Development (13): Features and Advantages of Real-time Video Coding H.264"
"Instant Messaging Audio and Video Development (17): The Past and Present of Video Coding H.264 and VP8"
"IQIYI Technology Sharing: Easy and humorous, explaining the past, present and future of video codec technology"

5. Hybrid coding framework

Throughout the history of video coding and decoding standards, each generation of video standards has significantly improved rate-distortion performance. They all have a core framework, which is a block-based hybrid coding framework (as shown in the figure below). It is a hybrid coding framework based on block motion compensation and transform coding proposed by JR Jain and AK Jain in the International Picture Coding Society (PCS 1979) in 1979.

Let's disassemble and analyze the framework together.

A frame of video collected from the camera: usually raw data in YUV format, we divide it into multiple square pixel blocks for processing in turn (for example, 16x16 pixels in H.264/AVC are the basic unit), and perform intra-frame /Inter-frame prediction, forward transformation, quantization, inverse quantization, inverse transformation, loop filtering, entropy coding, and finally get the video stream. The spatial prediction starts from the first block of the first frame of the video. Because the image block currently being encoded is similar to the surrounding image blocks, we can use the surrounding pixels to predict the current pixel. We subtract the predicted pixels from the original pixels to obtain the prediction residuals, then transform and quantize the prediction residuals to obtain transform coefficients, and then perform entropy coding to obtain the video bitstream.

Next: In order to make the subsequent image blocks can use the already coded blocks for prediction, we have to perform inverse quantization and inverse transformation on the transformation system to obtain the reconstruction residual, and then combine with the predicted value to obtain the reconstructed image. Finally, we perform loop filtering on the reconstructed image, remove blocking effects, etc., so that the reconstructed image obtained can be used to predict subsequent image blocks. According to the above steps, we process the subsequent image blocks in turn.

For video: the interval between video frames is only about ten to tens of milliseconds, and the content that is usually shot will not change drastically, and there is a very strong correlation between them.

As shown in the figure below, the video image is divided into blocks, matched between temporally adjacent images, and then the residual part after the matching is encoded, which can better remove the video frame and the frame in the video signal The redundancy to achieve the purpose of video compression. This is the motion compensation technology, and it is still one of the core technologies of video coding and decoding to this day.

Motion estimation and motion compensation:

The core idea of transform coding is to divide the video data into blocks, and use orthogonal transform to concentrate the energy of the data on a few transform coefficients. Combining quantization and entropy coding, we can obtain more effective compression. The loss of information and the acquisition of compression ratio in video coding are largely derived from the quantization module, which is to map a single sample in the source signal to a fixed value to form a more-to-less mapping, so as to achieve the purpose of compression. Of course Losses are introduced during the compression process. The quantized signal is then subjected to lossless entropy coding to eliminate statistical redundancy in the signal. The research of entropy coding can be traced back to the 1950s. After decades of development, the application of entropy coding in video coding has become more mature and sophisticated, making full use of the context information in the video data to estimate the probability model more accurately , Thereby improving the efficiency of entropy coding. For example, Cavlc (context-based variable length coding) and Cabac (context-based binary arithmetic coding) in H.264/AVC. Arithmetic coding technology is also applied in subsequent video coding standards, such as AV1, HEVC/H.265, and VVC/H.266.

Since the development of video coding, VVC/H.266, as the latest standard, has adopted a series of advanced technologies and optimized and improved all parts of the hybrid coding framework, making its rate-distortion performance compared to the previous generation standard. Doubled.

For example: VVC/H.266 uses a basic coding unit of 128x128 size, and can continue to perform quadtree division, supporting a division of two and three; the chrominance component is independent of the luminance component and supports separate division; more More refined intra-frame prediction direction and inter-frame prediction mode; support multiple sizes and forms of transformation, in-loop filtering, etc.

The formulation of VVC/H.266 aims to better support a variety of video content, such as screen sharing content, games, animation, virtual reality content (VR, AR), etc. There are also specific technologies that have been adopted into the standard, such as palette mode, intra-frame motion compensation, affine transformation, skip transformation, adaptive color transformation, and so on.

Back to the topic of this article, the following content, we focus on the prediction technology in video codec.

6. Intra prediction technology

After the video data is divided into squares, the colors of the pixels in the adjacent squares and the pixels in the squares often change gradually, and there is a strong similarity between them. This similarity is spatial redundancy. Since there is redundancy, a smaller amount of data can be used to express such characteristics.

example, : 160c825d2d5281 first transmits the value of the first pixel, and then transmits the change value of the second pixel relative to the first pixel. This change value often has a much smaller value range. The original pixel that needs 8 bits to express Value, maybe less than 8 bits are enough.

In the same way, a similar "difference" operation can also be performed with the pixel block as the basic unit. Let's take a look at the similarity more intuitively from the example picture.

The two 8x8 blocks as marked in the above figure: the brightness component (Y) is along the "upper left to lower right" direction, with continuity and little change.

If we design a certain “mode” to use the blocks on the left to “predict” the blocks on the right, then the “original pixels” minus the “predicted pixels” can reduce the amount of data required for transmission, and at the same time, The "mode" is written into the final code stream, and the decoder can use the blocks on the left to "reconstruct" the blocks on the right.

Extremely speaking: if the pixel value of the block on the left can be completely the same as the block on the right after a certain calculation, then the encoder only needs to use a "mode" cost to transmit the block on the right.

Of course, there are many kinds of textures in the video, and a single mode is difficult to apply to all textures. Therefore, a variety of intra-frame prediction modes are also designed in the standard to make full use of the correlation between pixels to achieve the purpose of compression. .

For example, the 9 intra prediction directions in H.264 shown in the figure below: take mode 0 (vertical prediction) as an example, each pixel value (reconstruction) of the upper square is copied one column to obtain the intra prediction value. Various other modes use similar methods, but the method of generating forecasts is slightly different. With so many modes, a question arises. For a block, which mode should we use for encoding? The best way to choose is to traverse all the modes to try, calculate the number of bits required for encoding and the quality loss, that is, rate-distortion optimization. This is obviously very complicated, so there are many other ways to infer which The mode is better, such as based on SATD or edge detection.

From the 9 prediction modes of H.264 to the 56 intra-directional prediction modes of AV1, more and more modes are also used to predict uncoded blocks more accurately, but the increase of modes increases the number of transmission modes on the one hand. Bit rate overhead, on the other hand, choosing an optimal mode from so many modes to encode so that it can achieve a higher compression ratio, which also puts forward higher requirements for the design and implementation of the encoder.

7. Inter-frame prediction technology

The following 5 pictures are the first 5 frames of a video: It can be seen that only Mario and the bricks are in motion in the picture, and the rest of the scenes are mostly similar. This similarity is called temporal redundancy. When encoding, we first encode and transmit the first frame of pictures through the intra-frame prediction method described above, and then transmit the Mario and brick motion directions of the subsequent frames. When decoding, we can combine the motion information with the first frame. One frame is used to synthesize subsequent frames together, which greatly reduces the number of bits required for transmission. This kind of compression technology using time redundancy is motion compensation technology. This technology has been adopted as early as in the H.261 standard.

Careful readers may have discovered: How to describe objects such as Mario and bricks so that they can be fully presented based on motion information alone?

In fact, in video coding, it is not necessary to know the shape of the moving object, but the entire frame of image is divided into pixel blocks, and each pixel block uses a piece of motion information. That is, block-based motion compensation.

The white arrows circled in red in the figure below are the motion information when encoding the bricks and Mario, and they all point to the position in the previous frame. Mario and bricks have two arrows, indicating that they are divided into two blocks, each block has separate motion information. These motion information are motion vectors. The motion vector has two components, horizontal and vertical, which represent the position change of a block relative to its reference frame. The reference frame is a certain (multiple) frames that have been coded.

Of course: the transmission of the motion vector itself takes up a lot of bits. In order to improve the transmission efficiency of the motion vector, there are mainly the following measures.

On the one hand: The blocks can be divided as large as possible, sharing a motion vector, because the motion of flat areas or larger objects may be more consistent. Starting from H.264, variable block size motion compensation technology has been widely adopted.

On the other hand: the motion between adjacent blocks often has a relatively high similarity, and its motion vector also has a high similarity. The motion vector itself can also be predicted based on the motion vector of the adjacent block, that is, motion vector prediction technology;

Finally: when the motion vector expresses the motion of the object, there is a choice of precision. Pixels are expressions of discretization. In reality, the motion of objects is obviously not in pixels. In order to accurately express the motion of objects, it is necessary to select appropriate precision to define the motion vector. Each video codec standard defines the accuracy of the motion vector. The higher the accuracy of the motion vector, the more accurately the motion can be expressed, but the cost is that it takes more bits to transmit the motion vector.

The motion vector in H.261 is based on an integer pixel precision, the motion vector in H.264 is based on a quarter pixel precision, and AV1 has increased the precision by one-eighth. In general, the closer the frames in time, the higher the similarity between them, there are exceptions, such as reciprocating scenes, which may be separated by a few frames, and even more distant frames will have a higher degree of similarity.

In order to make full use of the encoded frames to improve the accuracy of motion compensation, multi-reference frame technology has been introduced from H.264.

That is, a block can perform motion matching from many reference frames that have been coded, and transmit the matched frame index and motion vector information.

So how to get the motion information of a block? The simplest idea is to check a block in its reference frame for matching position by position, and the highest matching degree is the final motion vector.

Matching degree: SAD (Sum of Absolute Difference), SSD (Sum of Squared Difference), etc. are commonly used. The matching degree is checked position by position, that is, the so-called full search motion estimation, the computational complexity of which can be imagined is very high. In order to speed up the motion estimation, we can reduce the number of search positions. There are many similar algorithms, such as diamond search, hexagon search, asymmetric cross multi-level hexagon grid search algorithm and so on.

Take the diamond search as an example. As shown in the figure below, the 9 matching positions centered on the initial blue point are calculated separately for the SAD of these 9 positions. If the smallest SAD is the center position, the next step is to search for the center point closer The SAD of 4 green dots around, select the position with the smallest SAD, and continue to narrow down the search; if the dot with the smallest SAD in the first step is not in the center, then take this position as the center and add 5 or 3 brown dots, Continue to calculate SAD and iterate until the best matching position is found.

When the encoder is implemented, the search algorithm can be selected according to the actual application scenario.

For example, in a real-time audio and video scenario, the computational complexity is relatively limited, and the motion estimation module should choose an algorithm with a smaller computational complexity to balance the complexity and coding efficiency. Of course, the complexity of motion estimation and motion compensation is also related to the size of the block, the number of reference frames, the calculation of sub-pixels, etc., which will not be further elaborated here.

The principles of more forecasting techniques will not be repeated here. If you feel unable to understand the prediction technology mentioned above, here is an entry-level article, you can read this "Instant Messaging Audio and Video Development (4): Introduction to Video Coding and Decoding Forecasting Technology".

8. Write at the end

Audio and video coding and decoding technology, in the final analysis, is to make the sound quality clearer and the video higher quality with limited resources (network bandwidth, computing resources, etc.).

Among them, for video, there are still many hot issues that can be studied in depth to improve the quality.

For example: Based on the subjective quality optimization of the human eye, it mainly uses the visual characteristics of the human eye to combine the masking effect, contrast sensitivity, attention model, etc. with the encoding, reasonably allocate the code rate, and reduce the visual discomfort caused by the encoding loss.

The application of AI in the field of video coding and decoding: including a variety of artificial intelligence algorithms, such as classifiers, support vector machines, CNNs, etc., to quickly select coding parameters, and deep learning can also be used to encode videos outside the coding loop and inside the coding loop. Processing, such as video super-resolution, denoising, defogging, adaptive dynamic range adjustment and other out-of-encoding loop processing, to achieve the purpose of improving video quality.

In addition, there are deep neural network coding that breaks the traditional hybrid coding framework, such as Nvidia's Maxine video conferencing service, which uses deep learning to extract features, and then transmits the features to save bandwidth.

Appendix: More Essence Articles

[1] Other essential materials for real-time audio and video development:

"Instant Messaging Audio and Video Development (1): Theoretical Overview of Video Coding and Decoding"
"Instant Messaging Audio and Video Development (2): Introduction to Digital Video of Video Coding and Decoding"
"Instant Messaging Audio and Video Development (3): Coding Fundamentals of Video Coding and Decoding"
"Instant Messaging Audio and Video Development (4): Introduction to the Prediction Technology of Video Coding and Decoding"
"Instant Messaging Audio and Video Development (5): Understanding the Mainstream Video Coding Technology H.264"
"Instant Messaging Audio and Video Development (6): How to Start Learning Audio Codec Technology"
"Instant Messaging Audio and Video Development (7): Introduction to Audio Basics and Coding Principles"
"Instant Messaging Audio and Video Development (8): Common Real-time Voice Communication Coding Standards"
"Instant Messaging Audio and Video Development (9): Overview of Echo and Echo Cancellation in Real-time Voice Communication"
"Instant Messaging Audio and Video Development (10): Detailed Explanation of Echo Cancellation Technology for Real-time Voice Communication"
"Instant Messaging Audio and Video Development (11): Detailed Explanation of Packet Loss Compensation Technology for Real-time Voice Communication"
"Instant Messaging Audio and Video Development (12): Discussion on Multi-person Real-time Audio and Video Chat Architecture"
"Instant Messaging Audio and Video Development (13): Features and Advantages of Real-time Video Coding H.264"
"Instant Messaging Audio and Video Development (14): Introduction to Real-time Audio and Video Data Transmission Protocol"
"Instant Messaging Audio and Video Development (15): Talk about the application of P2P and real-time audio and video"
"Instant Messaging Audio and Video Development (16): Several Suggestions for Real-time Audio and Video Development on Mobile"
"Instant Messaging Audio and Video Development (17): The Past and Present of Video Coding H.264 and VP8"
"Instant Messaging Audio and Video Development (18): Detailed Explanation of the Principle, Evolution and Application Selection of Audio Codec"
"Instant Messaging Audio and Video Development (19): Zero Foundation, Introduction to the Most Popular Video Coding Technology in History"
"A Brief Introduction to Audio Processing and Coding Compression Technology in Real-time Voice Chat"
"NetEase Video Cloud Technology Sharing: Quick Start of Audio Processing and Compression Technology"
"Learning RFC3550: Basic knowledge of RTP/RTCP real-time transport protocol"
"Research on Real-time Streaming Media Technology Based on RTMP Data Transmission Protocol (Full Paper)"
"Sound Network Architects Talk about the Difficulties of Real-time Audio and Video Cloud (Video Interview)"
"On the technical points of developing a real-time video live broadcast platform"
"Still relying on "Hey He He" to test the quality of real-time voice calls? This article teaches you scientific evaluation methods! 》
"Practical sharing of real-time live audio and video live broadcast at 1080P with a delay of less than 500 milliseconds"
"Real-time video live broadcast technology practice on mobile terminal: How to achieve real-time seconds, smooth and non-blocking"
"How to use the easiest way to test your real-time audio and video solutions"
"Technical Secret: Facebook Live Video Broadcasting Supporting Millions of Fan Interactions"
"A brief description of the working principle of end-to-end encryption (E2EE) in real-time audio and video chats"
"Detailed explanation of real-time audio and video live broadcast technology on mobile terminal (1): opening"
"Detailed Explanation of Real-time Audio and Video Live Broadcast Technology on Mobile Terminal (2): Acquisition"
"Detailed Explanation of Real-time Audio and Video Live Broadcast Technology on Mobile Terminal (3): Processing"
"Detailed Explanation of Real-time Audio and Video Live Broadcast Technology on Mobile Terminal (4): Encoding and Packaging"
"Detailed Explanation of Real-time Audio and Video Live Broadcast Technology on Mobile Terminal (5): Push Streaming and Transmission"
"Detailed explanation of real-time audio and video live broadcast technology on mobile terminal (6): Delay optimization"
"Integration of Theory with Practice: Realizing a Real-Time Video Broadcasting Simply Based on HTML5"
"Detailed Explanation of Echo Cancellation Technology in IM Real-time Audio and Video Chat"
"Talking about several key technical indicators that directly affect user experience in real-time audio and video live broadcast"
"How to optimize the transmission mechanism to achieve ultra-low latency of real-time audio and video? 》
"First Disclosure: How does Kuaishou make it possible for millions of viewers to watch the live broadcast at the same time and still be able to start in seconds without lag? 》
"Android live broadcast introductory practice: hands-on to build a simple live broadcast system"
"Some optimization ideas for NetEase Yunxin real-time video live broadcast at the TCP data transmission layer"
"Real-time audio and video chat technology sharing: anti-packet loss codec for unreliable networks"
"How does P2P technology reduce the bandwidth of real-time video broadcast by 75%? 》
"Interview with the person in charge of WeChat video technology: the evolution of WeChat real-time video chat technology"
"Tencent Audio and Video Lab: Using AI Black Technology to Achieve Ultra-low Bit Rate HD Real-time Video Chat"
"WeChat Team Sharing: The Technology Decryption Behind WeChat Real-time Audio and Video Chats 100 Million Times a Day"
"Recently hot real-time live broadcast answering system realization ideas and technical difficulties sharing"
"Welfare Post: A Summary of Open Source Projects Used in the Most Complete Real-time Audio and Video Development"
"Qiniu Cloud Technology Sharing: Use QUIC Protocol to Realize Real-time Video Broadcasting 0 Caton! 》
"Thinking and technical practice of ultra-low latency architecture in real-time audio and video chat"
"Understanding the delay problem in real-time audio and video chat is enough."
"Real-time video broadcast client technology inventory: Native, HTML5, WebRTC, WeChat applet"
"An Introduction to Real-time Audio and Video Technology for Xiaobai"
"Interview with WeChat Multimedia Team: Learning from Audio and Video Development, WeChat Audio and Video Technology and Challenges, etc."
"Tencent Technology Sharing: The Story Behind WeChat Mini Program Audio and Video Technology"
"Interview with Liang Junbin from the WeChat Multimedia Team: Talk about the audio and video technologies I know"
"Sina Weibo Technology Sharing: The Optimal Practice Road of Weibo Short Video Service"
"Summary of technical principles and practice of real-time audio mixing in live video applications"
"Take the network access layer design of the online game server as an example to understand the technical challenges of real-time communication"
"Tencent Technology Sharing: Technical Ideas and Practice of Intercommunication Between WeChat Mini Program Audio and Video and WebRTC"
"Sina Weibo Technology Sharing: Practice of Million High Concurrency Architecture for Weibo Real-time Live Broadcast Answering Questions"
"Technical dry goods: real-time live video broadcast first screen optimization practice within 400ms"
"IQIYI Technology Sharing: Easy and humorous, explaining the past, present and future of video codec technology"
"Introduction to Zero Basics: A Comprehensive Inventory of Basic Knowledge of Real-time Audio and Video Technology"
"Necessary for real-time audio and video face-to-face viewing: quickly master 11 basic concepts related to video technology"
"Taobao live broadcast technology dry goods: high-definition, low-latency real-time video live broadcast technology decryption"
"Theoretical Essentials for Real-time Audio and Video Development: How to Save Traffic? Prediction Technology Behind High Video Compression》
More similar articles...

[2] An article on the open source real-time audio and video technology WebRTC:

"The Status Quo of Open Source Real-time Audio and Video Technology WebRTC"
"Brief description of the advantages and disadvantages of the open source real-time audio and video technology WebRTC"
"Interview with the Father of the WebRTC Standard: The Past, Present and Future of WebRTC"
"Conscience Sharing: WebRTC Zero-based Developer Tutorial (Chinese) [Attachment Download]"
"Introduction to the overall architecture of WebRTC real-time audio and video technology"
"Beginner's Introduction: What is a WebRTC server, and how does it connect to calls?" 》
"WebRTC real-time audio and video technology foundation: basic architecture and protocol stack"
"On the technical points of developing a real-time video live broadcast platform"
"[Opinion] Four Reasons Why WebRTC Should Choose H.264 Video Coding"
"Is it reliable to develop real-time audio and video based on open source WebRTC? What are the third-party SDKs? 》
"Application of RTP/RTCP Data Transmission Protocol in Open Source Real-time Audio and Video Technology WebRTC"
"A brief description of the working principle of end-to-end encryption (E2EE) in real-time audio and video chats"
"Real-time Communication RTC Technology Stack: Video Codec"
"Concise Compilation Tutorial of Open Source Real-time Audio and Video Technology WebRTC under Windows"
"WebRTC real-time audio and video technology on the web: It looks beautiful, but how many pits are there to fill before the production application? 》
"Amazing WebRTC: The ecology is getting better and better, or real-time audio and video technology will become a cabbage"
"Tencent Technology Sharing: Technical Ideas and Practice of Intercommunication Between WeChat Mini Program Audio and Video and WebRTC"
"Rongyun Technology Sharing: Real-time Audio and Video First Frame Display Time Optimization Practice Based on WebRTC"
More similar articles...
This article has been simultaneously published on the official account of "Instant Messaging Technology Circle".

▲ The link of this article on the official account is: click here to enter. The synchronous publishing link is: http://www.52im.net/thread-3581-1-1.html

Real-time audio and video development theory essential: how to save traffic? The prediction technology behind the high video compression

1 Introduction

2. Related articles

3. Why do you need video codec

4. What is a video codec standard

5. Hybrid coding framework

6. Intra prediction technology

7. Inter-frame prediction technology

8. Write at the end

Appendix: More Essence Articles

JackJiang

引用和评论

长连接网关技术专题(十二)：大模型时代多模型AI网关的架构设计与实现

三分钟掌握视频剪辑 | 在 Rust 中优雅地集成 FFmpeg

视频直播技术干货(十三)：B站实时视频直播技术实践和音视频知识入门

支持百万人超大群聊的Web端IM架构设计与实践

三分钟掌握音视频处理 | 在 Rust 中优雅地集成 FFmpeg

全平台开源即时通讯IM框架MobileIMSDK：7端+TCP/UDP/WebSocket协议

三分钟掌握视频分辨率修改 | 在 Rust 中优雅地使用 FFmpeg