In the course of audio and video development, codec is undoubtedly its core function, and the updating of codec standards has greatly promoted the development of audio and video technology and the change of behavior patterns. From TV to web video and now web live broadcast, on-demand, audio and video conferences, etc., behind these changes are inseparable from the update iteration of audio and video codec technology. Such as H.264 (still the most commonly used codec specification) and H.265/HEVC (some major manufacturers are using Youku, Tencent, etc.), as well as the domestic AVS series.

h.26x series

A brief history of the development of video coding standards

img

LoveYFan

H.261-founder of video editing

The purpose of H.261 design is to be able integrated services digital network (ISDN for Integrated Services Digital Network) with a bandwidth that is a multiple of 64kbps. The code rate designed by the encoding program can work between 40kbps and 2Mbps, and can CIF and QCIF resolution video, that is, the brightness resolution is 352x288 and 176x144, and the chroma uses 4:2: 0 sampling, the resolution is 176x144 and 88x72 respectively.

H.261 uses the discrete cosine transform (DCT) algorithm that we are now more familiar with in image coding, and it plays a major role in later JPEG coding. But more than that, it introduces a series of video-specific features, laying the foundation for modern video coding, including macroblocks and motion compensation based on macroblocks.

H.261 uses the YCbCr color space and uses 4:2:0 chroma sampling. Each macroblock includes 16x16 luminance sample values and two corresponding 8x8 chroma sample values. YCbCr became YUV again, and it is still the color space adopted by the current codec specifications.

Macroblock and inter prediction based on motion compensation

We know that a video is a combination of images frame by frame. Generally, the next second video will contain 24, 25, 30, 60 or more pictures, which are played out at a certain time interval, based on The principle of visual residue forms a smooth, moving picture. Between several consecutive frames, there are actually a lot of repeated pictures, such as the following example:

img

A white billiard ball is moving on the green tabletop

img

Use the direction and distance of the ball to describe the changes in the image

If each frame of image is compressed in a traditional way, it is obvious that there is still a lot of redundancy in the entire video after compression. So what to do? The H.261 standard introduces the thinking of macroblocks, which divides the entire picture into many small blocks, and then introduces inter-frame prediction based on motion compensation-most of the picture is not moving, then we will not move the part The block of ?? uses the previous compression result, and the moving part is described by a vector such as the direction of motion plus the distance. Can’t it save a lot of storage space?

DCT algorithm

img

Divide 8x8 pixels into one block

img

The DCT algorithm originated in the 1970s, and in the mid to late 1980s, some researchers began to use it for image compression. This algorithm can convert the image from the spatial domain to the frequency domain, and then quantify it-reduce the high-frequency information that is less sensitive to the human eye, and retain most of the low-frequency information, thereby reducing the volume of the image. Finally, the processed data is further compressed with an efficient data encoding method, where Zig-Zag scanning and variable length encoding are used.

In H.261 and later video coding based on the H.261 framework, the DCT algorithm is mainly aimed at the compression of key frames. The so-called key frame is a frame used as a reference reference in motion compensation. For example, like a key frame in a Flash animation, it defines a starting point, and the subsequent frames are calculated based on this key frame. Because it only performs intra-frame compression and does not involve other frames, it is also called Intra-frame (intra-frame coded frame), or I frame for short.

MPEG-1: Introduce the concept of frame type

MPEG-1 is a video and audio compression format customized CD CD MPEG-1 uses block-based motion compensation , discrete cosine transform ( DCT ), quantization and other technologies, and is optimized for a 1.2Mbps transmission rate. MPEG-1 was subsequently adopted as the core technology Video CD

Audio-MP3

MPEG-1 audio is divided into three generations. The most famous third-generation protocol is called MPEG-1 Layer 3, or short. MP3 is still a widely circulated audio compression technology.

Video-Introducing B frames and GOP

In H.261, there is actually a concept of video frames, such as the above key frame (a complete static image that can be directly decoded), and the other frames are calculated on the key frame through the motion compensation algorithm owned.

img

However, MPEG-1 really introduces the concept of frame category. The original key frame is called "I frame", and the frame calculated based on inter-frame prediction is P frame. In addition to the two existing frame types of H.261, it introduces a new type of frame: bidirectional predictive frame, also called B frame.

img

However, the introduction of B frames also increases the complexity of the encoding and decoding. MPEG-1 also proposed GOP ( G o f p ), which is the difference between I frame and I frame Picture grid arrangement.

An image group is a group of continuous images in a video or video stream encoded with MPEG. Each MPEG-encoded film or video stream is composed of consecutive image groups.

The following figure is an example of GOP

img

MPEG-2: DVD standard

For MPEG-1, it hasn't changed much, mainly for DVD applications and improvements in the digital age.

Support interlaced scanning

Interlaced scanning (English: Interlaced) is a method of displaying images on a scanning display device. Compared with progressive scanning , interlaced scanning occupies a smaller band and width. The scanning device alternately scans even-numbered lines and odd-numbered lines.

img

An illustration of a slow interlaced scan

H.263: Familiar 3GP video

The original H.261 and MPEG-1 are biased towards low bit rate applications. With the rapid development of the Internet and communication technology, people’s demand for online video is increasing, and the pursuit of higher quality videos at low bit rates has become To achieve new goals, and as a major standard-setter in the communications industry, ITU-T launched H.263, the direct successor of H.261 in 1995.

In the 1990s, 3GP was also all the rage. It reduced storage space and lower bandwidth requirements, allowing limited storage space on mobile phones to be used. At present, H.263 still occupies the mainstream position in 3GP.

H.264/MPEG-4: To the familiar part

H.264 / AVC is a block-oriented, based motion compensation of video coding standard . By 2014, it had become one of the most commonly used formats for high-precision video recording, compression, and publishing.

H.264/AVC contains a series of new features, making it not only more efficient than previous codecs, but also can be used in applications in various network environments. These new features include:

  • motion compensation multiple reference frames. Compared with previous video coding standards, H.264/AVC uses more encoded frames as reference frames in a more flexible way. In some cases, up to 32 reference frames can be used (in the previous standard, the number of reference frames was either 1 or 2 for B frames). This feature can bring a certain bit rate reduction or quality improvement for most scene sequences. For certain types of scene sequences, such as rapid repetitive flashing, repeated cutting or background occlusion, it can significantly reduce The bit rate of the encoding.
  • Variable block size motion compensation. Blocks from the largest 16x16 to the smallest 4x4 can be used for motion estimation and motion compensation, which can segment the motion area in the image sequence more accurately. These types have a total of 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, 4×4 .
  • In order to reduce aliasing (Aliasing) and get a sharper image, a six-tap filter (sixth-order digital filter) is used to generate a half-pixel luminance component prediction value.
  • Flexible interlaced-scan video coding (interlaced-scan video coding).
  • .....

H.265/HEVC: the embarrassing successor

As a successor of H.264, the HEVC is considered not only improves image quality, but also to achieve H.264 / MPEG-4 AVC twice as compression ratio (equivalent to the same picture quality bit rate reduced to 50% ), it can support 4K resolution or even ultra-high-definition TV (UHDTV), the highest resolution can reach 8192×4320 ( 8K resolution ).

The figure below is a comparison of subjective video performance between H265 and h264

img

From the above comparison of performance we know better than H.265 H.264 on a variety of parameters, then why say it is embarrassing successor it,

  1. Most of the existing audio and video are still based on H.264, and H.264 can meet most of the scenes
  2. The license fee is too expensive. The domestic and foreign audio and video service manufacturers have been smashed by H.264. If they support H.265, they will have to pay the license fee. So currently only some major manufacturers (Tencent, Youku) use it on specific videos. .
  3. H.266 has been released, so if there is a real need, it may be waiting to connect to H.266, and the less needed ones will still not be connected to H.265, so its location is more awkward.

However, H.266 may still need several years of development, and there is not much time left for H.265, but it still has a chance.

H.266/VVC: future coding

It's here, it's here, it's coming at a pace that the six relatives don't recognize it, and it will surely lead the development of the new generation of audio and video world, it is H.266/VVC.

In July 2020, the H.266/VVC video codec standard announced the completion of editing, which is also the moment to point out the direction of audio and video development in the next 10 years.

VVC stands for Versatile Video Coding, is also known as H.266 , MPEG-I Part 3 or Future Video Coding ( ) (16171726 171726b793c2). The original intention of VVC is to achieve a larger compression ratio and a lower bit rate under the same video quality, and to empower more scenes such as 4k, 8k ultra-definition video, 360 panoramic video, etc. Of course VVC has other features:

The coding complexity expected by this standard is several times that of HEVC (up to ten times), but it depends on the quality of the coding algorithm. Its decoding complexity is expected to be about twice that of HEVC.

Video standard update/encoding efficiency update

img

PS

• VTM = VVC test model, latest version is VTM-10.0 (Test model reference software platform)

• JVET = Joint Video Experts Team of the ITU-T VCEG and ISO/IEC MPEG (VVC Standards Committee)

H.266/VVC advantages

reduce costs

The existing H.264/H.265 has met most of the audio and video business needs, but has reached the bottleneck in some businesses, and the bandwidth and traffic of CDN is also a huge investment. If it can be used in the same Achieving a larger compression ratio and lower bit rate under video quality means that more customers can be served with the same CDN server, reducing costs and improving efficiency.

Empower more scenarios

Emerging services such as VR (Virtual Reality), AR (Augmented Reality), 360 panorama, etc., must use 4k or even higher 8k resolution to achieve the effect, in this situation is how to be faster (low latency) , Better (resolution), less (low bit rate) transmission data, the existing codec scheme can no longer meet. Current status of VVC development at home and abroad

The development of domestic H.266

  1. Actively participated in the H.266 standard formulation, which represented Tencent and Ali, who submitted hundreds of proposals during the H.266 standard formulation process, and the adoption rate was more than half. Actively participate in the formulation of the rules, and only then have the right to speak. This is a lesson of blood.
  2. Tencent open sourced the first H.266 codec https://github.com/TencentCloud/O266player, please refer to https://www.infoq.cn/article/auuthrzodb8j2k8lmrsz

AVS series

AVS development history

img

Due to the late start of the domestic AVS codec standard, most of the patents are in foreign countries, and most domestic companies will still fall into a situation where they are left to the slaughter. In addition, the performance of the AVS encoding system is still insufficient. According to a HEVC/VP9/AVS2 coding efficiency comparison report issued by IEEE, under random access conditions, HEVC performance is better than VP9 by 24.9% and AVS2 by 6.5%; under delay conditions, HEVC is better than VP9 by 8.7% and AVS2 by 14.5% . In some areas, AVS2 is not much different from HEVC, but from the perspective of overall performance and application scale, AVS2 still has a long way to go. Therefore, even if the country is trying its best to promote AVS, its application scenarios are still relatively small, and it is used more in state-owned enterprises.

Google series

VP8

From a technical point of view, the technology used by VP8 is similar to H.264. Although in the propaganda we have seen, VP8 has better compression efficiency than H.264, but in actual applications, due to its design flaws, its performance is not as good as H.264. In the end, although it entered the Web Standard, but no one has used it. Instead, WebP, which is extracted from its intra-frame compression technology, is popular.

VP9

The performance of VP8 was not ideal, and Google soon launched its successor-VP9. This time, they refer to HEVC, and the design goal is also high-efficiency encoding at high resolution. Some of the designs in VP9 are influenced by HEVC, such as the Super Block with a maximum size of 64x64. The final result achieved by VP9 is to provide an efficiency improvement of up to 50% over VP8. It seems that it can be comparable to HEVC, but it has also encountered similar problems with VP8 and cannot be promoted. The scope of application of VP9 is actually limited to Google's own Youtube, which can only be said to lack practical application scenarios.

Thoughts and Prospects on the Development of Audio and Video in the Future

Deep learning and end-to-end intelligence empower future audio and video development.

Deep learning

The AI is trained through the model to intelligently adjust the codec parameters.

End Intelligence

Establish end-to-end communication link


RTE开发者社区
647 声望966 粉丝

RTE 开发者社区是聚焦实时互动领域的中立开发者社区。不止于纯粹的技术交流,我们相信开发者具备更加丰盈的个体价值。行业发展变革、开发者职涯发展、技术创业创新资源,我们将陪跑开发者,共享、共建、共成长。