头图

Kill the big and small stream switching I frame! Stream-cut encoding jointly optimized by Alibaba Cloud RTC QoS and video encoding

阿里云视频云
中文

If you want to switch between two video streams with different resolutions, although the picture content of the two streams is basically the same, because the reference frames of the two streams are different and the resolutions are different, all current video coding standards cannot do it. Using inter-frame prediction coding to obtain the result of codec matching, and intra-frame prediction coding, that is, the compression efficiency of I frame, is very low, so it is easy to cause video quality degradation or freeze caused by sudden increase in bit rate at the cut stream; Based on the previous generation standard, Alibaba Cloud RTC codec can still use inter-frame prediction to encode P-frames and the codec matches in this scenario through the close cooperation of the original stream-cutting encoding technology and the network layer QoS system. Compared with Significantly improves compression efficiency for I-frames and enhances visual experience.

Author|An Jicheng, Tian Weifeng
Proofreading|Taiichi

1. Background introduction

A video stream, if the resolution is changed in the middle, for the current mainstream H.264/AVC, H.265/HEVC standards, it is necessary to encode an I frame, that is, only the information redundancy in the frame can be used, as shown in Figure 1 (left). ); a new generation of coding standards such as AV1, H.266/VVC, etc. can utilize the redundancy of inter-frame information without coding I-frames to improve compression efficiency. The basic principle is to scale the reference frame to make the reference frame The resolution of the frame and the current frame are the same. As shown in Figure 1 (right), the Resolution Change Coding (RCC) technology of Alibaba Cloud RTC codec also has this capability. For details, please refer to our previous sharing: RTC QoS Weak Network Confrontation" 161ee406c87e38 .

The Stream Switch Coding (SSC for short) technology to be introduced in this article is an upgrade to the RCC technology.


Figure 1. Schematic diagram of variable resolution (left: traditional I-frame insertion method; right: reference frame scaling technology)

The SP slice technology of the H.264/AVC standard can be used to switch two video streams with the same resolution, but it is powerless to switch two video streams with different resolutions.

Although the S frame of the AV1 standard can be used to switch from a high-resolution stream to a low-resolution stream, it will cause a codec mismatch and the risk of error propagation.

2. Introduction to the cut flow scene

image.png
Figure 2. Schematic diagram of a multi-stream scenario

Figure 2 shows a multi-stream scenario. There are two encoders on a publisher: Enc0 and Enc1, which send a large-resolution stream and a small-resolution stream respectively (hereinafter referred to as the large stream and the small stream). The screen content of the two streams is The same, but the resolution and bit rate are different, so the clarity is different. Subscribers can choose to subscribe to different streams according to their own network conditions. For example, when the network is good, they will receive large streams, and when the network is poor, they will receive small streams, as shown in Figure 2. There are 6 subscribers, that is, 6 decoders. Dec0, Dec1, Dec2 receive large streams, and Dec3, Dec4, and Dec5 receive small streams.
image.png
Figure 3. Schematic diagram of conventional cut flow

Figure 3 shows the changes when the stream is switched, in which Dec3 starts to receive a small stream, and then switches to a large stream for some reason (such as the network becomes better), then Enc0 must send an I frame to realize the stream switch, This I frame will affect all subscribers receiving large streams (Dec0, Dec1, Dec2 in the figure, there may be more subscribers in practice), resulting in a decrease in encoding quality or a sudden increase in bit rate at the moment of stream switching. The green arrow in the figure represents the frame received by Dec3. However, if the P frame of Enc0 is directly sent to Dec3, it will definitely not work, because the reference frames of the two streams are different and the resolutions are different, which will inevitably cause decoding errors (codec mismatch). It is precisely because of these difficulties, All current video coding standards fail to address this pain point. However, Alibaba Cloud RTC Codec can use the original SSC technology to switch between two streams with different resolutions, and it can also use inter-frame information redundancy to not edit I-frames to improve compression efficiency.

image.png
Figure 4. Schematic diagram of SSC technology cut flow in this paper

Figure 4 shows the use of SSC technology for stream switching. It is also Dec3 that switches from small stream to large stream. During stream switching, Enc0 encodes a PDS frame, and Enc1 encodes a PSS frame. The green arrow in the figure represents the frame received by Dec3 , which implements stream switching by receiving a PSS frame. PDS frame is called P frame for Destination-stream Switch in this paper, PSS frame is called P frame for Source-stream Switch in this paper, Dec0, 1, 2 The I frame of 1 becomes a PDS frame, and the I frame received by Dec3 becomes a PSS frame. Both the PDS frame and the PSS frame are coded using the redundancy of inter-frame information, so the compression efficiency is significantly improved compared to the I frame.

3. Test results

PDS frame compression performance test

This paper compares the compression performance of I-frame, P-frame, and PDS-frame by testing a video conferencing sequence FourPeople. The sequence is compressed into all I-frames, all P-frames (except that the first frame is an I-frame), and all PDS-frames (except that the first frame is an I-frame). Figure 5 shows the compression results, the abscissa is the bit rate, the ordinate is the PSNR, the BD-rate is calculated accurately, and the P frame can save 93% of the bit rate than the I frame under the same quality. At the same time, it can save 66% bit rate than I frame.
image.png
Figure 5. PDS frame compression performance display

This test directly shows that if each frame of a sequence is encoded as an I frame, each frame has the ability to cut the stream, but the compression performance is lost. However, it does not have the ability to cut streams at all. If they are encoded as PDS frames, they can save 66% of the bit rate compared to I frames while retaining the ability to switch I frames.

In the actual scene, there is generally no flow cut every frame. This test shows that at the cut flow, the target stream can save 66% of the bit rate by using PDS frames compared to I frames.

PSS frame compression performance test

Since PSS frames involve resolution switching, traditional (such as H.264, H.265) P frames cannot be encoded, so this paper only compares the compression performance of I frames and PSS frames. This paper uses a video conferencing sequence with large and small resolution frames interleaved for testing, that is, even-numbered frames are large resolution, odd-numbered frames are small resolution, encoding full I frames, and full PSS frames (except the first frame is an I frame) . Under the same quality, PSS frame can save 29% bit rate than I frame.
image.png
Figure 6. Example of conventional continuous tangential flow

image.png
Figure 7. An example of continuous tangential flow of SSC technology in this paper

This test directly shows a scene of continuous stream switching. As shown in Figure 6, Dec3 continuously switches between large and small streams. Figure 6 shows the stream switching method using the original encoded I frame, then Dec3 receives All are I frames. Figure 7 shows the stream switching method using the SSC technology in this paper. All Dec3 receives are PSS frames. This test shows that in this case, PSS frames can save 29% of the bit rate compared to I frames. The rate-distortion curve is shown in Figure 8.
image.png
Figure 8. PSS frame compression performance display

In the actual scene, there is generally no continuous stream cut. This test shows that at the cutoff, the source stream can save 29% of the bit rate by using PSS frames compared to I frames.

summary, using Alibaba Cloud RTC's original SSC technology shown in this article, the target stream can save 66% of the bit rate compared to the I frame, and the source stream can save 29% of the bit rate compared to the I frame.

"Video Cloud Technology", your most noteworthy public account of audio and video technology, pushes practical technical articles from the frontline of Alibaba Cloud every week, where you can communicate with first-class engineers in the audio and video field. Reply to [Technology] in the background of the official account, you can join the Alibaba Cloud Video Cloud Technology Exchange Group, discuss audio and video technology with the author, and obtain more latest industry information.
阅读 898

172 声望
1.5k 粉丝
0 条评论
172 声望
1.5k 粉丝
文章目录
宣传栏