人工智能 - Audio and video codec - coding parameter CRF - RTC 实时音视频

在这里插入图片描述

I have been in touch with some coding and decoding parameters before, and CRF parameters have also been used, but when I was chatting with my friends recently, when I talked about the CRF parameters and the specific function process in the process of using FFMPEG, this has not been tracked before. I didn't record it in detail, so I aroused my curiosity, so I decided to figure it out and started this magical journey of CRF. Introduction to CRF:

Constant Rate Factor (CRF, Constant Rate Factor) is an encoding mode that can adjust the file data rate up or down to achieve a selected quality level, rather than a specific data rate.

If you want to maintain the best quality without worrying about the file size, you can use the CRF rate control mode at this time. This is the recommended rate control mode in most cases. When the size of the output file is not important, this method allows the encoder to try to achieve the desired target video quality file output for the entire file, that is, the so-called one-time encoding can achieve the maximum video compression efficiency under the expected video quality. The main principle of the CRF mode is to dynamically adjust the QP value of each frame of video during the encoding process so that the bit rate can be maintained to maintain the required video quality level.

However, the disadvantage of CRF is that it cannot tell the encoder to expect a file of a certain size or not to exceed a certain size or bit rate. At the same time, it should be noted that when using CRF, it is not recommended to directly encode video for streaming media transmission.

It is generally recommended to use two rate control modes: constant rate factor (CRF) or 2-pass ABR. Rate control determines how many bits will be used in each frame. This will determine the file size and how the quality is allocated. CRF practical demonstration

Try to use the parameter CRF to compress the FFMPEG binary file, as shown in the figure below:

FFMPEG uses CRF to compress 18 and 24 respectively, and compares with the source file.

ffmpeg -i test.mp4 -c:v libx264 -crf 18 test18.mp4

Actual transcoding

After the transcoding is completed, specific encoding-related information will be displayed, including ref, crf value, qp quantization step size, etc., as well as the proportions of I frame, P frame, and B frame. It also contains audio-related information as shown below:

Use the command ffmpeg -i test.mp4 -c:v libx264 -crf 24 test24.mp4 to transcode CRF=24, and the transcoding result is shown in the figure below:

After transcoding, check the parameters of the three files and compare them. The results are shown in the figure below:

The above parameters can only roughly understand the basic information of the three videos, and then use the Elecard eye professional tool to view the intuitive diagram of the cause of the change, and the three file stream analysis results:

The comparison of the three documents is summarized as follows:

It can be seen that with the use of CRF parameters, the number of I frames is drastically reduced, and B frames are introduced at the same time; the entropy coding adopts the CABAC method, so the compression rate is greatly improved and the file size is reduced. At the same time, as the CRF value becomes larger, the compression rate of P frame and B frame also becomes larger, and the file size becomes smaller. CRF code walkthrough

Although I have read the FFMPEG code before, I haven't fully noticed the specific CRF parameters. In order not to understand the problem with only a half-knowledge, I force myself to go through the code to enhance my impression, deep understanding, and pave the way for friends who care about this parameter. *•CRF definition

First of all, you can see the definition of this value in X264:

typedef struct X264Context {
    AVClass        *class;
    x264_param_t    params;
    ......

    float crf;

    ......
    ｝

The specific definition in AVOption is as follows:

static const AVOption options[] = {
    { "preset",        "Set the encoding preset (cf. x264 --fullhelp)",   OFFSET(preset),        AV_OPT_TYPE_STRING, { .str = "medium" }, 0, 0, VE},
    { "tune",          "Tune the encoding params (cf. x264 --fullhelp)",  OFFSET(tune),          AV_OPT_TYPE_STRING, { 0 }, 0, 0, VE},
    { "profile",       "Set profile restrictions (cf. x264 --fullhelp) ", OFFSET(profile),       AV_OPT_TYPE_STRING, { 0 }, 0, 0, VE},
......
    {"x264opts", "x264 options", OFFSET(x264opts), AV_OPT_TYPE_STRING, {.str=NULL}, 0, 0, VE},
    { "crf",           "Select the quality for constant quality mode",    OFFSET(crf),           AV_OPT_TYPE_FLOAT,  {.dbl = -1 }, -1, FLT_MAX, VE },
    { "crf_max",       "In CRF mode, prevents VBV from lowering quality beyond this point.",OFFSET(crf_max), AV_OPT_TYPE_FLOAT, {.dbl = -1 }, -1, FLT_MAX, VE },
......
}

CRF still belongs to Rate control, so you can see that its RC related definition is as follows:

#define X264_RC_CQP                  0
#define X264_RC_CRF                  1
#define X264_RC_ABR                  2

• FFMPEG interface combing

There are too many parts related to the FFMPEG code review, here is only a brief description of the corresponding part of the CRF, other codec processes can be completed according to the code review process of other great gods on the Internet. This article assumes that everyone has enough foundation: the codec entry of X264 conforms to the FFMPEG interface definition, and the corresponding relationship is shown in the following figure:

Here is a picture of Raytheon to explain: (https://blog.csdn.net/leixiaohua1020/article/details/45960409)

X264_init()

The main function of the X264_init function is to pass the previously assigned and initialized option values to the libx264 module in turn to initialize the X264 parameters and assign the RC parameters. These values are passed from AVCodecContext and the default values of X264Context. Those familiar with FFMPEG know that AVCodecContext contains the codec option value in the input command line and the option value contained in the FFMPEG command, while X264Context contains the related options of x264. The combination of the two forms a complete x264 codec option value.

At the end of X264_init, perform the OPEN action of X264Codec and the action of encoding the global header.

x264_param_default

x264_param_default sets the default parameters, including other option values. I only care about CRF related options here. In x264_param_default, CRF is turned on by default, and the CRF option f_rf_constant is set to 23. This is also the reason for the default value of 23 mentioned in many other articles.

At the same time, note that it is observed that the B frame in the x264_param_default default parameter is set and set again, and cabac is turned on by default. Therefore, if cabac is turned on by default in the files transcoded with FFMPEG bin files, this is the root cause of CABAC and the increase of B frames when viewed on the tool side.

x264_encoder_open

After initializing the specific parameters, the init function then performs the operation of x264_encoder_open (the relevant code is located in encoder\encoder.c), and then it will specifically open the h264 related encoder in x264.

Afterwards, x264_encoder_open is mainly used to open the encoder, which verifies and initializes various variables required for libx264 encoding, and completes the initialization of sps, pps, and qm.

validate_parameters

Calling validate_parameters will check the input parameters to prevent encoding failure due to abnormal input parameters. This function completes the verification, update and assignment of CRF related parameters.

For other parts of the process, you can refer to the articles of other great gods, and I will not repeat them again. (Raytheon’s analysis is very detailed, please worship x264 source code simple analysis: the main part of the encoder -1_Lei Xiaohua (leixiaohua1020) column blog 1616d45af3422a)

x264_ratecontrol_new

x264_encoder_open will finally call x264_ratecontrol_new to initialize the rate control related variables.

x264_ratecontrol_new, which mainly sets the core parameters of rate control. You need to have a good understanding of x264 rate control to really understand it, otherwise it will be easy to look dizzy.

In the x264_ratecontrol_new function, according to the incoming parameter is CRF mode, and the default value of b_stat_read is 0, the b_abr parameter can be set to 1, and b_2pass is set to 0, which means that the CRF mode is abr and non-2-pass in rate_control. For processing.

In the x264_ratecontrol_init_reconfigurable function, the VBV parameters will be initialized, and the CRF related parameters base_cplx and rate_factor_constant will be updated.

At the same time, when the setting in x264_ratecontrol_init_reconfigurable is called, the parameter b_init=1 is passed in. At this time, the CRF sets the VBV mode to pave the way for the subsequent rate_control.

X264_frame

X264_frame() is used to perform a complete encoding of one frame of video data according to the incoming packet data. The function part is defined as follows.

reconfig_encoder

The main function of reconfig_encoder is to compare RC-related parameters with the parameters in AVCodecContext. If they are inconsistent, reconfigure the encoder. For example, the CRF value is initially set to 24, but the command line is set to 18. At this time, the two values are inconsistent. You need to assign the value according to the command line and reconfigure the encoder to meet the user's expectations. Just look at the specific configuration briefly, and I won't expand it here.

x264_encoder_encode

x264_encoder_encode is the beginning of real encoding. In the function x264_encoder_encode, a complete YUV image is encoded into an H264 video stream. For this process, please refer to Thor's article. The analysis is very good, https://blog.csdn.net/leixiaohua1020/article /details/45644367

What I am concerned about here is part of the content involved in CRF. The content related to rate control in x264_encoder_encode is mainly the following interface:

x264_thread_sync_ratecontrol()：

x264_ratecontrol_zone_init()：

x264_ratecontrol_start(): Turn on rate control and perform rate control for each frame. In x264_ratecontrol_start, different qp will be selected for compression according to different rate control modes. The previous analysis shows that CRF belongs to the abr mode, and the B frame is added at the same time, so the qp of each frame is different, so the file size after encoding under the same quality condition after compression cannot be determined.

x264_ratecontrol_qp()：

Rate control is a big piece of content, and the designed algorithm is also more complicated. This article only focuses on how to convert crf mode to vbv mode, and some parameters that affect the encoding. We will analyze and track the whole process in the next article. .

The above are some personal opinions, and there may be some incorrect points. Welcome everyone to discuss and study together.

If this article is helpful to you, please like, bookmark, forward, follow, and continue to update the audio and video related content.

Audio and video codec - coding parameter CRF

RTE开发者社区

引用和评论

Hume 推出 Octave TTS 即时模式，250 毫秒响应；客服语音智能体 Sona：简单集成、高度自定义丨日报

🔥全程不用写代码，我用 AI 程序员写了一个飞机大战

从 DeepSeek 看25年前端的一个小趋势

Open WebUI：开源AI交互平台的全面解析

大模型中的Token究竟是什么？从原理到作用深度解析

一文掌握 MCP 上下文协议：从理论到实践

MySQL × 向量数据库：大模型时代的黄金组合实战指南