Audio and video encoding and decoding process and how to use FFMPEG commands for audio and video processing

音视频编解码流程与如何使用FFMPEG命令进行音视频处理

I. Introduction

FFMPEG is a particularly powerful open source library dedicated to processing audio and video. You can use its API to process audio and video, or use the tools it provides, such as ffmpeg , ffplay , ffprobe , to edit your audio and video files.
This article will briefly introduce the basic directory structure and functions of the FFMPEG library, and then introduce in detail how we use the tools provided by ffmpeg to process audio and video files in our daily work.

2. FFMPEG catalog and function

libavcodec: provides a series of encoder implementations.
libavformat: implemented in the stream protocol, container format and its native IO access.
libavutil: includes hash, decoder and various utility functions.
libavfilter: provides various audio and video filters.
libavdevice: provides an interface for accessing capture devices and playback devices.
libswresample: realizes mixing and resampling.
libswscale: implements color conversion and scaling functions.

Three, the basic concept of FFMPEG

Before explaining the FFMPEG commands, we must first introduce some basic concepts of audio and video formats.

(1) Audio/video streaming

In the field of audio and video, we call one audio/video one stream . For example, when we were young, we often used VCD to watch Hong Kong movies, and we could choose Cantonese or Mandarin sound in it. In fact, there are two audio streams stored in the CD video file, and the user can choose one of them to play.

(2) Container

We generally call MP4, FLV, MOV and other file formats as container . That is, in these commonly used format files, multiple audio and video files can be stored. Take MP4 as an example, you can store one video stream, multiple audio streams, and multiple subtitle streams.

(3) channel

Channel is a concept in audio, called channel. In one audio stream, there can be mono, dual or stereo.

Four, FFMPEG functional classification

We can divide FFMPEG commands into the following categories according to the purpose of use:

Basic information query command
Record
Decomposition/reuse
Processing raw data
Filter
Cut and merge
Picture/View Mutual Transfer
Live related

Except for the basic information query command of FFMPEG, all other commands process audio and video according to the flow shown in the figure below.

Then pass the encoded data packet to the decoder (unless stream copy is selected for the data stream, see further description). The decoder produces uncompressed frames (original video/PCM audio/...), which can be further processed by filtering (see next section). After filtering, the frame is passed to the encoder, and the encoder outputs the encoded data packet. Finally, these are passed to the multiplexer, and the encoded data packets are written to the output file.
By default, ffmpeg only contains one stream of each type (video, audio, subtitle) in the input file and adds it to each output file. It selects each "best" according to the following criteria: for video, it is the stream with the highest resolution, for audio, it is the stream with the most channels, and for subtitles, it is the first subtitle stream. In the case where several streams of the same type are equal, the stream with the lowest index is selected.
You can disable some default settings by using the -vn / -an / -sn / -dn options. For full manual control, use the -map option, which disables the default settings just described.

Five, basic information query commands

MPEG can use the following parameters for basic information query. For example, if you want to check which filters are currently supported by FFMPEG, you can use ffmpeg -filters . The detailed parameter description is as follows:

parameter description

-version Display the version.

-formats Display available formats (including devices).

-demuxers Display available demuxers.

-muxers Display available muxers.

-devices Display available devices.

-codecs Display all codecs known to libavcodec.

-decoders Display available decoders.

-encoders Display all available encoders.

-bsfs Display available bitstream filters.

-protocols Display the available protocols.

-filters Display available libavfilter filters.

-pix_fmts displays the available pixel formats.

-sample_fmts shows the available sampling formats.

-layouts Display channel names and standard channel layouts.

-colors Display the recognized color names.

What follows is the command format and parameters used by FFMPEG to process audio and video

Six, the basic format and parameters of the command

The following is the basic command format of FFMPEG:

ffmpeg [global_options] {[input_file_options] -i input_url} ...

                         {[output_file_options] output_url} ...

ffmpeg reads and outputs any number of input "files" (can be regular files, pipes, network streams, grab devices, etc.) through the -i option, and writes any number of output "files".
In principle, each input/output "file" can contain any number of different types of video streams (video/audio/subtitles/attachments/data). The number and/or type of streams are limited by the container format. Choose from which input to enter which output will be automatically completed or use the -map option.
To reference input files in the options, you must use their index (starting from 0). E.g. The first input file is 0, the second input file is 1, and so on. Similarly, streams within files are referenced by their indexes. E.g. 2:3 refers to the fourth stream in the third input file.

The above are the common commands used by FFMPEG to process audio and video. Here are some common parameters:

(1) Main parameters

-f fmt (input/output)

Mandatory input or output file format. The format is usually to automatically detect the input file and guess it from the file extension of the output file, so in most cases this option is not needed.

-i url (input)

Enter the URL of the file

-y (global parameter)

Overwrite the output file without asking.

-n (global parameter)

Do not overwrite the output file, if the specified output file already exists, please exit immediately.

-c [:stream_specifier] codec (input/output, per stream)

Choose an encoder (when used before the output file) or decoder (when used before the input file) for one or more streams. codec is the name of the decoder/encoder or copy (output only) to indicate that the stream is not re-encoded. Such as: ffmpeg -i INPUT -map 0 -c:v libx264 -c:a copy OUTPUT

-codec [:stream_specifier] Codec (input/output, per stream)

Same as -c

-t duration (input/output)

When used as an input option (before -i), it limits the duration of the data read from the input file. When used as an output option (before output url), the output stops after the duration reaches the duration.

-ss position (input/output)

When used as an input option (before -i), find the position in this input file. Please note that in most formats, it is impossible to search precisely, so ffmpeg will look for the nearest search point before the position. When transcoding and -accurate_seek are enabled (default), this extra segment between the seek point and position will be decoded and discarded. It will be retained when streaming copy or -noaccurate_seek is used. When used as an output option (before the output url), decode but discard the input until the timestamp reaches the position.

-frames [：stream_specifier] framecount（output，per-stream）

Stop writing to the stream after the frame count frame.

-filter [：stream_specifier] filtergraph（output，per-stream）

Create the filter graph specified by filtergraph and use it to filter the flow. The filtergraph is a description of the filtergraph applied to the stream, and must have a single input and a single output of the same type of stream. In the filter graph, the input is associated with the label in the label, and the output in the label is associated with the label. For more information about filtergraph syntax, please refer to the ffmpeg-filters manual.

(2) Video parameters

-vframes num (output)

Set the number of video frames to be output. For -frames:v, this is an obsolete alias and you should use it.

-r [:stream_specifier] fps (input/output, per stream)

Set the frame rate (Hz value, fraction or abbreviation). As an input option, ignore any time stamps stored in the file and generate a new time stamp based on the rate. This is different from using the -framerate option (it was the same as used in older versions of FFmpeg). If in doubt, use -framerate instead of input option -r. As an output option, copy or discard the input frame to achieve a constant output frame rate fps.

-s [:stream_specifier] size (input/output, each stream)

Set the window size. As an input option, this is a shortcut to the video_size dedicated option, which is recognized by some framers whose frame size is not stored in the file. As an output option, this will insert the zoom video filter at the end of the corresponding filter graphic. Please use the proportional filter directly to insert it at the beginning or elsewhere. The format is'wxh' (default-same as source).

-aspect [:stream_specifier] aspect ratio (output, per stream)

The aspect ratio of the video display specified in the settings. aspect can be a string of floating-point numbers, or a string of the form num:den, where num and den are the numerator and denominator of the aspect ratio. For example, "4:3", "16:9", "1.3333" and "1.7777" are valid parameter values. If used with a copy of -vcodec, it will affect the aspect ratio stored at the container level, but will not affect the aspect ratio stored in the encoded frame (if it exists).

-vn (output)

Disable video recording.

-vcodec codec (output)

Set the video codec. This is an alias for -codec:v.

-vf filtergraph (output)

Create the filter graph specified by filtergraph and use it to filter the flow.

(3) Audio parameters

-aframes (output)

Set the number of audio frames to be output. This is an obsolete alias for -frames:a.

-ar [:stream_specifier] freq (input/output, per stream)

Set the audio sampling frequency. For the output stream, it is set by default to the frequency of the corresponding input stream. For input streams, this option only applies to audio capture devices and original splitters, and maps to the corresponding splitter options.

-ac [:stream_specifier] channel (input/output, each stream)

Set the number of audio channels. For output streams, it is set by default to the number of input audio channels. For input streams, this option only applies to audio capture devices and original splitters, and maps to the corresponding splitter options.

-an (output)

Disable recording.

-acodec codec (input/output)

Set the audio codec. This is an alias for -codec: a.

-sample_fmt [:stream_specifier] sample_fmt (output, per stream)

Set the audio sample format. Use -sample_fmts to get a list of supported sample formats.

-af filtergraph (output)

Create the filter graph specified by filtergraph and use it to filter the flow.

Seven, function description

(1) Recording

ffmpeg -f avfoundation -list_devices true -i ""

(2) Screen recording

ffmpeg -f avfoundation -i 1 -r 30 out.yuv

-f specifies the use of avfoundation to collect data.
-i specifies where to collect data, it is a file index number. On my computer, 1 represents the desktop (you can query the device index number through the above command).
-r specifies the frame rate. According to the official ffmpeg documentation, -r and -framerate have the same effect, but they are different in actual testing. -framerate is used to limit the input, and -r is used to limit the output.
Note that the input of the desktop does not require frame rate, so there is no need to limit the frame rate of the desktop. In fact, restrictions are useless.

(3) Screen recording + sound

ffmpeg  -f avfoundation -i 1:0  -r 29.97 -c:v libx264 -crf 0 -c:a libfdk_aac -profile:a aac_he_v2 -b:a 32k  out.flv

-i 1:0 The "1" before the colon represents the screen index number. The "0" after the colon represents the voice number.
-c:v Same as the parameter -vcodec, it means the video encoder. c is the abbreviation of codec, and v is the abbreviation of video.
-crf is the parameter of x264. 0 Table-style lossless compression.
-c:a Same as the parameter -acodec, it means the audio encoder.
-profile is the parameter of fdk_aac. The aac_he_v2 table format uses AAC_HE v2 to compress data.
-b:a specifies the audio bitrate. b is the abbreviation of bitrate, and a is the abbreviation of audio.

(4) Video recording

ffmpeg -framerate 30 -f avfoundation -i 0 out.mp4

-framerate Limit the video capture frame rate. This must be set according to the prompt requirements, if not set, an error will be reported.
-f specifies the use of avfoundation to collect data.
-i specifies the index number of the video device.

(5) Video + audio

ffmpeg -framerate 30 -f avfoundation -i 0:0 out.mp4

(6) Recording

ffmpeg -f avfoundation -i :0 out.wav

(7) Record audio raw data

ffmpeg  -f avfoundation -i :0 -ar 44100 -f s16le out.pcm

(8) Decomposition and reuse

Stream copy is to select the stream mode by supplying the copy parameter to the -codec option. It makes ffmpeg omit the decoding and encoding steps of the specified stream, so it can only perform demultiplexing and multiplexing. This is useful for changing the container format or modifying container-level metadata. In this case, the above diagram will be simplified to:

Since there is no decoding or encoding, the speed is very fast and there is no quality loss. However, due to many factors, it may not work properly in some cases. It is obviously impossible to apply filters, because filters deal with uncompressed data.

(9) Extract audio stream

ffmpeg -i input.mp4 -acodec copy -vn out.aac

acodec: Specify the audio encoder, copy indicates only copy, not codec.
vn: v stands for video, n stands for no, which means no video.

(10) Convert to MP3 format

ffmpeg -i input.mp4 -acodec libmp3lame  out.mp3

(11) Extract the video stream

ffmpeg -i input.mp4 -vcodec copy -an out.h264

vcodec: Specify the video encoder, copy indicates only copy, not codec.
an: a stands for video, n stands for no, which means no audio.

(12) Format conversion

ffmpeg -i out.mp4 -vcodec copy -acodec copy out.flv

The audio and video of the above command form are directly copied, but the encapsulation format of mp4 is converted to flv.

(13) Audio and video merge

ffmpeg -i out.h264 -i out.aac -vcodec copy -acodec copy out.mp4

(14) Extract YUV data

ffmpeg -i input.mp4 -an -c:v rawvideo -pixel_format yuv420p out.yuv
# 播放
ffplay -s wxh out.yuv

-c:v rawvideo specifies to convert the video to raw data
-pixel_format yuv420p specifies the conversion format as yuv420p

(15) YUV to H264

ffmpeg -f rawvideo -pix_fmt yuv420p -s 320x240 -r 30 -i out.yuv -c:v libx264 -f rawvideo out.h264

(16) Extract PCM data

ffmpeg -i out.mp4 -vn -ar 44100 -ac 2 -f s16le out.pcm
# 播放
ffplay -ar 44100 -ac 2 -f s16le -i out.pcm

(17) PCM to WAV

ffmpeg -f s16be -ar 8000 -ac 2 -acodec pcm_s16be -i input.raw output.wav

(18) Simple filter

Before encoding, ffmpeg can use the filters in the libavfilter library to process the original audio and video frames. Several chain filters form a filter pattern. ffmpeg distinguishes two types of filter graphics: simple and complex.

Simple filter graphs are those with only one input and output, all of the same type. In the above figure, they can be represented by inserting an extra step between decoding and encoding:

Simple filtergraphs are configured with the per-stream-filter option (use -vf and -af aliases for video and audio respectively). A simple video filtergraph can look like this example:

Please note that some filters will change the frame properties but not the frame content. E.g. The fps filter in the above example will change the number of frames, but will not touch the content of the frame. Another example is the setpts filter, which only sets the timestamp, otherwise it does not change the frame.

(19) Complex filter

Complex filter graphs are those that cannot be simply described as a linear processing chain applied to a stream. For example, this is the case when the graph has multiple inputs and/or outputs, or when the output stream type is different from the input. They can be represented by the following diagram:

Complex filter graphs are configured using the -filter_complex option. Please note that this option is global, because complex filter graphs inherently cannot be explicitly associated with a single stream or file.
The -lavfi option is equivalent to -filter_complex.
A simple example of a complex filter graph is an overlay filter, which has two video inputs and one video output, containing one video superimposed on top of the other. Its audio counterpart is the amix filter.

(20) Add watermark

ffmpeg -i out.mp4  -vf "movie=logo.png,scale=64:48[watermask];[in][watermask] overlay=30:10 [out]" water.mp4

The movie in -vf specifies the location of the logo. scale specifies the size of the logo. overlay specifies the location of the logo.

(21) Delete watermark

First find the location to delete the LOGO through ffplay

ffplay -i test.flv -vf delogo=x=806:y=20:w=70:h=80:show=1

Use delogo filter to delete LOGO

ffmpeg -i test.flv -vf delogo=x=806:y=20:w=70:h=80 output.flv

(22) Video is doubled

fmpeg -i out.mp4 -vf scale=iw/2:-1 scale.mp4

-vf scale specifies the use of a simple filter scale, iw in iw/2:-1 specifies the width of the video as an integer. -1 means that the height changes with the width.

(23) Video cropping

ffmpeg -i VR.mov  -vf crop=in_w-200:in_h-200 -c:v libx264 -c:a copy -video_size 1280x720 vr_new.mp4

Crop format: crop=out_w:out_h:x:y
out_w: The width of the output. You can use the in_w table to enter the width of the video.
out_h: The height of the output. You can use the in_h table to enter the height of the video.
x: X coordinate
y: Y coordinate
If x and y are set to 0, it means cropping starts from the upper left corner. If it is not written, it is cut from the center point.

(24) Double speed playback

ffmpeg -i out.mp4 -filter_complex "[0:v]setpts=0.5*PTS[v];[0:a]atempo=2.0[a]" -map "[v]" -map "[a]" speed2.0.mp4

-filter_complex Complex filter, [0:v] means the video of the first (file index number is 0) file as input. setpts=0.5*PTS means that the pts timestamp of each frame of video is multiplied by 0.5, that is, the difference is less than half. [v] represents the alias of the output. The audio is the same and will not be detailed.
map can be used to process complex output, for example, you can output a specified multiple streams to one output file, or you can specify output to multiple files. "[v]" The alias output of the complex filter is used as a stream of the output file. The usage of the map above is to output the video and audio output by the complex filter to the specified file.

(25) Symmetrical video

ffmpeg  -i out.mp4 -filter_complex "[0:v]pad=w=2*iw[a];[0:v]hflip[b];[a][b]overlay=x=w" duicheng.mp4

hflip flip horizontally
If you want to change to vertical flip, you can use vflip.

(25) Picture in Picture

ffmpeg -i out.mp4 -i out1.mp4 -filter_complex "[1:v]scale=w=176:h=144:force_original_aspect_ratio=decrease[ckout];[0:v][ckout]overlay=x=W-w-10:y=0[out]" -map "[out]" -movflags faststart new.mp4

(26) Recording Picture-in-Picture

ffmpeg  -f avfoundation -i "1" -framerate 30 -f avfoundation -i "0:0"

-r 30 -c:v libx264 -preset ultrafast

-c:a libfdk_aac -profile:a aac_he_v2 -ar 44100 -ac 2

-filter_complex "[1:v]scale=w=176:h=144:force_original_aspect_ratio=decrease[a];[0:v][a]overlay=x=W-w-10:y=0[out]"

-map "[out]" -movflags faststart -map 1:a b.mp4

(27) Multi-channel video splicing

ffmpeg  -f avfoundation -i "1" -framerate 30 -f avfoundation   -i "0:0" -r 30 -c:v libx264 -preset ultrafast -c:a libfdk_aac -profile:a aac_he_v2 -ar 44100 -ac 2 -filter_complex "[0:v]scale=320:240[a];[a]pad=640:240[b];[b][1:v]overlay=320:0[out]" -map "[out]" -movflags faststart  -map 1:a  c.mp4

(28) Cutting

ffmpeg -i out.mp4 -ss 00:00:00 -t 10 out1.mp4

-ss specifies the start time of cropping, accurate to the second
-t The length of time after being cropped.

(29) Merger

First create an inputs.txt file, the content of the file is as follows:
file '1.flv'
file '2.flv'
file '3.flv'
Then execute the following command:
ffmpeg -f concat -i inputs.txt -c copy output.flv

(30) hls slice

ffmpeg -i out.mp4 -c:v libx264 -c:a libfdk_aac -strict -2 -f hls  out.m3u8

-strict -2 specifies that the audio uses AAC.
-f hls is converted to m3u8 format.

(31) Video to JPEG

ffmpeg -i test.flv -r 1 -f image2 image-%3d.jpeg

(32) Video to gif

ffmpeg -i test.flv -r 1 -f image2 image-%3d.jpeg

(33) Picture to video

ffmpeg  -f image2 -i image-%3d.jpeg images.mp4

(34) Push Stream

ffmpeg -re -i out.mp4 -c copy -f flv rtmp://server/live/streamName

(35) Pull stream to save

ffmpeg -i rtmp://server/live/streamName -c copy dump.flv

(36) Diversion

ffmpeg -i rtmp://server/live/originalStream -c:a copy -c:v copy -f flv rtmp://server/live/h264Stream

(37) Real-time streaming

ffmpeg -framerate 15 -f avfoundation -i "1" -s 1280x720 -c:v libx264  -f  flv rtmp://localhost:1935/live/room

(38) Play YUV data

ffplay -pix_fmt nv12 -s 192x144 1.yuv

(39) Play the Y plane in YUV

ffplay -pix_fmt nv21 -s 640x480 -vf extractplanes='y' 1.yuv