一些背景:
这次的需求仍然是超过了我的知识范围——
需求有两个。
1,将H.264视频流文件进行mp4格式封装。
2,对音频和视频进行合并。
基于我对多媒体处理的简单理解,这两个需求都是可以通过ffmpeg实现的。最朴素的思路是通过shell调用系统中安装的ffmpeg。
Runtime.getRuntime().exec("command line");
但是这样做有两个令我不爽的点:
一是很难确保系统中已经正确安装好了ffmpeg。涉及到的这两个多媒体操作只是系统中的一个小小功能,并且考虑到跨平台也会增加部署的难度。为此需要在整个项目部署过程中大动干戈安装ffmpeg似乎对实施同事不太友好。
二是我不喜欢通过shell去调用其他程序。无它,只是觉得操作3个流(输入 输出 错误)写起来不优雅。如果有封装好的javaAPI就好了。
那么答案显而易见就是使用javaCV了。通过maven管理的依赖可以有效地适配全平台,并且有封装好的API。
下面干货:
简单介绍一下javaCV.JavaCV是一个用于Java编程语言的计算机视觉和多媒体处理库,它为Java提供了对OpenCV(计算机视觉库)和FFmpeg(多媒体框架)的集成和访问能力。通过JavaCV,开发人员可以在Java中轻松使用OpenCV和FFmpeg的功能,从而进行图像处理、视频处理以及多媒体数据的处理。
使用上,先添加依赖:
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>javacv</artifactId>
<version>1.5.6</version>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>ffmpeg-platform</artifactId>
<version>4.4-1.5.6</version>
</dependency>
这里使用1.5.6版本的javaCV库,然后引入ffmpeg的全平台依赖ffmpeg-platform.
在具体使用上,这里提供一个例子
public static VideoCodecResult codecVideoFile(Path videoFilePath) {
String mainName = FileNameUtil.getPrefix(videoFilePath.toFile());
Path targetPath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX + TMP_FILE_SUFFIX);
File file = videoFilePath.toFile();
try (FFmpegFrameGrabber grabber = new FFmpegFrameGrabber(file);
FFmpegFrameRecorder recorder = new FFmpegFrameRecorder(targetPath.toFile(), grabber.getImageWidth(), grabber.getImageHeight())) {
grabber.start();
recorder.setVideoCodec(avcodec.AV_CODEC_ID_H264);
recorder.setFormat("mp4");
// recorder.setFrameRate(24);
// recorder.setVideoBitrate(1024);
// recorder.setVideoOption("bEnableFrameSkip", "1");
// recorder.setVideoQuality(2);
Frame frame;
AtomicBoolean recorderInit = new AtomicBoolean(false);
while ((frame = grabber.grab()) != null) {
if (!recorderInit.get()) {
recorder.setImageWidth(frame.imageWidth);
recorder.setImageHeight(frame.imageHeight);
recorder.start();
recorderInit.set(true);
}
recorder.record(frame);
}
recorder.stop();
grabber.stop();
} catch (Throwable throwable) {
throw new RuntimeException(throwable);
}
Path finalFilePath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX);
FileUtil.move(targetPath.toFile(), finalFilePath.toFile(), true);
if (!finalFilePath.toFile().exists() || !finalFilePath.toFile().isFile() || finalFilePath.toFile().length() == 0) {
throw new RuntimeException("file not found:" + targetPath);
}
return new VideoCodecResult(finalFilePath);
}
@Data
private static class VideoCodecResult {
private boolean success;
private Throwable caught;
private String message;
private Path sourceFilePath;
private Path resultFilePath;
public VideoCodecResult(Path resultFilePath) {
this.success = true;
this.resultFilePath = resultFilePath;
}
private VideoCodecResult(Throwable throwable) {
this.success = false;
this.caught = throwable;
this.message = throwable.getMessage();
}
}
这是实现从H.264视频流文件封装成mp4格式的方法。核心的逻辑是通过FFmpegFrameGrabber 从视频流文件中读取Frame,然后写入FFmpegFrameRecorder 中。
由于待转换的视频流文件其图像尺寸是未知的,所以通过frame中取得帧后,通过帧的属性获得图像尺寸再初始化recorder。在初始化recorder之前,可以指定特定的帧率、比特率、图像质量等等;这些视频属性有些依赖于从grabber中获取,并且不是都能获取到,因此这里的处理是不修改,使用默认值。
通过上述例程可以看出虽然我没有指定grabber的文件编码方式,grabber仍然能正常从文件中读取Frame,这是因为ffmpeg有识别源文件编码方式的设计。整个转码过程还是比较简单的,只需通过FFmpegFrameGrabber和FFmpegFrameRecorder两个核心类即可完成。
接下来是加上音频处理的音视频合并方法:
private static AVCombineResult combineAVFile(Path videoFilePath, Path audioFilePath) {
String mainName = FileNameUtil.getPrefix(videoFilePath.toFile());
Path targetPath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX + TMP_FILE_SUFFIX);
File videoFile = videoFilePath.toFile();
File audioFile = audioFilePath.toFile();
try (FFmpegFrameGrabber videoGrabber = new FFmpegFrameGrabber(videoFile);
FFmpegFrameGrabber audioGrabber = new FFmpegFrameGrabber(audioFile);
FFmpegFrameRecorder recorder = new FFmpegFrameRecorder(targetPath.toFile(), videoGrabber.getImageWidth(), videoGrabber.getImageHeight())) {
videoGrabber.start();
audioGrabber.start();
recorder.setVideoCodec(avcodec.AV_CODEC_ID_H264);
recorder.setAudioCodec(avcodec.AV_CODEC_ID_AAC);
recorder.setFormat("mp4");
recorder.setAudioChannels(1);
double rate = 1.0 * videoGrabber.getLengthInTime() / audioGrabber.getLengthInTime();
if (videoGrabber.getLengthInTime() < audioGrabber.getLengthInTime()) {
recorder.setFrameRate(videoGrabber.getFrameRate() * rate);
}
// recorder.setVideoBitrate(1024);
// recorder.setVideoOption("bEnableFrameSkip", "1");
// recorder.setVideoQuality(2);
// recorder.setAudioBitrate(1024);
// recorder.setAudioQuality(2);
Frame videoFrame = null;
Frame audioFrame = null;
AtomicBoolean recorderInit = new AtomicBoolean(false);
while ((videoFrame = videoGrabber.grab()) != null | (audioFrame = audioGrabber.grab()) != null) {
if (!recorderInit.get()) {
if (videoFrame != null) {
recorder.setImageWidth(videoFrame.imageWidth);
recorder.setImageHeight(videoFrame.imageHeight);
}
recorder.start();
recorderInit.set(true);
}
if (videoFrame != null) {
videoFrame.timestamp /= rate;
recorder.record(videoFrame);
}
if (audioFrame != null) {
recorder.record(audioFrame);
}
}
recorder.stop();
videoGrabber.stop();
audioGrabber.stop();
} catch (Throwable throwable) {
throw new RuntimeException(throwable);
}
Path finalFilePath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX);
FileUtil.move(targetPath.toFile(), finalFilePath.toFile(), true);
if (!finalFilePath.toFile().exists() || !finalFilePath.toFile().isFile() || finalFilePath.toFile().length() == 0) {
throw new RuntimeException("file not found:" + targetPath);
}
return new AVCombineResult(finalFilePath);
}
核心的逻辑仍然是通过FFmpegFrameGrabber 从视频文件和音频文件中读取Frame,然后写入FFmpegFrameRecorder 中。构造FFmpegFrameGrabber时只需要通过视频和音频文件作为入参,从Grabber中提取Frame时视频和音频没什么区别。
但是,合并音视频存在一些进阶版问题需要考虑:音频和视频长度一致吗?音频和视频都是连续的吗?音频和视频开始时间相同吗?这些问题都会影响音视频的时间差,最后合并文件的观感。
从上面代码可以看出,方法中只是将音视频的开头对齐并将视频伸缩到合适的长度,这个方式显然无法保持音视频精准的同步。在业务场景中,除了直播一类的音视频同步场景,也无法保证提供的音频和视频完全是对齐的,需要根据业务场景来判断。
当音频和视频在绝对时间上错位时(比如音频先录制视频后录制,或者反过来),我们希望使用空白图像或音频来填补空缺。
为了让合并文件有良好的观感,实现音视频同步,至少需要知道音视频开始的时间间隔+音/视频中间是否有中断+中断的时间+音/视频分别的缩放比例(时间上伸缩的比例)。具有上述条件后,便可以计算出音视频的真实长度,从而计算出真实的帧率。有了真实的帧率,便可以在初始化recorder时指定帧率,也可以计算得出空白帧的数量。接下来,在合适的时间插入合适数量的空白帧,便可以将缺少的视频或音频填充起来,使它们在时间上对齐,从而实现音视频同步。
下面是修改后的代码
private static AVCombineResult combineAVFileAccurate(Path videoFilePath, Path audioFilePath, List<long[]> videoSegTime, List<long[]> audioSegTime) {
long videoStartTime = videoSegTime.get(0)[0];
long audioStartTime = audioSegTime.get(0)[0];
long videoExtraDelay = 0;
long audioExtraDelay = 0;
if (videoStartTime > audioStartTime) {
videoExtraDelay = videoStartTime - audioStartTime;
} else {
audioExtraDelay = audioStartTime - videoStartTime;
}
long expectVideoLengthInTime = videoSegTime.stream().mapToLong(x -> x[1] - x[0]).sum();
if (expectVideoLengthInTime > VIDEO_TIME_LENGTH_MAGIC_NUM) {
expectVideoLengthInTime -= VIDEO_TIME_LENGTH_MAGIC_NUM;
}
//转换微秒
expectVideoLengthInTime /= 1000L;
long videoTotalLength = 0;
LinkedList<TimestampDelay> videoDelays = new LinkedList<>();
if (videoExtraDelay > 0) {
videoDelays.add(new TimestampDelay(0, videoExtraDelay));
}
for (int i = 0; i < videoSegTime.size(); i++) {
long[] startStopTime = videoSegTime.get(i);
if (i > 0) {
long[] lastStartStopTime = videoSegTime.get(i - 1);
long timeDelay = startStopTime[0] - lastStartStopTime[1];
if (timeDelay > 0) {
videoDelays.add(new TimestampDelay(videoTotalLength, timeDelay));
}
}
videoTotalLength += startStopTime[1] - startStopTime[0];
}
long audioTotalLength = 0;
LinkedList<TimestampDelay> audioDelays = new LinkedList<>();
if (audioExtraDelay > 0) {
audioDelays.add(new TimestampDelay(0, audioExtraDelay));
}
for (int i = 0; i < audioSegTime.size(); i++) {
long[] startStopTime = audioSegTime.get(i);
if (i > 0) {
long[] lastStartStopTime = audioSegTime.get(i - 1);
long timeDelay = startStopTime[0] - lastStartStopTime[1];
if (timeDelay > 0) {
audioDelays.add(new TimestampDelay(audioTotalLength, timeDelay));
}
}
audioTotalLength += startStopTime[1] - startStopTime[0];
}
String mainName = FileNameUtil.getPrefix(videoFilePath.toFile());
Path targetPath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX + TMP_FILE_SUFFIX);
log.info("do av combine, target:{}, source video:{}, source audio: {}", targetPath, videoFilePath, audioFilePath);
log.info("video segment num: {}, time delay:{}", videoSegTime.size(), videoDelays);
log.info("audio segment num: {}, time delay:{}", audioSegTime.size(), audioDelays);
File videoFile = videoFilePath.toFile();
File audioFile = audioFilePath.toFile();
long startTime = System.currentTimeMillis();
try (FFmpegFrameGrabber videoGrabber = new FFmpegFrameGrabber(videoFile);
FFmpegFrameGrabber audioGrabber = new FFmpegFrameGrabber(audioFile);
FFmpegFrameRecorder recorder = new FFmpegFrameRecorder(targetPath.toFile(), videoGrabber.getImageWidth(), videoGrabber.getImageHeight())) {
videoGrabber.start();
audioGrabber.start();
recorder.setVideoCodec(avcodec.AV_CODEC_ID_H264);
recorder.setAudioCodec(avcodec.AV_CODEC_ID_AAC);
recorder.setFormat("mp4");
recorder.setAudioChannels(1);
double rate = 1.0 * videoGrabber.getLengthInTime() / expectVideoLengthInTime;
double videoFrameRate = videoGrabber.getFrameRate() * rate;
recorder.setFrameRate(videoFrameRate);
// recorder.setVideoBitrate(1024);
// recorder.setVideoOption("bEnableFrameSkip", "1");
// recorder.setVideoQuality(2);
// recorder.setAudioBitrate(1024);
// recorder.setAudioQuality(2);
Frame videoFrame = null;
Frame audioFrame = null;
int imageWidth = 0;
int imageHeight = 0;
int imageDepth = 0;
int sampleRate = 0;
AtomicBoolean recorderInit = new AtomicBoolean(false);
Frame emptyVideoFrame = null;
Frame emptyAudioFrame = null;
while ((videoFrame = videoGrabber.grab()) != null | (audioFrame = audioGrabber.grab()) != null) {
if (!recorderInit.get()) {
if (videoFrame != null) {
imageWidth = videoFrame.imageWidth;
imageHeight = videoFrame.imageHeight;
imageDepth = videoFrame.imageDepth;
recorder.setImageWidth(imageWidth);
recorder.setImageHeight(imageHeight);
}
if (audioFrame != null) {
sampleRate = audioFrame.sampleRate;
recorder.setSampleRate(sampleRate);
}
recorder.start();
recorderInit.set(true);
}
if (videoFrame != null) {
if (videoDelays.size() > 0 && videoDelays.peekFirst().isBefore(recorder.getTimestamp())) {
TimestampDelay timestampDelay = videoDelays.pollFirst();
double emptyFrameNum = timestampDelay.getDelay() * videoFrameRate / 1000000000L;
int i = 0;
while (i++ < emptyFrameNum) {
if (emptyVideoFrame == null) {
emptyVideoFrame = createEmptyVideoFrame(imageWidth, imageHeight, imageDepth, 1);
}
recorder.record(emptyVideoFrame);
}
}
recorder.record(videoFrame);
}
if (audioFrame != null) {
if (audioDelays.size() > 0 && audioDelays.peekFirst().isBefore(audioFrame.timestamp)) {
TimestampDelay timestampDelay = audioDelays.pollFirst();
int totalSamples = (int) (sampleRate * timestampDelay.getDelay() / 1000000000L);
int frameNum = totalSamples / AUDIO_SAMPLE_UNIT;
if (emptyAudioFrame == null) {
emptyAudioFrame = createEmptyAudioFrame(sampleRate, 1, AUDIO_SAMPLE_UNIT);
}
for (int i = 0; i < frameNum; i++) {
recorder.record(emptyAudioFrame);
}
}
recorder.record(audioFrame);
}
}
recorder.stop();
videoGrabber.stop();
audioGrabber.stop();
} catch (Throwable throwable) {
throw new RuntimeException(throwable);
}
Path finalFilePath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX);
FileUtil.move(targetPath.toFile(), finalFilePath.toFile(), true);
long endTime = System.currentTimeMillis();
FILE_NUM_COUNTER.getAndIncrement();
TIME_COST_COUNTER.addAndGet(endTime - startTime);
if (!finalFilePath.toFile().exists() || !finalFilePath.toFile().isFile() || finalFilePath.toFile().length() == 0) {
throw new RuntimeException("file not found:" + targetPath);
}
return new AVCombineResult(finalFilePath);
}
@Data
@AllArgsConstructor
private static class TimestampDelay {
private long startTimeNsec;
private long delay;
public boolean isBefore(long timeUsec) {
return startTimeNsec <= timeUsec * 1000L;
}
}
涉及到两个构造空白帧的方法是
private static Frame createEmptyVideoFrame(int width, int height, int depth, int channel) {
Frame frame = new Frame(width, height, depth, channel);
for (int i = 0; i < frame.image.length; i++) {
frame.image[i].clear();
}
return frame;
}
private static Frame createEmptyAudioFrame(int sampleRate, int channels, int unitSample) {
Frame frame = new Frame();
frame.audioChannels = channels;
frame.sampleRate = sampleRate;
Buffer[] samples = new Buffer[1];
samples[0] = ShortBuffer.wrap(new short[unitSample]);
frame.samples = samples;
return frame;
}
这里我们通过List<long[]> videoSegTime, List<long[]> audioSegTime分别传入视频的分段情况和音频的分段情况。long[0]表示视频分段的开始时间戳,long[1]表示视频分段的结束时间戳,音频同理。
通过整理这两组数据,可以分别得出音视频的缺失时间长度以及对应的时间戳,进而由帧率计算出缺失时间对应的帧数。
合并的核心逻辑仍然和之前相同,从grabber中提取帧并写入recorder,但是在写入前通过Frame.timestamp获取当前帧时间,并以此计算当前帧之前是否有空缺,若有则通过构造合适数量的空白帧填充。
if (videoFrame != null) {
if (videoDelays.size() > 0 && videoDelays.peekFirst().isBefore(recorder.getTimestamp())) {
TimestampDelay timestampDelay = videoDelays.pollFirst();
double emptyFrameNum = timestampDelay.getDelay() * videoFrameRate / 1000000000L;
int i = 0;
while (i++ < emptyFrameNum) {
if (emptyVideoFrame == null) {
emptyVideoFrame = createEmptyVideoFrame(imageWidth, imageHeight, imageDepth, 1);
}
recorder.record(emptyVideoFrame);
}
}
recorder.record(videoFrame);
}
if (audioFrame != null) {
if (audioDelays.size() > 0 && audioDelays.peekFirst().isBefore(audioFrame.timestamp)) {
TimestampDelay timestampDelay = audioDelays.pollFirst();
int totalSamples = (int) (sampleRate * timestampDelay.getDelay() / 1000000000L);
int frameNum = totalSamples / AUDIO_SAMPLE_UNIT;
if (emptyAudioFrame == null) {
emptyAudioFrame = createEmptyAudioFrame(sampleRate, 1, AUDIO_SAMPLE_UNIT);
}
for (int i = 0; i < frameNum; i++) {
recorder.record(emptyAudioFrame);
}
}
recorder.record(audioFrame);
}
}
这样就能完美地将音视频合并在一起了,当然前提是业务上下文中缺失能提供音视频的时间戳。
功能开发完后,还有额外小插曲:实施同事反映包体太大了。确实,我们通过引入ffmpeg-platform导致引入了全平台依赖,数个平台的ffmpeg依赖累积起来有上百M。这里我的解决办法如下:
去掉ffmpeg-platform的依赖,然后根据实际可能用到的平台引入ffmpeg的依赖。实际的平台通过<classifier>标签进行区分。
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>ffmpeg</artifactId>
<version>4.4-1.5.6</version>
<classifier>windows-x86_64</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>ffmpeg</artifactId>
<version>4.4-1.5.6</version>
<classifier>windows-x86</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>ffmpeg</artifactId>
<version>4.4-1.5.6</version>
<classifier>linux-x86_64</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>ffmpeg</artifactId>
<version>4.4-1.5.6</version>
<classifier>linux-x86</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>ffmpeg</artifactId>
<version>4.4-1.5.6</version>
<classifier>linux-ppc64le</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>ffmpeg</artifactId>
<version>4.4-1.5.6</version>
<classifier>linux-armhf</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>ffmpeg</artifactId>
<version>4.4-1.5.6</version>
<classifier>linux-arm64</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>javacpp</artifactId>
<version>1.5.6</version>
<classifier>windows-x86_64</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>javacpp</artifactId>
<version>1.5.6</version>
<classifier>windows-x86</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>javacpp</artifactId>
<version>1.5.6</version>
<classifier>linux-x86_64</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>javacpp</artifactId>
<version>1.5.6</version>
<classifier>linux-x86</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>javacpp</artifactId>
<version>1.5.6</version>
<classifier>linux-ppc64le</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>javacpp</artifactId>
<version>1.5.6</version>
<classifier>linux-armhf</classifier>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>javacpp</artifactId>
<version>1.5.6</version>
<classifier>linux-arm64</classifier>
</dependency>
下面开始技术总结:
1.通过javaCV使用ffmpeg方法简单,只需要构造grabber和recorder,并迭代取得Frame后写入recorder即可。
2.需要注意构造recorder时,传参需要准确。
3.构造空白帧可以参考文中的代码
4.需要的依赖可以通过<classfier>标签来单独引入
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。