2
头图

一些背景:

这次的需求仍然是超过了我的知识范围——
需求有两个。
1,将H.264视频流文件进行mp4格式封装。
2,对音频和视频进行合并。

基于我对多媒体处理的简单理解,这两个需求都是可以通过ffmpeg实现的。最朴素的思路是通过shell调用系统中安装的ffmpeg。

Runtime.getRuntime().exec("command line");

但是这样做有两个令我不爽的点:
一是很难确保系统中已经正确安装好了ffmpeg。涉及到的这两个多媒体操作只是系统中的一个小小功能,并且考虑到跨平台也会增加部署的难度。为此需要在整个项目部署过程中大动干戈安装ffmpeg似乎对实施同事不太友好。
二是我不喜欢通过shell去调用其他程序。无它,只是觉得操作3个流(输入 输出 错误)写起来不优雅。如果有封装好的javaAPI就好了。
那么答案显而易见就是使用javaCV了。通过maven管理的依赖可以有效地适配全平台,并且有封装好的API。


下面干货:

简单介绍一下javaCV.JavaCV是一个用于Java编程语言的计算机视觉和多媒体处理库,它为Java提供了对OpenCV(计算机视觉库)和FFmpeg(多媒体框架)的集成和访问能力。通过JavaCV,开发人员可以在Java中轻松使用OpenCV和FFmpeg的功能,从而进行图像处理、视频处理以及多媒体数据的处理。
使用上,先添加依赖:

<dependency>
    <groupId>org.bytedeco</groupId>
    <artifactId>javacv</artifactId>
    <version>1.5.6</version>
</dependency>
<dependency>
    <groupId>org.bytedeco</groupId>
    <artifactId>ffmpeg-platform</artifactId>
    <version>4.4-1.5.6</version>
</dependency>

这里使用1.5.6版本的javaCV库,然后引入ffmpeg的全平台依赖ffmpeg-platform.
在具体使用上,这里提供一个例子

public static VideoCodecResult codecVideoFile(Path videoFilePath) {

        String mainName = FileNameUtil.getPrefix(videoFilePath.toFile());
        Path targetPath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX + TMP_FILE_SUFFIX);
        File file = videoFilePath.toFile();
        try (FFmpegFrameGrabber grabber = new FFmpegFrameGrabber(file);
             FFmpegFrameRecorder recorder = new FFmpegFrameRecorder(targetPath.toFile(), grabber.getImageWidth(), grabber.getImageHeight())) {
            grabber.start();
            recorder.setVideoCodec(avcodec.AV_CODEC_ID_H264);
            recorder.setFormat("mp4");
//            recorder.setFrameRate(24);
//            recorder.setVideoBitrate(1024);
//            recorder.setVideoOption("bEnableFrameSkip", "1");
//            recorder.setVideoQuality(2);
            Frame frame;
            AtomicBoolean recorderInit = new AtomicBoolean(false);
            while ((frame = grabber.grab()) != null) {
                if (!recorderInit.get()) {
                    recorder.setImageWidth(frame.imageWidth);
                    recorder.setImageHeight(frame.imageHeight);
                    recorder.start();
                    recorderInit.set(true);
                }
                recorder.record(frame);
            }
            recorder.stop();
            grabber.stop();
        } catch (Throwable throwable) {
            throw new RuntimeException(throwable);
        }
        Path finalFilePath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX);
        FileUtil.move(targetPath.toFile(), finalFilePath.toFile(), true);
        if (!finalFilePath.toFile().exists() || !finalFilePath.toFile().isFile() || finalFilePath.toFile().length() == 0) {
            throw new RuntimeException("file not found:" + targetPath);
        }
        return new VideoCodecResult(finalFilePath);
    }
@Data
    private static class VideoCodecResult {
        private boolean success;
        private Throwable caught;
        private String message;
        private Path sourceFilePath;
        private Path resultFilePath;

        public VideoCodecResult(Path resultFilePath) {
            this.success = true;
            this.resultFilePath = resultFilePath;
        }

        private VideoCodecResult(Throwable throwable) {
            this.success = false;
            this.caught = throwable;
            this.message = throwable.getMessage();
        }
    }

这是实现从H.264视频流文件封装成mp4格式的方法。核心的逻辑是通过FFmpegFrameGrabber 从视频流文件中读取Frame,然后写入FFmpegFrameRecorder 中。
由于待转换的视频流文件其图像尺寸是未知的,所以通过frame中取得帧后,通过帧的属性获得图像尺寸再初始化recorder。在初始化recorder之前,可以指定特定的帧率、比特率、图像质量等等;这些视频属性有些依赖于从grabber中获取,并且不是都能获取到,因此这里的处理是不修改,使用默认值。
通过上述例程可以看出虽然我没有指定grabber的文件编码方式,grabber仍然能正常从文件中读取Frame,这是因为ffmpeg有识别源文件编码方式的设计。整个转码过程还是比较简单的,只需通过FFmpegFrameGrabber和FFmpegFrameRecorder两个核心类即可完成

接下来是加上音频处理的音视频合并方法:

    private static AVCombineResult combineAVFile(Path videoFilePath, Path audioFilePath) {
        String mainName = FileNameUtil.getPrefix(videoFilePath.toFile());
        Path targetPath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX + TMP_FILE_SUFFIX);
        File videoFile = videoFilePath.toFile();
        File audioFile = audioFilePath.toFile();
        try (FFmpegFrameGrabber videoGrabber = new FFmpegFrameGrabber(videoFile);
             FFmpegFrameGrabber audioGrabber = new FFmpegFrameGrabber(audioFile);
             FFmpegFrameRecorder recorder = new FFmpegFrameRecorder(targetPath.toFile(), videoGrabber.getImageWidth(), videoGrabber.getImageHeight())) {
            videoGrabber.start();
            audioGrabber.start();
            recorder.setVideoCodec(avcodec.AV_CODEC_ID_H264);
            recorder.setAudioCodec(avcodec.AV_CODEC_ID_AAC);
            recorder.setFormat("mp4");
            recorder.setAudioChannels(1);
            double rate = 1.0 * videoGrabber.getLengthInTime() / audioGrabber.getLengthInTime();
            if (videoGrabber.getLengthInTime() < audioGrabber.getLengthInTime()) {
                recorder.setFrameRate(videoGrabber.getFrameRate() * rate);
            }
//            recorder.setVideoBitrate(1024);
//            recorder.setVideoOption("bEnableFrameSkip", "1");
//            recorder.setVideoQuality(2);
//            recorder.setAudioBitrate(1024);
//            recorder.setAudioQuality(2);
            Frame videoFrame = null;
            Frame audioFrame = null;
            AtomicBoolean recorderInit = new AtomicBoolean(false);
            while ((videoFrame = videoGrabber.grab()) != null | (audioFrame = audioGrabber.grab()) != null) {
                if (!recorderInit.get()) {
                    if (videoFrame != null) {
                        recorder.setImageWidth(videoFrame.imageWidth);
                        recorder.setImageHeight(videoFrame.imageHeight);
                    }
                    recorder.start();
                    recorderInit.set(true);
                }
                if (videoFrame != null) {
                    videoFrame.timestamp /= rate;
                    recorder.record(videoFrame);
                }
                if (audioFrame != null) {
                    recorder.record(audioFrame);
                }
            }
            recorder.stop();
            videoGrabber.stop();
            audioGrabber.stop();
        } catch (Throwable throwable) {
            throw new RuntimeException(throwable);
        }
        Path finalFilePath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX);
        FileUtil.move(targetPath.toFile(), finalFilePath.toFile(), true);
        if (!finalFilePath.toFile().exists() || !finalFilePath.toFile().isFile() || finalFilePath.toFile().length() == 0) {
            throw new RuntimeException("file not found:" + targetPath);
        }
        return new AVCombineResult(finalFilePath);
    }

核心的逻辑仍然是通过FFmpegFrameGrabber 从视频文件和音频文件中读取Frame,然后写入FFmpegFrameRecorder 中。构造FFmpegFrameGrabber时只需要通过视频和音频文件作为入参,从Grabber中提取Frame时视频和音频没什么区别。

但是,合并音视频存在一些进阶版问题需要考虑:音频和视频长度一致吗?音频和视频都是连续的吗?音频和视频开始时间相同吗?这些问题都会影响音视频的时间差,最后合并文件的观感。
从上面代码可以看出,方法中只是将音视频的开头对齐并将视频伸缩到合适的长度,这个方式显然无法保持音视频精准的同步。在业务场景中,除了直播一类的音视频同步场景,也无法保证提供的音频和视频完全是对齐的,需要根据业务场景来判断。
image.png
当音频和视频在绝对时间上错位时(比如音频先录制视频后录制,或者反过来),我们希望使用空白图像或音频来填补空缺。
为了让合并文件有良好的观感,实现音视频同步,至少需要知道音视频开始的时间间隔+音/视频中间是否有中断+中断的时间+音/视频分别的缩放比例(时间上伸缩的比例)。具有上述条件后,便可以计算出音视频的真实长度,从而计算出真实的帧率。有了真实的帧率,便可以在初始化recorder时指定帧率,也可以计算得出空白帧的数量。接下来,在合适的时间插入合适数量的空白帧,便可以将缺少的视频或音频填充起来,使它们在时间上对齐,从而实现音视频同步。

下面是修改后的代码

private static AVCombineResult combineAVFileAccurate(Path videoFilePath, Path audioFilePath, List<long[]> videoSegTime, List<long[]> audioSegTime) {
        long videoStartTime = videoSegTime.get(0)[0];
        long audioStartTime = audioSegTime.get(0)[0];
        long videoExtraDelay = 0;
        long audioExtraDelay = 0;
        if (videoStartTime > audioStartTime) {
            videoExtraDelay = videoStartTime - audioStartTime;
        } else {
            audioExtraDelay = audioStartTime - videoStartTime;
        }
        long expectVideoLengthInTime = videoSegTime.stream().mapToLong(x -> x[1] - x[0]).sum();
        if (expectVideoLengthInTime > VIDEO_TIME_LENGTH_MAGIC_NUM) {
            expectVideoLengthInTime -= VIDEO_TIME_LENGTH_MAGIC_NUM;
        }
        //转换微秒
        expectVideoLengthInTime /= 1000L;
        long videoTotalLength = 0;
        LinkedList<TimestampDelay> videoDelays = new LinkedList<>();
        if (videoExtraDelay > 0) {
            videoDelays.add(new TimestampDelay(0, videoExtraDelay));
        }
        for (int i = 0; i < videoSegTime.size(); i++) {
            long[] startStopTime = videoSegTime.get(i);
            if (i > 0) {
                long[] lastStartStopTime = videoSegTime.get(i - 1);
                long timeDelay = startStopTime[0] - lastStartStopTime[1];
                if (timeDelay > 0) {
                    videoDelays.add(new TimestampDelay(videoTotalLength, timeDelay));
                }
            }
            videoTotalLength += startStopTime[1] - startStopTime[0];
        }
        long audioTotalLength = 0;
        LinkedList<TimestampDelay> audioDelays = new LinkedList<>();
        if (audioExtraDelay > 0) {
            audioDelays.add(new TimestampDelay(0, audioExtraDelay));
        }
        for (int i = 0; i < audioSegTime.size(); i++) {
            long[] startStopTime = audioSegTime.get(i);
            if (i > 0) {
                long[] lastStartStopTime = audioSegTime.get(i - 1);
                long timeDelay = startStopTime[0] - lastStartStopTime[1];
                if (timeDelay > 0) {
                    audioDelays.add(new TimestampDelay(audioTotalLength, timeDelay));
                }
            }
            audioTotalLength += startStopTime[1] - startStopTime[0];
        }
        String mainName = FileNameUtil.getPrefix(videoFilePath.toFile());
        Path targetPath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX + TMP_FILE_SUFFIX);
        log.info("do av combine, target:{}, source video:{}, source audio: {}", targetPath, videoFilePath, audioFilePath);
        log.info("video segment num: {}, time delay:{}", videoSegTime.size(), videoDelays);
        log.info("audio segment num: {}, time delay:{}", audioSegTime.size(), audioDelays);
        File videoFile = videoFilePath.toFile();
        File audioFile = audioFilePath.toFile();
        long startTime = System.currentTimeMillis();
        try (FFmpegFrameGrabber videoGrabber = new FFmpegFrameGrabber(videoFile);
             FFmpegFrameGrabber audioGrabber = new FFmpegFrameGrabber(audioFile);
             FFmpegFrameRecorder recorder = new FFmpegFrameRecorder(targetPath.toFile(), videoGrabber.getImageWidth(), videoGrabber.getImageHeight())) {
            videoGrabber.start();
            audioGrabber.start();
            recorder.setVideoCodec(avcodec.AV_CODEC_ID_H264);
            recorder.setAudioCodec(avcodec.AV_CODEC_ID_AAC);
            recorder.setFormat("mp4");
            recorder.setAudioChannels(1);
            double rate = 1.0 * videoGrabber.getLengthInTime() / expectVideoLengthInTime;
            double videoFrameRate = videoGrabber.getFrameRate() * rate;
            recorder.setFrameRate(videoFrameRate);
//            recorder.setVideoBitrate(1024);
//            recorder.setVideoOption("bEnableFrameSkip", "1");
//            recorder.setVideoQuality(2);
//            recorder.setAudioBitrate(1024);
//            recorder.setAudioQuality(2);
            Frame videoFrame = null;
            Frame audioFrame = null;
            int imageWidth = 0;
            int imageHeight = 0;
            int imageDepth = 0;
            int sampleRate = 0;
            AtomicBoolean recorderInit = new AtomicBoolean(false);
            Frame emptyVideoFrame = null;
            Frame emptyAudioFrame = null;
            while ((videoFrame = videoGrabber.grab()) != null | (audioFrame = audioGrabber.grab()) != null) {
                if (!recorderInit.get()) {
                    if (videoFrame != null) {
                        imageWidth = videoFrame.imageWidth;
                        imageHeight = videoFrame.imageHeight;
                        imageDepth = videoFrame.imageDepth;
                        recorder.setImageWidth(imageWidth);
                        recorder.setImageHeight(imageHeight);
                    }
                    if (audioFrame != null) {
                        sampleRate = audioFrame.sampleRate;
                        recorder.setSampleRate(sampleRate);
                    }
                    recorder.start();
                    recorderInit.set(true);
                }
                if (videoFrame != null) {
                    if (videoDelays.size() > 0 && videoDelays.peekFirst().isBefore(recorder.getTimestamp())) {
                        TimestampDelay timestampDelay = videoDelays.pollFirst();
                        double emptyFrameNum = timestampDelay.getDelay() * videoFrameRate / 1000000000L;
                        int i = 0;
                        while (i++ < emptyFrameNum) {
                            if (emptyVideoFrame == null) {
                                emptyVideoFrame = createEmptyVideoFrame(imageWidth, imageHeight, imageDepth, 1);
                            }
                            recorder.record(emptyVideoFrame);
                        }
                    }
                    recorder.record(videoFrame);
                }
                if (audioFrame != null) {
                    if (audioDelays.size() > 0 && audioDelays.peekFirst().isBefore(audioFrame.timestamp)) {
                        TimestampDelay timestampDelay = audioDelays.pollFirst();
                        int totalSamples = (int) (sampleRate * timestampDelay.getDelay() / 1000000000L);
                        int frameNum = totalSamples / AUDIO_SAMPLE_UNIT;
                        if (emptyAudioFrame == null) {
                            emptyAudioFrame = createEmptyAudioFrame(sampleRate, 1, AUDIO_SAMPLE_UNIT);
                        }
                        for (int i = 0; i < frameNum; i++) {
                            recorder.record(emptyAudioFrame);

                        }
                    }
                    recorder.record(audioFrame);
                }
            }
            recorder.stop();
            videoGrabber.stop();
            audioGrabber.stop();
        } catch (Throwable throwable) {
            throw new RuntimeException(throwable);
        }
        Path finalFilePath = videoFilePath.getParent().resolve(mainName + MP4_FILE_SUFFIX);
        FileUtil.move(targetPath.toFile(), finalFilePath.toFile(), true);
        long endTime = System.currentTimeMillis();
        FILE_NUM_COUNTER.getAndIncrement();
        TIME_COST_COUNTER.addAndGet(endTime - startTime);
        if (!finalFilePath.toFile().exists() || !finalFilePath.toFile().isFile() || finalFilePath.toFile().length() == 0) {
            throw new RuntimeException("file not found:" + targetPath);
        }
        return new AVCombineResult(finalFilePath);
    }

    @Data
    @AllArgsConstructor
    private static class TimestampDelay {
        private long startTimeNsec;
        private long delay;

        public boolean isBefore(long timeUsec) {
            return startTimeNsec <= timeUsec * 1000L;
        }
    }

涉及到两个构造空白帧的方法是



    private static Frame createEmptyVideoFrame(int width, int height, int depth, int channel) {
        Frame frame = new Frame(width, height, depth, channel);
        for (int i = 0; i < frame.image.length; i++) {
            frame.image[i].clear();
        }
        return frame;
    }

    private static Frame createEmptyAudioFrame(int sampleRate, int channels, int unitSample) {
        Frame frame = new Frame();
        frame.audioChannels = channels;
        frame.sampleRate = sampleRate;
        Buffer[] samples = new Buffer[1];
        samples[0] = ShortBuffer.wrap(new short[unitSample]);
        frame.samples = samples;
        return frame;
    }

这里我们通过List<long[]> videoSegTime, List<long[]> audioSegTime分别传入视频的分段情况和音频的分段情况。long[0]表示视频分段的开始时间戳,long[1]表示视频分段的结束时间戳,音频同理。
通过整理这两组数据,可以分别得出音视频的缺失时间长度以及对应的时间戳,进而由帧率计算出缺失时间对应的帧数。
合并的核心逻辑仍然和之前相同,从grabber中提取帧并写入recorder,但是在写入前通过Frame.timestamp获取当前帧时间,并以此计算当前帧之前是否有空缺,若有则通过构造合适数量的空白帧填充。

if (videoFrame != null) {
                    if (videoDelays.size() > 0 && videoDelays.peekFirst().isBefore(recorder.getTimestamp())) {
                        TimestampDelay timestampDelay = videoDelays.pollFirst();
                        double emptyFrameNum = timestampDelay.getDelay() * videoFrameRate / 1000000000L;
                        int i = 0;
                        while (i++ < emptyFrameNum) {
                            if (emptyVideoFrame == null) {
                                emptyVideoFrame = createEmptyVideoFrame(imageWidth, imageHeight, imageDepth, 1);
                            }
                            recorder.record(emptyVideoFrame);
                        }
                    }
                    recorder.record(videoFrame);
                }
                if (audioFrame != null) {
                    if (audioDelays.size() > 0 && audioDelays.peekFirst().isBefore(audioFrame.timestamp)) {
                        TimestampDelay timestampDelay = audioDelays.pollFirst();
                        int totalSamples = (int) (sampleRate * timestampDelay.getDelay() / 1000000000L);
                        int frameNum = totalSamples / AUDIO_SAMPLE_UNIT;
                        if (emptyAudioFrame == null) {
                            emptyAudioFrame = createEmptyAudioFrame(sampleRate, 1, AUDIO_SAMPLE_UNIT);
                        }
                        for (int i = 0; i < frameNum; i++) {
                            recorder.record(emptyAudioFrame);

                        }
                    }
                    recorder.record(audioFrame);
                }
            }

这样就能完美地将音视频合并在一起了,当然前提是业务上下文中缺失能提供音视频的时间戳。

功能开发完后,还有额外小插曲:实施同事反映包体太大了。确实,我们通过引入ffmpeg-platform导致引入了全平台依赖,数个平台的ffmpeg依赖累积起来有上百M。这里我的解决办法如下:
去掉ffmpeg-platform的依赖,然后根据实际可能用到的平台引入ffmpeg的依赖。实际的平台通过<classifier>标签进行区分。

<dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>ffmpeg</artifactId>
            <version>4.4-1.5.6</version>
            <classifier>windows-x86_64</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>ffmpeg</artifactId>
            <version>4.4-1.5.6</version>
            <classifier>windows-x86</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>ffmpeg</artifactId>
            <version>4.4-1.5.6</version>
            <classifier>linux-x86_64</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>ffmpeg</artifactId>
            <version>4.4-1.5.6</version>
            <classifier>linux-x86</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>ffmpeg</artifactId>
            <version>4.4-1.5.6</version>
            <classifier>linux-ppc64le</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>ffmpeg</artifactId>
            <version>4.4-1.5.6</version>
            <classifier>linux-armhf</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>ffmpeg</artifactId>
            <version>4.4-1.5.6</version>
            <classifier>linux-arm64</classifier>
        </dependency>

        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>javacpp</artifactId>
            <version>1.5.6</version>
            <classifier>windows-x86_64</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>javacpp</artifactId>
            <version>1.5.6</version>
            <classifier>windows-x86</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>javacpp</artifactId>
            <version>1.5.6</version>
            <classifier>linux-x86_64</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>javacpp</artifactId>
            <version>1.5.6</version>
            <classifier>linux-x86</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>javacpp</artifactId>
            <version>1.5.6</version>
            <classifier>linux-ppc64le</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>javacpp</artifactId>
            <version>1.5.6</version>
            <classifier>linux-armhf</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>javacpp</artifactId>
            <version>1.5.6</version>
            <classifier>linux-arm64</classifier>
        </dependency>

下面开始技术总结:

1.通过javaCV使用ffmpeg方法简单,只需要构造grabber和recorder,并迭代取得Frame后写入recorder即可。
2.需要注意构造recorder时,传参需要准确。
3.构造空白帧可以参考文中的代码
4.需要的依赖可以通过<classfier>标签来单独引入


风觅椒塘考曲棋
210 声望41 粉丝

爱学习的小白一枚呀