Android audio and video capture those things

Audio and video capture

In the entire audio and video processing process, the audio and video collection at the sending end is undoubtedly the beginning of the entire audio and video link. There are related hardware devices on Android or IOS-Camera and microphone as input sources. In this chapter, we will analyze how to collect data through Camera and recording devices on Android. This chapter can be combined with the previously published article Android Audio and Video-MediaCodec Codec Audio and Video make a complete Demo.

Camera

The picture/video capture device on Android is undoubtedly the Camera. The version before Android SDK API21 can only use Camera1. After API 21, Camera1 has been marked as Deprecated. Google recommends using Camera2. Let's take a look at them separately.

Camera1

Let's take a look at some of the class diagrams of the Camera1 system.

The Camera class is the core class of the Camera1 system, and there are many internal classes in this class, as shown in the figure above:

The Camera.CameraInfo class expresses Camera-related information such as the facing and orientation of the Camera.
The Camera.Parameters class is the camera-related parameter settings, such as setting the preview size and setting the rotation angle.

The Camera class has APIs such as opening the Camera, setting parameters, setting previews, etc. Let's look at the process of using the Camera API to open the system camera.

1. Release the Camera before starting the Camera. The purpose of this step is to reset the state of the Camera and reset the previewCallback of the Camera to null.

Call Camera's release to release
Set the Camera object to null

/**
*释放Camera
*/
  private fun releaseCamera() {
        //重置previewCallback为空
      cameraInstance!!.setPreviewCallback(null)
      cameraInstance!!.release()
      cameraInstance = null
  }

2. Get the Id of the Camera

/**
*获取Camera Id
*/
 private fun getCurrentCameraId(): Int {
        val cameraInfo = Camera.CameraInfo()
        //遍历所有的Camera id,比较CameraInfo facing 
        for (id in 0 until Camera.getNumberOfCameras()) {
            Camera.getCameraInfo(id, cameraInfo)
            if (cameraInfo.facing == cameraFacing) {
                return id
            }
        }
        return 0
    }

3. Open Camera to get Camera object

/**
*获取Camera 实例
*/
 private fun getCameraInstance(id: Int): Camera {
      return try {
        //调用Camera的open函数获取Camera的实例
          Camera.open(id)
      } catch (e: Exception) {
          throw IllegalAccessError("Camera not found")
      }
  }

4. Set the relevant parameters of Camera

//[3]设置参数
val parameters = cameraInstance!!.parameters

        if (parameters.supportedFocusModes.contains(Camera.Parameters.FOCUS_MODE_CONTINUOUS_PICTURE)) {
            parameters.focusMode = Camera.Parameters.FOCUS_MODE_CONTINUOUS_PICTURE
        }
        cameraInstance!!.parameters = parameters

5. Set previewDisplay

//【4】  调用Camera API 设置预览Surface
        surfaceHolder?.let { cameraInstance!!.setPreviewDisplay(it) }

6. Set preview callback

//【5】 调用Camera API设置预览回调
        cameraInstance!!.setPreviewCallback { data, camera ->
            if (data == null || camera == null) {
                return@setPreviewCallback
            }
            val size = camera.parameters.previewSize
            onPreviewFrame?.invoke(data, size.width, size.height)
        }

7. Open preview

//【6】 调用Camera API开启预览
        cameraInstance!!.startPreview()

[3] [4] [5] [6] in the above code are all completed by calling the Camera API,

After the above process, the preview of the Camera will be displayed on the incoming Surface, and the function onPreviewFrame(byte[] data,Camera camera) will be called back until the Camera stops, where the real-time YUV image data is stored in byte[] data. byte[] data format is NV21 in YUV format

YUV image format

Color space

Here we only talk about two commonly used color spaces.

RGB RGB should be the one we are most familiar with, and it is widely used in current electronic devices. Through the three basic colors of RGB, all colors can be mixed.

YUV focuses on YUV here. This color space is not familiar to us. This is a color format in which brightness and chroma are separated.

Early TVs were all black and white, that is, only the brightness value, that is, Y. With the color TV, two chromaticities of UV were added to form the current YUV, also called YCbCr.

Y: Brightness, which is the gray value. In addition to expressing the luminance signal, it also contains a large amount of green channel.

U: The difference between the blue channel and the brightness.

V: The difference between the red channel and the brightness.

What are the advantages of using YUV?

The human eye is sensitive to brightness and insensitive to chromaticity. Therefore, the amount of UV data is reduced, but the human eye cannot perceive it. This way, by compressing the resolution of UV, the volume of the video can be reduced without affecting the look and feel.

Conversion of RGB and YUV
Y = 0.299R ＋ 0.587G ＋ 0.114B
U = －0.147R － 0.289G ＋ 0.436B
V = 0.615R － 0.515G － 0.100B
——————————————————
R = Y ＋ 1.14V
G = Y － 0.39U － 0.58V
B = Y ＋ 2.03U

YUV format

YUV storage methods are divided into two categories: planar and packed.

Planar: Store all Y first, then store all U, and finally V;

packed: The Y, U, and V of each pixel are continuously interleaved and stored.

The pakced storage method has been very seldom used, and most of the videos are stored in the planar storage method.

For the planar storage method, some chromaticity information is omitted, that is, the brightness shares some chromaticity information, thereby saving storage space. Therefore, planar distinguishes the following formats: YUV444, YUV422, YUV420.

YUV 4:4:4 sampling, each Y corresponds to a set of UV components.

YUV 4:2:2 sampling, every two Y shares a set of UV components.

YUV 4:2:0 sampling, every four Y shares a set of UV components.

Among them, the most commonly used is YUV420 .

YUV420 format storage methods are divided into two types

YUV420P: Three-plane storage. The data composition is YYYYYYYYUUVV (such as I420) or YYYYYYYYVVUU (such as YV12).
YUV420SP: Two-plane storage. Divided into two types: YYYYYYYYUVUV (such as NV12) or YYYYYYYYVUVU (such as NV21)

Camera2

After Andorid SDK API 21, Google recommends using the Camera2 system to manage devices. Camera2 is still very different from Camera1. Similarly, let’s take a look at some of the class diagrams of the Camera2 system.

Camera2 is much more complicated than Camera1. CameraManager CameraCaptureSession is the core class of the Camera2 system. CameraManager is used to manage the opening and closing of the camera. Camera2 introduces CameraCaptureSession to manage the shooting session.

Let's look at a more detailed flow chart below.

1. Release the Camera before starting the Camera. The purpose of this step is to reset the state of the Camera.

private fun releaseCamera() {
        imageReader?.close()
        cameraInstance?.close()
        captureSession?.close()
        imageReader = null
        cameraInstance = null
        captureSession = null
    }

2. Get the Id of the Camera

/**
  *【1】 获取Camera Id
  */
    private fun getCameraId(facing: Int): String? {
        return cameraManager.cameraIdList.find { id ->
            cameraManager.getCameraCharacteristics(id).get(CameraCharacteristics.LENS_FACING) == facing
        }
    }

3. Open Camera

try {
          //【2】打开Camera，传入的 CameraDeviceCallback()是摄像机设备状态回调
            cameraManager.openCamera(cameraId, CameraDeviceCallback(), null)
        } catch (e: CameraAccessException) {
            Log.e(TAG, "Opening camera (ID: $cameraId) failed.")
        }

//设备状态回调
    private inner class CameraDeviceCallback : CameraDevice.StateCallback() {
        override fun onOpened(camera: CameraDevice) {
            cameraInstance = camera
            //【3】开启拍摄会话
            startCaptureSession()
        }

        override fun onDisconnected(camera: CameraDevice) {
            camera.close()
            cameraInstance = null
        }

        override fun onError(camera: CameraDevice, error: Int) {
            camera.close()
            cameraInstance = null
        }
    }

4. Start the shooting session

 //【3】开启拍摄会话
  private fun startCaptureSession() {
        val size = chooseOptimalSize()
        //创建ImageRender并设置回调
        imageReader =
                ImageReader.newInstance(size.width, size.height, ImageFormat.YUV_420_888, 2).apply {
                    setOnImageAvailableListener({ reader ->
                        val image = reader?.acquireNextImage() ?: return@setOnImageAvailableListener
                        onPreviewFrame?.invoke(image.generateNV21Data(), image.width, image.height)
                        image.close()
                    }, null)
                }

        try {
            if (surfaceHolder == null) {
              //设置ImageRender的surface给cameraInstance，以便后面预览的时候数据呈现到ImageRender的surface，从而触发ImageRender的回调
                cameraInstance?.createCaptureSession(
                        listOf(imageReader!!.surface),
                        //【4】CaptureStateCallback是CameraCaptureSession的内部类，是摄像机会话状态的回调
                        CaptureStateCallback(),
                        null
                )
            } else {
                cameraInstance?.createCaptureSession(
                        listOf(imageReader!!.surface,
                                surfaceHolder!!.surface),
                        CaptureStateCallback(),
                        null
                )
            }

        } catch (e: CameraAccessException) {
            Log.e(TAG, "Failed to start camera session")
        }
    }

  //摄像机会话状态的回调
    private inner class CaptureStateCallback : CameraCaptureSession.StateCallback() {
        override fun onConfigureFailed(session: CameraCaptureSession) {
            Log.e(TAG, "Failed to configure capture session.")
        }
    //摄像机配置完成
        override fun onConfigured(session: CameraCaptureSession) {
            cameraInstance ?: return
            captureSession = session
            //设置预览CaptureRequest.Builder
            val builder = cameraInstance!!.createCaptureRequest(CameraDevice.TEMPLATE_PREVIEW)
            builder.addTarget(imageReader!!.surface)
            surfaceHolder?.let {
                builder.addTarget(it.surface)
            }

            try {
              //开启会话
                session.setRepeatingRequest(builder.build(), null, null)
            } catch (e: CameraAccessException) {
                Log.e(TAG, "Failed to start camera preview because it couldn't access camera", e)
            } catch (e: IllegalStateException) {
                Log.e(TAG, "Failed to start camera preview.", e)
            }
        }
    }

PS
ImageRender can directly access the image data presented on the Surface. The working principle of ImageRender is to create an instance and set a callback. This callback will be called when the image on the Surface associated with ImageRender is available.

We analyzed the data collected by the Camera above. For the complete code, please see the Github address at the end of the article.

AudioRecord

After analyzing the video above, let’s look at the audio. For the recording API, we use AudioRecord. The recording process is much simpler than that of the video. For the same reason, let’s take a look at the simple class diagram first.

Just one class, the API is simple and clear, let's take a look at the process.

The code below

   public void startRecord() {
   //开启录音
        mAudioRecord.startRecording();
        mIsRecording = true;
        //开启新线程轮询
        ExecutorService executorService = Executors.newSingleThreadExecutor();
        executorService.execute(new Runnable() {
            @Override
            public void run() {
                byte[] buffer = new byte[DEFAULT_BUFFER_SIZE_IN_BYTES];
                while (mIsRecording) {
                    int len = mAudioRecord.read(buffer, 0, DEFAULT_BUFFER_SIZE_IN_BYTES);
                    if (len > 0) {
                        byte[] data = new byte[len];
                        System.arraycopy(buffer, 0, data, 0, len);
                        //处理data
                    }
                }
            }
        });

    }


    public void stopRecord() {
        mIsRecording = false;
        mAACMediaCodecEncoder.stopEncoder();
        mAudioRecord.stop();
    }

The byte[] data generated by AudioRecord is PCM audio data.

summary

In this chapter, we give a detailed introduction to the native input API of audio and video. This is also the basis of our blog. After we have YUV and PCM data, we can encode. In the next article, we will analyze MediaCodec and use MediaCodec to analyze native audio and video. The data is hard-coded to generate Mp4.

Android audio and video capture those things

Audio and video capture

Camera

Camera1

YUV image format

Color space

YUV format

Camera2

AudioRecord

summary

RTE开发者社区

引用和评论

Rime 最新 TTS 模型 Arcana：能听到呼吸声和轻微口腔音；Bubba AI：专为卡车司机打造的语音交互智能体丨日报

从 DeepSeek 看25年前端的一个小趋势

Open WebUI：开源AI交互平台的全面解析

大模型中的Token究竟是什么？从原理到作用深度解析

一文掌握 MCP 上下文协议：从理论到实践

MySQL × 向量数据库：大模型时代的黄金组合实战指南

Mac 安装 DeepSeek-R1 本地化部署