java - Technical dry goods | WebRTC technical analysis of Android VDM - 网易云信技术小站

Introduction: The Android VDM (Video Device Manager) technical module in WebRTC refers to the management of video data collection, encoding, decoding and rendering based on the Android system of WebRTC. When you get an Android phone and use NetEase Yunxin SDK for RTC communication, are you curious, how is the Android system VDM implemented? How does WebRTC use Android VDM? This article decomposes and sorts out the implementation of Android VDM in WebRTC.

Text｜Iven
Senior Audio and Video Client Development Engineer, NetEase Yunxin

01 Introduction to Android Graphics System

A video is a sequence of images. In the Android system, the carrier of the image is Surface. Surface can be understood as a drawing buffer in the Android system memory. No matter what rendering API the developer uses, everything will be rendered on the Surface. Android's acquisition, encoding, decoding, and rendering are all based on the processing of Surface. Surface represents the producer in the buffer queue, and the buffer queue is usually consumed by SurfaceFlinger or display OpenGL ES. Every window created on the Android platform is supported by Surface. All the rendered visible Surfaces are synthesized to the screen by SurfaceFlinger, and finally displayed on the phone screen.

Producer and consumer models in the Android graphics system

As mentioned earlier, Surface represents the producer in the buffer queue, which corresponds to the Producer in the figure below. BufferQueues is the glue between Android graphics components. It is a queue that connects a component (producer) that can generate a graphics data buffer to a component (consumer) that receives data for display or further processing. Once the producer transfers its buffer zone, the consumer convenience will be responsible for processing the produced content.

The image stream producer can be any content that generates a graphics buffer for consumption. For example, Canvas 2D and mediaserver video decoders. The most common consumer of image streams is SurfaceFlinger. This system service consumes the currently visible Surface and synthesizes them to the screen using the information provided in the window manager.

The data flow pipeline of the entire Android image is very long, so the roles of the producer and the consumer are actually relative, and the same module may correspond to both the producer and the consumer. OpenGL ES can be used as a producer to provide a camera to capture an image stream, or as a consumer to consume an image stream decoded by a video decoder.

Android graphics display pipeline

SurfaceFlinger is the core of the entire Android screen display. The SurfaceFlinger process is created by the init process, which merges the final drawing results of all applications in the system into one, and then displays it on the physical screen.

Before an application is sent to SurfaceFlinger, multiple producers are working at the same time, that is, multiple Surfaces send data to SurfaceFlinger through BufferQueue. As shown in the figure below, Status Bar, System Bar, Icons/Widgets, etc.; however, no matter how many producers there are, they will eventually be merged into a Surface by the consumer of SurfaceFlinger.

It is worth mentioning that SurfaceFlinger's hardware acceleration function, if all the synthesis is handed over to SurfaceFlinger for processing, the burden on the GPU will increase, so most mobile phones now support hardware acceleration for synthesis, which is HWComposer. HWComposer reduces the burden on the GPU through dedicated hardware acceleration, helping Surfaceflinger to efficiently and quickly synthesize Surface.

As shown in the figure below, taking the system Camera as an example, you can see that the screenshot shown in the figure below actually corresponds to 6 Surfaces. 4 Surfaces are marked in the figure, and the other 2 are not visible in the figure, so they are not marked. The display of Camera data as one of the Surface (SurfaceView Layer) occupies most of the SurfaceFlinger synthesis calculations. Because the content of the Camera data will change continuously, the GPU needs to be redrawn.

So which are the other two invisible Surfaces? In fact, it is the NavigationBar corresponding to the picture below. Because it is hidden, it is not visible in the picture.

The other is Dim Layer, because the "USB for" window is placed on top, which creates a darkened transparent effect for the window behind it. This is also a separate Surface.

02 Android VDM in WebRTC

After talking about the VDM module of the Android system, how does WebRTC manage and use the four modules of Capturer, Encoder, Decoder, and Render? According to the producer-consumer model, we are divided into:

Producer (green): Capturer, Decoder;
After the data is collected by the Capturer and decoded by the Decoder, the data is continuously sent to the Surface of SurfaceTexture. SurfaceTexture notifies consumers of processing through OnFrameAvailableListener. In WebRTC, Capturer is implemented using Camera1/Camera2, and Decoder is implemented using MediaCodec.
Consumer (blue): Render, Encoder;
The respective Surfaces of Render and Encoder are associated with EGLSurface through eglCreateWindowSurface(), and EGLSurface and SurfaceTexture share the same EGLContext. In this way, EGLSurface opens up the data channel between SurfaceTexture and render/Encoder. EGLSurface reads the Surface data of SurfaceTexture to draw graphics in shader language. Finally, the current frame is submitted through eglSwapBuffers(), and the data is finally drawn on the Surface of the Render and Encoder.

Render uses SurfaceView in WebRTC, but TextureView is not implemented in the open source WebRTC code. Interested friends can implement it by themselves. Encoder is implemented using MediaCodec.

collection

The collection of Android system mainly uses Camera collection and screen collection in WebRTC. There are also external collections in WebRTC, such as inputting texture data and buffer data from the outside, but external collection does not rely on Android native system functions, so it is beyond the scope of this article.

Camera capture

It has gone through multiple iterations of the system camera architecture. Currently, there are three ways to use Camera1/Camera2/CameraX.

Camera1 is used on the system before 5.0, and the method of use is relatively simple. The parameters that developers can set are limited. The texture data can be obtained through SurfaceTexure. If you need to obtain buffer data for video pre-processing or software encoding, you can set the camera to collect the video stream format and Nv21 data.

When Camera2 is 5.0, Google launched a new set of APIs for cameras. The development and use are more complicated than Camera1, and there are more Camera control parameters, and texture data can be obtained through SurfaceTexure. If you need to get buffer data for video pre-processing or software encoding, you can set up monitoring through ImageReader to get i420/rgba and other data.

CameraX is a method provided by a support library of jetpack. The usage method is simpler than Camera2. From the source code, it mainly encapsulates the implementation of Camera1/Camera2, so that users don't have to think about when to use Camera1 and when to use Camera2. CameraX allows users to pay more attention to the collected data itself, rather than complicated calling methods and headaches of compatibility/stability issues. CameraX is not implemented in the WebRTC source code, and interested students can study by themselves.

Screen capture

After 5.0, Google has opened the screen sharing API: MediaProjection, but the screen recording permission application box will pop up, and the screen recording can be started after the user agrees. When targetSdkVersion is greater than or equal to 29, the system strengthens the restriction on screen capture. The corresponding foreground service must be started before the getMediaProjection method can be called normally. For data collection, similar to the data collection method of Camera2, texture data is also obtained through SurfaceTexure, or i420/rgba data is obtained through ImageReader. The author tried to obtain i420 during screen sharing, but was unsuccessful. It seems that most mobile phones do not support outputting i420 data during screen sharing. The acquisition frame rate of screen sharing cannot be controlled. The main rule is that the acquisition frame rate decreases when the screen is stationary. If the picture is moving, the frame rate of acquisition can reach the highest 60fps. If the aspect ratio of the Surface is inconsistent with the screen ratio, there may be black border issues on some phones.

Codec

Speaking of Android MediaCodec, this picture will definitely be mentioned repeatedly. The role of MediaCodec is to process input data to generate output data. First generate an input data buffer, fill the data into the buffer and provide it to the Codec. The Codec will process the input data in an asynchronous manner, and then provide the filled output buffer to the consumer. After the consumer consumes the buffer, the buffer is provided Return to Codec.

When encoding, if the input data is texture, you need to obtain a Surface from MediaCodec, and draw the texture data onto this Surface through EGLSurface. This method is based on the Android system Surface drawing pipeline throughout and is considered to be the most efficient.

When decoding, if you want to output to texture, you need to set the Surface of SurfaceTexture to MediaCodec, and MediaCodec as the producer continuously transmits the decoded data to SurfaceTexture. This method is based on the Android system Surface drawing pipeline throughout and is considered to be the most efficient.

In addition to efficient texture-based operations, MediaCodec can decode compressed and encoded video data to obtain NV12 data, and also supports encoding of i420/NV12 data.

In addition to the hardware codec based on MediaCodec, the WebRTC source code also implements software codec. Through the switching strategy of software and hardware, the balance of performance and stability is well considered.

In fact, MediaCodec also has the realization of software codec. The underlying implementation of MediaCodec is based on the open source OpenMax framework and integrates multiple hardware and software codecs. However, generally in actual use, the software codec that comes with the Android system is not used, and we mostly use hardware codec.

When using MediaCodec hardware codec, you can obtain Codec related information. Take the "OMX.MTK.VIDEO.ENCODER.AVC" encoder as an example as follows. MediaCodecInfo can provide the encoder name, supported color formats, encoding profile/level, and the maximum number of instances that can be created.

Rendering

SurfaceView: Available since Android 1.0 (API level 1). Different from ordinary View, SurfaceView has its own Surface, which is managed by SurfaceHolder. Video content can be rendered on a separate thread on this Surface alone, and will not affect the main thread's response to events, but cannot be moved, rotated, zoomed, or animated.

TextureView: Introduced in Android 4.0, it can move, rotate, zoom, and animate like normal View. The TextureView must be in a hardware-accelerated window. When there are other Views on top of the TextureView, updating the content of the TextureView will trigger the top View to redraw, which will undoubtedly increase performance consumption. In the RTC scene, most of the time, some control buttons will be added to the video playback window. At this time, using SurfaceView will undoubtedly have more advantages in performance.

03 Cross-platform engineering realization of VDM

Speaking of WebRTC, I have to say its cross-platform features. So how does Android VDM work through the cross-platform framework?

According to the author's understanding, the implementation of Android VDM in WebRTC is divided into 4 layers. From top to bottom, it is divided into: Android Java Application, Java API, C++ Wrapper, All In One API.

All In One API :

Students who know WebRTC know that cross-platform code is implemented in C/C++, because C/C++ language has good versatility on various platforms. WebRTC forms the All In One API by abstracting various platforms, including Android/IOS/Windows/MAC, Encoder/Decoder/Capturer/Render modules. Based on these APIs, each platform implements corresponding functions based on different operating systems. I have to admire the power of C++'s polymorphism. Through the All In One API, WebRTC's functions such as media data transmission after PeerConnection establishment, codec policy control, large and small streams, and main and auxiliary stream switching can be successfully built. All In One API is the basis for the establishment of the entire audio and video communication. .

C++ Wrapper:

This layer is the encapsulation of Android's corresponding Java module in the native layer, and the corresponding module inherited from the All In One API layer, implemented by C++. Through Wrapper, the C++ layer can access Android Java objects unconsciously. Technically, it is realized through Android's JNI. Take VideoEncoderWrapper as an example. VideoEncoderWrapper encapsulates Java's VideoEncoder object, and VideoEncoderWrapper inherits from VideoEncoder of All In One API. In this way, by calling the VideoEncoder of the All In One API, it is actually implemented to the specific implementation of Android Java. In addition to the Android platform, other platforms can also be encapsulated in this layer by the same method. This layer can be said to be a melting pot, and the platform attributes of Android/IOS/Windows/MAC can be encapsulated. The C++ Wrapper layer is really a perfect bridge between the cross-platform layer of WebRTC and the concrete realization of each platform.

Java API

This layer provides API interfaces implemented by WebRTC in Java. By inheriting these APIs, the implementation of Android SDK Application has better scalability. For example, CameraVideoCapturer and ScreenCapturerAndroid realize the collection of Camera and Screen by inheriting VideoCapturer. When subsequent development maintainers want to add other video capture methods, good scalability can be achieved by inheriting VideoCapturer. For another example, the SurfaceViewRender in the figure inherits from VideoSink. If developers want to implement TextureView-based Render, they can also quickly implement it by inheriting VideoSink.

Android SDK Application

This layer is where the real Android VDM is implemented, and it is the specific implementation of Encode/Decode/Capture/Render functions based on the Android SDK API. This is the layer closest to the Android system. In the implementation of this layer, it is worth noting that: Capturer/Encode/Decode triggers the creation and destruction of objects by the cross-platform layer, while Render creates objects from Java and then actively passes them to the cross-platform layer. So for the creation/destroy of Render, you need to pay special attention to prevent the appearance of wild pointers.

04 VDM parameter adaptation and optimization in RTC scenarios

The last chapter mentioned that the Android Java Application layer is where specific functions are implemented, and for the All In One API layer, it is an abstraction for all platforms. So when calling, don't care about some compatibility issues related to the platform. For the Android system, compatibility issues cannot be avoided. Therefore, if you want the Android VDM function to maintain stable operation under the complex calls based on the All In One API layer, the solution of compatibility adaptation problems cannot be ignored, and a relatively complete compatible adaptation framework is required, which is distributed online , Local configuration reading, logic processing at the code level, etc., are fully adapted and optimized for different device models, different CPU models, different Android system versions, and different business scenarios. The following figure is a framework diagram of the distribution configuration method for compatibility issues. By maintaining a compatible configuration parameter, set to each VDM module through Compat to solve the compatibility problem.

05 Summary

This article introduces the Android display system, and then leads to the implementation of WebRTC on the Android platform VDM, and goes deep into the WebRTC source code, dissecting the implementation of Android VDM in WebRTC into 4 layers, from top to bottom: Android SDK Application, Java API, C++ Wrapper, All In One API.

At the same time, a brief introduction is given to the engineering realization of compatibility issues that Android cannot ignore. By analyzing the implementation of WebRTC on Android VDM, we can have a deeper understanding of the implementation architecture of WebRTC's video system and the architectural thinking of cross-platform implementation.

about the author

Iven, a senior audio and video client development engineer at NetEase Yunxin, is mainly responsible for video engineering and Android VDM related work.

Technical dry goods | WebRTC technical analysis of Android VDM

01 Introduction to Android Graphics System

Producer and consumer models in the Android graphics system

Android graphics display pipeline

02 Android VDM in WebRTC

collection

Codec

Rendering

03 Cross-platform engineering realization of VDM

04 VDM parameter adaptation and optimization in RTC scenarios

05 Summary

about the author

网易数智

引用和评论

InfoQ官媒报道|网易云信裴明明：云原生架构下中间件联邦高可用架构实践

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性