Cocos Big Cousin: Audio and Video Application in Cocos Engine丨ECUG Meetup Review

This article is based on the sharing of "Big Cousin" Li Yang (head of Cocos Engine Evangelist) on June 26, 2021 ECUG Meetup Phase 1 | 2021 Audio and Video Technology Best Practices·Hangzhou Station. This article is more than 5800 words long, and the main structure is as follows:

Cocos engine introduction
Cocos Creator && Engine
Cocos Creator video player based on FFmpeg
Cocos audio and video playback and transformation

To get the "full version of the lecturer PPT", please add the ECUG assistant WeChat (WeChat ID: ECUGCON) and note "ECUG PPT". The sharing of other lecturers will also be released in the future, so stay tuned.

following is the share text:

Big guy, I'm Li Yang, the head of Cocos engine evangelist. I have written code for about seven years, and I have been doing mobile game research and development based on the Cocos engine. A game that everyone may be familiar with is Fishing Master.

I am now in charge of evangelism at Cocos. In my opinion, is a developer’s assistant and technical navigator , mainly to assist developers who use company engines to develop games more efficiently and help them solve the problem. Provide support from the engine department for technical issues.

1. Introduction to Cocos Engine

Cocos is based on the concept of "technology drives the efficiency of the digital content industry". Since its establishment in 2011, there are 1.4 million registered developers worldwide and nearly 100,000 registered games. Approximately 1.6 billion devices have downloaded the Cocos engine to make games or apps, including 203 countries and regions.

The figure above shows some market shares of Cocos engines. The share of Chinese mobile games is 40%, the share of Chinese mini games is 64%, and the share of online education apps is 90%. In the field of online education, Cocos has a special education editor, and an interactive game can be developed in a PPT-like way.

The left side of the above figure is some data for May 2020, and the right side is some game cases. "Sword and Crusade" I don’t know if you have played it. "Animal Restaurant" is a creative game of WeChat mini-games with over 100 million users and sales; "World Conquest" is a game developed in one month, of course its team The scale is also very large. This game introduces the last game, it is developed by a team of only three people, and there is only one programmer. So, our engine and editor are able to develop games very efficiently and quickly.

We are involved in many aspects, including education, in-vehicle, digital twins, online exhibitions, and so on. Including Tencent, Huawei, NetEase, etc., all of our partners are using the Cocos engine for development.

The Cocos editor can be used for visual editing, and game creation can be realized in modules, showing the state of WYSIWYG.

Cocos editor can do 2D and 3D games. For the art workflow, it also has an animation editor, and a particle system, etc., which lowers the threshold of game development through very simple modularization.

The picture above is our cross-platform workflow. 3ds Max and Maya are third-party software used to make models; the UI and Animation visualization parts can be done with the cocos editor. Code IDE to write our logic code. We can complete a WYSIWYG R&D experience.

2. Cocos Creator && Engine

The picture above is the architecture of the Cocos Creator engine. Game Assets are our resources, and Game Logic is the code we write. Later we will have the Cocos Creator Engine, including rendering, Scene Graph and our resources and data, as well as special effects, physics and componentization, UI, and finally released to the platform. The following is our back-end rendering, Metal corresponds to Apple, Vulkan is rendering on the Android platform, and WebGL is the web side.

The Cocos engine currently accounts for a relatively large proportion of 2D mobile games, but we also have a good performance in 3D. Huawei HiSilicon also helped us implement the delayed rendering pipeline, which I will introduce in detail later, which is relatively real.

Our engine part is 100% open source, and the open source engine is on Github. also has a preference design in the editor, which means that we will download a corresponding version of the engine. But for our open source part, you can change our engine, choose your own path to modify the engine on TS Engine and Native C++ Engine, and use a customizable engine to make the version you want. This is also our more diversified and customized interface for all developers.

Let me popularize texture rendering a bit. The CPU reads the data from the memory, and then sends the Vertex Data to the Vertex Shader, through trimming and then to the fragment shader, after mixing and testing, and finally to the screen. This is the entire rendering process.

In Cocos Creator Renderer, all nodes on the RenderScene management engine, FrameGraph is the data dependency. We have two pipelines here, one of which is Forward, which is forward rendering.

What is the problem with It is fairly linear, and each geometric figure is passed down the pipe once at a time to generate the final image. But there is an overlap between the model and the model. For example, there is a model in the front, a model in the middle, and a model in the back, but the real camera sees only the first part. But if the image is rendered, I will render all three, but when it comes to visualization, there is only one, and the calculation of the part that is obscured later is redundant.

In deferred rendering, the shading will be delayed a bit until all the geometry has passed through the pipeline to generate the final image. When the vertex data is passed to the vertex shader and to the fragment shader, it will store the data in the G-Buffer. In the next stage, the G-Buffer data is sent to the vertex shader. At this time, the vertex shader's processing program for the G-Buffer data is different from the vertex shader program in the previous stage. There are two sets of logic, and its data processing mode is different.

Next, we will go to the fragment shader. The fragment shader is based on the pixels on the screen. It will only render the graphics to be displayed on the current pixel. Other occluded images are not within the calculation range. , It will save a lot of calculation time.

Just now we talked about the rendering interface of different platforms, we made a sub-assembly of GFX. It will turn these interfaces into the same API, because at this stage when we use GFX, developers can see, but I want to publish to different platforms. If my API is different, it will increase the developer’s work time and workload. So we encapsulated it. When developing a game, regardless of the platform, the same set of APIs are used. When it is released, it is built and the rendering to be used is selected.

This is the rendering of the native part, let me talk about GFX Agent. GFX Agent will send some commands to the CPU. If we have 20 commands in the queue, at each frame, we first need to read the queue, the instructions in the queue must be executed, and then they will be removed from the queue.

But sometimes, if the entire command is too many, normal we are 60 frames per second, which is OK. That is to say, within this time frame, if I can't finish the processing, the instructions sent by my current CPU to the GPU will be stuck. If I freeze one frame now, and it takes two frames to complete a rendering, it means that my current frame has fallen from 60 to 30. If I freeze and use three frames, then I will start from 60 directly drops to 20 frames (the above are all real-time frame rates).

According to this problem, we made GFX Agent, which means that we are not processing GPU coding commands in real-time in the current thread. At this stage, we will only collect and package these commands, and we will put them in the queue. Then I will start another thread while executing these instructions.

thread also has a queue. If my CPU is too late to send some instructions to the GPU (you must understand that most of the stalling is the waiting time of the GPU waiting for the CPU), it will cause the stalling. So when the CPU does not pass the instruction to the GPU, even if the GPU does not receive a new instruction, but there is still something that can be rendered in another thread that is started, it will not cause a stuck condition.

Our current version is Cocos CreatorV3, which has made some performance improvements:

First, multi-backend GFX for modern graphics interfaces; second, load-balanced multi-threaded renderer. When I want to issue commands to certain threads, I will check these threads, or see which thread handles fewer tasks. At this time, the tasks will be placed in threads with fewer tasks to ensure that the tasks of each thread are relatively similar to maintain load balance; third, highly customizable rendering pipeline; fourth, Huawei’s contribution High-performance deferred rendering pipeline; fifth, memoryless architecture based on mobile TBR&TBDR GPU.

Because we know that the mobile terminal is more hardware-intensive, for example, when my volume is very large, and the mobile terminal transmits to the mobile phone, you may find that there is some heat or reception delay. TBR divides the image to be rendered into tiles, and the coordinates of each tile are stored in the system memory in the form of a list through an intermediate buffer. TBDR One of the keys to TBDR rendering is deferred rendering.

3. Cocos Creator video player based on FFmpeg

Next, I will talk about the Cocos Creator video player based on FFmpeg, taking Happy Monkey as an example. We will have some interactive games on the educational courseware. The interactive games will definitely require some videos, and there will be some problems at this time.

For example, in this picture, we use the same design, and the visual effects on Android, IOS, and the Web are inconsistent. And because of the limitation of the fix top design, I cannot use the mask component to limit the shape, and it does not support the concealment of the far corners. There will also be a situation when switching scenes. I quit a scene, but I currently have the afterimage of the video component. This is the problem we encountered when developing the game with the engine.

So, what is our solution? It is mainly divided into two parts: FFmpeg and Web Video Element.

We use FFmpeg for encoding, then use OpenGL for rendering, and publish to mobile phones. Android uses Oboe for one-screen playback, and IOS uses AudioQueue for audio playback. For the Web, we use Video as an element for decoding, WebGL for video rendering, and Web Audio API for audio and video playback. This is the idea of our overall solution.

This is some audio division. First of all, I will use FFmpeg to decode audio and video, modify ffplay, and make AVplay. After decoding, the decoding of audio and video connected to each end is different for Android, IOS, and Web. Finally, we have a jsb binding video component interface. Because our entire engine is written in js language, but our audio player is written in C++, we must bind through jsb so that js can call objects in other languages.

Next is the audio playback. As I mentioned earlier, Android, IOS, and Web use different audio playback processing. Finally, we have an optimization and extension, such as side-by-side broadcasting, precise seek, and libyuv instead of swscale.

Let me add a little bit to this point. Why do we want to do this is because if we want to use the engine to play videos in our game, the entire framework is moved over, there is no way to interact with some of the components in our engine, and it doesn’t matter. Hierarchical relationship, I have no way to operate this video, and there is no way to interact with the video through the user's behavior. So this is why we are going to make changes based on FFmpeg. Because in this case, it is equivalent to rendering every frame of our video to our engine, so it will have some efficiency problems, which is why we use libyuv instead of swscale.

Let me talk about AVplay transformation. After we compiled the source code of FFmpeg, we found three executable programs:

First, FFmpeg is used for audio, video and video format conversion, video editing, etc.;
Second, ffplay is used to play audio and video and needs to rely on SDL;
Third, ffprobe is used to analyze audio and video streaming media data;

What's wrong with it? Although it satisfies our needs for playing audio and video, it has some problems during our transformation. For example, it has very low efficiency when performing texture scaling and pixel and format conversion, and it does not support the reading of AndroidSL files. Fetch, so we modified this.

This is the way of thinking of AVplayer's entire transformation. First, we will call a function to initialize all the information, and then create a read thread and refresh thread. It will create audio, video, and subtitle decoding programs. The refresh thread consumes, for example, the sequence of pictures on the video, the sequence of samples on the audio, and the string sequence of subtitles that we pass over. The above is the entire architecture of AVPlayer.

This is the JSB binding video component interface. The reason for JSB binding is to allow JSB to call C++ objects and other languages. Because our engine language and the language of the video player are not the same, so we have to make them have a calling relationship, so we have to call the JSB side Make a binding. Here we also made a sub-packaged class, which is Video, which is its UML.

After finishing the JSB binding, we come to the rendering part, which is what I just said we want to render every frame of the video to our engine. It mainly consists of three steps:

First, customize the material, and the material is responsible for the shader program;
Second, customize the Assembler, which is responsible for passing vertex attributes;
Third, set the material dynamic parameters, such as setting the texture, transformation, translation, rotation, and scaling matrix.

Regarding custom materials, the shader program needs to be written in the effect file, and the effect is used by the material. Each rendering component needs to mount the material property. Since the video display can be understood as a picture frame animation rendering, you can directly use the builtin-2d-sprite material used by CCSprite provided by Cocos Creator.

Regarding the custom Assembler, after you have the material, you only need to care about the location coordinates and texture coordinate transfer, that is, to customize the Assembler, you can refer to the official document to customize the Assembler. The world coordinate calculation method (updateWorldVerts) is different between the native terminal and the web terminal, otherwise the display position will be disordered.

Regarding setting the material dynamic parameters, in the video player, the texture data needs to be dynamically modified. On the mobile terminal, the AVPlayer after ffplay transformation is called to the void Video::setImage method through the ITextureRenderer.render interface during the playback process. Continuously update texture data.

The above is the transformation based on FFmpeg, this is the scene after we finished the transformation (illustration). The video just now is actually an animation of every frame we rendered, so it can form an interactive state.

4. Audio playback and transformation of Cocos

Finally, talk about the playback and transformation of Cocos audio and video.

The audio and video playback of Cocos is relatively simple. IOS is OpenAL, Android is OpenSL, and Web is WebAudio and DomAudio, which are also done on different platforms.

This is the audio of the Cocos engine. In fact, it’s relatively simple in the engine. One is looping, similar to background music, and the other is sound effects, such as the sound of a bullet.

Here, we are still based on the above example, and also made a slight modification to the audio on the mobile terminal, mainly to replace the SDL audio-related interface in the ffplay program, mainly to open, close, pause and resume.

Next, we will also add some 3D sound effects to the audio to support some mainstream media sound effects platforms. The application of education that I mentioned just now, in fact, we can also use our Cocos engine to make some interactive games in development on the live broadcast. In the future, we will also share more game cases in other scenarios with you.

The above is my sharing, thank you!

About Qiniu Cloud, ECUG and ECUG Meetup

Qiniu Cloud: Qiniu Cloud was established in 2011. As a well-known domestic cloud computing and data service provider, Qiniu Cloud continues to be intelligent in massive file storage, CDN content distribution, video-on-demand, interactive live broadcast and large-scale heterogeneous data In-depth investment in core technologies in the fields of analysis and processing is committed to fully driving the digital future with data technology, and empowering all walks of life to fully enter the data age.

ECUG: Fully known as Effective Cloud User Group (Effective Cloud User Group), CN Erlounge II was established in 2007 and initiated by Xu Shiwei. It is an indispensable high-end frontier group in the field of science and technology. As a window of the industry's technological progress, ECUG brings together many technical people, pays attention to current hot technologies and cutting-edge practices, and jointly leads the technological transformation of the industry.

ECUG Meetup: A series of technology sharing activities jointly created by ECUG and Qiniu Cloud. It is positioned as an offline gathering for developers and technical practitioners. The goal is to create a high-quality learning and social platform for developers. We look forward to every participant. The co-creation, co-construction, and mutual influence of knowledge between developers, generating new knowledge to promote cognitive development and technological progress, promote common progress in the industry through communication, and create a better communication platform and development space for developers and technology practitioners.

Cocos Big Cousin: Audio and Video Application in Cocos Engine丨ECUG Meetup Review

1. Introduction to Cocos Engine

2. Cocos Creator && Engine

3. Cocos Creator video player based on FFmpeg

4. Audio playback and transformation of Cocos

七牛云

引用和评论

AI for All，Code for All｜七牛云 AI 开源项目扶持计划全面启动

三分钟掌握视频剪辑 | 在 Rust 中优雅地集成 FFmpeg

三分钟掌握音视频处理 | 在 Rust 中优雅地集成 FFmpeg

三分钟掌握视频分辨率修改 | 在 Rust 中优雅地使用 FFmpeg

CVPR 2025 | 火山引擎获得NTIRE 视频质量评价挑战赛全球第一

三分钟掌握音视频信息查询 | 在 Rust 中优雅地集成 FFmpeg

【harmonyOS NEXT 下的前端开发者】WAV音频编码实现