Content source: Huawei Developer Conference 2021 HMS Core 6 Media Technology Forum, the keynote speech "The Status Quo and Development Trend of Audio Production".

Speaker: Senior Engineer of Huawei Audio Algorithm Research

Hello everyone! I am an engineer from Huawei Audio Engineering Department. Today, I am very happy to share with you the current status of audio production and the audio development trends we understand.

From phonographs and vinyl records to MP3 and CD, audio has been developed for many years, and has been developing along two relatively large main lines. One is the development direction of ultra-high definition, the digital quality of audio is getting higher and higher, and the bit rate is getting lower and lower. For humans, the audio range is 20 to 22 Hz, and audio above 17K is basically inaudible.

is the topic I am going to talk about today-immersive . The sounds we usually hear come from all directions. For example, we have bird calls in the upper right corner and the click of a camera on the left. The sounds come from all directions around us. At the beginning, audio recording and playback are subject to certain restrictions. At the beginning, it was mono. Mono means that the sound source flies from a point and does not cover much content. It is called a point sound source. Many mixers and musicians have made a lot of attempts in the monophonic era, such as adding reverberation and color dyeing to form a certain sense of space, but limited by its mono, the effect is far from enough.

In the era of stereo and dual channel, the left and right channels were given different content to present more space and story. In the past 60s and 70s, many cutting-edge bands used dual channel to give Users bring a different experience. For example, in music, the human voice and guitar are divided into left and right channels, which makes users feel as if they are listening to the performance on the spot; another example is Bohemian Rhapsody, which puts the different chorus of two people on the left and right to form a dialogue. I feel that by giving the left and right channels separate different content to form a stronger sense of story, it can bring the logic of music narration to the listener. At that time, it was a big leap from mono to stereo.

Later, from surround sound to three-dimensional sound, the added content is not only the content of the left and right channels, but also the different sound sources from the front, rear, left, right, and even up and down directions. In this way, will have more Large space to edit their sound source , users also have more immersion. Later we will enter the era of 6-DOF spatial sound, which will give you a stronger sense of space. Now we are at a historical node surrounded by stereo sound and surround sound to three-dimensional sound.

Nowadays, more and more musicians use three-dimensional sound to make music, but the production of three-dimensional sound is still at a relatively early stage, and the mixing paradigm mixer is still exploring. Three-dimensional sound is a multi-channel playback environment for professional mixers, but most users still rely on headphones for playback. Currently, the partners that mainly promote three-dimensional sound music for headphones are Sony’s 360 reality and Apple. Dolby atmos music, a collaboration between music and Dolby, can of course also be realized through IMU headsets to realize three-dimensional sound playback exploration. Wearing IMU headsets will give you a feeling of head turning and the sound source fixed there.

current 3D sound production process?

First, there will be a composition and arrangement. We need to find professional people to record singing and musical instruments. Then we will import the recorded sub-track materials into the digital audio workstation, and the digital audio workstation will go through a professional mixer. The plug-in is performed by the skillful hands of, and finally three-dimensional sound and audio are generated and played back through multi-channels.

So, how can HMS Core be made?

First, we use AI composition ability to assist you in composing composition, and then based on the TTSing singing voice synthesis ability we provide, let you synthesize more professional audio faster, use Audio Edior kit convert 2D to 3D, and then make it in UGC or PUGC In the process, they generate their own 3D audio, and finally form binaural playback or multi-channel playback and perform headphone monitoring.

AI composer orchestrator

Next, let me introduce the AI composer orchestrator. We usually hum some inspirations of minor tunes when we walk, and the music will be automatically generated through AI composing. The current AI composing capabilities are mainly focused on folk songs and national style. We can better serve everyone to cater to the consumer market. To C can assist PGC, enable UGC music creation, and lower the creation threshold; To B can provide canned music, reduce music copyright fees, and support its own applications and partner businesses success.

TTSing singing voice synthesis

How to synthesize TTSing singing voice after having lyrics and music? The user can input the score information into the TTSing singing voice, feel the ability of the song to be matched with the song, and cooperate with the song made by the AI just now, plus the lyrics listening effect.

2D to 3D

With the instrument track, music track and singing voice, you can import the audio 2D to 3D ability, you can simply drag through the interactive interface, or you can use the mobile phone Media sensor command to render these tracks to any position in the space, after love Perform binaural playback to hear the final synthesized song. Is it impossible to render this space without the music and ability? No, we support the ability to transfer ordinary dual-channel and MP3 to 2D to 3D, automatically analyze that there are piano and vocals, and then specify a certain element in the piano, and then double-click it by rendering to each position in the space. Ear rendering, even old songs can be converted into three-dimensional music, creating your own music.

This is the production process of our HMS Core 3D sound . We can provide AI composing orchestration, TTSing singing voice synthesis, and the ability to convert music from 2D to 3D. Through the HMS Core service, we can provide convenient production for UGC and PUGC users.

Now we are facing the junction of reality and virtuality. The popular concept now is the meta-universe, and the future with greater audio is the sound meta-universe (Soudverse). The sound universe is constructed through space acoustics, and we will provide more calculations in the future. Acoustics and spatial acoustics, sound source synthesis and spatial rendering capabilities help everyone quickly build an acoustic universe by using the capabilities of HMS Core.

thank you all!

for more details>>

Visit Huawei Developer Alliance official website
Obtain development guide document
Huawei Mobile Services open source warehouse address: GitHub , Gitee

, and learn about the latest technical information of HMS Core for the first time~


HarmonyOS_SDK
596 声望11.7k 粉丝

HarmonyOS SDK通过将HarmonyOS系统级能力对外开放,支撑开发者高效打造更纯净、更智能、更精致、更易用的鸿蒙原生应用,和开发者共同成长。