8 questions and 8 answers, one article understands spatial sound effects

Recently, the first session of the Netease Group Innovation Awards came to an end, Netease intellectual level "close to the limit of the human ear - Audio call" project from a number of entries to the fore won "0-1 Innovation Award" three awards .

The award-winning project was born in the audio laboratory of NetEase Yunxin, a subsidiary of NetEase Smart Enterprise. From the beginning of 2020 to the present, the audio laboratory team has continued to explore and innovate on the basis of stable audio communication quality. "From 0 to 1" successfully developed and implemented a number of innovative algorithms, including real-time AI audio noise reduction , Noise Injection, mobile terminal dual-talk detection, real-time voice 3D sound effects, real-time intelligent music scene detection etc.

Among them, real-time voice 3D sound effects are the first not only realizes real-time 3D spatial sound effects, but also adds distance attenuation and room modeling features.

Many friends know that spatial sound effects are due to first-person shooting game scenes such as "eating chicken", but how are spatial sound effects achieved? What are the current mainstream solutions? What scenarios can it be applied to? What is the value to the product and even the industry?

Today, we have 8 questions and 8 answers, an article to give you a comprehensive understanding of spatial sound effects.

This article contains the following 3 parts :

#01 General: Xiaobai can also understand

Q1 What is spatial sound effect?
Q2 How to hear spatial sound effects?
Q3 What is the basic principle of spatial sound effects?
Q4 What factors affect the effect of spatial sound effects?

#02 Technical articles: Daniel to learn about

Q5 What are the main technical difficulties of spatial sound effects?
Q6 What are the current mainstream solutions for spatial sound effects?

#03 Scene: Ask the value? Just look here!

Q7 What are the characteristics and advantages of spatial sound effects?
Q8 What scenarios can spatial sound effects be applied to?

General articles: Xiaobai can understand

Q1 What is spatial sound effect?

Wikipedia introduces it like this: 3D sound effects, also known as Spatial Sound, are a set of sound effects that can manipulate stereo speakers, surround speakers, speaker arrays, or headphones. It can virtualize the sound source to be emitted from a specific location in three-dimensional space, including the front, back, left, and right of the listener's horizontal plane, as well as the vertical direction above or below.

In essence, spatial sound effects are based on some special psychoacoustic effects of the human ear. Through calculation and simulation of some acoustic-related algorithms, the sound that seems to exist but is actually fictional is simulated.

For example, in the game, the sound of footsteps when the enemy sneaks up to your left rear, the sound of your companion changing the magazine on your right, the sound of the window being broken on the left, and the explosion of a grenade in the front right.

Q2 How to hear spatial sound effects?

In fact, we can achieve the purpose of hearing spatial sound effects in many ways, such as using speakers or headphones. Here are four methods according to the purpose of use and application scenarios:

using multiple speakers

One way to create spatial sound effects is to place multiple speakers in a space. When listening to film scores or music through a surround sound system, a single element can be translated to any position on the same plane as the listener's head. The dialogue, music, and sound effects seem to come from the speakers or anywhere in between. This is a common solution for movie theaters and home theaters.

(See the reference for the source of the picture)

2. A soundbar or stereo speaker using crosstalk cancellation technology

If you want to have a home theater, this may be a more cost-effective and convenient choice. Smart sound bars using crosstalk cancellation technology can now provide a complete 3D experience. Crosstalk cancellation technology plays an important role in rendering binaural signals with speakers. It mainly uses predistortion filters to allow the sound played by the speakers to produce phase cancellation on a specific acoustic transmission path. Simply put, the sound from the right speaker to the left ear and from the left speaker to the right ear is canceled. The crosstalk cancellation filter should be updated in real time according to the head position, so head tracking is required to achieve the best performance.

3. Headphones using static binaural mixing

In the case of headphones, a multi-channel audio source can be generated based on technologies such as upmixing or diffuser filters, and then HRTF convolution filtering is performed on each channel data, thereby increasing the sense of sound orientation. Properly combined with the use of the reverb effector can produce a specific 3D sound field effect. One of the main advantages of this method is that it can eliminate the "head effect", which is suitable for game and movie scenes, and can bring a certain sense of immersion. The 3D immersion and 3D grand mode in histen sound effects common in Huawei mobile phones are mainly realized based on this kind of technology.

4. Combine head tracking and head lock audio

Binaural headphones usually don’t sound real, partly because it doesn’t change when you turn your head, so head tracking is very important. For example, use optical camera methods or gyroscope sensors to track the position and orientation of your head. Binaural rendering can integrate your actions, which means that the rendering can be updated based on your head rotation and position.

(See the reference for the source of the picture)

Apple uses the built-in acceleration sensor and gyroscope of AirPods Pro to track the wearer's head in real time. When the head moves, the data can be recalculated so that the environmental sound effect heard by the wearer is consistent with the original effect. In addition to real-time tracking of the wearer's head, AirPods Pro's sensors can also track the movement data between the head and the device, and support data comparison to ensure that users encounter emergency braking conditions when taking subways or buses. Surround sound will not be interrupted.

Q3 What is the basic principle of spatial sound effects?

In real life, the sound we hear has direction and distance, and the sound source itself also has a certain width. Sounds of different directions, distances, and widths together constitute the sound source location of the sounds we hear.

The spatial sound effect generally uses head related transfer function (HRTF) and sound wave spatial convolution to imitate the propagation of natural sound waves, making it seem to come from a point in three-dimensional space.

The head related transfer function (HRTF) can be used to describe the impact of your head and ears on the sound you perceive. When sounds from different directions reach the two ears, there will be a slight difference in phase and frequency. This difference allows us to instinctively locate the sound source.

In simple terms, HRTF is an attempt to simulate the sound of our human ears, and use this model to virtualize the sensation of any sound source to the human ear. Therefore, HRTF first needs to measure the data of a large number of human ears. The key to establishing a black box acoustic model through these data is how to measure more accurate HRTF data and how to establish a more suitable related model.

Q4 What factors affect the effect of spatial sound effects?

The first unavoidable factor is direction.

When the sound source is on our right side, the sound waves usually reach our right ear first, and then the left ear. These small time differences are enough for the brain to judge that the sound comes from our right side. It is ITD (Interaural time difference, binaural time difference), and because the right ear receives sound waves directly, the volume is slightly larger than that of the left ear. In addition to the sound received by the left ear, part of it is reflection and diffraction from the outside Come, so it will cause the tone to change, this is ILD (interaural level differences, binaural sound intensity difference). In addition, people are the biggest variable. When we listen to sounds, it is impossible to guarantee that our heads and ears are motionless. The influence of ITD, ILD, and people form HRTF, and the influence of ears, head and shoulders is also the reason why HRTF needs to be personalized.

(Image source: network)

The second factor is distance, including subjective loudness perception, high frequency attenuation, head influence on sound, reflections, etc. In addition, the Doppler effect (wavelength or frequency will change due to the relative movement of the person and the sound source) will also affect the spatial sound effect.

The third factor is the environment, such as reflection, reverberation, absorption, obstacles, propagation, diffraction, etc.

Finally, there are some other factors, such as the listener often confuses the front sound image and the rear sound image, and visual aids and movement tend to enhance the positioning effect and so on.

Technical articles: let's get to know Daniel

Q5 What are the main technical difficulties of spatial sound effects?

In the fourth question, the factors that affect the spatial sound effect also determine the technical difficulties of the spatial sound effect. Here we mainly share the following 3 points:

1 Construction of high-quality HRTF database:

In order to ensure that the characteristics of the sound source transmitted to the human ear from any position in space are accurately recorded, it is necessary to measure as many distances and angles as possible. As a result, the collected HRTF database is relatively large, which will be restricted in specific application scenarios.

Studies have shown that the sound direction information has a relatively high correlation with the time difference between the sound’s arrival in the human’s ears and the intensity difference between the two ears. It is also affected by the outer ear pinna, inner ear canal and shoulder width. This directly makes it impossible to create a single HRTF database that is perfect for everyone.

2 Construction of a sense of distance:

The human ear can distinguish the distance of the sound based on the loudness and the difference in frequency components. In addition, when the sound source is from far to near or from near to far, the frequency of the sound felt by the human ear will change, which is the so-called Doppler effect. When developing spatial audio, it is necessary to develop appropriate algorithms to simulate the attenuation of sound with distance during the propagation process, and the Doppler effect of the sound source during the movement.

3 Construction of acoustic environment:

The propagation of sound in space can be analogous to the propagation of light. When sound encounters a wall in the process of propagation, it will be reflected. At the same time, the wall material is different, the sound will be absorbed to a certain extent. For a specific three-dimensional room, the sound that propagates from a certain point to the listener can be a direct sound, a sound that is absorbed by a single reflection in the room, and a sound that is transmitted by the ear or even multiple reflections. It is a more complex and challenging problem to efficiently model the propagation of this kind of sound in a specific environment.

Q6 What are the current mainstream solutions for spatial sound effects?

The first is a 3D audio solution based on multi-channels.

This program is the earliest and most widely used program. This solution uses multiple speakers arranged in the space to directly play the sound in a specific direction, so as to achieve the effect of sound emitted from a specific position in the space. Based on multi-channel 3D audio, the most common schemes in home theater are 5.1-channel system and 7.1-channel system. The 5.1-channel and 7.1-channel solutions can only bring spatial effects on the horizontal plane, but have no effect in the vertical direction. At present, in professional movie theaters, 11.1 or 22.1 channel playback systems have emerged. These solutions improve the spatial effect of directions above or below the horizontal plane by arranging speakers on different vertical planes.

The current common multi-channel playback solutions include: Dolby Surround 7.1, Dolby Digital 5.1, auro9.1, auro10.1, auro 13.1 and other solutions launched by auro. Japan's NHK company introduced a 22.2 multi-channel playback system.

(Dolby 7.1)

The second is object-based audio.

At present, the spatial audio solutions based on Dolby Atmos®, DTS:X surround sound system and MPEG-H are currently used in the market.

Dolby Atmos is an advanced surround sound standard launched by Dolby Laboratories in 2012. It combines front, side, rear and sky speakers with complex audio processing and algorithms to provide up to the highest 64-channel surround sound increases the sense of space immersion. The core of Dolby Atmos technology is spatial coding, in which sound signals are allocated to locations in space rather than specific channels or speakers.

DTS:X technology is an open new generation of codec standards, and it is also an object-based multi-dimensional spatial audio technology. Different from the existing surround sound system, DTS:X audio is no longer restricted by fixed-position speaker placement or specific channel signals. It can be flexibly adjusted according to the playback environment to obtain the best performance in this environment. Sound performance. It can also create realistic sound effects at precise points around the audience, creating a richer soundscape. DTS:X and DolbyAtmos both use recording technology based on sound objects.

The third option is Ambisonics.

The program records and encodes the audio source in ambisonic format at the acquisition end, and then decodes it into the corresponding format according to the speaker arrangement of the playback system during playback. There are a variety of audio capture devices that support this format on the market.

The last one is based on binaural rendering.

Based on this solution, it is currently widely used in music apps and consumer electronic devices. For example: the 5.1 panorama in the Viper sound effect, 3D Nicam. Whale cloud sound effects of NetEase Cloud Music.

Scene: Ask the value? Just look here!

Q7 What are the characteristics and advantages of spatial sound effects?

1 The spatial sound effect replicates the way the sound is processed in real life

The sounds we hear every day are complex. The extraordinary thing about spatial audio is that it digitally reproduces the sounds we hear in real life.

The sound will change according to how close or far you are to the sound source. When you tilt or turn your head, the sound will change according to the direction of your ears, and you can feel the height of the sound. Spatial audio opens up a full range of sounds and provides a 3D soundscape.

2 Spatial sound effects provide an immersive dynamic experience

Spatial sound effects make the digital world more real. When you interact with 3D images, the sound should also give you a sense of space. Only in this way can you truly experience the immersive feeling. For example, while playing a game, you may hear the air conditioner buzzing overhead as you walk through a dark corridor. When you get closer, the sound becomes louder. Birds chirping in the trees, waterfalls roaring in the distance-all of this will appear in a lush 3D environment using spatial audio.

(See the reference for the source of the picture)

In the ever-changing era, one of the trends we can feel is "immersion." The deep integration of reality and virtuality is opening up a "Metaverse" in which humans and machines blend. Spatial audio and similar immersive audio technologies will enhance the immersion of the "meta universe" from the sound experience, so that we in the "meta universe" can be completely immersed from sight to hearing.

3 spatial audio provides more accurate and clearer audio

Spatial audio allows us to pinpoint the location of sound and distinguish it from multiple sources, which is very valuable in remote communication scenarios.
takes video conferencing as an example. The limitations of video conferencing highlight the importance of realistic audio. Using spatial sound effects makes it easier for us to understand who is talking. When two or more people are talking at the same time, it is also easier to recognize what they are talking about. During the course of the day, it does play an important role in reducing fatigue and enjoying conversation.

Clarity makes real-time spatial audio shine.

Q8 What scenarios can spatial sound effects be applied to?

1 Game industry

What we are familiar with is the application of spatial sound effects in FPS games (first-person shooters). By making the player rely on the correct judgment of the source of the sound clues, spatial sound effects can improve the player's environmental awareness in FPS games. Skilled players can accurately locate the danger with only a slight sound or skill sound effects during the game. When connecting with teammates, they can accurately identify the location of teammates through the help voice to start rescue.

But it is not limited to first-person shooter games. As one of the key factors to enhance the immersive experience, spatial audio can improve the gaming experience to a certain extent for most games.

For example, through spatial sound effects, small-screen games such as mobile games can create the experience of a big game; games centered on sound (spatial sound effects) can help visually impaired people enjoy the game; horror games can take advantage of darkness and lack of visibility. Make players rely on 3D sound cues to create a more immersive experience.

(Image source: network)

In addition, the traditional sound is a two-dimensional plane, which is out of touch with the visual field provided by VR. The combination of head-mounted devices (such as Oculus Rift) and spatial sound effects allows players to determine the direction of the sound source by turning their heads, thereby further enhancing the VR experience.

2 Music performance

If you are a listener, spatial sound effects allow you to choose different sound effects in the same venue. If you want, you can even experience the feeling of standing next to the singer and listening to music, and you can have the experience of sitting in the middle of the stage and enjoying the symphony. To a certain extent, this solves the current lack of immersion in online performances.

If you are a creator, spatial sound effects bring you unlimited possibilities. The increased freedom of sound not only helps composers express their emotions when creating, but also allows the soundtrack to free up more for action and dialogue. Space. In the future, there will be more music creation based on "spatial sound effects", and there will be targeted recording from the recording stage, and the music market may enter an era of immersive creation.

3 Corporate Services

As mentioned in the seventh question, spatial sound effects will bring spatial information into the audio. To a certain extent, using spatial sound effects makes it easier for us to know who is speaking in an audio call scene where multiple people are present. When multiple people are talking at the same time, it is also easier to recognize what they are talking about. While improving efficiency, it also reduces the fatigue of communication.

Digital exhibitions and commercial exhibition halls are also possible directions. With the combination of VR and spatial sound effects, company employees can introduce booths and communicate with customers face-to-face. For a real VR experience, in addition to sensory experiences such as touch and vision, spatial sound effects are even more indispensable.

4 Healthcare

Spatial sound effects can also be used in health care, for example, technologies used in sports rehabilitation systems, electronic travel aids, and other assistive devices for the visually impaired. Taking the visually impaired as an example, spatial sound effects can be used as the main clue of their sense of direction. Their daily life provides greater convenience.

The rapid development of technology, from mono, stereo, to the current spatial audio, the colorful world has also been brought to our ears. When sound enters the space environment, it is not only a revolution in audio technology, but also a basic component of entertainment and many other ecology. In order to better serve corporate customers in all walks of life, NetEase Yunxin will officially release real-time voice 3D sound effects in the near future, so stay tuned.

references

For more technical dry goods, please pay attention to [Netease Smart Enterprise Technology+] WeChat public account

8 questions and 8 answers, one article understands spatial sound effects