At the end of May, we took turns to experience a "joint" concert of Luo Dayou and Stefanie Sun during the weekend swiping screen, collectively recalling youth, and the whole DNA moved. Follow [Rongyun Global Internet Communication Cloud] to learn more

Under the influence of the epidemic, the meeting in the crowded stadium has become far away, and we met everyone online, which has led to a number of online concerts that resonated strongly with everyone.

In fact, we are moving many offline scenes online, and the concert is just one of them. Voice chat room, KTV, teaching... With the support of RTC real-time audio and video technology, online gameplay has become more and more abundant, which also brings new challenges to technology implementation.

Especially in scenes with music participation, the high requirements for sound quality presentation require the introduction of 3A algorithms, and targeted program design according to different scenes.


3A algorithm

3A audio processing technology is a collective name for three audio algorithms: Acoustic Echo Cancellation (AEC), Background Noise Suppression (ANS), and Automatic Gain Control (AGC).

In the process of audio data processing, preprocessing is required after the acquisition of audio data, and 3A is the key to preprocessing.

(Audio processing flow chart)

Principle and detailed explanation of AEC echo cancellation algorithm

Acoustic echo cancellation (AEC) refers to occupying the line at the same time and with the same frequency spectrum in both directions of the second-line transmission. The signals transmitted in the two directions of the line are completely mixed together, and the echo of the signal sent by the local end becomes the signal of the local end. For the interference signal, the echo can be canceled by the adaptive filter to achieve better received signal quality, that is, echo cancellation.

The principle of echo cancellation is to use the received audio to compare with the locally collected audio, add reverse artificial echo, and cancel the remote sound.

In the communication occasions that are prone to echo, the adaptive echo cancellation algorithm is an indispensable technology. Its main application fields include video conference systems, hands-free phones, video phone terminals, mobile communication or voice control systems, and conference microphones.

Especially in video conferencing systems and hands-free voice calls, the existence of echoes seriously affects the quality of conferences, especially in enterprise remote conferences, due to the increased transmission delay, the impact of echoes is even worse.

ANS Background Noise Suppression Methods and Functions

Background Noise Suppression (ANS) refers to the process of identifying and eliminating background noise in sounds.

Background noise is divided into balanced noise and instantaneous noise . The spectrum of stationary noise is stable, and the variance of instantaneous noise spectrum energy is small. Using the characteristics of noise, it can be eliminated by adding reverse waveform processing to audio data.

At present, there are many simple methods to successfully suppress stationary noise, but there is still a lack of good methods for some transient noises common in life.

The common feature of transient noise is that it is extremely sudden, in the form of oscillation and weakening in the time domain, and the duration varies from ten milliseconds to hundreds of milliseconds; it is widely distributed in the frequency domain, and the spectrum of transient noise is basically It is aliased with the spectrum of normal speech and is difficult to suppress.

AGC automatic gain algorithm implementation and influence

Automatic Gain Control (AGC) is mainly used to adjust the volume amplitude to improve the performance of voice communication systems in noisy environments.

The volume of people's normal conversation is between 40-60dB. The sound below 25dB sounds very difficult, and the sound above 100dB will make people feel uncomfortable. The role of AGC is to adjust the volume to the acceptable range .

Audio loudness and microphone pickup control are important technical means to ensure the quality of audio and video communication. Generally speaking, factors such as audio standards, transmission conditions, and human errors may lead to sudden changes in sound or inconsistent loudness between audio signals. At this time It is necessary to enlarge or reduce the audio signal in order to obtain natural and clear voice communication.


The application of 3A processing in different scenarios

Each scene has different requirements and processing methods for sound quality. In order to provide better services, Rongyun implements targeted 3A strategy adaptation in different scenarios , and strives to bring customers the best sound quality experience.

The voice of the audio and video calls is dynamically adjusted first

The priority of the call scene is clear human voice . Whether the user is in a noisy outdoor environment or a weak network in an underground garage, clear and stable voice reception is the core requirement of the call scene.

In the call scenario, how does Rongyun set 3A processing in a targeted manner?

The first is that ANS suppresses background noise , and the background noise in the user's environment is suppressed to the greatest extent.

Balance noise is a major factor in background noise, and this part is not difficult to suppress. The difficulty lies in the instantaneous noise, which usually appears suddenly during the call, such as the whizzing of the plane, the school bell, the honking of the car, etc., which requires intelligent noise reduction through the AI algorithm.

AI noise reduction can identify the instantaneous noise that needs to be filtered out through model training. With the gradual expansion of the corpus, the effect of AI noise reduction will get better and better.

The second is AEC echo cancellation . Whether the echo cancellation is enabled also depends on whether the user uses an external device. For example, if a user uses an earphone to make a call, no echo will be generated, so naturally there is no need to turn on echo cancellation.

If you directly use the phone's microphone and speaker to make a call, you need to turn on the echo cancellation function. The echo cancellation function will have a certain inhibitory effect on the local human voice , so for different volume ranges and sound clarity requirements, the strength of echo cancellation will also be different.

For example, during a call, use the loudspeaker and turn on the volume to the maximum, and the echo suppression can be turned on to the maximum to eliminate all the echoes as much as possible. Correspondingly, the local vocals will also be suppressed to the greatest extent, and it is necessary to adjust the suppression degree on the premise that the other party can hear clearly.

The last is the AGC automatic gain , which has a certain sequence of logic with the echo cancellation and suppression of the human voice. After the human voice is suppressed, the automatic gain is used to amplify the sound, so that the sound can be heard clearly by the opposite end. There are some mobile phone speakers whose volume gain is very small, and the sound gain can also be used for different models to make the playback volume of the local end clearer.

A delicate balance of music and vocals in the chat room

In the chat room scene, the human voice is still very important. However, an important feature of the chat room is that music needs to be played in the room to enhance the atmosphere and achieve entertainment in business scenarios.

Anchors singing and showcasing their talents are also an important way to increase user activity and revenue.

So how to ensure that the songs sung by the anchor are still beautiful and pleasant to the audience? This has a lot to do with 3A.

The first is ANS to suppress background noise . The requirements here are the same as during the call. All the noise in the background should be removed.

The second is AEC echo cancellation. Most of the anchors will use an external headphone device for live broadcast without considering the echo situation.

When the host uses the mobile phone microphone and speaker, the degree of echo cancellation should not be too large, otherwise it will affect the host's speech captured by the microphone. This requires a delicate balance to be achieved, not only to allow the music to be released in high quality, but also to ensure that the host vocals captured by the microphone will not be suppressed too much due to echo cancellation.

The last is the AGC automatic gain , which needs to control the gain of the host voice and the music sound collected by the microphone according to the volume of the original collection setting. The music sound should not be too loud to cover the human voice, and the music sound should not be too small to reach. to the effect of setting off the atmosphere.

Music teaching instruments restore music fidelity

The requirements of the music teaching scene for the human voice are still clear and stable, and the requirements for the music are higher than that of the language chat room. Because the tonal range that different instruments need to collect will be much larger, in order to restore the original sound of the instrument at the far end, 3A needs to do special processing.

The first is that ANS suppresses background noise . In music teaching scenarios, the degree of background noise suppression is relatively low to ensure that the sound collected by the musical instrument will not be weakened at the peak position, resulting in loss of pitch.

Different from the call scene and the language chat room scene, in the music teaching scene, the granularity of noise reduction is not the first. Music teaching teachers are generally in a relatively quiet indoor environment with a relatively high level of sound insulation, so the primary purpose is to ensure that the music is not distorted.

The second is AEC echo cancellation . In music teaching scenarios, teachers generally have relatively standard equipment, and the requirements for echo cancellation are relatively low. Sound can be collected through an external microphone and played through headphones, so that no echo will appear.

Finally, there is the AGC automatic gain . In the music teaching scene, the teacher's playing and speaking basically do not appear at the same time, and the explanation and demonstration are interleaved. At this point, the requirement for automatic gain is to allow the vocals and instrument sounds captured by the microphone to be clearly conveyed to the far end.

The game opens the black voice to discuss smoothly

In the dark scene of the game, the most important thing is voice communication, and there is almost no need for background music. The focus of 3A processing is whether the double-talk effect is excellent, and whether the echo cancellation meets the standard.

The first is that ANS suppresses background noise . The scene of mobile games occurs anytime, anywhere, even in crowded subways or speeding shuttles or in vegetable markets near the community, which can become a dark place for mobile game lovers.

Therefore, the background noise of mobile games is extremely complex, and the demand for background noise suppression is particularly prominent, which is almost the same as the background noise suppression standard during calls.

The second is AEC echo cancellation . In the mobile game scene, the number of players wearing headphones and those who do not wear headphones is almost the same, which is a scene that needs attention.

With headphones on, echo cancellation basically requires no processing. However, in the case of external playback, if you are in a noisy environment, the echo cancellation needs to be turned on to a larger value, and the clarity and stability of the dual-talk needs to be ensured, which poses a higher challenge to the technology. Rongyun has invested a lot of resources to develop and adjust the echo cancellation algorithm in this scenario to ensure the communication effect of hackers.

The last is the AGC automatic gain . In the dark scene, because the background is more noisy, the degree of suppression of the original collected sound is greater, so the gain needs to be larger to ensure that other teammates can hear the sound of the local end clearly, so that during the battle The discussion of countermeasures will be smoother.


Audio 3A can use a lot of scenarios, this article only introduces the mainstream scenarios.

In the popular scenes of werewolf killing, script killing, and video blind date, if you want to achieve good audio and video effects, 3A processing is also inseparable.

It is believed that with the deepening of the application, the 3A technology will also make greater progress, reaching the level of removing all target noises. Rongyun will also continue to improve its technology, and is committed to allowing every user to have an immersive experience in real-time audio and video scenes.


融云RongCloud
82 声望1.2k 粉丝

因为专注,所以专业