头图

Guide:

As audio and video communication conferences become more and more popular, participants encounter increasingly obvious and different reverberation scenarios in different environments, such as large conference rooms, glass conference rooms, and small rooms with poor sound insulation materials. Wait. To ensure better listening intelligibility and comfort, the need for speech de-reverberation in communications is increasingly important and urgent. This paper explains some research and development progress and viewpoints of NetEase Yunxin in speech de-reverberation and improvement of communication effects, focusing on the solution based on adaptive combination of dual-mic signal correlation. The overall goal is to improve de-reverberation in the case of fidelity speech. Effect.

Text|Zhang Long NetEase Yunxin Senior Algorithm Engineer

An introduction to speech reverberation

(1) Introduction to reverberation

The following figure describes the cause and process of speech reverberation. The degree of reverberation in the signal depends on:

  • closed room;
  • room size;
  • reflective material;
  • The distance between the speaker and the microphone, etc.

(Pay attention to distinguish the meaning of echo)

 title=

As shown in the figure below, according to the order of arrival time, the reverberation is generally divided into: direct sound + early reverberation + late reverberation, they have different meanings in acoustic understanding.

 title=

The following figure shows the speech reverb effect :

 title=

(2) Development history of reverberation and de-reverberation research

  • The initial research came from the basic research on the phenomenon of sound propagation in the room, and then it was applied to the acoustic design of spaces such as concert halls and classrooms, in order to better transmit sound including music, human voice, etc.;
  • Next, we study the effect of reverberation on speech intelligibility;
  • Some researchers have focused on the positive benefits of reverberation: improving the naturalness, layering, and spatiality of speech, including improved intelligibility . Researchers use artificial reverberation to enhance various experiences, such as entertainment, games and music; as shown in the figure below, NetEase Yunxin provides the ability to provide artificial reverberation based on the Feedback Delay Network scheme;

 title=

  • Beginning in the 1970s, research on speech de-reverberation focused on the negative effects of reverberation on calls and recordings , improving intelligibility and quality;
  • After 2004 and 2005, hands-free communication and video conferencing emerged. Combined with the development of voice assistants (especially the far-field) after 11 years, the research and application of speech de-reverberation have become more and more extensive.

We categorize the metrics for evaluating performance according to the application of speech de-reverberation:

 title=

 title=

2. Key Algorithms and Research Progress

Combining algorithm practice and operational considerations, NetEase Yunxin currently implements voice de-reverberation from traditional algorithms, and cooperates with noise reduction algorithms to improve communication experience.

The following figure roughly categorizes speech de-reverberation algorithms according to the signal model and target :

 title=

 title=

This article mainly focuses on the following points:

  • Linear prediction class evolution algorithm;
  • Correlation suppression algorithm;
  • Then discuss plans to incorporate deep learning in the future.

(1) AWPE algorithm

 title=

 title=

Model transformation can be obtained:

 title=

 title=

Xt^m represents the signal received by the mth microphone at time t, Lm represents the number of microphones; hk^m represents the impulse response of the source s reaching the mth microphone, Lh is the impulse response length; nt^m represents the mth microphone The additive noise signal component received by the microphone at time t.

 title=

 title=

in

 title=

 title=

Indicates the data received by microphone m at time D before time t and before. dt^m is the aforementioned early reflection signal, that is, the target signal for de-reverberation; of course, there are also models that directly use the solution source signal s as the target signal, but it is not the mainstream, because the early-stage reverberation is generally beneficial to auditory and recognition systems.

Continue to solve the above model to get:

 title=

 title=

Transform the above model in time-frequency domain and introduce Recursive Least Squares transformation to get:

 title=

 title=

Solving the above objective function yields the following solution:

 title=

 title=

The above solution can be summarized as the following steps:

 title=

 title=

(2) Introduction of correlation noise reduction and dereverberation algorithm

Based on the assumption that the part of the late reverberation signal is scattered field noise, the correlation estimation method of scattered field noise between microphones is used to calculate the size of the late reverberation components, and then the spectral subtraction method is used to estimate the gain to de-reverberate. Experience shows that this kind of algorithm has better performance in reducing scattered field noise.

Signal model:

 title=

 title=

Calculate the following intermediate results:

 title=

 title=

Finally, the following noise reduction gain is obtained, and applying the gain to the input signal can de-reverberate the target:

 title=

(3) Comprehensive application

  • For communication tasks , NetEase Yunxin currently focuses on the realization of the AWPE series noise reduction scheme; synchronously considers the combination of CDR suppression algorithm in scenes with strong scattered field noise to improve performance;

 title=

The uplink in the communication must include a noise reduction module. The de-reverberation algorithm needs to cooperate with noise reduction to achieve joint tuning performance , which is generally achieved through module and parameter debugging.

 title=

  • For intelligent voice tasks, linear voice de-reverberation is generally used for pre-enhancement processing:

 title=

 title=

 title=

Future trends:

\

 title=

 title=

 title=

3. Algorithm implementation and operation optimization

Regarding the concerns in the specific implementation of the algorithm in the second section above:

 title=

 title=

  • Set the buffer access mechanism (involving the number of microphones, the number of historical frames, frequency points, etc.) to reduce the calculation time; the RLS algorithm pays attention to using the Woodbury matrix identity rule to replace the matrix inversion;

 title=

 title=

  • As shown in the formula, this type of statistical information can be replaced by a smooth update mechanism ;

 title=

 title=

  • Note that some matrices should be diagonalized as much as possible, or even real-numbered to reduce the amount of calculation;

 title=

 title=

  •  title= ​The amount of computation can be reduced by tabularization and frequency indexing.

- Set the ideal scattered field noise model.

 title=

4. Results report and follow-up outlook

(1) Display of current results

Currently combined with noise reduction, we set the priority of speech fidelity in the de-reverberation stage; the current algorithm has a reverberation processing capacity of about 800ms~1s, and the most important debugging parameters are: forgetting factor and number of blocks.

 title=

 title=

(2) Future Outlook

In the field of communication:

  • Adaptive implementation scheme of forgetting factor;
  • The deep learning solution realizes the fusion of speech de-reverberation and noise reduction, replacing the current solution combining traditional algorithms.

NetEase Yunxin's current optimization:

 title=

 title=

future:

 title=

 title=

 title=

references

[1] Xiang, Teng, Jing Lu, and Kai Chen. "Multi-channel adaptive dereverberation robust to abrupt change of target speaker position." The Journal of the Acoustical Society of America 145.3 (2019): EL250-EL256.

[2] Taniguchi, Toru, et al. "Generalized weighted-prediction-error dereverberation with varying source priors for reverberant speech recognition." 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WSPAA). IEEE, 2019.

[3] Tang, Xinyu, et al. "A Time-Varying Forgetting Factor-Based QRRLS Algorithm for Multichannel Speech Dereverberation." 2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 2020.

[4] Schwarz, Andreas. Dereverberation and Robust Speech Recognition Using Spatial Coherence Models. Diss. Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 2019.

about the author

Zhang Long, working in NetEase Yunxin Audio and Video Laboratory, is currently engaged in research and development of audio signal enhancement and dynamic gain control.


网易数智
619 声望140 粉丝

欢迎关注网易云信 GitHub: