Talking about Voice Quality Assurance: How to test the audio quality in RTC?

In daily audio and video meetings, we will encounter these scenes more or less: "Hello, can you hear me? I hear your voice intermittently", "Hey, how can I hear the echo?", "Too It's noisy, I can't hear what you are saying" Wait. These voice quality problems affect the experience of audio and video meetings. If it is an important meeting, it is enough to make people "annoyed into anger". So how to effectively reduce the occurrence of these problems? This series of articles will share the test experience of Alibaba Cloud Video Cloud in ensuring RTC voice quality.

Author | Ke Huai
Review｜Taiyi

Background introduction

Audio quality refers to auditory quality and audio 3A algorithm quality under normal network. Auditory quality is the subjective feeling of the human ear on the pros and cons of voice in the case of lossless network. However, in real life, different people may have different judgments on the quality of the same sound, and it will also be affected by the listening environment and listening psychology. In the test, we can start from the three elements of sound: loudness, pitch, timbre latitude, and quantitatively evaluate some indicators. In addition, industry standards will pass these quantitative indicators through a certain weighting process in order to expect to fit subjective feelings, such as POLQA, PESQ, and so on.

The audio 3A algorithm refers to:

AGC: Automatic gain control

ANS: Adaptive noise suppression

AEC: Acoustic echo cancellation

There are many articles in the public account of this part that introduce the principle and implementation in detail, so I won't repeat them here.

Detailed explanation of the high sound quality and low latency behind WebRTC-AGC (Automatic Gain Control)

Hard Goods Column｜WebRTC AEC (Acoustic Echo Cancellation)

This series of articles will audio quality, adaptation test, Qos quality, and automation solution 16156b1fd9bfd0. This article first introduces the audio quality part (auditory quality and audio 3A under normal network) Algorithm quality).

RTC voice test link disassembly

Before the formal test, we first understand the entire link frame diagram of RTC voice transmission. The sound is collected by the microphone, and then the upstream audio algorithm is pre-processed, and the codec is transmitted and played out through the speaker. If you want to test the upstream audio algorithm, you can input the sound at (1), and then pull the output audio at (2) for analysis. When testing the system, we often evaluate it from an end-to-end perspective, that is, input the sound from (1) and then pull the sound for analysis in (4). The subsequent test methods in this article are all based on end-to-end.

Audio quality test program

Alibaba Cloud Video Cloud uses a combination of objective indicators + subjective evaluation commonly used in the industry to ensure audio quality. For specific indicators, please refer to the following figure:

Objective test method

Effective bandwidth

Line in input sweep file + 48K sampling rate human voice audio (audio material reference is as follows), Line out record output audio, read the effective bandwidth through frequency analysis;

End-to-end delay

Method 1: Use the VQT test, and output the delay time in the test result.

Method 2: Self-study. Line in test material, Line out record without transmission and output audio, calculate audio delay time.

Test material: a continuous single tone.
Index calculation: The starting time of reading the audio that has not been transmitted in the recording file is recorded as t1, and the starting time of reading the audio that has been transmitted through the conference is recorded as t2, then Delay=t2-t1.

ANS

Investigate the performance of the ANS algorithm in pure noise and speech noise mixed scenarios. The analysis indicators include: noise reduction consistency, signal-to-noise ratio improvement, convergence time, and voice quality after noise reduction.

Test topology

Input the background material and voice material through the volume Line in or external speaker, and record the output audio at the streaming terminal Line out for indicator analysis.

Test material

Index calculation

Signal-to-noise ratio improvement: To obtain the signal-to-noise ratio of the audio after denoising is A, then the signal-to-noise ratio improvement value=A-input signal-to-noise ratio.
Noise reduction consistency: Calculate the residual value of the noise after various noise inputs, and count whether the noise residual under various noises is consistent.
Convergence time: record the time when the noise energy begins to fall as t1, record the initial time t2 when the noise has converged to a plateau, and the convergence time=t2-t1.
Sound quality: Modify the VQT POLQA test script to calculate the output audio MOS score under different signal-to-noise ratio inputs. The following table shows that the input signal-to-noise ratio is 10dB with noisy human voice, and the output audio quality is MOS:

AGC

Investigate the performance of the AGC algorithm under different volume levels. The analysis indicators include: sound stability and output loudness.

Test topology

Refer to the ANS test topology diagram, input the voice material through the volume Line in or put it out, and record the output audio at the streaming end Line out for indicator analysis.

Test material

Index calculation

Sound stability: Calculate the average RMS of each volume segment of the output audio, and then solve the variance of the average RMS of the output audio. The following is the calculation formula of the average RMS:

Output loudness: Line out method calculates the average RMS of the output audio; the external amplifier method uses a standard sound pressure meter, and the loudness value is recorded in the A weighting method.
Sound quality: Transform the VQT POLQA test script to calculate the output audio MOS score under different volume inputs. The following table shows the output audio sound quality MOS score under high, medium and low volume input:

AEC

Investigate whether there are echo leakage and vocal suppression problems in the single-talk and dual-talk scenarios of the AEC algorithm.

Test topology

【Single Talk】

The streaming end plays single-talk voice materials, and the streaming end is configured in an open meeting room by default. Line out Record the output of the streaming end, and judge whether there is a leakage echo at the streaming end.

【Double Lecture】

At the same time, the dual-talk test material is played to the streaming end and the streaming end, and Line out records the output of the streaming end to determine whether there is echo leakage and vocal suppression at the streaming end.

Test material

Index calculation

Echo leakage: Read the residual amount of human voice in the recorded audio file. Theoretically, the value is 0-there is no echo leakage.
Vocal suppression: evaluate this indicator in a dual-talk scenario. The 3gpp TS 26.132 standard was used to evaluate the shear condition. The final evaluation was based on the D type (continuous shear greater than 150ms). The closer the value is to 0, the better the quality.
Convergence time: The start time of the test is recorded as t1, the time when the AEC convergence is completed and the appearance of no leakage echo is recorded as t2, and the convergence time = t2-t1.
Human voice quality: evaluate this indicator in a dual-talk scenario. Modify the VQT POLQA test script to calculate the sound quality score of the human voice in the dual-talk scene.

STOI

Short-term objective intelligibility, current academically accurate, reliable objective evaluation method to calculate speech intelligibility, objective test results can reflect the intelligibility and naturalness of speech to a certain extent. There are limitations: need to downsample to 16K for calculation.

Test topology: Refer to ANS test topology.
Test material: ITU-P863 provides standard human voice material.
Index calculation: The following frame diagram shows the STOI calculation process. Currently, there are already matlab and python engineering implementations of this algorithm in the industry.

POLQA

ITU-T P.863 provides test methods to get MOS points and audio delay. Support 8K, 16K, 48K test, the limitation is that the equipment is expensive.

Test topology: Refer to ANS test topology.
Test material: ITU-P863 provides standard human voice material & VQT built-in voice test material.
Indicator calculation: POLQA MOS points.

PESQ

ITU-T P.862 provides a test method that can get MOS points. The limitation is that it can only support 8K and 16K.

Test topology: Refer to ANS test topology.
Test method: Test material: ITU-P863 provides standard human voice material.
Index calculation: PESQ MOS points

Subjective testing methods

Adopt the scoring rules and dimensions mentioned in "YD/T 2309 Audio Quality Subjective Test Method (ITU-R BS.1284)" to conduct scoring tests for experts and ordinary users in different scenarios.

Scoring method

Evaluation dimension

testing scenarios

The test materials used "Hvi Audition Disc" and "TUT-acoustic-scenes-2017-development".

This article is the first RTC audio test series. We will introduce how Alibaba Cloud Video Cloud guarantees RTC voice quality from the dimensions of adaptation testing, Qos quality, and automation solutions. Welcome to the public account "Video Cloud Technology".

"Video Cloud Technology" Your most noteworthy audio and video technology public account, pushes practical technical articles from the front line of Alibaba Cloud every week, and exchanges and exchanges with first-class engineers in the audio and video field. The official account backstage reply [Technology] You can join the Alibaba Cloud Video Cloud Product Technology Exchange Group, discuss audio and video technologies with industry leaders, and get more industry latest information.

Talking about Voice Quality Assurance: How to test the audio quality in RTC?

Background introduction

Previous articles

RTC voice test link disassembly

Audio quality test program

Objective test method

Effective bandwidth

End-to-end delay

ANS

Test topology

Test material

Index calculation

AGC

Test topology

Test material

Index calculation

AEC

Test topology

Test material

Index calculation

STOI

POLQA

PESQ

Subjective testing methods

Scoring method

Evaluation dimension

testing scenarios

CloudImagine

引用和评论

阿里云 ESA 游戏行业解决方案｜安全防护、加速、低延时的技术融合

支付宝H5下载被拦截的原因排查与解决指南

PAI Model Gallery 支持云上一键部署 Qwen3 全尺寸模型

2025年3月中国数据库排行榜：PolarDB夺魁傲群雄，GoldenDB晋位入三强

2025年4月中国数据库流行度排行榜：OB高分复登顶，崖山稳驭撼十强

三分钟掌握音视频处理 | 在 Rust 中优雅地集成 FFmpeg

StarRocks + Paimon 在阿里集团 Lakehouse 的探索与实践