头图

On April 20, Agora announced a comprehensive upgrade of its real-time chorus technology solution, helping the well-known domestic mini KTV brand "Mida" realize the first domestic complete real-time chorus solution that supports multi-terminal, multi-person chorus and high sound quality. Ended the domestic K song industry's long-term exploration of the "real-time chorus" scene, but has not been able to go online.

Before introducing the complete real-time chorus solution of Shengwang, let's review the two common online chorus playing methods in the online K song industry, and what technical difficulties are faced by the real "real-time chorus"?

Users who have experienced online KTV chorus play know that almost all online choruses are realized by recording chorus and single-way chorus. Take lead singer A and user B as examples:

Recording Chorus: lead singer A sings according to the accompaniment-click upload after completion-user B selects the accompaniment with singing voice A and sings again-complete the chorus indirectly after the recording is completed.

single-way chorus: lead singer A initiates a chorus-the accompaniment is sent to lead singer A-the singing + accompaniment of lead singer A is sent to user B-user B joins in and sings together.

图片

The second solution, which seems to be real-time, is not a chorus in terms of experience. The reason is that the user B and the audience can hear the singing of the lead singer A, but the lead singer A cannot hear the singing of the lead singer B. In addition, if there is a problem with lead singer A, user B cannot continue. This solution does not support chorus of more than two people.

And the real "real-time chorus" we want should be like copying the chorus scene of an offline karaoke room to the online. Both parties can sing together after hearing the accompaniment at the same time, and each other can hear each other's voice in real time.

Real-time chorus faces two major technical difficulties: chorus synchronization and high sound quality

As early as 2018, put forward the technical concept of , but it has not been integrated and launched due to the insufficient maturity of the overall network infrastructure. Since then, Shengwang has carried out long-term technical polishing for the real-time chorus scene, and Repeated integration tests with "Mida" and many domestic online K song platforms and smart TV manufacturers, and finally launched a complete real-time chorus solution with high maturity, ultra-low latency, landable, and complete. In the process of honing together with customers, Shengwang summarized two core technical difficulties in this scenario:

01 Chorus sync

The synchronization here refers to the synchronization between the singing and accompaniment of two users. Let's first assume that the two singing users are both professional, and the problem of inaccurate rhythm does not exist at all. As described in the above scenario, since the accompaniment is sent to two users at the same time, the key is whether the singing of the two can be synchronized. The main factor affecting chorus synchronization is delay.

Without considering the accompaniment, suppose the end-to-end delay between users A and B is 100ms. From the sound transmission process, the following situation will occur:

  • A sings first, and B hears A's singing. At this time, a 100ms delay occurs;
  • B starts to join the chorus after hearing A's singing, and the singing reaches the A end. At this time, another 100ms delay occurs; then A will always delay 200ms when hearing B’s singing;
  • If you sing a word in 200-300ms in an online KTV, the user's listening experience will be at least half a word slower, and there will be a sense of dislocation.

02 High-quality sound of real-time chorus

People who sing have a common psychological need, that is, they want others to praise themselves for singing well. Sound quality is particularly important in chorus scenes. The factors that affect real-time chorus sound quality mainly include: audio sampling rate, bit rate, and delay.

  • Sampling rate: It is the number of samples extracted from a continuous signal per second to form a discrete signal. The higher the sampling rate, the closer the audio sounds to the real sound.
  • Bit rate: It refers to the amount of data (bits) that the encoded (compressed) audio data represents per second. The higher the bit rate, the greater the amount of information in each sample, the more accurate the description of this sample, and the better the sound quality.

Assuming that the network state is stable, the higher the sampling rate and the higher the bit rate, the better the sound quality, but the larger the amount of information corresponding to a single sample, the transmission time may be relatively longer. In other words, high sound quality may also affect the delay.

In addition, real-time chorus will also face delays in the audio transmission process, adaptation and compatibility with a series of hardware devices such as mobile phones, TVs, and KTVs. Multiple technical difficulties make many people want to launch real-time chorus. K song platforms, social platforms, TV manufacturers, etc. are very "headache".

Soundnet released the industry's first complete real-time chorus solution

In response to these technical difficulties, Shengwang launched the industry's first complete real-time chorus solution, which not only effectively solves a series of problems such as ultra-low delay of chorus, synchronization of accompaniment, flexibility of chorus number, high sound quality guarantee, etc., Shengwang also provides 50ms A series of functions such as ultra-low delay ear return, lyrics synchronization, singing bel canto, and sound wave spectrum form a set of high-integrity real-time chorus solutions in the industry.

sound network real-time chorus solution is roughly as follows:

  • The vocal end and each chorus end obtain BGM from the local at the same time, and start singing along with the accompaniment at the same time;
  • Through SD-RTN™ transmission and scheduling, the lead singer and chorus can hear other people's singing in real time and achieve a chorus. At the same time, the audience can enjoy the "zero delay" chorus effect of the singers.

图片

Picture: Acoustic network real-time chorus technical solution architecture

In the real-time chorus program of Shengwang, the core advantages of six dimensions are formed around ultra-low latency processing, high sound quality experience, precise synchronization of accompaniment, and flexible number of choruses.

achieves end-to-end 64ms ultra-low latency

In the real-time chorus solution of Shengwang, the lead singer and chorus end hear the accompaniment at the same time, and all parties can hear the voices of other singers in real time, eliminating the delay before singing. The problem to be solved by the chorus delay is to reduce the end-to-end delay of each singing voice transmission to the other party.

In response to the audio delay in the transmission process, the sound network found in the process of technical polishing that the delay in the real-time chorus scene is not as low as possible. The blind pursuit of lower delay may "sacrifice" sound quality, etc. The quality of other links. Through our long-term practice, it is a perfect value for the delay of real-time chorus to reach 50ms, but these technical difficulties need to be overcome to achieve 50ms:

图片

01 Audio delay at the acquisition and playback ends

The delay on the device side includes the delay caused by the collection, pre-processing, and encoding of the acquisition side, the reception, decoding, and post-processing of the playback side, and the end network delay generated by both ends after encoding and before decoding.

The delay on the end is mainly related to the hardware performance, the encoding and decoding algorithms used, and the amount of audio and video data. The delay on the end of the device can reach 30~200ms, or even higher.

02 Delay of network transmission

In the real-time chorus scene, in order to solve the problem of poor network and network jitter, it is necessary to add buffering strategies on the collection equipment side, server, and playback side. Once the buffer strategy is triggered, there will be a delay. If there are too many stalls, the delay will slowly accumulate. To solve the stuck and accumulated delay, it is necessary to optimize the entire network condition.

Wang Qi, head of the social pan-entertainment product of Shengwang, said, “50ms end-to-end delay is the goal we have been pursuing. At present, the real-time chorus solution of Shengwang can achieve 64ms end-to-end delay. The core behind it is The sound network optimizes a series of technical difficulties such as audio delay on the device side, weak network transmission, and audio engine consumption delay. In the real-time chorus scene, the 64ms delay is infinitely close to the optimal ideal state of 50ms. In the case of delay, the user's real-time chorus experience is almost moved from offline to online, and the scene experience achieves high availability, allowing users to truly achieve real-time chorus with high stability, high sound quality, and ultra-low latency Experience."

high-quality singing experience + real-time

In real-time chorus, the sound network solution can also provide a high-quality singing experience. Through the industry’s leading voice engine, the sound network realizes the expansion of low-bit rate narrowband voice to high-quality stereo music, and supports sampling from 8kHz (narrowband) to 48kHz (full frequency band). The sound network also has the industry-leading 3A algorithm. Effectively eliminate all kinds of noise without damaging the sound quality.

In addition, the sound network Agora pioneered the real-time Bel Canto function. On the basis of the original low latency and high sound quality, the link-type multi-module joint algorithm framework is adopted for the singing scene to adjust the different dimensions of the human voice such as pitch, timbre, rhythm, rhythm, space, atmosphere and even artistic type. It makes the singing sound more beautiful and more in line with the accompaniment, while retaining the original characteristics of the singer's voice.

supports multiplayer real-time chorus

In the single-pass chorus solution, only two people can be supported. In the real-time chorus solution of Shengwang, each chorus end is independent of each other and does not affect each other. The number of choruses can support more than two people. At the same time, if there is a problem on one side of the chorus, it will not affect the experience and effects of the other chorus side and the audience side.

Accompaniment precise synchronization

In order to achieve the best real-time chorus effect, each end can sing together under the accompaniment of their own BGM. After the lead singer initiates a request to play the BGM, we will let the host side wait for a delay with the chorus side to achieve precise synchronization of the accompaniment of all parties.

50ms ultra-low latency

Whether it is live singing or online karaoke, the low-latency ear-back function plays a key role in the singing experience. It can help users hear the sound collected by the microphone and the accompaniment played through the headset in real time to determine whether their voice is It's not a sound drift, this requires a particularly high delay.

In this regard, the Agora SDK provides a low-latency karaoke ear-back function with a unified interface. Through in-depth technical cooperation with mobile phone manufacturers, it can provide karaoke and live streaming apps with ear-back applications suitable for different mobile phone brands and different mobile phone models. , We reduced the 100-300ms delay of traditional ear return to less than 50ms, combined with the overall real-time chorus solution, to achieve an ear return experience with ultra-low delay, ultra-low noise, and extreme sound effects, and comprehensively enhance the experience of karaoke.

Lyrics synchronization + sound wave spectrum

Lyric synchronization can realize the synchronous display of audio and lyrics on the playback end, and the audio and lyrics are aligned word by word, and enterprise developers do not need to perform additional synchronization processing. The sound wave spectrum can help singers adjust their pitch in time, and the system can also score the singers based on the completion of the sound wave spectrum. Professional online K song applications already have mature lyrics synchronization, sound wave spectrum and other functions, but for start-ups or developers who want to add online KTV functions to existing applications, the real-time chorus solution of Soundnet comes with Features such as lyrics synchronization and sound wave frequency spectrum will help developers save development costs and ensure experience.

Real-time chorus covering online and offline brings multiple values to enterprises

online social real-time chorus, users, revenue double growth: 1. User refresh, increase activity and retention: Real-time chorus, as the latest online KTV gameplay, can undoubtedly serve as a new gameplay selling point for products, bringing more ideas New users to experience. At the same time, the innovative gameplay of real-time chorus solves the pain points of traditional recording chorus and single-channel chorus, and will also inspire many old users who like chorus to actively participate, increasing user activity and retention. 2. Bring more revenue space: Online K song and social platforms can also explore more commercial value and increase revenue space based on the function of real-time chorus.

smart TV K song chorus, entertainment interactive upgrade: after technical polishing with TV manufacturers, the real-time chorus solution of Shengwang also supports the TV terminal, users can perform online real-time chorus with friends through the big screen of smart TV at home. For TV manufacturers, the addition of real-time chorus also enriches the entertainment and interactive gameplay of smart TVs.

offline KTV chorus in different places, breaking space constraints: real-time chorus can also facilitate the interoperability of offline traditional KTV or shopping mall mini KTV, so that friends in multiple places can sing together in different places offline and improve consumer K-song entertainment Experience, promote the innovative development of offline KTV entertainment.

图片

The person in charge of "Mida" technology said: The real-time chorus technology solution that can be implemented has been blank in the industry before. With its deep technical background and insight into innovative scenarios, Shengwang has polished the industry's first complete solution together with Mida. Real-time chorus solutions, low distortion and ultra-low latency technical guarantees will bring users the best real-time audio experience. Mida will be the first to launch the new real-time chorus game in the offline mini KTVs across the country. Real-time chorus will be the future The online and offline K song industry brings new vitality.

Demo open source

Currently, Shengwang Real-time Chorus Demo on iOS has been real-time chorus Demo, click 1614da1ad3a9fe "Read the original text" leave your message. We will provide the download address of the real-time chorus demo.

In this article, we analyzed the technical difficulties faced by real-time chorus, and introduced the architecture and core technical advantages of the real-time chorus solution. For a series of technical difficulties such as packet loss, jitter, etc., you can click on the relevant reading below to refer to our previous series of audio technology articles.

Related Reading

Detailed explanation of low latency and high sound quality: Codec articles

Detailed explanation of low latency and high sound quality: echo cancellation and noise reduction

Detailed explanation of low latency and high sound quality: packet loss, jitter and last mile optimization

Detailed explanation of low latency and high sound quality: Sound beautification and spatial sound effects


RTE开发者社区
647 声望966 粉丝

RTE 开发者社区是聚焦实时互动领域的中立开发者社区。不止于纯粹的技术交流,我们相信开发者具备更加丰盈的个体价值。行业发展变革、开发者职涯发展、技术创业创新资源,我们将陪跑开发者,共享、共建、共成长。