Immersive experience NetEase Yunxin Online KTV

Online KTV Status

In recent years, a variety of products such as Sing Duck, Whale Ming, Hui Sen, and Komori Sing have appeared, and incentives and high investment are also common. All platforms have made great efforts to seize the user market. , data graphs and the red K-line across the board have also brought considerable returns to the investment of investors. According to the relevant data released by Bida Consulting, a third-party data research institution at the end of April 2021, the number of active users of the national K song in March led the industry by a large margin, with the number of monthly active users reaching 135.438 million, and singing it ranked second. , the monthly active users reached 30.373 million. In addition, Kuwo K Song, Tian Lai K Song and Sing Duck ranked 3-5 respectively, with more than 3 million monthly active users.

There are endless ways to play online K songs, but there are few platforms to try the scene of online chorus. This is mainly because the current technical capabilities of various manufacturers are still difficult to bring the ultimate experience to users in the chorus scene.

Immersive Difficulties and Chorus Dilemmas

In order to explore the scene of online chorus, we used the simplest RTC service to build a set of uncustomized chorus demos to test the effect. A brief description of the system is shown in the figure below.

Singer A plays the accompaniment independently, and the RTC engine mixes the dry sound with the accompaniment and sends it to singer B and the audience. At this time, singer B hears A's singing and accompaniment, and sings together in rhythm. But there are several obvious issues with this process that need to be addressed:

The communication delay between singers is too large, and A hears B's voice much slower than his own music
The audience hears A's voice on the accompaniment beat, but B's voice is slower than the accompaniment
Lyrics cannot be delivered in sync
Neither singers nor non-singers have the immersion of being in the KTV box
The voices of users participating in the chorus will be interrupted by each other

Finding countermeasures

In order to improve this scene, we analyze the problems encountered in the above test one by one, and solve them in our real-time chorus program. The specific content is shown in the following table:

NetEase Yunxin's real-time chorus solution

the whole frame

Reduce latency to the limit

Yunxin's WE-CAN transmission network optimizes transmission delay.

For users who are close to each other, We-Can's 2.0 scheduling system can perform intelligent scheduling to ensure that users can reach the same machine, computer room or the computer room as close as possible, so that the flow does not need to go through the public network at all, and can be completely solved in the computer room. The ultimate low-latency effect.

For different operators, through intelligent routing selection, the network stability can reach 99.99%, and the delay is less than 75ms.

On the end, a lot of optimizations have been made to the entire audio pipeline, which reduces the average acquisition, playback and signal processing delay to 52ms.
Under a specific network, these works can guarantee the lowest communication delay under the current network conditions, due to the objective existence of communication delay. We need to use other solutions to reduce the impact of delay on the chorus experience.

Reduce the impact of latency on the experience by precisely syncing accompaniments

In order to solve the problem that the voices heard by both singers are too delayed, we have implemented the NTP correction value based on the precise synchronization of the server to provide an accurate timestamp for the entire business system. When the chorus starts, the two ends participating in the chorus start playing the accompaniment accurately (with a synchronization accuracy of 10ms), and both parties sing along with the accompaniment.

Synchronization test results of multiple accompaniments when there is no weak network:

When joining a weak net:

Different from the traditional chorus scheme, Yunxin proposed that the chorus scheme should be based on the accompaniment. Due to the objective existence of transmission delay, this is inevitable. However, through precise accompaniment start synchronization, the delay is minimized, which is practical and feasible for the experience upgrade of both parties. To this end, we did a subjective test based on the existing solution (the end-to-end delay of the audio full link is about 100ms in a normal network):

In the presence of a large number of weak networks, the user experience can still be guaranteed.

Viewer's synchronization module

Due to the different network conditions of the multiple users participating in the chorus, the delays of their audio data reaching different audiences are also different. Yunxin innovatively used relative timestamps to actively align the dry voices of multiple chorus players. This allows the audience in the channel and those who are not participating in the chorus to hear the same voice as the lead singers, fully reproducing the KTV box scene.

The chorus side marks its own dry voice with the relative timestamp of the accompaniment and sends it along with the audio stream package. The audience side uses a customized buffer algorithm and an active alignment strategy to strictly align the singer's voice with the accompaniment. This solution can ensure that the audience can still restore the vocal and rhythm of the lead singer when there are various complex network conditions between the lead singer and the audience. By modifying the fast and slow playback strategy, the alignment time of multiple audio streams is accelerated, the convergence time is reduced, and the accompaniment/dry alignment convergence can be achieved in the intro part of the accompaniment. At the same time, this mechanism can also support the video function, which can ensure the alignment of the KTV video and the lead singer, and perfectly present the KTV box experience.

An accompaniment streaming solution that reduces network burden

In practical use of KTV App, especially when users select songs, they usually need to download the accompaniment and lyrics information at the same time, and the audience also needs to download the corresponding accompaniment file, which is very unfriendly to the audience who join the channel in the middle. In order to make the audience experience of the chorus better, we have designed the accompaniment auxiliary stream plan. That is, the accompaniment is sent through the audio auxiliary stream with a special relative time stamp. When the receiving end (audience or non-chorus anchor) receives the data, it enters the synchronization module in combination with the time stamp of the dry sound for precise synchronization.

Playback offset estimation

In the harmony part of the chorus, the other party's dry voice sometimes affects its own rhythm, which may "go off track" or sing the wrong rhythm. For this reason, we have optimized the volume at both ends of the chorus, so that it can be heard clearly but not rhythmically. affect the effect of singing.

In the KTV scene, the user always starts singing after hearing the accompaniment, which will introduce a delay in synthesizing the singing effect, and different devices have different playback delays. Through the combination of innovative signal loopback algorithm and hardware buffer calculation, Yunxin verifies each other and ensures the accuracy of delay calculation. At the same time, we have also carried out a large number of model adaptations.

future outlook

Through the work presented in this paper, we basically address various experiential problems in chorus. In the future, NetEase Yunxin will also continue to deepen technology, continue to pay attention to industry trends, provide complete solutions and best examples, and help customers improve quality and efficiency with technological innovation.

Immersive experience NetEase Yunxin Online KTV

Online KTV Status

Immersive Difficulties and Chorus Dilemmas

Finding countermeasures

NetEase Yunxin's real-time chorus solution

the whole frame

Reduce latency to the limit

Reduce the impact of latency on the experience by precisely syncing accompaniments

Viewer's synchronization module

An accompaniment streaming solution that reduces network burden

Playback offset estimation

future outlook

网易数智

引用和评论

InfoQ官媒报道|网易云信裴明明：云原生架构下中间件联邦高可用架构实践

2025版 RTC、直播、点播技术对比｜腾讯云/即构/声网如何选型

三分钟掌握音视频处理 | 在 Rust 中优雅地集成 FFmpeg

三分钟掌握视频分辨率修改 | 在 Rust 中优雅地使用 FFmpeg

CVPR 2025 | 火山引擎获得NTIRE 视频质量评价挑战赛全球第一

三分钟掌握音视频信息查询 | 在 Rust 中优雅地集成 FFmpeg

【harmonyOS NEXT 下的前端开发者】WAV音频编码实现