AliRTC opens the era of "zero computing" for video interaction

In the video cloud theme forum of "Industrial Video Innovation and Best Practices" at the 2021 Yunqi Conference, senior technical experts of Alibaba Cloud Intelligence released the Alibaba Cloud Video Cloud in the keynote speech of "AliRTC Opens the Era of "Zero Processing" in Video Interaction" The next-generation real-time interactive solution—RTC "zero processing". At the same time, it shared the exploration and practice of Alibaba Cloud Video Cloud in RTC products. The following is the content of the speech.

1. Interactive evolution and challenges

What has changed in video interactive products in the past few years?

We believe that RTC products have brought about two very important changes to the industry.

The first change was 2014 and upgraded the interaction from graphics to audio and video.

In 2014, Internet entrepreneurs and RTC product suppliers explored the commercialization of video interaction. Education and entertainment have become the main breakthrough directions. Based on global interactive teaching, show video linking, and multi-person friendship interaction, most of them are also At this point in time, the successful combination of business and technology was completed.

2017 Nian is a landmark point in time, RTC products have helped customers achieve a head Internet subversive development , marking the interactive video technology and mature online interactive business models.

In the next few years, there will be more scale replication of different sizes and different scenes, so we can see that in 2018 and the following years, there are no new scenes and new interactive innovations in the market, but based on different content. , Business replication of different customer groups, video interaction from the head to more market segments.

The second important change occurred in 2020 years , affected by the epidemic, let cloud video conferencing full penetration, make this ahead of time of at least five years.

We cannot call this market change a technological revolution. In fact, there are no new demands for RTC products, nor new interactive scenarios and technologies. However, this large-scale penetration redefines the supplier’s For the first time in the market, cloud vendor become an extremely important part of the market, which has split the market from a single conference vendor to a cloud platform + conference terminal supplier, giving our customers more choices.

From 2018 to now, we have not had a fundamental breakthrough in the scene. Is it because our technology has encountered a bottleneck?

With such problems, Alibaba Cloud conducted an in-depth technical evaluation of the RTC scene technology. We tried to find out what the technical level of the industry is like. Unlike individual video technologies, the evaluation of RTC is more complicated.

For example, for video coding, we can analyze by PSNR, SSIM, VMAF, etc., for visual algorithms such as video classification, we can analyze by ROC curve, but for video RTC, it involves a lot of subjective feelings. For more complicated things, there is currently no unified evaluation standard in the industry.

We extracted six dimensions from these indicators that affect user experience to characterize the performance quality of RTC.

If you are interested in the evaluation, you can pay attention to our "Video Cloud Technology" public number , which details how we conduct automated evaluation. During the evaluation process, we will create different network environments to detect RTC performance in various aspects.

We have done some evaluations of RTC in the industry and found that there are two characteristics.

First, RTC has obvious technical thresholds. For example, the green box represents a type of typical RTC capability, which is self-developed by a smaller team with a small investment, and there will be a significant gap.

The second is a few relatively large suppliers, including Alibaba Cloud. In the outer circle, the red line, blue line, and yellow line are all at a relatively consistent level, but none of them There are particularly good areas, so the homogeneity of technology is particularly serious, and everyone is basically at the same level.

Our current real-time video interaction mainly focuses on online and offline scenarios, and there may be broader application scenarios in the future, such as some interactive scenarios, VR manipulation, and virtual reality.

At this time, we will think about a question. Has our technology reached a bottleneck period and we cannot meet the broader needs in the future. What is the reason behind this? Could it be that our technology has reached a certain bottleneck? Because technology is usually a step-by-step development, if you fail to break through, you will fall into one level.

2. "Zero processing" accelerates interactive upgrades

We want to analyze, what is the user experience now? What are the problems with our current technology?

By comparing various RTC suppliers, we found that one of the more interesting points is that it is difficult to eliminate the two-thousandths of the stall rate. Both 50% and 60% packet loss can be done very well, but if the network bandwidth is limited, the two-thousandths of the jam will be difficult to eliminate.

We have some means to solve similar problems, such as using narrow-band high-definition technology, we can solve these problems through complex calculations, or through non-standard screen coding technologies, but in fact, it is difficult for us to make extensive use of these technologies. .

The most fundamental reason is that we will find that has a limited end-to-side capability of . Everyone’s mobile phones are different. It is possible that some people’s mobile phones are particularly good and can do complex algorithms. Some people have poor phones and cannot perform complex algorithms. At the same time , The fragmentation of the terminal is more serious, and it is more difficult to adapt to all terminals.

In the application, we hope to provide more interesting interactions, such as real-time generation of cartoon characters, which can run on the end, but only a few very powerful devices can run.

is, can we break through the current application architecture?

We have gradually transformed an architecture that completely relies on end-to-end capabilities into an architecture that relies on cloud and end-to-end cooperation for video transmission processing. Based on this idea, we propose cloud processing + end rendering technology, the purpose is to provide a powerful cloud Processing power, the end is responsible for rendering, only a small amount of processing power is needed to complete a better processing effect, so that everyone can get the same experience on different mobile phones.

This is "zero processing" solution . On the end, only relatively simple video capture and video transmission are required, and then reach the cloud through the GRTN network that we built to cover the world, and the cloud uses GRTP cloud. The real-time processing engine processes the video, and then transmits the processed video to the end, and the end only needs to do a simple presentation. This can solve the problem of insufficient computing power and fragmentation just mentioned.

But there is no free lunch in the world. With the above architecture, it is easy to find several problems.

The first is whether our cloud can withstand such a large-scale processing.

Second, can the cloud bear such a large-scale cost.

Third, can the cloud continue to provide so many types of processing services.

Our own confidence comes from several sources.

First, through the accumulation of Alibaba for many years, we have accumulated largest cloud-based video processing cluster , so we are technically able to undertake ultra-large-scale processing.

Second, about the cost.

The figure below is an example of a business graph we processed. The abscissa is time and the ordinate is resource usage. The black line is one kind of business, and the red line is another kind of business. You can see that every business exists. a large number of idle business, business idle period allows us to have a lot of resources for us to reuse, when we a variety of business mix ran time, will be able to use up resources, significantly reduce costs.

In addition to mixed running in time, we can also reduce the overall cost through mixed running in space and heterogeneous running.

Third, because we rely on the Alibaba Group, including ourselves, we have accumulated a lot of video algorithm processing, so we have the opportunity to continue to provide a wealth of algorithms and processing capabilities.

3. "Zero Processing" Practice Sharing

Next is the practice of Alibaba Cloud Video Cloud in zero processing.

The first scenario is to use MCU to liberate the end-side computing power of .

Under normal circumstances, when we do RTC live broadcast, the live broadcast image that the audience sees is completed through the RTMP protocol. In this case, the audience cannot participate in the live broadcast interaction due to the delay. To enhance the interactive nature of the audience, everyone needs to join the RTC network, and each end subscribes to multiple streams. The computing power and network traffic of the other end are a very heavy burden.

We merge the streams through the cloud MCU and re-enter the RTC conference, so that the audience can see the live stream through RTC, which is very convenient for interaction and does not need to consume too many end-end resources. This mode is called the interactive low-latency mode, and it is already a mature product capability of ours.

In the second scene, cloud .

This is an example of us getting through Alibaba’s internal service capabilities. Through cooperation with the Alibaba Group’s Security Department, we have connected the flow of RTC through the intranet and the products of the Security Department, reducing intermediate links and realizing low-cost and low-latency content review. .

The third scene, cloud special effects .

I believe everyone has seen this scene. Using cloud processing, we have implemented a virtual meeting room. Through the MCU in the cloud, everyone can cut out and paste to improve the experience of participating in the video conference. Techniques that can be applied and seen.

The real-time avatar shown above relies on the GRTN real-time transmission network to transmit the video stream to the cloud, and the cloud performs complex AI processing such as cutout, voice change, and cartoonization of the video. The terminal is only responsible for display, thus achieving end-side zero processing. .

"zeros" as the next generation of real-time interactive solutions , pioneered in the cloud vendors to solve the problem of virtual interaction scenes interactive new era because of side force is limited and can not be considered to achieve the full use of one of the super-cloud Refined computing power and building real-time virtual scenes with cloud special effects are an important evolution to fully open a new world of immersive interaction.

AliRTC series content

Cloud RTC QoS Screen Sharing Weak Network Optimization Several Encoder Related Optimizations

Cloud RTC QoS weak network confrontation variable resolution coding

Cloud RTC QoS weak network confrontation LTR and its hardware decoding support

"Video Cloud Technology" Your most noteworthy audio and video technology public account, pushes practical technical articles from the front line of Alibaba Cloud every week, and exchanges and exchanges with first-class engineers in the audio and video field. The official account backstage reply [Technology] You can join the Alibaba Cloud Video Cloud Product Technology Exchange Group, discuss audio and video technologies with industry leaders, and get more industry latest information.

AliRTC opens the era of "zero computing" for video interaction

1. Interactive evolution and challenges

2. "Zero processing" accelerates interactive upgrades

3. "Zero Processing" Practice Sharing

AliRTC series content

CloudImagine

引用和评论

阿里云 ESA 游戏行业解决方案｜安全防护、加速、低延时的技术融合

Light创造营 2025 评选规则

全网首发 | PAI Model Gallery一键部署阶跃星辰Step-Video-T2V、Step-Audio-Chat模型

无需编码5分钟免费部署云上调用满血版DeepSeek

支付宝H5下载被拦截的原因排查与解决指南

如何在通义灵码里用上DeepSeek-V3 和 DeepSeek-R1 满血版671B模型？

数据库的下一场革命：S3 延迟已降至原先的 10%，云数据库架构该进化了