Extreme real-time video communication under weak network

弱网下的极限实时视频通信

1. What is the extreme video communication under a weak network?

The so-called weak network environment means that the network is not very good, such as wireless wifi, cross-layer network routing, or excessive network load, etc., so that data will be lost during transmission.

Generally speaking, when we have network congestion, we will spend a long time in this network before, through a forward check code fec, or we through arq, there are still countless researchers doing this hybrid arq plus fec. But from the point of view of the video signal, another method needs to be found. It is when your network is time lag, for example, if the network access environment is below 50k and 5kbps or even lower, then the network cannot be effectively transmitted.

2. The architecture design and advantages of Xtreme Communication

The advantage is that when we extend these empirical-based models to a similar data-driven , we can make our performance equivalent to the user’s perception, and the perception of video communication can be increased by 12% to 100%. About 20%, in fact, starting from AlphaGo, network flow control based on reinforcement learning is used in encoding and transmission, and the video is optimized in the form of adaptive detection and adjustment bandwidth feedback, in which IP-based packets replace the network Occupying a dominant position in the video transmission network makes the end-to-end network throughput and other states have high time variability, and because of network resource competition between different users, these network states become stable with their movements.

After listening to the end-to-end transmission process and the method of introducing reinforcement learning, I think we can comprehensively think about the end-to-end process of video encoding and network transmission. After decoding and playing at the receiving end, it will generate a new state. At the same time, the disciplinary action of the decision is fed back to the agent. Based on this sub-signal, the agent continuously updates the neural network parameters with the maximum accumulation of sub-indices. In the end, it is actually possible to adaptively adjust video coding parameters only by observing and learning the original state of coding, network, and playback. This method is a completely new design idea.

Three, intelligent video coding

For videos with a large amount of data, the use of compression or encoding must have a great effect, so the real video encoding can reduce this data because of Moore's Law. The victory of Moore's Law is because Moore's Law is here. Every eighteen months, we call it silicon, which is equivalent to an improvement in performance, so that we can use many algorithms that we couldn't design before, and some advanced algorithms that we couldn't use.

Image Video Coding Technology Standard

From this point of view, then we will think further down that there will be some bottlenecks. This bottleneck came from 2015, and everyone found that as our process became more and more advanced. For example, when it comes to three nanometers, five nanometers, and then with such a theory of relativity so far.

Even existence is the deformation of our speed of light. Then the speed of the electron is at most the speed of light. So how high can it be in this situation? Of course we can achieve this at the expense of our power consumption, but there will be such a price. So by 2015, some of the world's top scientists suggested that maybe this is a turning point here, and we need to have a new price to control. (You can go and listen to the video if you don’t understand it)

For example

Four, network adaptive transmission

In the video conference, the RTP channel cannot provide a good Qos guarantee for the video and audio data, which greatly affects the effect of the video conference in practical applications. As a way, let the network learn by itself. Some methods of reinforcement learning are combined, and then it should be around 2018. At that time, we were mainly faced with a real-time communication. So compared to vod or live stream, we will have some more challenges, we call this limit or the sound, for example, we can't have a big buffer like this. On the one hand, the delay on the user side should take into account the user’s network condition, and on the other hand, whether the user’s hardware system can support it. Some old models are prone to heat due to the large amount of CPU usage during decoding processing. Hot, causing the phone to freeze.

The delay in network transmission is slowly accumulated in these processes. Beauty needs time for synthesis processing, transmission takes a certain time, audio and video compression synthesis takes a certain time, video distribution also takes a certain time, etc. In the event of network speed and server problems, the delay may increase further. For example, the "high concurrency" problem that is more common in the development of a webcast system is better explained. (Under normal circumstances, the live broadcast platform can provide users with stable and smooth services. However, once it encounters special circumstances such as 618 and celebrity live broadcasts, the traffic enters at a hundred, thousand or even ten thousand times the usual scale, and the so-called high concurrency problem appears If the problem of concurrency is not considered during the development of the platform, it will cause the server to crash, cause the viewing failure, and affect the user experience of the live broadcast.

In short, after listening to teacher Ma’s explanation, I feel that I have a deeper understanding of extreme communication, and learned a lot of research progress in audio and video transmission at home and abroad, as well as key technology related knowledge such as coding. I also recommend that you have time to watch the video. .

Extreme real-time video communication under weak network

1. What is the extreme video communication under a weak network?

2. The architecture design and advantages of Xtreme Communication

Three, intelligent video coding

Four, network adaptive transmission

RTE开发者社区

引用和评论

ElevenLabs 新 TTS 模型支持音频标签；NotebookLM 前产品经理新项目曝光：将邮件日历新闻转为互动音频丨日报

三分钟掌握视频剪辑 | 在 Rust 中优雅地集成 FFmpeg

三分钟掌握音视频处理 | 在 Rust 中优雅地集成 FFmpeg

三分钟掌握视频分辨率修改 | 在 Rust 中优雅地使用 FFmpeg

CVPR 2025 | 火山引擎获得NTIRE 视频质量评价挑战赛全球第一

三分钟掌握音视频信息查询 | 在 Rust 中优雅地集成 FFmpeg

【harmonyOS NEXT 下的前端开发者】WAV音频编码实现