Today I would like to share with you the InfoQ platform open class-Extreme real-time video communication under weak network. On the extreme exploration of real-time video communication, the speaker is Professor Ma Zhan of Nanjing University.
1. Subject background
First of all, let’s talk about the background of the next topic. Usually the accuracy and timeliness of information received by network devices such as mobile phones and computers are related to real-time communication. Taking real-time video communication as an example, we cannot always guarantee the full-time stability of the network. The existence of the network environment will play an important role in improving transmission quality.
Quoting the official explanation is: the weak network environment exists for a long time, especially at many critical moments related to life, production and even life. The communication network is often restricted by extreme physical conditions, such as maritime operations, emergency disaster relief, and high-concurrency scenarios. Therefore, we need to explore new theories and new methods for effective analysis, accurate modeling, and accurate prediction, in order to realize the weak network extreme environment (such as extremely low bandwidth <50kbps, extremely unstable network jitter, extremely long delay, etc. ) High-quality real-time video communication.
Professor Ma first introduced his own research on the direction of video processing for about seventeen years. Currently, he is mainly doing two aspects of work. On the one hand, it is about information collection, on the other hand, it uses similar face recognition and traffic recognition. , Intelligent transportation and other technologies for video processing, such a reconstruction for people.
2. What is the extreme video communication under a weak network?
Introduce a weak network
weak network is different from the regular Internet. From the perspective of the current limit, the regular Internet is already quite good. For example, whether it is live broadcast or on-demand, no matter from the perspective of signal processing, video compression, or from the perspective of the network, network equipment has been able to meet high-definition ultra-high-definition, and even more. However, in situations such as large-scale mudslides, the base station cannot be used; if it is in maritime affairs, only communication satellites can be used. But we need real-time, timely, and accurate grasp of the online environment. At this time, it is very important to study an extreme video framework, that is, a weak network.
Third, the architecture design and advantages of Xtreme Communication
Three aspects
1. From the perspective of the most basic engineering design, it can truly move towards data movement .
Using the original method for data-driven, similar to Alpha Dog-Go, it uses reinforcement learning. Use reinforcement learning to control network bandwidth and control some of our complex parameters like video codecs. Correspondingly, these parameters and edit code parameters of these networks are all numbers. So if we design it empirically, there may always be a bottleneck in our hearts.
2. The second one is the empirical design , which goes further from data-driven to intelligent.
Professor Ma took a title here, called from Alpha Go to Alpha Zero . When it comes to designing Alpha Dog, he will make a simple start for many of these things, but to Alpha Zero he will start from the initial state according to his own model, and then slowly learn. Therefore, it is also proposed that for end-to-end video communication, online learning can be used to learn different states of the entire network interconnection. Then provide an up-to-date online learning model or decision to achieve personalized learning for a single user.
3. Use video center and data communication form. Combining the content of the video or the image, let the communication information itself be based on the user's understanding, or what we call the semantic level of such a content understanding, which can truly move from data to artificial intelligence. It is equivalent to perception, even if the video loses a frame or the image has some pixels lost, or even some large blocks are lost, it can be retrieved through some compensation methods.
Four, intelligent video coding
In terms of video signal processing, how do we use such a neural network video compression video coding process inspired by brain vision or such a lower bit rate signal processing?
Video compression is actually a process very similar to the previous pipeline structure. From the pixel to the encoding end, from the pixel to the placement stream, and from the binary stream to the pixel, it is actually an informationized process. Then we have some new theories and new methods under this informatization process that should be explored and should continue to be explored.
Which refers to two systems, from human angle of view, we then from the retina to this intermediate. It's called optical nerve. Then to such a lateral knee double layer, and finally to our brain, which we call the primary visual cortex. Then this is also the gradual extraction and perceptual understanding of information.
From another angle, it is proposed to use this biological vision or old vision to be inspired, using the most basic information flow to perform network imaging from the human eye-perceived 3D world. This is called for the pass way. To the middle is to absorb the bottom layer from the outside, and then pass through different cells to our primary cortex, and then to the aerial inside, and every part of it has a lot of such a function. . At present, in addition to theoretical exploration, what we call this stimulating experiment, there are many such anatomical experiments on primates. So it also proved from the side what kind of transmission process such information is.
Technical challenge-complexity
For some of the previous video image processing, one of the concerns is its complexity. Its complexity is also a very important link in the realization of chip design.
solution
A new method is proposed, which is whether we can combine this mode based on such a brain vision with the current traditional video compression. There are two main reasons for this, generally from the performance. In terms of performance, although our current image compression has exceeded the latest international standards. But when it comes to video chats, there is still a way to go, and at the same time, there should be billions of devices. Such a large number already exists. Therefore, the most effective method is whether we can use some simple transformations on these existing devices to inspire some obsolete data, so that they can be used in video processing.
Five, network adaptive transmission
Video bit rate adaptation based on reinforcement learning
Problem description and difficulties
The delay jitter of the network will cause real-time changes in the available bandwidth. The existing algorithms are mainly designed for VoD field optimization/heuristics. Future video information cannot be obtained in real-time scenes and large buffers are not tolerated
Solution ideas
1. Design an efficient and robust bit rate adaptive algorithm to predict the bandwidth and dynamically adjust the video encoding and sending bit rate
2. Real-time bit rate adaptive strategy system framework, automatically learn real-time bit rate adaptive algorithm through historical video streaming experience
Later, based on the advanced experience of learning internationalization, this was used in a real real-time system. Then used this real-time system to conduct a distributed learning on an any game on the Internet. So here we propose an adaptive time streaming that is offline. Collected a lot of such a network vertical, including Europe, as given by other laboratories, and then proposed a network feedback signal standard, which has undergone an evolution.
Video bit rate adaptive evolution based on reinforcement learning
There is a problem
1. Samples are limited during offline training
2. The simulated ring filling may not match the actual environment
3. Consider the performance loss caused by the generalization performance of the model model
Solution ideas
1. Network status clustering and classification
2. Video content service category and classification
3. Train offline models separately for network conditions and videos
4. Online model tuning further covers unconsidered environmental conditions
6. End-to-end extreme video communication demonstration platform
Two demos made. First of all, the first one is to do with any game is the current state of the whole, the perception of the network and such a cloud game.
DEMO-cloud game
The other is using this one called Cloud Bank, and in fact, such a desktop can be delivered back in the form of video.
DEMO-Remote Desktop
The above is all the content of this sharing of notes. The corresponding video content of this sharing can be viewed by clicking " here ".
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。