This article is the content of the "Dev for Dev Column" series. The author is Xia Xia, the person in charge of the back-end transmission protocol of Shengwang.
In response to the new requirements and challenges brought by real-time interactive applications to network transmission, SoundNet developed its own private transport layer protocol Agora in 2019 by layering and decoupling application layer business requirements and transmission strategies in real-time interaction. Universal Transport (AUT) brings together various transmission control capabilities under heterogeneous networks, and will gradually be implemented in various services on a large scale from 2021 to 2022, using a set of transmission protocols/framework to solve the different problems of various services. transmission needs.
The relevant content is divided into two parts. In the previous content , "The Implementation of AUT Self-developed Transport Layer Protocol AUT" , we introduced the birth of the AUT transmission protocol and its application in real-time interactive business scenarios. This article abstracts the experience of AUT from the evolution and implementation of the protocol, and looks forward to AUT and future real-time interactive network transmission.
01 AUT self-research experience summary
It took 3 years from the research and development of AUT to the large-scale implementation. The design and implementation of the transmission protocol have the dual challenges of engineering and algorithm. At the same time, as a low-level and abstract technology, there are many aspects in the process of real-time interactive scenes. Specific problems, the process can be described as difficult. Here, we will do some review and arrangement for our past work, and give some suggestions to accumulate experience for the follow-up similar work.
1. Algorithm design and iterative ideas in transmission control
Due to the numerous and complex scenarios in the actual network, algorithm changes in one scenario may lead to incompatibility in other scenarios. At the same time, since the AUT serves multiple different services, it is necessary to avoid the impact between services. Therefore, in the algorithm design of AUT, our main idea is to clearly identify the newly discovered problems, and then solve them in a targeted manner to avoid affecting other scenarios; through a configurable way, using grayscale experiments and data-driven Analyze actual results to evaluate the merits of the algorithm design.
In the design of the algorithm, since there are many weak network confrontation modules in the AUT, if each module independently maintains its own state and logic, a lot of redundant codes will be generated. A typical scenario is that the data being analyzed by a certain module is also needed by many other modules, but the timing of triggering algorithmic decision-making is very different. After analyzing the problem, we decided to split the algorithm module into an objective data statistics module and a subjective decision processing module , and then trigger each different decision processing module to make algorithm decision by event-driven, so that the data statistics can achieve Reuse to the greatest extent, and the focus of each algorithm becomes clear and clear.
2. Only by adapting the application can the transmission effect be maximized
Before there was no AUT, the logic of the transmission and application layers was tightly coupled. Although the tight coupling has poor portability, it can achieve the best results in terms of the transmission effect of this business, because many business information can be easily shared. Intercommunication. From the beginning of the implementation of AUT, we have been thinking about how to not only separate the transport layer protocol and framework, but also optimize the transmission effect together with each application layer. In the end, the idea we explored is deep coupling in mechanism and enough abstraction in engineering.
First of all, starting from the business, we still need to keep the optimal processing method of a transmission mechanism unchanged from the results. Some prior knowledge and information cannot be ignored after layering: for example, an entire frame in video transmission needs to be processed together. If there is a piece of data missing in the middle of the frame, it cannot be processed; then after the independent transport layer is separated, the whole block of information cannot be lost, and it is necessary to decide how to transmit each packet in the block according to this information. Therefore, from the transmission mechanism, it is still necessary to use as much information provided by the application layer as possible in order to maximize the final result.
At the same time, the transport layer, which is independent in engineering, should not understand what the frame in the video is. At this time, it needs to be understood abstractly: the application layer is the frame of the video, and the transport layer is a whole block of data. What the transport layer completes is how to optimize the transmission of a whole block of data, so the transmission strategy for "block data" is independent - it can be used in many scenarios, video is just one of them, Others, such as transmitting encrypted certificates, pictures, and large pieces of information, we can use the same strategy, which is the benefit of abstract understanding in engineering implementation.
3. Use the transmission strategy according to the characteristics of the scene
How to use a set of frameworks to meet various needs in the company's various businesses is a big problem. To solve this problem, we have gradually explored the idea of scenario-based transmission: first, clarify our own capabilities, then analyze and extract typical usage scenarios (network scenarios + business requirements) as prior knowledge, and then use them for different usage scenarios. Targeted ability to complete transmission requirements.
For example typical scenarios include:
● Wireless transmission/wired transmission; generally speaking, the network fluctuation in wired transmission is less, so we can simplify many transmission strategies to improve performance; in wireless transmission, network jitter and changes are more frequent due to channel competition. At this time, it is necessary to make specific adaptations according to the specific wireless transmission network, such as dynamically compensating the sending strategy according to network jitter.
● Real-time data transmission (RTC)/non-real-time data transmission (File/Report); in real-time transmission, more emphasis is placed on the timeliness of data, and error recovery should be more aggressive or even avoided in advance; non-real-time transmission should try to avoid being too aggressive and harmful to users Overall network usage has a larger impact.
● Single-connection large-traffic transmission (FPA)/multi-connection sparse traffic transmission (S2S); single-connection large-traffic transmission can do more aggregation processing work, while multi-link sparse traffic should avoid the overhead caused by connection idle state.
● Long-link continuous data transmission (Proxy)/short-link request-response transmission (LBS Request); long-link transmission can enable MTU detection, send some additional information, use the context of the connection, etc. to reduce protocol overhead; while short-link request The acknowledgment transmission can make more additional transmission guarantees for application data to avoid data loss and cause the connection to continue for too long.
Different scenarios have different requirements for real-time/reliability/weak network confrontation capabilities. Our transmission strategy is based on scenarios, and one scenario can be mapped to multiple other services with commonality. Then the transmission for this scenario Optimization can be reused instead of directional optimization for each specific business.
4. Do a good job in the transmission quality assurance system of the agreement
The evolution of the transmission protocol is inseparable from its own quality assurance system. Only with a stable and effective quality assurance system can the evolution of the protocol continue to be efficient. Otherwise, the correctness of the technical improvement cannot be verified, and there is no way to investigate any problems.
1. Visualize the impact of internal logic
The problem investigation of the transmission protocol has its particularity: there may be many packets in transmission, and the sending and receiving of each packet may affect the internal state and logic. At the same time, there are many internal modules and algorithm modules, and the logic chain may be very long. Long-term, cross-module states interact with each other. Using breakpoints makes it difficult to track long-term influence relationships and outcomes.
At this time, a better way is to provide a visualization tool . Through the general module, according to the internal information of the log dump, the various internal states can be made into a visual chart, which can greatly facilitate the tracking of the mutual influence of the internal variables. Many problems can be At a glance.
2. Reproduce/self-test various network scenarios
The transmitted network state is fleeting, and it is also very important for the simulation and reproduction of various network states. We use two-level tools to reproduce the network state:
● Use system-level related tools (TC, etc.) to simulate more complex weak network scenarios (with complex weak network simulation capabilities, and at the same time, the cost of time/resources is also greater);
● Use the internal simulation module (simulator) to simulate relatively clear weak network scenarios from the unit test level (the simulation ability is weak, but the overhead is low);
The tools at these two levels complement each other, reproduce the network state from different dimensions, and provide a reliable basis for the internal debugging of the AUT.
3. Ensure the robustness of the code
Because it is widely used in the company and exposed to the public network to receive various unpredictable inputs, the robustness of the code also needs to be effectively guaranteed.
● Use fuzz test automation to simulate normal or abnormal input of various configurations/interfaces/network packets to ensure stability. The fuzz test is a very good tool for the transmission protocol. The computing power of the machine and the judgment of the code path can be more comprehensive in coverage than the human thinking test case;
● Extreme scenario coverage, stress testing through long-term and extremely abnormal networks. Extreme scenarios are often the most typical corner cases. Many boundary scenarios that are missed during design can appear in extreme scenario tests, so never let them go.
5. Technology evolution must be combined with landing applications
Transmission control itself is a very low-level technology. It can be said that it is the cornerstone of many upper-layer applications. In the research and development of low-level technology, due to the many problems encountered, a vicious circle that is often caught is that it is easy to find some problems for yourself to think about. And spend a lot of time solving, some of these issues are forward-looking and practical, but a lot of them may be detached or at least detached at this stage. One result of this is that the technical performance is completely decoupled from the actual implementation. The technology has evolved very well, but it is difficult to generate actual value in specific businesses, and it may even be possible to do a function that has no business needs at all, which is time-consuming and labor-intensive.
We have had a similar experience in the evolution of AUT. In the future, we will pay great attention to the actual value generated by each technology in specific business scenarios, and ensure that what we make must be implemented in specific application scenarios. In actual application scenarios, more practical problems will arise. The emergence of these problems, in turn, will better help the iteration of technology, which can not only generate actual value, but also carry out technological evolution, avoiding closed doors.
02 Outlook for AUT Evolution
The landing of AUT in various scenarios is by no means the end, but a new beginning. There are still many directions for the follow-up work related to network and transmission waiting for us to explore.
1. Global regional network data collection and analysis
After the transport layer is separated from each business, we can obtain more pure network data of local users. After the result is collected/analyzed/modeled, the network data can provide a lot of support for the business, such as:
● Transmission control: use user network data to improve algorithms in transmission;
● User allocation: allow users to access the most suitable operator/edge;
● Network diagnosis: analyze user network problems according to the typical network model of the local/current operator;
● Experimental simulation: build a weak network environment that is closer to the user's real network.
2. Application of machine learning in AUT
In recent years, the application of machine learning in real-time audio and video transmission has also emerged in an endless stream - there are many attempts in congestion control algorithms. The following figure is the experimental result of our internal machine learning algorithm. It can be seen that compared with the experimental result in Paper, the bandwidth estimation result of our internal algorithm is closer to Optimal:
The algorithm of machine learning is very dependent on the data set. After completing the more adequate data collection work, we believe that machine learning will play a greater role in network transmission.
3. Continuous evolution and implementation of multi-path transmission
Multi-path transmission is the general trend in the future, and the AUT is also doing corresponding support synchronously, and it is currently in the stage of gradual implementation. The picture below is a test we are currently testing in the laboratory: the multipath version uses two outlets, Wi-Fi and mobile data for transmission, the singlepath version only uses Wi-Fi, and we only add weak networks under the Wi-Fi link.
The results show that the video latency of the multipath version is basically unaffected, while the latency of the singlepath version fluctuates with weak network conditions.
4. Generalize the weak network confrontation module to the public protocol
Each weak network confrontation module in the AUT has considered versatility from the very beginning, and is sufficiently modular, and the input is completely independent of the protocol itself, so that these weak network modules can be easily transplanted to other protocols.
For example, we have now used the congestion control module in the AUT in the WebRTC service to improve many problems of native congestion control; at the same time, the application scenarios of QUIC are becoming more and more extensive after standardization, and the weak network countermeasure module in the AUT will gradually follow. Migrate to the company's internal QUIC protocol stack to enhance QUIC's network analysis and weak network confrontation capabilities.
To sum up, the birth and iteration of AUT is a process from business to application. As an underlying protocol, AUT abstracts the common network transmission requirements from the complex business scenarios of the sound network, and at the engineering level, makes the coupling logic between the underlying algorithm and the upper-level application more universal; in the modular design concept In the future, AUT is also easier to integrate into widely used public protocols, which also opens the door to imagination for the future development of AUT.
About Dev for Dev
The full name of the Dev for Dev column is Developer for Developer. This column is a developer interactive innovation practice activity jointly initiated by Shengwang and the RTC developer community.
Through various forms of technology sharing, communication and collision, and project co-construction from the perspective of engineers, the power of developers is gathered, the most valuable technical content and projects are mined and delivered, and the creativity of technology is fully released.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。