头图

This article is the content of the "Dev for Dev Column" series. The author is Xia Xia, the person in charge of the back-end transmission protocol of Shengwang.

In response to the new requirements and challenges brought by real-time interactive applications to network transmission, SoundNet developed its own private transport layer protocol Agora in 2019 by layering and decoupling application layer business requirements and transmission strategies in real-time interaction. Universal Transport (AUT) brings together various transmission control capabilities under heterogeneous networks, and will gradually be implemented in various services on a large scale from 2021 to 2022, using a set of transmission protocols/framework to solve the different problems of various services. transmission needs.

The related content is divided into two parts, this paper will introduce the design and evolution process of the AUT transmission protocol in detail.

01 Complex transmission scenarios

As a company providing a real-time interactive platform, transmission is undoubtedly the cornerstone of all interactions. With the development of the network, the interaction scenarios are becoming more and more diverse, and the transmission content is becoming more and more complex. The transmission under real-time interaction mainly faces The following needs and challenges:

media data transfer

The most important real-time interaction scene of RTC sound network, the transmission of real-time audio and video is also the transmission scene that is most closely coupled with the upper-layer business logic. The transmission logic needs to work closely with the source coding at the media level to complete low-latency interaction.

● A reliable network channel is required to send and receive control messages.

● Multiple reliable real-time channels are required to meet the sending and receiving of multiple data streams (audio and video, etc.).

● When the bandwidth is limited, the priority management of the above streams needs to be solved (Control Channel > Audio > Video).

● In the To B scenario, customers should be able to independently and flexibly decide the priority of the stream and the transmission degradation policy.

● The relevant policies in the media need to be closely coordinated with the current network conditions.

● During intra-network transmission, the quality of the interconnected networks varies greatly between different regions and operators, and various network buffers, delays, and packet loss vary greatly.

Universal data transfer

The FPA full-link acceleration products launched by Shengwang provide end-to-end link acceleration services for various applications (including web/game/audio and video, etc.) worldwide.

● Supports both end-to-end reliable transmission channels and unreliable transmission channels.

● As a general data acceleration channel, the data traffic varies widely.

● Low latency.

Low-latency reliable message delivery

RTM is a stable, reliable, ultra-low-latency, high-concurrency global signaling and message cloud service provided by SoundNet.

● The message size is small, the message frequency is high, and the overall traffic is relatively low.

● Low latency.

Low priority reliable data transfer

Report is the internal and event data reporting of the sound network, and provides information for internal and external troubleshooting.

● The transmission priority and real-time requirements are low, and it is necessary to avoid affecting other business data.

● Maintain a long connection with the backend, the number of connections is huge, but most of the time is idle.

Intranet transmission of various services

SD-RTN™ is a global coverage network within the sound network, covering more than 200 countries and regions, carrying traffic of different services.

● In the transmission of long-term fat links (that is, long-term fat networks, with large bandwidth delay and high packet loss rate) in cross-region transmission, the bandwidth climbing/packet loss should be quickly recovered under large link delay.

● It is difficult to guarantee the network quality between different countries/operators, and packet loss/jitter is very common.

● Different degrees of QoS guarantee for different services/customers/scenarios.

02 The solution--self-developed transport layer protocol

Weaknesses of existing solutions

The transmission requirements are different. We first review various products in the industry in order to obtain mature solutions, but although the survey found that there are various unsatisfactory places:

Program advantage limited
RTP/RTCP Customizable transmission strategy No priority management; lack of multiplexing capabilities;
TCP/RTMP Good versatility It is easy to block the head of line and cannot be multiplexed; it cannot customize the transmission strategy, and there are few optimization schemes;
SRT There are some weak network confrontation mechanisms Lack of independent congestion control; lack of multiplexing capability;
QUIC With multiplexing, priority management and anti-head-of-row blocking Lack of transmission support for real-time unreliable data streams; lack of various weak network confrontation and network adaptation capabilities; the protocol is large and complete, but it is too heavy for internal use;

From the business requirements, we found that there are commonalities and differences in the transmission of various services. It is still sufficient to seek an existing solution for the transmission of a business, but it is absolutely impossible to use a transmission framework to take into account all business scenarios. It is not an easy task. This requires in-depth understanding and abstraction of various business requirements, and various products in the industry are difficult to achieve.

Ultimately, the self-developed transport layer protocol/framework has become the lightest, most adaptable, but also the most challenging solution.

Design Goals of the AUT

According to the transmission requirements and characteristics of the existing business, we have sorted out and summarized several goals that the self-developed transmission protocol needs to achieve, as follows:

1. Provide multiplexed transmission channels to carry different data types at the same time, and provide effective description schemes for different types of data .

2. Provide sufficient scalability to adapt to the transmission-related logic and input of various application layers, such as block data/block data/scattered data, etc. Real-time transmission is often combined with applications to reveal the secrets, for the underlying transmission control. The granularity will also require finer granularity.

3. Provide different levels of transmission quality assurance for different data, such as different priorities/reliability/redundancy.

4. The flexible transmission control module interface can be extended to implement different congestion control, packet loss detection, network detection and other strategies.

5. The underlying network interface can support any virtual network .

6. Provide effective network quality analysis to the upper layer, which is convenient for the upper layer to apply different business logics for different network qualities.

7. Adapt to different network scenarios and adapt to network conditions such as disorder/jitter/long fat/current limit under heterogeneous networks.

03 Evolution of AUT

Architecture Evolution Process

● In the initial prototype verification stage, we use the simplest model to complete the abstraction of the transport layer, and only apply it to unreliable transmission scenarios in audio and video media.

在这里插入图片描述

● After the prototype verification is completed, the AUT evolves into a layered design of Session and Connection: Session is mainly responsible for Stream management, different Streams fulfill different business requirements, and Connection is responsible for business-independent pure transmission control, so as to complete transmission control and business Decoupling of requirements; independently abstracting the interfaces related to network sending and receiving, so that AUT can overlay on various networks; introducing security features such as encryption/authentication; introducing MTU detection, Padding and other network status detection capabilities.

● After the layered design is completed, the layering of AUT is further refined: more fine-grained general-purpose modules and special-purpose modules are gradually evolved in Stream, and different channel programming is accomplished through abstract implementation of Writer/Reader, combined stream controller and cache, etc. Operations such as decoding, flow control, and retransmission control make the weak network confrontation capability of each stream differentiated to adapt to different service data; Connection manages multiple Paths as a whole, and each Path acts as an independent transmission control unit without affecting each other , since it can support multi-tone Path transmission at the same time; in the transmission algorithm, more in-depth network status detection modules such as traffic model detection and packet loss type detection are introduced.

图片

Weak network confrontation ability

With the evolution of AUT, the dimensions of our network analysis and evaluation are gradually enriched. These network analysis capabilities are the basis of AUT's weak network confrontation capabilities:

● Out-of-order detection capability: Detect the out-of-order degree in the network, and make corresponding adjustments to the transmission algorithm under out-of-order conditions, such as adjusting the packet loss detection algorithm and the congestion control algorithm to adapt to the out-of-order scenario;

● Bandwidth detection capability: It has a variety of bandwidth detection algorithms of different dimensions, which complement each other's advantages under different detection requirements: packet train detects the upper limit of the bandwidth, and detects higher transmission bandwidth with extremely small traffic ambiguity; padding detects the actual bandwidth. When the application data is insufficient, fill the actual traffic into the network to truly verify the actual capacity of the network;

● Traffic model detection capability: Detect the actual network traffic model, such as the typical traffic policing/shaping and other traffic restriction policies in the public network, and adjust the sending control policy in a targeted manner;

● Packet loss type detection capability: Analyze the real packet loss mode, input the current packet loss model, such as random packet loss and congestion loss, and perform dynamic compensation in other modules according to the packet loss type.

After the current state of the network is clarified, the various weak network confrontation capabilities within the AUT can "prescribe the right medicine":

● Universal channel coding capability at the Stream level: Stream implements a common block coding and decoding framework internally, which can be easily integrated for block codes in FEC, so that different Streams have different codes in addition to differences in retransmission and flow control. Strong protection, the external transmission scalability is more extensive;

● Dynamic feedback control capability: The feedback of AUT is mainly Ack packets, and effective Ack feedback ensures the accuracy of the internal algorithm logic: Dynamically enable AckAck to combat Ack packet loss; different Ack Delay adaptation performance/delay; Ack link jitter/ Burst detection, compensation transmission control;

● Deeply optimized congestion control/packet loss detection algorithm: In-depth optimization and exploration of typical congestion control/packet loss detection algorithms are carried out in various scenarios. While maintaining the characteristics of the algorithm itself, it adapts to various transmissions in real-time transmission. control needs.

Transmission effect comparison

As a transport layer protocol, we also compared the effects of other transport layer protocols for some real business scenarios. Some results are as follows:

图片

It can be seen that no matter in the high and low bandwidth limit (100Mbps and 4Mbps), 0-50% of individual uplink and simultaneous uplink and downlink random packet loss, 20ms and 200ms RTT scenarios, AUT can achieve more close to the actual bandwidth than lsquic. throughput.

Overview of landing scenarios

Up to now, AUT has gradually landed in the following business scenarios of Shengwang:

RTC

The overall location of AUT in the sound network RTC service is shown in the figure below. It can be seen that AUT is used for the intra-network transmission based on SD-RTNTM between the user access to the sound network edge node Lastmile and the sound network edge node based on SD-RTNTM As a layer 4 transport protocol.

在这里插入图片描述

Lastmile uses the AUT protocol, which allows us to reconstruct the cooperative relationship between media and transmission as a whole, achieving faster network status tracking (bit rate climb from 20+ seconds to about 3 seconds), better audio priority experience (no card pause), and effectively reduce the freeze and delay, better support for hardware encoding, which greatly improves the user experience of weak network.

Under different network conditions, some indicators of the video after using AUT are as follows:

图片

FPA

The location of AUT in FPA is similar to that of RTC. The transmission data in FPA is no longer limited to audio and video. However, due to the multiplexing of AUT, the transmission capability in various scenarios has also been greatly improved. We tested FPA and public network in different regions. The transmission rate of , the relevant data are as follows:

在这里插入图片描述

RTM

In the edge nodes that access RTM, RTM uses AUT to replace TCP, and the arrival rate and delay of using AUT to access RTM under various weak networks are significantly ahead of TCP access. By sending 1000 message data and recording its arrival time, comparing the test results of AUT and the public Internet transport protocol represented by TCP in three cases shows that the average message arrival delay using AUT is reduced compared to TCP under the condition of a speed limit of 100Kbps 53%, the average message arrival delay decreased by 67% under the condition of 20% packet loss, and the average message arrival delay decreased by 55% under the condition of speed limit + packet loss.

图片

图片

图片 Report

We use AUT at the edge connecting Report and RTC, and use the LEDBAT algorithm in the AUT connection of Report. If there is a shared bottleneck bandwidth for each connection at this time, the Report connection can participate in transmission with extreme competition, ensuring that other business data is generated. minimal impact.

图片

SD-RTN™

The AUT protocol is used in the SD-RTN™ network of the sound network, so that the intra-network transmission has the ability to provide different QOS, which further improves the transmission efficiency of business data. The use of FEC in the long fat network reduces the number of retransmissions and further reduces the number of retransmissions in the weak network. It can detect and adapt to Traffic Policing/Traffic Shaping in the public network, avoid packet loss caused by over-transmission, make network quality assessment more clear, and achieve better intra-network path planning. The Metadata mechanism is extremely efficient. Greatly simplifies the control plane logic of the business.

For the transmission results in the RTC network, we selected 40 computer rooms for comparison. The results are as follows:

Program Before using AUT After using AUT
The arrival rate does not meet the target time 00.00709% 00.00489%
Jitter rate not up to standard time 00.00577% 00.00564%

Since the transmission quality in the previous RTC network has been very good, on the premise that the arrival rate and jitter-related indicators have reached the standard of 4 9s, the introduction of AUT further reduces the non-compliance time, making SD-RTN™ The quality of the dedicated line is further improved. Check it out.

About Dev for Dev

The full name of the Dev for Dev column is Developer for Developer. This column is a developer interactive innovation practice activity jointly initiated by Shengwang and the RTC developer community.

Through various forms of technology sharing, communication and collision, and project co-construction from the perspective of engineers, the power of developers is gathered, the most valuable technical content and projects are mined and delivered, and the creativity of technology is fully released.


RTE开发者社区
658 声望971 粉丝

RTE 开发者社区是聚焦实时互动领域的中立开发者社区。不止于纯粹的技术交流,我们相信开发者具备更加丰盈的个体价值。行业发展变革、开发者职涯发展、技术创业创新资源,我们将陪跑开发者,共享、共建、共成长。