Introduction : In 2020, the new crown epidemic broke out and swept the world, causing a huge impact on the global economy including China, and profoundly affecting social life. In this context, with the vigorous live broadcast e-commerce in the consumer market as the tipping point, the live broadcast industry has once again set off an upsurge. In the wave of digital transformation of Chinese enterprises, the enterprise live broadcast service market, which has been developed for ten years, has also entered a stage of rapid development.
Text|Hong Shundi, NetEase Yunxin Streaming Media Server R&D Engineer
1 Typical live broadcast architecture
In a typical live broadcast architecture, the left side is the push stream client, and the protocol uses RTMP for uplink; the right side is the pull stream client, which supports different pull stream protocols, the more common ones are: RTMP, FLV, HLS
1.1 Advantages of existing architectures
This framework makes good use of the capabilities of CDN vendors or cloud vendors. Although the streaming protocols are not unified, streaming protocols such as rtmp/flv/hls, as relatively mature streaming media protocols, have been widely supported by various CDN manufacturers after years of development. With the support of cloud capabilities, the concurrency capability of the server and the acceleration capability of the streaming end have been greatly increased, and the live broadcast industry has flourished.
2 The current state of low-latency live streaming
Stuttering and lag are like two ends of the scale in the realm of live streaming. delay, the less the stutter.
In general scenarios, the client uses a larger buffer duration and sacrifices delay to meet fluency. With the development of the industry, some application scenarios have more and more stringent requirements for delay time, such as live sports games, interaction between teachers and students in educational scenarios, etc. The shortcomings of common live streaming media protocols in these scenarios are reflected in Out.
Generally, the delay of live broadcast of rtmp protocol is 3-10s. If it is cached and forwarded by layers of CDN, it is often more than 10 seconds. The delay of flv and hls protocol is higher. As far as the streaming end is concerned, a large part of the delay comes from network transmission: the tcp 3-way handshake and c0/c1/c2 handshake protocol before rtmp transmits media, which introduces several RTT delays for no reason; due to rtmp/flv/ The transport layer of hls is based on the tcp protocol. When the network is unstable, the congestion control of the tcp protocol cannot make full use of the network bandwidth. In order to maintain fluency, the client can only increase the cache time. big delay. \
After realizing the limitations of the existing streaming media live broadcast protocols, major friends and businessmen have also launched their own low-latency live broadcasts, which better play a role in combating weak networks and speeding up the first screen. However, most of them are currently based on private signaling protocols and private UDP streaming media transmission protocols, and major cloud vendors cannot be compatible with each other, which limits the large-scale development of low-latency live broadcasts.
3 Open source practice of low-latency live broadcast based on standard WebRTC
NetEase Yunxin has been exploring how to make an open low-latency live broadcast solution. future, various cloud manufacturers can also implement it more conveniently, just like the existing rtmp/hls protocol, to promote the low-latency of the entire live broadcast industry. To implement this scheme requires the following two things.
- open signaling protocol: signaling protocol needs to meet the media negotiation requirements of most manufacturers, and at the same time, it can be as concise as possible.
- Open Media Protocol: The media transmission protocol needs to be universal among major manufacturers, and the QoS capability above this also needs to be open and cannot be private.
According to the above requirements, we chose a mature solution in the RTC field - WebRTC. The figure below is our current practice architecture.
In the picture above, WE-CAN is NetEase Yunxin's global accelerated RTC network. The network transmission of media between servers relies on WE-CAN.
The edge media server pulls the stream from the CDN to the WE-CAN large network edge node, and then sends it to the client from the WE-CAN large network edge node.
3.1 Open source signaling protocol implementation
signaling protocol adopts HTTP+SDP method, that is, the client POST an SDP Offer.
{
...
"pull_stream": "nertc://your.domain.com/live/testname"
"sdp": "v=0\r\no=4611731400430051336 2 IN IP4 127.0.0.1\r\ns=-\r\nt=0 0\r\na=group:BUNDLE 0 1\r\n......",
"type": "offer"
}
Then the media server returns the SDP Answer through negotiation.
{
...
"code": 200
"sdp": "v=0\r\no=4611731400430051336 10000 1 IN IP4 0.0.0.0\r\ns=-\r\nt=0 0\r\na=ice-lite\r\n......",
"type": "answer"
...
}
3.2 Standard WebRTC media protocol
After the client gets the SDP Answer, it is the standard WebRTC media interaction process: ICE, DTLS encrypted connection, receiving RTP streaming media.
The following is a basic web-side streaming code Demo:
self.pc = new RTCPeerConnection(null);
self.pc.addTransceiver("audio", {direction: "recvonly"});
self.pc.addTransceiver("video", {direction: "recvonly"});
var offer = await self.pc.createOffer();
await self.pc.setLocalDescription(offer);
var session = await new Promise(function(resolve, reject) {
var data = {
pull_stream: streamId,
type: "offer",
sdp: offer.sdp
};
$.ajax({
type: "POST", url: apiUrl, data: JSON.stringify(data),
contentType:'application/json', dataType: 'json'
}).done(function(data) {
resolve(data);
});
});
await self.pc.setRemoteDescription(
new RTCSessionDescription({type: 'answer', sdp: session.sdp})
);
3.3 Open Source Native Media Player
In order to make it easier for Native clients to access WebRTC, we have also open sourced a low-latency live broadcast player that integrates standard WebRTC: We-Can-Player, as long as you enter the stream address, you can receive WebRTC stream playback.
3.4 Client's architecture:
As long as the manufacturer implements a similar protocol, it is possible to pull the stream to WebRTC with a slight modification of this player. From the architecture, it can be seen that the interaction between the media server and the streaming client is mostly based on the standard WebRTC. There is no private RTP extension and private QoS protocol. CDN manufacturers may not even have their own RTC network. only needs to CDN edge nodes implement standard WebRTC gateway + a simple HTTP Server can have the same capabilities.
In order to optimize the live broadcast experience, we have also done a lot of optimization on the server side.
4 Optimize the live broadcast experience
4.1 First screen optimization
4.1.1 GOP cache first screen optimization
There are two major indicators in the live broadcast field: first screen and fluency. assumed that the user's GOP on the streaming end is 5 seconds. In some cases, the streaming end will wait for nearly 5 seconds before receiving the first I frame and rendering the first screen. This is unacceptable for live streaming.
The solution is to do a Gop cache in the media server, and cache the media packets of the last 1-2 Gops on the server side. When the media connection between the client and the media device is successful, the media packets in the Gop cache are sent first, and then the current media is sent. data. After the client receives the media packet, it needs to align the audio and video packets according to a certain strategy, and then speed up the frame tracking.
In the specific practice process, attention should be paid to the size of the Gop cache, the coordination of the client's Jitter buffer size, the alignment of audio and video in the Gop cache, and the adaptation of different Gop lengths on different streaming ends.
4.1.2 Pacer Smooth Send
If the Gop set on the push-stream end is relatively large, when the media connection of the pull-stream client is successful, all the data in the Gop will be sent to the client at one go, which may cause client buffer overflow and other problems. At this time, the Pacer of the Server needs to be smoothly sent to play a role.
In the specific practice process, pay attention to the cooperation between the pacer's frame rate and the client's frame rate.
4.2 Latency optimization
4.2.1 WE-CAN Network
As mentioned above, the reason why the live broadcast industry can flourish is that the cloud capabilities of CDN manufacturers have played a big role in promoting technology. CDN speeds up the back-to-source speed of edge nodes, and edge nodes speed up the access speed of streaming terminals.
As can be seen from the above architecture diagram, in order to speed up the back-to-source speed, the selection of back-to-source media services will be as close as possible to the regional center node of the CDN; in order to optimize the access performance of the client, the streaming media server should also be as close as possible. Pull streaming client, Therefore transferred from the back-source media service to the pull streaming service is very important.
and WE-CAN this responsibility, 161dfd502593c2 he is a set of efficient global transmission network developed by Yunxin, which can accelerate the network transmission between any two media servers in the world. In a sense, he played a role in promoting CDN to accelerate transmission, but the principle of CDN is layer-by-layer cache, and WE-CAN relies on path optimization.
4.2.2 Media server with full SFU architecture
Imagine the interaction between the two anchors. If we join the MCU, the cache will inevitably be introduced, which will increase the first screen and delay. Therefore, the internal layout of the RTC network is based on the SFU architecture.
4.2.3 Full link delay monitoring
monitor the delay introduced by the full-link streaming test? media stream is forwarded layer by layer in the WE-CAN large network. If any route introduces unnecessary delay, it will affect the final low delay effect. Our approach is to add an extension to the RTP header to record the millisecond NTP time when the RTP packet arrives at each machine, and report the time consumption of each hop route and the time consumption of the edge server and the client on the last media server forwarded to the client. The RTT time between, and then strip this extension in the client RTP that is finally sent to the client. Although the NTP time of each machine is not absolutely aligned, but relying on the global NTP time synchronization function of the WE-CAN network, the gap can be controlled within 1ms, which is enough to play a monitoring role in engineering practice.
5 Effects and follow-up
The first stage only supports the inband-FEC capability of ARQ and Opus in terms of network QoS. Since WebRTC natively supports XOR-based FEC capability, it is weak in resisting continuous packet loss, so FEC has not been enabled for the time being, but it is a huge improvement over RTMP. Under the condition of 50% packet loss, the delay is controlled at about 2 seconds, and the first screen is 200~400ms.
Our future plans include: adding more WebRTC standard QoS capabilities (including FEC); WebRTC transformation capabilities on the push side, etc. specific open source content of 161dfd50259461, you can continue to pay attention to the [Intelligent Enterprise Technology +] public account, and we will follow up. Continuously update open source related content and open source addresses.
authors introduce
Hong Shundi, the R&D engineer of NetEase Yunxin Streaming Media Server, is responsible for the development of NetEase Yunxin Streaming Media Server.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。