RTMP protocol and source code analysis of the live broadcast series (2)

1. Background

Real-Time Messaging Protocol (Real-Time Messaging Protocol) is currently the main protocol for live broadcasts. It is an application layer private protocol designed by Adobe to provide audio and video data transmission services between Flash players and servers. The RTMP protocol is the basic live push-pull streaming protocol commonly used by major cloud vendors for their direct live streaming services. With the development of the domestic live broadcasting industry and the advent of the 5G era, a basic understanding of the RTMP protocol is also a basic requirement for our programmers. skill.

This article mainly elaborates the basic ideas and core concepts of RTMP, supplemented by livego source code analysis, and learns the core knowledge points of the RTMP protocol with everyone.

2. Features of RTMP Protocol

The main features of the RTMP protocol are: multiplexing, subcontracting and application layer protocols. These features will be described in detail below.

2.1 Multiplexing

Multiplexing refers to the simultaneous transmission of multiple signals at the signal sending end through a channel, and then the signal receiving end combines the multiple signals transmitted in one channel to form independent and complete signal information. To use communication lines more effectively.

In short, on a TCP connection, the Message that needs to be delivered is divided into one or more Chunks, and multiple Chunks of the same Message form a ChunkStream. At the receiving end, combine the Chunk in the ChunkStream to restore it. To form a complete Message, this is the basic idea of multiplexing.

The figure above is a simple example. Assuming that a 300-byte long Message needs to be passed, we can split it into 3 Chunks, and each Chunk can be divided into Chunk Header and Chunk Data. In the Chunk Header, we can mark some basic information in this Chunk, such as Chunk Stream Id and Message Type; Chunk Data is the original information, and the message is divided into 128+128+44=300 in the above figure, so that this can be completely transmitted Message.

The format of Chunk Header and Chunk Data will be introduced in detail later.

2.2 Subcontracting

The second major feature of the RTMP protocol is subcontracting. Compared with the RTSP protocol, subcontracting is a feature of RTMP. Unlike ordinary business application layer protocols (such as RPC protocol), in multimedia network transmission cases, most of the audio and video data packets of multimedia transmission are relatively large. The transmission of large data packets over the transmission protocol is likely to block the connection and cause higher priority information to be unable to be transmitted. Sub-packet transmission is to solve this problem. The specific sub-packet format will be introduced below.

2.3 Application layer protocol

The last feature of RTMP is the application layer protocol. The RTMP protocol is implemented by default based on the transport layer protocol TCP, but in the official RTMP document, only the standard data transmission format description and some specific protocol format descriptions are given, and there is no specific official complete implementation, which gave birth to Many related other industry implementations, such as RTMP over UDP and other related privately adapted protocols, have appeared, giving you more room for expansion and making it easier for you to solve the live broadcast delay and other problems that exist in native RTMP.

Three, RTMP protocol analysis

As an application layer protocol, like other private transmission protocols (such as RPC protocol), RTMP also has some specific code implementations, such as nginx-rtmp, livego and srs. This article selects livego, an open source live broadcast server based on the go language, to analyze the main process at the source level, and to learn the implementation of the core process of RTMP push-pull streaming with you, to help you have an overall understanding of the RTMP protocol.

Before the source code analysis, we will help everyone to have a basic understanding of the format of the RTMP protocol by analogy to the RPC protocol. First, we can look at a relatively simple but practical RPC protocol format, as shown in the following figure:

We can see that this is a data transmission format used during RPC calls. The fundamental purpose of using this format is to solve the problem of " sticking and unpacking ".

The following is a brief description of the format of the RPC protocol in the figure: First, use 2 bytes, MAGIC to represent the magic number, and mark that the protocol is an identifier that can be recognized by the opposite end. If the received 2 bytes are not 0xbabe, they are directly discarded. The package; the second sign occupies 1 byte, the lower 4 bits indicate the type of message request/response/heartbeat, the upper 4 bits indicate the serialization type such as json, hessian, protobuf, kyro, etc.; the third status occupies one Byte, indicating the status bit; then 8 bytes are used to indicate the requestId of the call, generally using the lower 48 bits (2 to the 48th power) is enough to indicate the requestId; then a 4-byte fixed-length body size is used to indicate the Body Content, in this way, the complete request object of the RPC message can be quickly parsed.

By analyzing the above-mentioned simple RPC protocol, we can actually find a very good idea, which is to use bytes with maximum efficiency, that is, use the smallest byte array to transmit the most data information. A small byte can bring a lot of information, after all, there are 64 different changes in a byte. In the network, if only one byte can be used to transmit a lot of useful information, then we can use extremely limited resources to get the maximum use of resources. RTMP's official document in 2012. Although from the current point of view, the implementation of the RTMP protocol is very complicated and even a bit bloated, but it was able to have more advanced ideas in 2012. It is indeed We are an example to follow.

In today's era when the WebRTC protocol is rampant, we can also see the shadow of RTMP from the design and implementation of WebRTC. The above-mentioned RPC protocol can be regarded as a simplified design with similar design concepts to RTMP.

3.1 Description of RTMP core concepts

Before analyzing the RTMP source code, we first explain several core concepts in the RTMP protocol, so that we can have a basic understanding of the entire RTMP protocol stack at a macro level, and during the source code analysis below, we will also use the grasp The package method helps us to analyze related principles more intuitively.

First of all, like the RPC protocol format just now, the entity object actually transmitted by RTMP is Chunk. A Chunk consists of two parts: Chunk Header and Chunk Body, as shown in the figure below.

3.1.1Chunk Header

The Chunk Header part is different from the RPC protocol we mentioned earlier, mainly because the length of the Chunk Header of the RTMP protocol is not fixed, why is it not fixed? In fact, it is Adobe in order to save data transmission overhead. From the example of splitting a 300-byte Message into 3 Chunks, we can see that multiplexing actually has an obvious disadvantage, that is, we need a Chunk Header to mark the basic information of this Chunk. , In this way, there is an overhead of extra byte stream transmission during transmission. Therefore, in order to ensure the minimum number of bytes transmitted, we need to continuously squeeze the size of the RTMP header to ensure that the header size is minimized, so as to achieve the highest transmission efficiency.

First, let's study the Basic Header part of the Chunk Header. The length of the Basic Header is not fixed. It can be 1 byte, 2 bytes or 3 bytes, depending on the Chunk Stream Id (abbreviation: csid).

The range of csid supported by the RTMP protocol is 2~65599. 0 and 1 are reserved values for the protocol and cannot be used by users. Basic Header contains at least 1 byte (lower 8 bits), and its length is determined by this 1 byte, as shown in the figure below. The upper 2 bits of this byte are reserved for fmt. The value of fmt determines the format of the Message Header, which will be discussed later. The lower 6 bits of this byte are the value of csid. When the value of the lower 6 bits of csid is 0, it means that the real csid value is too large to be represented by 6 bits, and the following byte is needed; when it is low When the 6-bit csid value is 1, it means that the real csid value is too large to be represented by 14 bits, and one more byte is needed. Therefore, the length of the entire Basic Header does not seem to be fixed, and depends entirely on the value of the csid of the lower 6 bits of the first byte.

In actual applications, so many csids are not used, which means that in general, the length of the Basic Header is one byte, and the value of csid ranges from 2 to 63.

Having said so much just now, I only talked about Basic Header, and Basci Header is only one of the components of Chunk Header. The author who prefers tossing RTMP protocol designed the Chunk Header module of RTMP into a dynamic size. In other words, it is also to save transmission space. What can be easily understood here is that the length of the Chunk Message Header is also divided into four cases, which is determined by the value of fmt mentioned earlier.

The four formats of Message Header are shown in the figure below:

When fmt is 0 , Message Header occupies 11 bytes (please note that the 11 bytes here do not include the length of Basic Header), consisting of a 3-byte timestamp, 3 bytes It is composed of a message length with a length of 1 byte, a message type Id with a length of 4 bytes and a message stream Id with a length of 4 bytes.

Among them, timestamp is the absolute timestamp, which means the time when the message was sent; message length means the length of the chunk body; message type id means the message type, which will be discussed in detail later; message stream id is the message Uniquely identifies. It should be noted here that if the absolute timestamp of this message is greater than 0xFFFFFF, it means that the time is too large to be expressed by 3 bytes, and it needs to be expressed with the help of Extended Timestamp. The length of the extended timestamp is 4 Byte, which is placed between Chunk Header and Chunk Body by default.

When fmt is 1, , Message Header occupies 7 bytes. Compared with the previous 11-byte chunk header, there is one less message stream id. This chunk is the chunk stream id before reuse. This is general Used for variable-length message structures.

When fmt is 2, , Message Header only occupies 3 bytes, which only contains the three bytes of timestamp. Compared with the previous one, it has less stream id and less message length, which lacks message length. Yes, it is generally used for fixed-length messages (such as audio data) that need to be corrected.

When fmt is 3, , the Message Header is not included in the Chunk Header. Generally speaking, when unpacking, a complete RTMP Message message will be split into the first Chunk message with fmt being 0, and subsequent messages will also be split into messages with fmt being 3. This is done. The way is that the first Chunk is accompanied by the most complete Chunk message information, and the header of the subsequent Chunk information will be smaller, so the implementation is simpler and the compression rate is better. Of course, if the first Message is successfully sent, when the second Message is sent again, the first Chunk of the second Message will be set to a Chunk whose fmt is type 1, and then the fmt of the Chunk of the Message will be 3. , So that the message can be distinguished.

3.1.2 Chunk Body

I just spent a lot of time describing the Chunk Header, and then we will briefly describe the Chunk Body. Compared with the Chunk Header, the Chunk Body is relatively simple, without so many variable length controls, and the structure is relatively simple. The data in this is the data with real business meaning, and the length is 128 bytes by default (you can use the set chunk The size command is negotiated to change). The data packet organization format inside is generally AMF or FLV format audio and video data (without FLV TAG header). Data AMF organizational structure of the composition shown below, FLV format described herein do not do in-depth, interested can read FLV official document .

3.1.3 AMF

AMF (Action Message Format) is a binary data serialization format similar to JSON and XML. Adobe Flash and the remote server can communicate data through AMF format data.

The specific format of AMF is actually very similar to the data structure of Map, that is, on the basis of KV key-value pairs, a length of Value is added in the middle. The result of AMF is basically as shown in the figure below. Sometimes the len field is empty. This is determined by the type. For example, if we are transmitting data in AMF format of type number, then we can ignore the len field. Because our default number type field occupies 8 bytes, we can ignore it here.

For another example, if AMF transmits data of 0x02 string type, the length of len occupies 2 bytes by default, because 2 bytes are enough to represent the maximum length of the following value. By analogy, of course, sometimes the values of len and value do not exist. For example, when passing 0x05 and passing null, we don't need both len and value.

The following lists the corresponding tables of some commonly used AMF types. For more information, please official document .

We can capture packets through WireShark and actually experience the specific AMF0 format.

As shown in the figure above, this is a very typical AMF0 type string structure capture. There are currently two main versions of AMF, namely AFM0 and AMF3. In the current actual use scenario, AMF0 still occupies the mainstream position. So what is the difference between AMF0 and AMF3? When the client sends AMF Chunk Data data to the server, how does the server know AMF0 or AMF3 when the server receives the information? In fact, RTMP uses the message type id in the Chunk Header to distinguish. When the message is encoded with AMF0, the message type id is equal to 20, and when the message is encoded with AMF3, the message type id is equal to 17.

3.1.4 Chunk & Message

First, summarize the relationship between Chunk and Message in one sentence. A Message is composed of multiple Chunks. Chunks with the same Chunk Stream id are called Chunk Streams. The receiving end can merge and parse them into a complete Message. Compared with RPC messages, RTMP has many more message types. The RPC message types mentioned in the previous article are basically request, response and heartbeat, but the message types of the RTMP protocol are more abundant. RTMP messages are mainly divided into the following three types: protocol control messages, data messages and command messages.

protocol control message: Message Type ID = 1~6, mainly used for control within the protocol.

data message: Message Type ID = 8 9

188: Audio audio data
9: Video video data 1
8: Metadata includes audio and video metadata such as audio and video encoding, video width and height.

Command Message Command Message (20, 17): This type of message mainly includes NetConnection and NetStream. The two types have multiple functions. The call of this message can be understood as a remote function call.

The overview diagram is as follows, and a detailed introduction will be given later in the source code analysis chapter. The colored part is commonly used news.

3.2 Core realization process

Network protocol learning is a boring process. We try to combine the original text of the RTMP protocol with WireShark's packet capture method, and try to vividly describe the core processes of the RTMP protocol, including handshake, connection, createStream, push and pull. The basic environment for all packet capture data in this section is: livego as an RTMP server (service port is 1935), OBS as a streaming application, and VLC as a streaming application.

As an application-layer protocol analysis, first of all, we should pay attention to the mastery of the main process. For each RTMP server, each push and pull is a network link from the code level. A connection, we need to process the corresponding process, we can see the same as shown in the source code in livego, there is a handleConn method, as the name suggests, is used to process each connection, according to the main process, it is divided into the first part In the handshake, the second core module analyzes the Chunk header and Chunk body according to the RTMP packet protocol, and then performs specific processing based on the parsed Chunk header and Chunk body.

You can see the above code block, there are mainly two core methods: one is HandshakeServer, which mainly handles the handshake logic; the other is the ReadMsg method, which mainly handles the reading of Chunk header and Chunk body information.

3.2.1 Part One-Handshake

Section 5.2.5 of the original protocol introduces the RTMP handshake process in detail. The diagram is as follows:

At first glance, this process may seem a bit complicated. So, let's first use WireShark to capture packets to take a look at the process as a whole.

The Info of the WireShark packet capture can interpret the meaning of the RTMP packet for us. As can be seen from the figure below, the handshake mainly involves 3 packets. The 16th package is the client sending C0 and C1 messages to the server, the 18th package is the server sending S0, S1 and S2 messages to the client, and the 20th package is the client sending C2 messages to the server. In this way, the client and server have completed the handshake process.

From the WireShark packet capture, it can be seen that the handshake process is still very concise, somewhat similar to the TCP three-way handshake process, so in terms of actual packet capture, there are still some discrepancies with the introduction in section 5.2.5 of the original RTMP protocol, and the overall process has changed It's very concise.

Now you can look back at the more complicated handshake flowchart above. In the figure, the client and server are divided into four states, namely: uninitialized, version number sent, ACK sent, and handshake completed.

not initialized : There is no communication phase between the client and the server;
has sent the version number : C0 or S0 has been sent;
has sent ACK : C2 or S2 has been sent;
Handshake completed : S2 or C2 is received.

The RTMP protocol specification does not limit the order of C0, C1, C2 and S0, S1, S2, but the following rules are formulated:

The client must receive S1 from the server before sending C2;
The client must receive S2 from the server before sending other data;
The server must receive C0 from the client before sending S0 and S1;
The server must receive C1 from the client before sending S2;
The server must receive the C2 from the client before sending other data.

From WireShark's packet capture analysis, it can be seen that the entire handshake process does follow the above regulations. Now the question is, what exactly are the messages C0, C1, C2, S0, S1 and S2? In fact, their data format is clearly defined in the RTMP protocol specification.

C0 and S0: 1 byte length, this message specifies the RTMP version number. The value range is 0~255, we only need to know that 3 is what we need. If you are interested in the meaning of other values, you can read the original text of the agreement.
C1 and S1: 1536 bytes in length, composed of time stamp + zero value + random data, the middle packet of the handshake process.
C2 and S2: 1536 bytes in length, consisting of time stamp + time stamp 2 + random data return, basically the echo data of C1 and S1. Generally, in implementation, S2 = C1, C2 = S1.

Below we combine the livego source code to strengthen the understanding of the handshake process.

So far, the simplest handshake process is over. It can be seen that the entire handshake process is relatively clear, and the processing logic is relatively simple and easy to understand.

3.2.2 Part Two-Information Exchange

3.2.2.1 Parsing the Chunk information of RTMP protocol

After the handshake, you have to start to do connection and other related things. Before doing this information processing, you must first sharpen your tools if you want to do well.

We must first parse the Chunk Header and Chunk body according to the RTMP protocol specification, convert the byte packet data transmitted by the network into our identifiable information processing, and then perform the corresponding process based on the identifiable information data. This is the key core of source code analysis, and there are many knowledge points involved. You can read it together with the above, which can facilitate your understanding of the core logic of ReadMsg.

The logic of the above code block is very clear. The main thing is to read each conn connection, perform the corresponding codec, obtain one Chunk, and merge the Chunk with the same ChunkStreamId again to merge into the corresponding Chunk Stream, and the last one The complete Chunk Stream is Message.

This piece of code is closer to the chunkstreamId piece of knowledge introduced in our previous theoretical part of knowledge. You can combine it and look at it. In your mind, please note that a conn connection will transmit multiple messages, such as connecting Messages. , CreateStreamMessage, etc., each Message is a Chunk Stream, that is, multiple Chunks with the same csid, so the author of livego uses a data structure like map for storage, the key is the csid, and the value is the chunkstream, so that it can be sent to the rtmp server All the information coming over can be preserved.

The specific logic implementation of readChunk code is divided into the following parts:

1) Modification of csid. As for the theoretical part, refer to the above logic. This is actually the processing of the basic header.

2) Chunk Header performs corresponding analysis and processing according to the value of format. The theoretical part has also been introduced above, and there are specific annotations below. There are two technical points to pay attention to. The first is the processing of timestamp and the second The point to note is the chunk.new(pool) line of code, which also requires your attention. The code comments are also clearer.

3) Chunk Body reading processing. As mentioned in the theoretical part above, when fmt is 0 in the Chunk header, there will be a message length field, which will control the size of the Chunk Body. According to this field, we can easily The overall logic of reading the Chunk body information is as follows.

So far, we have successfully parsed the Chunk Header and read the Chunk Body. Note that we have only read the Chunk Body and have not parsed the Chunk Body according to the AMF format. The logic processing of the Chunk Body part will be detailed below. The source code introduction, but now we have parsed a ChunkStream sent by a connection, and then we can return to the analysis of the main process.

I just said that after the handshake is completed, and we have also parsed the ChunkStream information, then we will perform the corresponding process flow processing based on the typeId of the ChunkStream and the AMF data in the Chunk Body. The specific idea can be understood in this way. Client A Send the xxxCmd command, the RTMP server parses the xxxCmd command according to the typeId and AMF information, and responds to the corresponding command.

The handleCmdMsg in the above code block is also the essence of the RTMP server's processing of client commands. It can be seen that livego supports AMF3 and AMF0. The difference between AMF3 and AMF0 has also been introduced above. The code comments below The writing is also relatively clear, and then it is to analyze the data of the Chunk Body in AMF format, and the result of the analysis is also stored in the Slice format.

After parsing the typeId and AMF, the next step is to process each command as a matter of course.

Next is the processing for each client command.

3.2.2.2 Connection

Connect command processing process: During the connection process, the client and server will complete the confirmation of window size, transmission block size and bandwidth size. The original text of the RTMP protocol describes the connection process in detail, as shown in the following figure:

Similarly, here we use WireShark to capture and analyze packets:

It can be seen from the packet capture that the connection process was completed with only 3 packets:

Package 22: The client tells the server that I want to set the chunk size to 4096;
Package 24: The client tells the server that I want to connect to the application called "live";
Packet 26: The server responds to the client's connection request, determines the window size, bandwidth size and chunk size, and returns "_result" to indicate a successful response. These are all done through a TCP packet.

So how do the client and server know the meaning of these packages? This is the rule established by the RTMP protocol specification. We can understand it by reading the specification. Of course, we can also use wrieshark to help us quickly analyze it. The following is a detailed analysis of packet 22, we only need to focus on the RTMP protocol analysis information.

As can be seen from the figure, RTMP Header contains Format information, Chunk Stream ID information, Timestamp information, Body size information, Message Type ID information and Messgae Stream ID information. The hexadecimal value of Type ID is 0x01, meaning Set Chunk Size, which belongs to Protocol Control Messages.

Section 5.4 of the RTMP protocol specification stipulates that for protocol control messages, Chunk Stream ID must be set to 2, Message Stream ID must be set to 0, and the timestamp is ignored. From the information parsed by WireShark's packet capture, it can be seen that the 22nd packet is indeed in compliance with the RTMP specification.

Now let's take a look at the detailed analysis of package 24.

Packet 24 is also sent by the client. You can see that it sets the Message Stream ID to 0 and the Message Type ID to 0x14 (that is, 20 in decimal), which means the AMF0 command. AMF0 belongs to RTMP Command Messages. The RTMP protocol specification does not stipulate the Chunk Stream ID that must be used in the connection process, because the Message Type ID is really working, and the server responds accordingly according to the Message Type ID. The AMF0 command sent during the connection process carries Object type data, which will tell the server to connect to the application name and playback address and other information.

The following code is how livego processes the client connection request.

After receiving the client's request to connect to the application, the server needs to make a corresponding response to the client, that is, the content of packet 26 captured by WireShark. The details are shown in the following figure. You can see that the server has done it in a package Several things.

We can combine the livego source code to learn the process in depth.

3.2.2.3 createStream

After the connection is complete, the stream can be created. The process of creating a stream is relatively simple, and it can be achieved with only two packages, as shown below:

The 32nd package is the client initiates a createStream request, and the 34th package is the server response. The following is the source code for livego to process the client connection request.

3.2.2.4 Push Stream

After creating the stream, you can start to push or pull the stream. Section 7.3.1 of the RTMP protocol specification also provides a schematic diagram of the push stream, as shown in the figure below. The process of connecting and creating a stream has been introduced in detail above, and we focus on the process of publishing content (Publishing Content).

Before using livego to push, you need to obtain the channelkey of the push. We can get the channelKey whose channel is "movie" through the following command. The value of the data field of Content in the response content is the channelKey required for streaming.

$ curl http://localhost:8090/control/get?room=movie
 
StatusCode        : 200
StatusDescription : OK
Content           : {"status":200,"data":"rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575K
                    LkIZ9PYk"}
RawContent        : HTTP/1.1 200 OK
                    Content-Length: 72
                    Content-Type: application/json
                    Date: Tue, 09 Feb 2021 09:19:34 GMT
 
                    {"status":200,"data":"rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575K
                    LkIZ9PYk"}
Forms             : {}
Headers           : {[Content-Length, 72], [Content-Type, application/json], [Date
                    , Tue, 09 Feb 2021 09:19:34 GMT]}
Images            : {}
InputFields       : {}
Links             : {}
ParsedHtml        : mshtml.HTMLDocumentClass
RawContentLength  : 72

Use OBS to push streaming to the movie channel named live in the livego server. The streaming address is: rtmp://localhost:1935/live/rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575KLkIZ9PYk. Similarly, let's take a look at the contents of WireShark's packet capture first.

At the beginning of the push, the client initiates a publish request, which is the content of package 36. The request needs to include the channel name. In this package is "rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575KLkIZ9PYk".

The server will first detect whether the channel name exists and whether the push stream name is in use. If it does not exist or is in use, it will reject the client's push request. Since we have generated the channel name before pushing the stream, the client can use it legally, so the server responds with "NetStream.Publish.Start" in the 38th package, which tells the client to start pushing. The client needs to send the metadata of the audio and video to the server before pushing the streaming audio and video data, which is what package 40 does. We can look at the details of the package. As can be seen from the figure below, there are a lot of metadata information sent, including key information such as video resolution, frame rate, audio sampling rate, and audio channel.

After telling the server audio and video metadata, the client can start sending valid audio and video data, and the server will continue to receive these data until the client sends the FCUnpublish and deleteStream commands. The main logic of the TransStart() method of stream.go is to receive the audio and video data of the push streaming client, then cache the latest data packet locally, and finally send the audio and video data to each streaming terminal. Among them, the method of VirReader.Read() in rtmp.go is mainly used to read the single audio and video data of the push client. The relevant code and comments are as follows.

Part of the source code analysis of media header information analysis is attached.

Parse audio head

Parsing video header

3.2.2.5 Pull flow

With the continuous push of the streaming client, the streaming client can continuously pull audio and video data through the server. Section 7.2.2.1 of the RTMP protocol specification describes the streaming process in detail. Among them, the process of handshake, connection, and flow creation has been described before, so we can focus on the process of the play command.

Similarly, we first use WireShark to capture packets for analysis. The client tells the server through packet 640 that I want to play the channel called "movie".

Why is it called "movie" here instead of "rfBd56ti2SMtYvSgD5xAV0YU99zampta7Z7S575KLkIZ9PYk" used when pushing. In fact, these two points to the same channel, but one is used to push the stream and the other is used to pull the stream. We can get it from the source code of livego Confirm this.

After the server receives the play request from the streaming client, it will respond with "NetStream.Play.Reset", "NetStream.Play.Start", "NetStream.Play.PublishNotify" and audio and video metadata. After these tasks are completed, you can continue to send audio and video data to the streaming client. We can deepen our understanding of this process through the livego source code.

The push data is read through chan, and then sent to the pull client.

So far, the main flow of RTMP is like this. There is no source code description for specific transmission protocols such as FLV and HLS or format conversion. That is to say, how the RTMP server receives the audio and video packets from the push client will remain intact. It is distributed to the streaming client without additional processing, but now the streaming terminals of major cloud manufacturers support the support of http-flv, hls and other transmission protocols, and also support the audio and video recording and playback on-demand functions. In fact, livego also supports it.

Due to space constraints, I will not introduce it here. I will have the opportunity to learn and share the processing of this logic by livego separately.

Four, outlook

At present, RTMP-based live broadcast is the benchmark protocol for domestic live broadcasts, and it is also a live broadcast protocol compatible with major cloud vendors. Its excellent features such as multiplexing and subcontracting are also an important reason for major vendors to choose it. On this basis, but also because it is an application layer protocol, large cloud vendors such as Tencent, Alibaba, and SoundNet will also modify the source code details of the protocol, such as realizing the mixing of multiple audio and video streams, single-channel Recording and other functions.

However, RTMP also has its own shortcomings. The high latency is one of the biggest problems of RTMP. In the actual production process, even in a relatively healthy network environment, the latency of RTMP will be 3~8s. The 1~3s theoretical delay value given by the big cloud manufacturer is still quite different. So what are the problems caused by time delay? We can imagine some scenarios as follows:

Online education, when students ask questions, the teacher talks about the next point of knowledge before seeing the student's last question.
E-commerce live broadcasts, ask for baby information, the anchor "ignores".
After the reward, I couldn't hear the host's oral thanks for a long time.
Knowing that the ball has been scored in the shouts of others, do you still watch the live broadcast?

Especially now that live broadcast has formed an industrial chain environment, many anchors regard it as a profession, and many anchors use the same company network for live broadcast. In the case of limited export bandwidth of the company network, RTMP and FLV formats The delay will be more serious. The high-latency live broadcast affects the real-time interaction between users and the anchor, and also hinders the landing of some special live broadcast scenarios, such as live streaming with goods and live education.

following is a conventional solution using the RTMP protocol:

According to the actual network conditions and some settings of streaming, such as key frame interval, streaming bit rate, etc., the delay will generally be about 8 seconds, and the delay mainly comes from two major aspects:

CDN link delay, which is divided into two parts, one part is network transmission delay. There are four segments of network transmission inside the CDN. Assuming that the delay caused by each segment of network transmission is 20ms, then these four segments of delay are 100ms; in addition, using RTMP frames as the transmission unit means that each node must receive a full frame. In order to start the downstream forwarding process; in order to improve concurrency performance, CDN will have a certain optimization package strategy, which will increase some delays. In the network jitter scenario, the delay is even more uncontrollable. Under a reliable transmission protocol, once there is network jitter, the subsequent sending process will be blocked, and it is necessary to wait for the retransmission of the pre-order packet.
Player buffer, this is the main source of delay. The public network environment is very different, and network jitter in any of the links of streaming, CDN transmission, and playback and reception will affect the playback end. In order to combat the jitter of the front link, the conventional strategy of the player is to reserve a media buffer of about 6s.

Through the above description, we can clearly know that the biggest delay of live broadcast is the delay of the streaming end (playing end buffer), so how to quickly eliminate the delay at this stage is an urgent problem for major cloud vendors. This is the follow-up major cloud vendors to launch new products that eliminate the delay of the RTMP protocol, such as Tencent Cloud’s "fast" live broadcast, Alibaba Cloud’s ultra-low latency RTS live broadcast, etc. In fact, these live broadcasts have introduced WebRTC technology. Have the opportunity to learn relevant knowledge together.