The live broadcast has penetrated into every household. Taking Taobao's live broadcast as an example, how to realize the non-sensing screen combination or screen cutting in the interaction between fans and anchors? Xiao Kai, head of Alibaba Cloud's GRTN core network technology, shared with us the operation mechanism and application aspects of the GRTN core network and the practical optimization of the QOE network model in the business sector at the LVS2022 Shanghai station.
Alibaba Cloud Global Real-time Transport Network GRTN
GRTN is Alibaba Cloud's globalized real-time communication network. It is built on the center cloud-native and edge cloud-native infrastructure, and organically integrates technologies. Drawing on the design concept of SDN, it separates CDs, puts control in the center, and places data in the center. Surface distribution sinks to more than 2,800 nodes in Alibaba Cloud Edge Cloud. GRTN has scenario-based QOS capabilities, real-time communication capabilities within 400 milliseconds, and ultra-low latency capabilities, as well as full-link RTC and dynamic networking capabilities. GRTN provides an integrated solution, which not only supports the streaming media features of video uploading to the cloud and video distribution, but also has distributed computing and distributed storage processing capabilities.
This sharing is divided into two parts around GRTN:
- GRTN's philosophy and capabilities offered.
- How to optimize the QOE indicators in the commercial landing of GRTN.
The philosophy of GRTN
To put it simply, the current GRTN of Alibaba Cloud is based on more than 2,800 edge nodes covering the world. We use these nodes and network resources to form a communication-level SFU transmission network.
These nodes, including solving cross-continental network problems, have dedicated lines. The entire system has evolved from live broadcast. In the past, many CDN live broadcast networks generally had a tree-like structure. However, Alibaba Cloud's GRTN is a dynamic network that combines tree and mesh. Currently, the screen-to-screen delay supported by Alibaba Cloud GRTN is about 100 milliseconds, which is suitable for scenarios such as cloud gaming or cloud rendering.
GRTN provides content transmission and distribution. Any user uses RTP protocol to push media to Alibaba Cloud GRTN nodes, and it can distribute content from GRTN anywhere in the world. At the same time, GRTN will also solve the problems of dynamic networking and nearby access.
GRTN business model
The current business model of GRTN is the RTS 1.0 of Alibaba Cloud, which is currently accessed by many customers.
RTS 1.0 was developed by Alibaba Cloud around 2018. Its core concept is to help customers connect to GRTN and reduce delays under the premise of limited transformation.
The delay of traditional live FLV is about 5 seconds, and the delay of HLS is more than 20s. RTS is to transform the streaming side or the playback side. The most important thing is to replace the playback side protocol with RTP, which can achieve a delay of about 1 second. This technology has been fully implemented in Taobao Live in 2019.
After RTS 1.0, Alibaba Cloud entered the era of RTS 2.0.
In the era of RTS 2.0, our expectation for real-time streaming scenarios is that there is no distinction between RTC and live broadcast, and all services can be built on the full-link RTP protocol. The whole link uses communication-level transmission, which is the technical concept of GRTN. The current RTS 2.0, which has communication-level service capabilities.
The transmission delay of RTS 2.0 is basically around 100 milliseconds in China, that is, the transmission time of the node. The remaining delay can be placed on the encoding side or on the playback side for anti-jitter. Such scenarios can be used for one-to-one video communication, multi-person conferences, and live broadcast integration.
How to make one-to-one communication on GRTN?
Alibaba Cloud GRTN's external services include two modes: one-to-one communication and multi-person conferences.
one-to-one communication
The first is Alibaba Cloud's SDK. By using GRTN's private protocol, it also supports browsers, so the ecology of GRTN is completely open. Users can use a browser to interface with the GRTN in a standard SDP signaling interaction mode, push the media in, and then selectively pull the media out through the GRTN. Two clients and GRTN can choose to exchange audio, video or customized messages through single PC or multi-PC mode, and realize communication-level transmission through GRTN, which is one-to-one communication.
This model is not limited to communication, but can also be used for cloud rendering, cloud gaming.
multi-person meeting
On the basis of one-to-one communication, GRTN supports multi-person conferences, as shown in the figure above.
Generally speaking, when there are many participants, it is a very troublesome problem to selectively subscribe to the video and audio of the peer, because Audio Ranking is involved. In order to do this kind of multi-person conference, many business parties have to put the audio on a dedicated Ranking Server.
GRTN provides large-scale Audio Ranking capabilities, which means that any end that consumes audio on GRTN can perform Audio Ranking for it. Whatever the person subscribes to, GRTN will perform Audio Ranking in the audio subscribed by this person, without involving the Ranking server and without increasing the delay.
Another important capability of GRTN is cut flow. GRTN can realize his media replacement for any audience, and this is a very core capability in the cloud-connected microphone scene. On a browser, the viewer can see the screen through GRTN, and the viewer can switch the screen without feeling completely through the instruction of switching the stream.
This is the streaming capability of GRTN, which can realize real-time switching of media images for all viewers of a certain anchor on GRTN, and can switch from image a to image b, and from anchor a to anchor b. The audience is completely insensitive.
How to use the streaming capability to realize cloud-connected wheat confluence?
In the scene of Lianmai, if it is the Lianmai of the client, it means that the two anchors ab are connected to the microphone. When the audience is watching anchor a, they are connected to the microphone, and the screen that the audience sees becomes a combined screen of a and b in real time. screen. This kind of scenario can be easily realized through end-to-end confluence, that is, anchor a directly changes his own screen on the end, and the content that the audience sees changes accordingly. However, there are some scenarios where end-to-end confluence cannot be achieved. For example, the end-end performance is not sufficient, so in such scenarios, cloud confluence is required.
As shown in the figure above, after the image of an anchor stream is pushed to GRTN, there is a viewer watching the anchor's screen. When the anchor and other fans are connected to the microphone, there is a combined screen server of the business side after the connection. The server will combine the two media into one.
At this time, it is necessary to realize the screen switching of the client, and all of them have to be cut. At this time, the ability we provide is the cutting instruction, that is, the ability to cut the flow mentioned above. After the stream switching command is transmitted to the GRTN, the GRTN switches the images of all the viewers of the anchor to the images of the composite stream without feeling.
This capability is currently the basic solution for realizing the complete integration of Taobao Live's live broadcast on GRTN and Lianmai.
Moreover, this is a general solution. With the external output of GRTN and subsequent RTS 2.0 services, this capability will be directly opened to the outside world.
At present, Taobao live broadcast has actually been carried out in full through GRTN, and the delay between the audience and the anchor in any live broadcast is basically within 1 second. This is a typical business scenario of GRTN on RTS2.0 at present.
QOE overview and optimization difficulties
The optimization of QOE is actually based on the data of Alibaba Cloud's external customers. Why do we talk about QOE instead of QOS?
Because we found in the process of receiving customers, QOE is usually a series of indicators set by the customer itself, such as penetration rate, viewing time, business conversion rate, these indicators are not a good QOS indicator, QOE is can get better.
For example, in the customer scenario of GRTN, our first frame freeze, 100-second freeze duration, delay, and image quality are leading in all aspects. (The QOS of RTS must be better than FLV in all directions, so it doesn't need to be better than HLS.)
However, when faced with different customers, the QOE of some customers is correct, and the QOE of some customers has problems, because after customers transition from traditional FLV to RTS and RTS 2.0, the adaptation of the client will not be done well. , or the running-in of business scenarios was not done well, and problems were encountered.
For example, for WebRTC to communicate, the buffer mechanism of the player can be very aggressive, but in the live broadcast scene, the experience of the audience may be more important than your aggressive delay control, so in the live broadcast scene, it is more important to Do a balance.
During this process, we found that sometimes customers have made all QOS correct, but QOE still needs to spend a lot of time to deal with, so in the process of making QOE correct, what method should be used?
This is what Alibaba Cloud will continue to invest in QOE. If you want to do a good job in QOE, you must have business input. Without business input and business feedback, QOE must be done incorrectly. Therefore, Alibaba Cloud has continued to invest in business-based data-driven technology.
The most important point here is the data of the client. In the process of QOE, we believe that the server is not qualified to say QOE. Only the client and the business are qualified to say that their QOE is so positive. So in this process, GRTN's method is to first obtain the desensitization data of the business side, and then do QOE.
GRTN QOE Optimization Concept
A concept of GRTN optimizing QOE is: GRTN achieves inductive link switching.
The interior of GRTN is an all-SFU network, and the upstream network can be switched at any time, which is completely insensitive to the audience. At the same time, there are strong real-time main and backup links. In many live broadcast and communication scenarios, there will be the concept of re-assurance, or strong real-time dual-path guarantee. If there is a problem between the nodes, it can be immediately switched to another node link, so that the audience will not feel it at all.
There is also a mobility solution between the GRTN node and the client. For example, a node may have network problems, or the client's network has switched from WiFi to 4G, then using a mobility solution can instantly switch nodes, while the downstream of GRTN Consumers are completely unaffected.
Another method of GRTN to optimize QOE is the programmable strategy.
Programmability is actually an achievement made by Alibaba Cloud in the past year. Traditional QOS optimization capabilities, such as enabling BBR or GCC or other congestion control algorithms, will send a bunch of configurations, and the configurations are all switches.
However, the current GRTN can directly execute modules with programmable policies at the edge. Similar to CDN, it has programmable capabilities, including edge scripts. GRTN is also similar, but it is done more thoroughly. The current ability is that the node can directly issue policies, run the language, directly control the frame sending and packet sending logic, intervene in the retransmission logic, and directly program the behavior of each client of GRTN, that is, configure the system through policies. Send the code directly, without software release upgrade.
Because there are more than 2,800 nodes, it is impossible to upgrade the software version frequently, but using the GRTN programmability can realize several strategy iterations a day, combined with the data of the client, the data can be opened up. In this way, the client gets the QOE data and feeds it back to GRTN, and the GRTN tuning personnel know how to further optimize.
The picture shows the random configuration of multiple scenarios of GRTN, which is also based on the huge amount of business data on the Alibaba Cloud line.
For example, Alibaba Cloud's online configuration management system will issue configuration sets, which are the basic capabilities of AB. Later, the configuration management system will send n groups of configurations to all edge nodes in the entire network in real time, targeting a certain domain name.
For this domain name, at the same time, three sets of configurations are sent to him for randomization, which may be assigned a certain weight. For example, Alibaba Cloud considers conf_1 to be a high-risk configuration, a new high-risk function. After it is sent, conf_1 is assigned to 1% of the network's business volume for AB. After it is sent to the node, when any consumer comes to GRTN to consume content, it will make a random weighted selection, it has a certain probability to use conf_1, and it also has a certain probability to use the latter two.
After the request in the first step is completed, we let multiple groups of configurations run online at the same time, but how do we get the results after running?
The simple way is that the client records our trace_id. GRTN has a concept of trace_id. This ID corresponds to this playback of the client, and the ID of any two playbacks is different.
Another method is for the client to put a session ID in its request parameters, so that a client has a session ID corresponding to the trace_id in GRTN. We can also record the conf used for this playback. At the same time, in this playback, according to the session ID, we can check its QOE result from the client's buried point.
GRTN Horse Racing System
Next, associate it. After the player finishes playing on GRTN, the player starts to bury the log. The buried core log includes the first frame time, 100-second rendering freeze, and the playback time of any player.
In the log recorded by the business side, you will know how long the session id has been playing this time, and how its various indicators are. In GRTN, we know which trace_id is sent, and then for this playback, how much buffer depth is allocated, and what is the current statistics of the packet loss rate.
These two data (server log and client log) collect the customer's log and send it to us. Here, the session ID and trace_id are integrated in the data analysis system of GRTN, and a result is obtained. : What is the network condition of the corresponding server for any playback, what is the first frame time, 100-second rendering freeze, and playback time of the corresponding client. GRTN integrates the data of the client and a behavior of the server through the integration of these two kinds of data.
After the association is done, the next step is to build a horse racing system.
In any configuration, just like when Alibaba Cloud is tuning customers, we will communicate with customers about tuning in advance.
For example, in such a configuration, taking the customer online business as an example, conf_1 is a high-risk function, conf_2 is for tuning parameters of existing functions such as BBR, and conf_3 may enable GCC. The configuration is sent to the node. After the client plays the playback, the client and server data are collected from the previous two steps, and then collected to the GRTN side. After the data is uploaded, a comprehensive analysis of the AB results is made. .
At this time, in the eyes of the R&D personnel, it is already clear how the effect of the configuration of each group issued and what the difference is. R&D and tuning personnel can know how to do further tuning, and feedback which set of configurations can be eliminated, and then perform further tuning based on good configurations. So this is the value of the horse racing system - it can be integrated and continuously iterated based on client-side data and server-side data.
The picture shows the horse racing system. As a whole, it has the node network of GRTN, and the data reported by the service client is connected with the log system of GRTN, so as to cooperate with each other.
GRTN QOE optimization case
This is an optimized example of GRTN, the grading of the horse racing system.
At that time, we had 4 sets of experiments, normal is the configuration of the daily running constants, radial is a set of very radical configurations, and reference is the reference used to compare with the radial. As shown in the figure, a six-dimensional display is made, and it is also comprehensively scored according to our ideas.
The more detailed results are in the table above. After the conf_id mentioned just now is assigned, after running, some data such as the success rate and the second open are obtained. This is the data that the horse racing system that GRTN is currently showing can see.
The success rate and seconds to open belong to the category of QOS, and the final average playback time belongs to the category of QOE.
The data of the radial group we tested is the best. It may have an advantage of about 1 second in playback time. It has accumulated 24 hours of data, which is about hundreds of thousands of orders. We think this order of magnitude The playback is data that can be used to support AB. GRTN first made this system in the mobile shopping scene. The business volume of mobile shopping is relatively large, so we started to run it through the full magnitude of online shopping. Now it can be run directly through the data of external customers to make a horse racing system, and the programmable capabilities of Alibaba Cloud and the data collection of the client, including horse racing, are made into a closed loop.
The current optimization method, if you want to optimize a certain strategy, you need to send a set of configurations. For example, sending a set of configurations, running an evening peak, and getting the data results the next day, this process actually has great advantages for iteration.
For example, in March of this year, when we adjusted the playback duration for a certain customer, we analyzed some behaviors of the client, including analyzing the data through tests, and found that the customer's audio and video synchronization may have some problems. How to solve this problem?
We believe that adjusting the frame sending strategy of the server can help the client to better achieve audio and video synchronization. We programmed this strategy and sent it out, and it worked very well the next day. We found that after the release, the audience playback time of this group of configurations increased, which is the optimization of QOE.
On this basis, the first round of iterations has been completed, and we think this route is right. The next step is to further tune the parameters on this route.
The above is compiled from the full text of the speech of LVS2022 "Global Real-time Transmission Network GRTN-QOE Optimization Practice".
"Video Cloud Technology", your most noteworthy public account of audio and video technology, pushes practical technical articles from the frontline of Alibaba Cloud every week, where you can communicate with first-class engineers in the audio and video field. Reply to [Technology] in the background of the official account, you can join the Alibaba Cloud video cloud product technology exchange group, discuss audio and video technology with industry leaders, and obtain more latest industry information.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。