A few years ago, many people were very unfamiliar with online courses. With the popularization of mobile devices and the development of audio and video technology, online education products are now flourishing. Online education products can serve millions of students without the support of streaming media distribution technology. This LiveVideoStackCon
The Beijing Station of the 2021 Audio and Video Technology Conference invited Zhou Xiaotian, a research and development engineer of Netease Youdao, to share with us the streaming media distribution related content of Netease Youdao's online education business.
Text | Zhou Xiaotian
Organize | LiveVideoStack
Hello everyone, I am from the research and development team of Netease Youdao Excellent Course. Nowadays, audio and video have attracted wide attention from all walks of life, and "live broadcast +" has become a hot topic, and major manufacturers have also launched a series of audio and video related services.
NetEase is an intelligent learning company mission is to achieve "efficient learning" for learners. Relying on powerful Internet AI and other technical means, it has created a series of learning products and services that are deeply loved by users around learning scenarios. In addition to online education platforms for various scenarios, there are also market-leading software and hardware learning tools such as Youdao Dictionary and Youdao Dictionary Pen.
Among them, the online education business is an important business that comes into being based on the continuous maturity of audio and video technology.
Audio and video technology has a wide range of content, a long chain, and each point will be deep. Therefore, the content shared today is based on Youdao's online education business, focusing on the part of the Youdao team's streaming media distribution server.
Today's content is divided into three parts, which are the introduction of Youdao online education business, the evolution of the distribution system architecture, and the thinking and practice of distribution difficulties.
1. Introduction to online education business
First, understand the needs through the online education live broadcast business form, and clarify what the media distribution server needs to consider.
Different classes correspond to different needs. Around 2013, 1V1 courses and ordinary small classes first appeared. Essentially, it is an educational product built with the help of the RTC real-time communication model. Later, game live broadcast and entertainment live broadcast became familiar to everyone, and the main form of online learning that was well-known at this stage was the video-on-demand mode, such as NetEase Open Class. With the maturity of technology in the audio and video field and the upgrading of users' needs for online education, live online courses have developed rapidly. Live classes appeared around 2014 and received unprecedented attention after the epidemic.
The traditional large class live class is a one-way push by the teacher. In the interactive large class, students can further interact with the teacher to obtain a better class experience. Student microphones, screens/whiteboards, teacher videos and interactive messages constitute the main content of a lesson.
The interactive small class further optimizes the interactivity of the product and enhances the students' sense of classroom participation, learning experience and learning effect. Audio and video + H5 interactive components + flexible layout requirements also bring additional complexity.
To design services for business, it is necessary to understand the differences between different businesses before adopting corresponding technologies. Here is a way of thinking: Take an interactive large class as an example, a teacher and a student are connecting the microphones, and then distribute the process of connecting the microphones to other students. For streaming distribution, some considerations are listed on the right: What level of latency and fluidity is required? How big is it? What media quality is required? Sensitivity of current line of business to solution cost?
Further, in this way, different curriculum forms can be compared horizontally, and more detailed requirements can be obtained through their differences.
For example, comparing the large-class live class and the interactive large-class class: for a session of size M, the large-class live class should distribute one person's information to M-1 people, which can be done through CDN-based video live broadcast. If you further want to add Lianmai interactivity to the product, it will become an interactive class. The addition of Lianmai will make the simplified model into two parts. How to meet these two needs in one classroom? The simplest idea is to exchange the content of Lianmai through RTC on the basis of the original CDN distribution, and then distribute their information through the original CDN system, but this will cause problems such as content delay and user switching delay.
Comparing the interactive large class with the (online and offline) dual-teacher class, although the models are similar, a "student end" in the dual-teacher class in the scene may correspond to all students in an offline classroom, which will increase the one-way distribution The cost of exceptions, such differences also require the system to configure different strategies for different scenarios.
In addition to online education, the idea of horizontal comparison can also be used to analyze business lines in other scenarios, such as ordinary small classes and game hacking. Opening black seems to be similar to ordinary small-class courses that only send voice, but it has stricter requirements in terms of performance and network usage. While not occupying the game bandwidth as much as possible, it is also necessary to minimize the operation of the CPU to provide sufficient computing power for the game. If you directly use the RTC interface of the small class for games, it will affect the game while ensuring the quality of the call. If a single system is expected to support multiple services, business differences and design requirements must be identified early in the system design.
Through the above analysis, some main demand points of online education business for media distribution system can be listed. The first is to meet the requirements of low-latency distribution and low-latency mic loading. The second point is to do mass distribution. Relative to some entertainment scenarios, it is necessary to achieve high stability and high availability. The fourth point is to control costs. Finally, different students and different classrooms have different requirements for class scenarios, so multi-terminal access must be supported.
2. Evolution of distribution architecture
When multiple business lines are rolled out at the same time, courses ranging from 1v1 to small classes, to large class live broadcasts, to interactive large classes and interactive small classes will affect the evolution of the distribution system. One way of thinking is that as the business evolves, the distribution architecture becomes more complex and supports more and more features. Youdao did not adopt this idea, but experienced the switch from CDN-based distribution to real-time communication network (RTN) for all services, without an intermediate transition state in terms of architecture.
Below we briefly review some distribution architectures as popular content.
The tree structure of live content distribution based on CDN network is very clear. The structure itself determines the routing of data, and it is easy to maintain and control risks and costs. When a user selects an edge access, the distribution route of media data has been planned. At the same time, it has its own shortcomings, such as: it only supports one-way distribution, and the fixed delay brought by the protocol.
In order to increase the interactivity and reduce the delay of live broadcasts deployed in the CDN mode, two optimizations were made on the basis of the CDN architecture. On the one hand, the edge pull node supports RTC access (also written as RTN edge node in the figure), thereby shielding the delay caused by the media encapsulation protocol, increasing the IM interaction effect, and at the same time increasing the weak network resistance. On the other hand, in order to further increase the interactivity, an RTC bypass system was added to support two-way microphone connection, and then the content of the connection microphone was retweeted to the CDN network to complete the live broadcast. Some "low-latency CDN live broadcast" products use this principle.
It was just mentioned that the bypass RTC system used for Lianmai needs to retweet the content to the CDN distribution network. Can this system also perform the task of large-scale CDN distribution? So there is a pure RTN architecture. The architecture no longer has a distinct tree-like distribution structure, but instead uses a mesh topology to distribute everything. Any one-way streaming client can be switched to two-way communication at any time, and there is no need to switch the system first.
Through the above analysis, we can roughly summarize the evolution direction of live streaming media distribution in the industry—the boundary between CDN and RTC network for audio and video live broadcast is blurred and gradually integrated. Live CDN manufacturers gradually support low-latency access and connected microphones from one-way large-scale distribution. The previous RTC products started to complicate the distribution network gradually in order to serve thousands of people and 10,000 people at the same time. So now we can see that NetEase's WE-CAN distributed transmission network, Alibaba Cloud GRTN streaming media bus, and other "X-RTN" are the results of this evolution.
The architecture just mentioned is mainly the product of ToB manufacturers. In the ToC service scenario, there will also be an architecture as shown in the figure above. A media server integrates two distribution networks to provide services, especially for self-developed and three-party connections. in time. While this structure brings new non-functional properties, it also has great risks. Youdao did not choose to use a similar architecture for transition, but directly replaced the original functions with the RTN distribution network.
This architecture can meet the needs of various scenarios, and also supports multiple push-pull streaming client access. For example, when students take an open class, it is most convenient to watch directly through the WeChat applet or browser. Users who have already used the course APP and have participated in a series of courses can use the APP to access to obtain the best experience.
Compared with the topology of the CDN architecture itself, which determines the routing of data distribution, the RTN mesh topology brings flexibility while increasing complexity. For example, routes cannot be obtained directly from the topology, but an additional dispatch center is required to calculate and plan routes and complete the scheduling of corresponding forwarding resources, which also highlights the importance of the dispatch center under the RTN architecture.
There is also a CDN bypass part in the figure. Its main function is to balance the load of some courses with excessive burst access to increase the flexibility of the system.
Youdao prefers flexibility when designing network node topology. On the one hand, the distribution nodes are not hierarchical and hierarchical, and adopt a flat topology. On the other hand, the network distribution characteristics can be changed by configuring different attributes and roles.
3. Thinking and practice of distribution difficulties
For the streaming media distribution system, there are four main points - access problem, network connectivity, route establishment and forwarding. In addition to this, I would like to share about the concept of layered design and channels.
The core concept of solving the access problem is "nearest" access—the access with the best network quality is the "nearest" access. (Different types of services may have different ideas: in the teaching scenario of Youdao, we strive to achieve the best possible experience for each user, similar to the greedy algorithm; but in other services, the idea may be to reach the minimum QoS limit The most intuitive method is to use IP-based and location-based access recommendation. Further use the historical data of different gateway network detection and connection to optimize the recommendation results. In addition to using the prior knowledge obtained from online and offline data statistics for access recommendation, considering that this method cannot cover all special situations, Youdao also introduces support for manual configuration. It is very effective to support manual hot pairing in some ToC scenarios
In the lower right corner is a large class teacher's uplink packet loss rate map. It can be seen that there is a regular, average, about 9% packet loss. The teacher has used fixed equipment for live broadcasts at a fixed location for a long time, and in the early days, there was technical support for students to check the network, and the network has always been good. According to the previous algorithm, his location has not changed, the network has not changed, and the recommendation database used has not changed much, so according to the algorithm, the same recommendation result will be given each time. The sudden regular packet loss is presumed that the traffic behavior is identified and classified by the operator, and it is restricted by policy.
Faced with this situation, modifying the algorithm will not work. Through the method of hot configuration, when problems are found and reported, the configuration can be manually modified. The next time the teacher accesses, the corresponding access node will be avoided to solve the problem of packet loss.
We implement this operation through the "filter" mechanism: if all accessible nodes form a pool, the final "filtered" result constitutes a list recommended for clients to access. Therefore, the calculation process of the filtering rules is written into the system as an algorithm, and the parameters to be used in the algorithm execution are written in the database as hot-updateable data.
Access only solves the problem of the entrance of the distribution network, so what is the topology of the distribution network? This involves the connectivity design of network nodes. The Youdao network is a flat topology, and each computer room is a flat point in the topology. In theory, connections can be established between all nodes to become a mesh network, then such a network will be extremely flexible, any path can be planned, and the actual routing selection is completely dependent on the algorithm. Youdao does not take this approach.
We still introduced some artificial experience, such as deleting the connectivity of some computer rooms based on experience to become a non-Full mesh structure. It can be considered as pruning and organization by means of manual methods. In addition to connectivity, it is also necessary to solve the problem of obtaining weights during routing calculation, and it is also necessary to quantitatively describe the differences in node connection. This quantification is based on regular QoS detection. Similar to the previous access selection problem, the algorithm may not be able to finely meet all cases or some special cases. In addition to quantifying differences, we also describe qualitative differences through configurable attributes. differences to increase topology flexibility.
The reason for improving flexibility and supporting manual configuration is to meet the differentiated needs of different businesses. There is also a price, which is increased complexity. So maybe there is no best architecture, only a more suitable architecture.
After determining the access location (defining the starting point and end point of distribution) and establishing the connectivity of the distribution network, the problem to be solved is routing planning or scheduling. There are three points of practice and thinking that can be shared here: one route planning, multi-path and cost control. Planning a single route is the basis for completing data distribution. We calculate the routing weight based on the dynamic detection and refreshed network QoS quantification quality and the current node status and node configuration. With the undirected weighted graph, the end point and the start point, a shortest distribution route can be planned.
The access problem has been solved, and the definition of distribution network connectivity has been completed. Now, the planning of media data distribution routing has been solved, and it seems that the distribution task can be completed. However, this is not enough for Youdao's business requirements. To further ensure the user experience, it is necessary to improve the resistance of the distribution network to jitter and packet loss. Multi-path distribution is a guaranteed way. Youdao distribution network has three paths—main path, alternate path, and real-time path. The main path is directly used for service distribution; the alternate path is the backup of the main path, which is generated when the main path is planned and switched when the main path is abnormal. The real-time path is a multi-path redundant distribution path established in addition to the main path to provide stronger distribution jitter and packet loss resistance, which is of high value for some key tasks and large-scale distribution tasks.
Take the orange line in the figure as an example. The edge is the three single-line equipment rooms of China Mobile, China Unicom and China Telecom. In addition to the main path, a real-time path can be established between China Unicom operators at the two edges, reducing the cost of backup lines in the case of real-time backup.
After the control center completes the planning of the data distribution path, the nodes along the route need to perform forwarding tasks. This involves the design of a high-performance streaming media distribution server. The figure above shows Youdao's forwarding server threading model. Protocols and ports correspond to different threads, so that multi-core resources can be utilized as much as possible in the case of limited ports.
In addition to each protocol-port pair is bound to an IO thread, there is also a core thread that completes the routing of packets from different accesses. For example, a streaming user accesses from port A1 of protocol A (for example, using UDP, streaming from port 3000), and another streaming user in the same session uses port B1 of protocol B to access (for example, using TCP, streaming from port 4000), It is impossible for these two users to be assigned to the same thread according to the IO thread model, so cross-thread data forwarding is required. At this time, the core thread will forward the content of the receiving queue to the queue corresponding to the IO thread according to the relationship of session publishing and subscription.
The design of the threading model is also related to the business type and scale. At that time, the system load was dominated by large classes, that is, the number of people who pushed the stream was much smaller than the number of people who pulled the stream. If the business type changes, for example, the class size is getting smaller and smaller, each member of the course pushes the stream, and the total number of users on the server remains unchanged, this will greatly increase the forwarding load of the core thread compared to the large class. This is also a challenge brought by the small class business, and the architecture needs to be able to flexibly respond to business changes.
In addition to the above four key issues, I would like to take this opportunity to share and discuss two additional details: the concept of layered design and channel.
The hierarchical design is equivalent to an extension of the forwarding problem. After the server gets the data from a connection, it distributes it through the core thread. The logical structure can be understood as three layers: the link layer solves the problem of connecting different protocols; the routing layer is responsible for the internal distribution and transfer of data; the session layer maintains the publish-subscribe relationship, guides the routing to distribute, and sends the data to the correct one. connect. This layering idea is used not only in the single-machine thread model, but also in the entire distribution network.
When the business side accesses a real-time communication SDK, different ToB manufacturers have different definitions of "channel". A simple understanding is an abstraction of real-time media transmission resources. For example, the main data of the business scenarios served by some manufacturers is face and screen sharing. The corresponding SDK may only provide two channel resources, of which the face channel supports simultaneous push of large and small streams.
The above picture takes the interactive large class as an example to introduce Youdao's thinking on "channel" design. The picture in the lower left corner shows the typical teacher's class effect in the interactive large class: the upper right corner is the main teacher, who is connecting with the students on the left, so how to further transmit all the information of the current interface to other students? Youdao real-time communication SDK provides multiple channel resources such as Live, RTC, and Group. The number of channel resources exposed by the SDK can be defined and configured differently. Although the names are different, the underlying resources belong to the same category. One channel corresponds to the distribution capability of one channel of synchronized audio and video.
Take the scene just now as an example: the left side of the diagram is the teacher, and the right side is the student. Orange is the RTC channel, which completes the communication between teachers and students. Then the teacher mixes the stream on the terminal—mixing the content of the microphone and the whiteboard of the course into one audio and video and sending it to other students listening to the class through the Live channel. For example, you can do end-to-end mixing by obtaining the current screen content. In the interactive large-class business scenario, all the information that students need to obtain is in this picture, which is the media information of video and audio, so that a combination of two channels can be adopted, one is connected to the microphone and the other is live, so as to Complete the entire business.
The reason why different channels have different names instead of using an array of channel objects is to further reduce the threshold for client access. For example, the Live channel conceptually emphasizes fluency compared to RTC, which can correspond to a larger video minimum buffer to improve network jitter resistance.
In the business, it is found that the way that the SDK provides a channel resource may affect the way of thinking of the business side: if there are only "face channel" and "screen channel", this may limit the thinking of business products on the new course form.
4. Interactive small class as an example
I would like to take this opportunity to share Youdao's attempt on interactive small classes with you, and to communicate with you in the following two aspects: What is the "interaction" of small classes? As well as recording questions for interactive lessons.
In small classes, multiple students and teachers can connect to the microphone throughout the whole process. Different students can be pulled to the stage at any time to share and answer questions. In addition to the basic contents of audio, video and whiteboard, we also added some interactive elements: local media element playback, multiplayer real-time interactive chessboard, etc. What is the impact of such interactive elements?
The aforementioned interactive large class can be mixed on the terminal and then sent to the Live channel, so that the stream can not only save the video delay and synchronization problems caused by the need for a separate server to mix the stream, but also transmit all the course information completely. However, for small interactive classes, if the teacher distributes the content to other students through this screen capture method, the interactivity of the interactive elements will be lost and the layout cannot be changed. When a student looks back at the recording, he cannot participate, and can only watch the interaction of other students as a bystander. This is also the first difficulty in small interactive classes - how to deal with interactive elements? How to record? How do I keep it in sync during playback? In practice, there are many pitfalls and challenges.
5. About self-study
Finally, I would like to discuss with you some questions about the self-developed real-time communication system.
Some of the content here is intercepted from the analysis of pain points by ToB manufacturers. The problems encountered by self-research can be divided into the following points:
- Cost: In addition to manpower, resource coverage, and operation and maintenance of dynamic expansion and contraction, there are also corresponding opportunity costs. The first two points are more important. In addition, the peak locations of different service bandwidths are different, and multiplexing a set of infrastructure and bandwidth resources can reduce resource and energy consumption.
- Risks: For example, as mentioned above, combining two architectures with one MCU may introduce additional risks.
- Boundary: For example, whether to add special configurations to solve business problems, and how to grasp the boundaries of business requirements in self-research within the team?
- System optimization threshold: After running through all the content mentioned above, the business can run. However, if you want to further reduce costs, you need to understand a deeper technology stack, such as data-driven full-link transmission optimization, codec optimization, and the difficulty and required manpower may be higher.
But the advantages of self-development are also obvious:
- Understanding of audio and video infrastructure: Audio and video have gradually become a kind of infrastructure, but if the team only accesses audio and video capabilities through the three-party SDK, it may not be able to deeply understand the difficulties of audio and video technology, correctly assess risks, and fail to grasp potential opportunities.
- More atomic capabilities: Self-developed technologies can be configured more flexibly according to business lines according to complex business needs, and expose deeper interfaces in a reasonable way, which will allow the business layer to gain greater flexibility.
- Provide assistance for products, R&D, and technical support: Audio and video technology involves a wide range and complexity. It is difficult for client-side R&D students and technical support students to accurately troubleshoot business exceptions and analyze the cause of problems based on buried point data. It is an effective method to rely on the audio and video self-research team to accumulate problems encountered in the business, understand the deeper reasons, and troubleshoot potential hidden dangers in the future. The audio and video self-research team can assist in product design, accelerate the development of audio and video technology, and assist technical support in determining the cause of user problems and discovering deeper hidden dangers in business. After all, the fastest work order system may not be able to come faster than the support of the next station.
- Cost control, business-oriented optimization: The lower the technology that can be manipulated, the more room for optimization can be done for a specific business, and there is more room for cost reduction while further optimizing the experience.
Thank you for reading, the above is the content of this sharing, thank you!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。