I. Overview
In 2019, the vivo live broadcast platform was established. In the initial stage, the development of intermodal live broadcast with the excellent top live broadcast platform was carried out, and the preliminary exploration of the market, products and technology was carried out. Later, in order to enrich the content and form of live broadcast, we began to explore independently. After that, we combined vivo with vivo. At this stage, the live broadcast business has successively completed the implementation of various live broadcast forms such as pan-entertainment, interaction, and company event live broadcast. I believe that according to the business plan, we will bring users a better live broadcast experience.
Today, I would like to share with you the technological development process of the vivo live broadcast platform in the past two years. I hope you have a basic understanding of live broadcast. If there are related students who have just started to develop live broadcast-related businesses, they can bring you some inspiration.
2. Introduction to business background
Up to now, the vivo live broadcast platform supports three categories of live broadcasts. The first one is the classic pan-entertainment show live broadcast. The platform provides some standard functions of show live broadcast, such as interactive chat, PK, gift giving and other basic functions. The pan-entertainment live broadcast market started relatively early, and the related functions and gameplay are also relatively rich and diverse, such as the anchor's Lianmai PK, the interaction between the anchor and the user's Lianmai, the gift combo, the event list, you draw I guess and other related entertainment functions. At present, the platform is continuing to iterate related classic functions and continue to bring better user experience to users.
The second type of live broadcast is the current low-latency interactive live broadcast based on real-time audio and video technology. The characteristics of low-latency live broadcast are strong interactivity and a strong sense of user participation. High maintenance costs.
The third is that under the current background of the epidemic, the live broadcast of company information distribution events has been born, such as the company annual meeting, the live broadcast of school recruitment presentations, the live broadcast of brand culture, and the live broadcast of epidemic prevention knowledge. The biggest feature of this type of live broadcast is that it is flexible, changeable, and has various forms of content. For the live broadcast platform, we need to provide it with a one-stop live broadcast solution, so that the company can use the form of live broadcast to carry out better and more stable brand and cultural publicity.
First of all, let's take a look at a business architecture model of the vivo live broadcast platform. The bottom layer is our company's full-link monitoring platform, which will monitor the data indicators of live broadcast traffic, network, and related live broadcast services, and provide related alarm services, which is convenient for us. It can respond, locate and solve some practical problems encountered in the distribution process of each module in the first time.
On top of this monitoring, we carry out logical iteration of the upper-level business. In the distribution of live content, the main way is to rely on the CDN of cloud manufacturers and our internal live server cluster. One is direct traffic distribution on the C-side, and the other is related business processing through internal transfer. In terms of business capabilities, we have initially acquired the following basic capabilities, such as massive information storage, video processing, content identification, real-time auditing of live video content for security compliance, and synchronous and asynchronous processing of live events. At the same time, based on these capabilities, SDKs are packaged, such as scene-based SDKs such as streaming, playback, and IM. By providing standard SDKs for distribution and reuse of live broadcast capabilities, it can also facilitate the integration of functions by business parties.
In terms of content production and external services, we have empowered our own mobile APPs, such as i video, vivo short video, i music, browser and other APPs with live broadcast capabilities to enrich consumers' mobile experience. In addition, in addition to supporting the live broadcast capabilities of vivo related mobile APPs, we will also cooperate with third-party live broadcast platforms to distribute and broadcast some content.
3. Technical practice
3.1 Pan-entertainment show live broadcast
First of all, let's introduce the most classic pan-entertainment show live broadcast, as shown in the figure below, which is a set of standard live broadcast process based on RTMP protocol.
The general process is as follows. First, the input source is collected, which is generally collected by the screen, camera, microphone, etc.; then image processing is performed on it, such as beauty, filters, watermarks, etc.; then the standard h264 encoding is performed, using RTMP The protocol transmission is transmitted to the central computer room and the relevant events are processed accordingly, and finally distributed to the edge node. The user terminal goes to the edge node in reverse to pull the live stream, and finally decodes and plays it on the terminal device. On top of this standard viewing process, the live broadcast of the pan-entertainment show will also provide some common functions, such as interactive chat, small games in the live broadcast room, PK, gift giving and so on.
This set of standard live broadcast process also involves many technical points of live broadcast. Next, I will talk to you about some practical problems encountered by our team in the process of landing related live broadcast business.
We encountered the following four problems. The first technical difficulty we encountered was the beauty of the broadcast tool. There are many anchors, and the definition of "beauty" is different. It is highly subjective and has many personalized needs. The requirements for color temperature and saturation of the picture are relatively high. At the same time, during the broadcast process, the face is required to be free from shaking and distortion. At the same time, there are also related requirements for the clarity and graininess of the live broadcast screen.
The second problem is the distribution of messages, which is what we traditionally call IM instant messaging technology. Unlike traditional instant messaging technology, in addition to the classic IM message distribution scenarios such as group chats, private chats, and broadcasts, "group chats" "The instability of members is also something we need to consider. Users frequently enter and leave the live broadcast room, switch between the live broadcast room, and the news storm and traffic spurt in the evening peak are also problems that we need to deal with and solve.
The third problem is the delay of live broadcast, which is also one of the classic problems of live broadcast. There are many factors that cause delay, such as terminal equipment, transmission protocol, network bandwidth, codec and so on.
The last question is the cost of live broadcast. Presumably related friends who have done live broadcast will know clearly that RTMP and CDN are not separated. As a basic cloud service product, CDN is also billed for the distribution of this kind of streaming media. It is relatively expensive, so we must also control our live broadcast costs on the basis of ensuring product functions and user experience.
For the first technical difficulty, we have done the following processing, and we have carried out some technical optimizations on the end. In terms of beauty, we make full use of the technology accumulation of the company's imaging team to standardize, generalize, and customize beauty, filters, stickers, beauty makeup, and style makeup.
At the same time, we have also conducted a series of experiments in the streaming module, including cloud transcoding, over-score, and sharpening, to ensure that the style of each anchor has a certain audience appeal. In the part of gift animation playback, after many experiments, MP4 is used for special effect gift animation playback. The test data shows that compared with svga, the memory and file size occupied by MP4 are significantly reduced. Finally, on the important indicator of the second opening of the live broadcast room, the core team of the player has customized the player. By sharing the player, sliding, and clicking pre-creation, the indicator of the second opening of the live broadcast room can reach the first-class standard in the industry. .
In the second part, we will introduce another problem of live broadcast, which is the processing of instant messages. Potential problems can be analyzed from the following two dimensions. The first is the user dimension. The active behaviors of users include gift giving, gift combo, likes, interactive chat, and a large number of users flooding in at the start of the broadcast. These scenarios will cause a flood of IM messages and make the server load pressure all of a sudden. It will improve a lot. Even if we componentize the IM module and isolate the modules, the sudden IM traffic will also affect the distribution of other high-priority or system messages.
The second aspect is the system dimension. In the scenario of mass messaging, there will also be a problem of read diffusion, and for the scenario of the live broadcast room, the data distributed by our IM is also structured data, some of which are particularly complex business scenarios. , the volume of structured data packets is also relatively large. In this case, if there are many messages distributed instantaneously, it will cause a certain pressure on the egress bandwidth of the computer room; the second influencing factor is that the message processing capacity of the mobile terminal is also limited. , If too many messages are distributed and the special effects of some messages are more complicated, some low-end mobile phone models will have problems that cannot be processed immediately.
Under the constraints of these scenarios, we have made the following three solutions. The first is the combination of message push and pull, and the combined operation of IM long link and http short polling for peak business traffic; the second is to use protobuf compression and hierarchical frequency limiting for messages. The messages in the live broadcast room are graded in two dimensions: business and host, limiting the frequency of distribution in different types of live broadcast rooms; the last solution is to monitor downgrade, we can monitor the designated live broadcast room, and automatically switch messages when monitoring problems occur. The way of distribution ensures the arrival rate of messages.
The third is the problem of live broadcast delay. In fact, in dealing with the problem of live broadcast delay, we hope to avoid unnecessary delay in specific live broadcast scenarios, so as to achieve standard live broadcast delay, rather than blindly The pursuit of the ultimate low latency. To solve the problem of live broadcast delay, we must first analyze several links that generate delay, and sort out the optimizations we can make in these links. The acquisition end is mainly affected by the caching strategy and data encoding. At the same time, the network environment, transmission protocol and physical distance will also have a certain impact on the delay. To solve these problems, we collected the host's mobile phone model and the actual network environment. According to the actual situation Encode with corresponding definition to weaken the influence of model and network on live broadcast.
Secondly, we all know that the other 80% of the delay is generated in the downstream CDN playback link. On the playback side, in order to resist the freeze, it is necessary to do buffer adaptation, bit rate adaptation, forward error correction, packet loss and retransmission, etc. , because of the limitations of the protocol itself, we can only distribute viewing links with different protocols and different definitions according to different user terminals. At the same time, we introduce new live broadcast protocols for special scenarios, such as WebRTC, QUIC, SRT, etc.
The last aspect is the cost of pan-entertainment live broadcast. As a business department, the cost is also our constant concern. The cost of live broadcast mainly consists of two main components: storage and bandwidth. We have optimized the following three aspects. In terms of storage, we First, the video of the anchor is transcoded to reduce the size of the stored file, secondly, the popularity of the anchor is graded, and storage processing is performed at different times. Finally, the relevant wonderful moments will also be edited, and the original file will be deleted on the basis of complying with relevant national laws and regulations. , thereby reducing storage costs.
The second is the basic service CDN fee. The charging methods of cloud vendors are based on bandwidth peak and traffic. Therefore, we will also carry out a certain degree of policy optimization from the perspective of business. For the push scenario, we will divide it into batches. Pushing is carried out in different time periods to prevent a large number of live viewing users from flooding into the live broadcast room at the same time, which increases the bandwidth peak of live broadcast. At the same time, the viewing terminal supports multi-definition, distributes viewing links of different resolutions according to the different traffic time periods of the business, and finally restricts the strategy. Start broadcasting together, resulting in unnecessary peak bandwidth and additional traffic costs. The last one is monitoring, judging and calculating the cost of each cycle, and making periodic adjustments to the next cycle based on the actual usage to ensure that the Get a better solution for stage charges.
3.2 Interactive live broadcast
After analyzing the solutions to several classic problems in the live broadcast of pan-entertainment shows, let's also introduce the low-latency interactive live broadcast that is currently in full swing. The core features of interactive live broadcast are ultra-low latency and strong interactivity. In business scenarios, the requirements for the sequential and real-time distribution of various types of messages are also particularly high. Currently, interactive live broadcasts are common. Currently, the best scenarios in the industry are e-commerce, education, Lianmai interactive entertainment, government and enterprise live broadcasts, etc. .
The functions and technologies related to interactive live broadcast are mainly as shown above, such as multi-terminal information synchronization and media processing. Compared with pan-entertainment live broadcast, there are also stricter requirements in streaming media security audit and multi-user terminal consistency. A large part of the technology stack is based on real-time audio and video technology, and it will also combine SEI technology to synchronize information sequence. Generally speaking, it can meet the needs of business.
We mainly encountered two business pain points when implementing interactive live broadcast. The first pain point is that the second-level delay of a lengthy live broadcast link based on RTMP protocol is difficult to handle, and multi-terminal media based on RTMP protocol Streaming picture processing, such as continuous microphone, mixed stream. In scenarios such as mute, RTMP processing is relatively complicated, and it is not conducive to multi-terminal synchronization. The second aspect is information management and control. Information management and control are also problems we encountered in the actual development process, such as terminal consistency management and control, streaming media security management and control, and exception management and control. In the actual production environment, because there are a large number of multi-terminal interactions, a large number of messages are generated during the interaction process. In some business scenarios, there may be delays, disordered order, or even loss of message distribution in some business scenarios, which will eventually lead to inconsistent states of each business terminal. The problem, this is also at present, in addition to the delay of interactive live broadcast, there is another obvious problem.
In view of the low-latency and multi-interaction business characteristics of interactive live broadcast, the current standard solution in the industry is RTC-related technology. The two solutions of RTC and RTMP+CDN are completely different in terms of technical characteristics. RTMP technology The feature is that it is bound and coupled with the CDN. RTMP needs to rely on the capabilities of the CDN edge network, so that users in multiple regions can obtain live content nearby, and can improve the success rate and second opening rate of users to obtain live video streams. However, it is also limited by the Factors such as network protocols bring corresponding delays that cannot be avoided by the RTMP technology protocol stack. RTC also has two obvious technical features. The theoretical delay of RTC based on UDP protocol is within 100 milliseconds. The second feature is that it is suitable for multi-terminal interaction with the help of SFU and MCU communication. Of course, in the scenario of multi-terminal interaction, RTC The requirements for machines and networks are also relatively high, so we will also use different related technologies to solve practical problems for specified scenarios.
In the development process of interactive live broadcast, we encountered a series of problems in information management and control. The first one is the consistency management and control . The content information of the interactive live broadcast service is strongly related to the streaming media information, and it is necessary to transmit the information in an indistinguishable order in real time. The following compensation control is done to report individual and batch information on the client side, and the server side corrects the abnormal state of individual terminals; the second is to use SEI reasonably, and carry a small amount of business information in the video stream information, such as karaoke room, band This ensures the consistency and synchronization of information.
The second is security management and control . According to the relevant national regulations, whether the audio and video live broadcast content violates the regulations is the key review object of relevant departments. In the live broadcast of the show, we are currently doing video stream time frame frame review, using machine review With the dual review mechanism of human review, in terms of interactive live broadcast, each terminal will be independently reviewed to accurately identify illegal interactive terminals, and ultimately create a green and healthy live broadcast environment. The last scenario is also determined by the complexity of the business, precisely because there are many interactive terminal devices. Factors such as human, network, equipment and other factors have also greatly increased the probability of abnormality. Therefore, we must be prepared to quickly identify outages, effectively use callbacks or reports within the capabilities of the client, and perform relevant abnormality recovery processes after business verification. processing. The second is to use the heartbeat detection mechanism. The client regularly reports the heartbeat. If the server does not receive the relevant heartbeat for more than a specified number of times, it can identify that the device is abnormal, and also perform related abnormal logic processing.
3.3 Live events
In the last module, we will focus on the case where the live broadcast platform empowers the company's internal related businesses to live broadcast. Vivo also has many official live events on a daily basis, such as technology sharing, school recruitment lectures, epidemic knowledge lectures, brand image live broadcasts, and more official live broadcasts of mobile phone conferences.
Why does focus on the company's live broadcast here?
The first is the official organization, which has a huge influence and a larger audience than other types of live broadcasts. We will put more energy into development, operation and maintenance and guaranteeing live broadcasts than other live broadcasts. In the live broadcast of the company, some technical inductions and summaries were also carried out. First of all, we need to ensure the stability and smoothness of the entire live broadcast. We monitor and optimize the network, and at the same time build multi-regional internal live broadcast servers to achieve multi-regional traffic isolation and load balancing. Another aspect is the technical support for flexibility. In order to save our development time, after helping several large companies live broadcasts, we also systematically organized and summarized some general functional SDKs. The problematic SDK can reduce the probability of live broadcast errors and improve the success rate of the entire live broadcast.
Next, I will talk about two more interesting cases, which are also some interesting and practical cases that we are exploring and practicing the internal live broadcast of the company. The first case is that our company could only hold an online live broadcast annual meeting last year due to the epidemic situation. vivo There are many offices of our employees, located in various parts of the country, so how to efficiently ensure the simultaneous high-definition viewing of 10,000 employees in multiple office locations at home and abroad is a problem that we need to solve at that time.
The root cause of this problem is that the egress bandwidth of each office area is limited. If company employees use the company's wireless network to watch, the egress bandwidth of each office area will be full, which will affect the viewing experience of some employees. It will even affect everyone's daily work.
For the problem of bandwidth limitation, generally speaking, there are two common solutions in the industry;
- Temporarily increase the egress bandwidth of network operators at various office locations;
- Reduce the pressure and cost of bandwidth by reducing the bit rate of live broadcast.
In fact, neither of these two schemes can completely guarantee that the live broadcast will be free of problems, and the premise is that the viewing experience of some users needs to be sacrificed. This kind of scheme is also difficult to accept. The solution that finally landed and successfully practiced is to pass the intranet retweet, the intranet server will do load balancing, and the viewing requests of different office locations will be resolved to the live broadcast server of the local office location through DNS, so that the bandwidth problem can be successfully solved, and It can also support high-definition 4k resolution. At present, this ability has been verified and feasible many times, and the relevant practical details have also been published on many authoritative technology websites, which have been well received by the community. If you are interested, you can check it out.
The second interesting case is that the company's daily press conferences and presentations need to be streamed to multiple third-party live broadcast platforms at the same time, such as station B, Tencent live broadcast, etc. Because the streaming device has a limit on the number of streaming devices, it cannot support multiple streaming addresses. The previous solution was to coordinate each live broadcast partner and pull the group to synchronize a streaming address. Every time, there would be a lot of coordination, communication and confirmation work. It greatly affects everyone's efficiency, and it is easy for some partners to fail to live broadcast normally due to configuration errors.
Against this background, we have made some adjustments. By integrating the internal live server and cloud live server, we have done a good job of network disaster recovery. When there is a problem with the mainline network, the system can automatically identify and replace it with another channel. Streaming clusters ultimately ensure the smoothness and stability of the whole live broadcast. In addition, we have also built a manual operation platform to configure the relevant addresses in advance in the system, and through the open platform, partners can temporarily modify the relevant configuration, which can greatly Reduce the possibility of manual configuration errors, and support company-related live broadcasts more efficiently.
4. Summary
At present, the vivo live broadcast platform is still in the process of preliminary construction and continuous exploration, but our direction and plan are clear, that is, by continuously enriching the form of C-side live broadcast and introducing more forms of live broadcast, it will bring better benefits to vivo mobile phone users. At the same time, the precipitation and accumulation of related technologies are carried out, and finally these technologies are formed into some standard solutions, and the output of content and technologies to the horizontal departments of the company, such as technical SDK services, internal live broadcast services, live short video services And so on, forming a virtuous circle of mutual feedback between internal and external platforms.
Author: vivo Internet Server Team - Li Guolin
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。