1

In live video, there are several key experience indicators, such as freeze rate, clarity, delay time, second opening speed, etc. This article introduces some optimization measures that DEWU Live has made on the second opening experience.

There is a big difference between live broadcast and video seconds, and the difficulty is also higher. For example, the video can be pre-downloaded with the corresponding video list, while the live broadcast is a dynamic, real-time stream, not static, so there is no way to prepare and pre-download to cache. The cache is no longer a useful tool, so how do we improve its opening speed per second?

Statistical caliber

To optimize indicators, first determine to optimize indicators through data as support, then you need to bury the points on the client side and clarify what the statistical caliber is. Only in this way can the burying data have reference value. For the statistics of duration, what we need to be clear is the start time and the end time, and the difference is the second open time we need to report.

Start time: When the page is selected (corresponding to onPageSelected of ViewPager in android)

End time: the player's first frame callback;

Seconds on time: end time-start time

Live link process analysis

To improve the seconds-opening experience, we must first sort out the live broadcast link:

To put it simply, the first is the production side, that is, the anchor pushes the stream to the streaming media server, and then the streaming media server distributes to the CDNs everywhere. The user pulls the stream for consumption through the player, and the anchor end pushes the stream -> streaming media server -> CDN edge node -> Terminal equipment.

We can roughly divide into three parts according to different ends, push flow part , flow service part , pull flow part .

Note here that the actual statistics of the second opening data are only the part of the streaming. Pushing does not directly affect the optimization strategy of the streaming end, but it will indirectly affect the second opening data. For example, under the same conditions, whether to push H264 or H265 is definitely H265. Second opening is better, so in order to clarify the entire live broadcast link, here are some optimizations and settings that affect the second opening at the push end.

Streamline optimization

httpDNS IP direct connection

Player streaming can be roughly divided into the following processes:

DNS resolution -> CDN node establishment connection -> GOP packet received -> Decode the first frame -> Render the first frame

Among them, the longest time-consuming is the mitigation of DNS resolution. Therefore, to reduce the first screen time, the DNS resolution time must be reduced first.

The traditional DNS resolution process is very simple. The industry collectively refers to the localDNS solution. The APP initiates a domain name resolution request to the DNS Server of the network operator where it is located, and the operator DNS Server initiates a recursive query to the CDN's GSLB system. The GSLB operates through the DNS Server. The IP address determines which operator and geographic location the query comes from, and then returns several appropriate CDN edge node IPs to it.

HttpDNS uses the HTTP protocol for domain name resolution, instead of the existing UDP-based DNS protocol, the domain name resolution request is sent directly to Alibaba Cloud’s HttpDNS server, thereby bypassing the operator’s Local DNS and avoiding domain name hijacking problems and scheduling failures caused by Local DNS. Precision issues.

The principle is actually simpler than localDNS, that is, the APP directly calls the httpsDNS API provided by CDN through the HTTP protocol to obtain a set of appropriate edge node IPs. Because there is one less link in the middle, the time-consuming part of this part can be tens of milliseconds . Through the selected IP of the node, we must pay attention to prevent a large number of users from gathering on a small number of nodes, resulting in unbalanced node load

Up and down multi-player logic transformation

Up and down is a more important way for us to switch between live broadcasts. We can use some processing to make the up and down seconds open faster.

ezgif-5-3eabf45e450a (1).webp

A closer look at this video will reveal that when we slide the live room, the video screen of the next live room has been displayed, and after letting go, it is judged that if you do not leave the current live room, it will not reload the data of the current live room.

From the user's point of view, the screen of the next live room will be seen at the moment of swiping to the next live room. This actually makes full use of the time from the beginning of the swipe to when the finger leaves the screen to pull the stream.

core logic of 1618e57565b393 is multiple players, one for the current live room, and one for the top and bottom. When the user slides the card in the live room, we can judge whether we slide up or down according to the position and the current live broadcast position, and let the corresponding live room Fragment to execute the start of the player.

can be seen from the data that after this layer of optimization, the up and down seconds opening rate has been very high, and 65%+ of the data has achieved 0 s opening.

Lazy loading of UI modules

The UI layout of the live broadcast room is more complicated, so can we let the layout not be loaded all at once, but gradually load it?

To put it simply, progressive loading is part-by-part loading. After the current frame is loaded, the next frame is loaded.

The advantage of this design is that it can alleviate the pressure caused by loading View at the same time. In this way, we can load the core part of the View first, and then gradually load other Views. For example, in our live broadcast room, we can load our video controls first, and then load other layout controls in turn. Observe that the implementation of Douyin live broadcast is also progressively loaded in this way, with the video coming out first, and then the interactive UI.

At present, except for the core parent layout and player controls, the Fragment in our live room is loaded directly through xml, and the rest are progressively loaded through the ViewStub mechanism.

Because the time for seconds to open is calculated after the page UI is loaded, this optimized data is not reflected in the seconds to open data, but it does optimize the seconds to open time from the look and feel.

Revenue: UI First frame loading time increased by 90+%

Self-developed player first frame optimization

Live broadcast has relatively high requirements for the player experience. The overall framework of live broadcast is: CDN connection, media file download, file analysis, audio and video decoding, audio and video rendering, and then optimize the second frame of the first frame from the overall playback link. What can be optimized more on the link is these 3 links before file parsing. In terms of implementation strategy, Miaokai can be considered from two dimensions, which can be called buffer configuration and buffer management.

Buffer configuration, try to ensure that when the live broadcast is playing normally, try to optimize the start broadcast on the link as much as possible.

Buffer management, try to ensure that when the live broadcast is playing, you can follow up to the current live broadcast point.

In the process of CDN connection establishment, the self-developed player has added logic to support IP direct connection. The player can be directly connected to the CDN server through IP in the form of a known server domain name for the player;

In the process of downloading media files, the self-developed player adjusts the start-up buffer, and does not set additional start-up conditions, and only guarantees that the video of sufficient length is parsed.

At the file analysis level, when the player probably has a small amount of data downloaded, it starts to play.

Through the iterative test of the version, the internal buffer, the size of the buffer, and the frame tracking strategy of each link of the player were modified, and a more ideal second-opening effect was obtained. At present, it is at the same level as the third party player.

Push stream settings and optimization

Push streaming protocol selection

Currently we use RTMP push the stream , HTTP-FLV to pull the stream.

The RTMP protocol is designed for streaming media and is used more in streaming. At the same time, most CDN manufacturers support the RTMP protocol.

HTTP-FLV uses HTTP long connections similar to RTMP streaming, distributed by a specific streaming media server, taking into account the advantages of both. And a streaming protocol that can reuse existing HTTP distribution resources. Its real-time performance is equal to RTMP, and compared with RTMP, it saves part of the protocol interaction time, the first screen time is shorter, and the expandable functions are more. As the live broadcast protocol proposed by Apple, HLS occupies an unshakable position on the iOS side. The Android side also provides corresponding support. However, as a real-time interactive live broadcast, this delay is unacceptable, so it is not considered.

Dynamic bit rate

The push end can choose different image quality to push the stream, and it will dynamically set the bit rate and resolution according to the network conditions. Therefore, the second open data must report the bit rate and resolution parameters for comparison and analysis. For example, see 720P. Second opening rate, 1080P second opening rate.

Picture quality Video Resolution H.264 transcoding bit rate H.265 transcoding Bit rate (30% lower than H.264)
Smooth (360P)640*360400Kbps280Kbps
Standard Definition (480P)854*480600Kbps420Kbps
HD (720P)1280*7201000Kbps700Kbps
Ultra-clear (1080P)1920*10802000Kbps1400Kbps
2K2560*14407000Kbps4900Kbps
4K3840*21608000Kbps5600Kbps

H265

H.265 is designed to transmit higher quality network video under limited bandwidth, and only half of the bandwidth of H.264 can play the same quality video. Compared with H.264/AVC, H.265/HEVC provides more different tools to reduce the bit rate. In terms of coding unit, the size of each macroblock (MB) in H.264 is fixed. 16x16 pixels, and the coding unit of H.265 can be selected from

The smallest 8x8 to the largest 64x64. At the same time, the H.265 intra prediction mode supports 33 directions (H.264 only supports 8), and provides better motion compensation processing and vector prediction methods.

The quality comparison test shows that under the same image quality, compared with H.264, the size of the video encoded by H.265 will be reduced by about 39-44% . When the bit rate is reduced by 51-74%, the quality of H.265 encoded video can be similar to or better than that of H.264 encoded video, which is essentially better than the expected signal-to-noise ratio (PSNR).

As with the dynamic bit rate, we will report whether the streaming is H264 or H265 as the buried point parameter, and then compare the second-to-second ratio of H264 to H265 at the same bit rate resolution.

Streaming server optimization

Cache GOP

In addition to optimization on the client side, we can also optimize from the streaming server side.

The image frames in the live stream are divided into: I frames, P frames, and B frames. Among them, only I frames can be decoded independently without relying on other frames. This means that when the player receives the I frames, it can be rendered immediately. When receiving P frames and B frames, you need to wait for the dependent frames and cannot immediately complete the decoding and rendering. This is a "black screen" during this period.

Therefore, on the server side, the GOP (in H.264, the GOP is closed and is a sequence of image frames starting with the I frame) to ensure that the player can obtain the I first when accessing the live broadcast. The frame is rendered immediately to optimize the experience of loading on the first screen.

This is the basic function of the streaming media server, so we don't actually need to do anything.

Hybrid cloud (in progress)

Hybrid cloud introduction: Hybrid cloud integrates public cloud and private cloud, and is the main model and development direction of cloud computing in recent years. We already know that private clouds are mainly for corporate users. For security reasons, companies are more willing to store data in private clouds, but at the same time they hope to obtain computing resources from public clouds. In this case, hybrid clouds are becoming more and more popular. With more adoption, it mixes and matches public clouds and private clouds to get the best results. This personalized solution achieves the goal of saving money and security.

Because different cloud service platforms CDN are distributed differently, for example, Alibaba Cloud may have more CDN nodes in Hangzhou, and the speed is faster, and Qiniu may have more CDN nodes in Shanghai (just for example, Ha), and the speed is faster, so we can integrate more Home cloud service platform, evaluate the user's current fastest channel, thereby improving the speed of user connection. At the same time, it is not only the increase in speed, but also the landing of the multi-cloud multi-active architecture and disaster recovery backup business. Currently we are connecting with Alibaba Cloud, Wangsu, Jinshan, Qiniu, Huawei, etc.

Optimizing revenue

After going through a Q governance, the overall second opening rate of Dewu Live Broadcast has increased from about 60% to 85%+:

Outlook

  • QUIC (Quick UDP Internet Connection) is a next-generation high-quality transmission protocol based on the UDP protocol developed by Google. Since 2018, the IETF has identified the QUIC protocol as the HTTP/3.0 network protocol specification for promotion. The QUIC protocol is relative to the TCP protocol. It is more suitable for data transmission under weak network and high packet loss scenarios.
  • Narrowband HD

    Under the same quality, a video with slow scene motion requires much less bit rate than a video with intense scene motion. In addition, when the bit rate is high, the increase in bit rate will improve the video quality less, so as long as you find the right bit rate, Then the quality of the video under low bit rate is almost the same as the quality under high bit rate. Therefore, we need to analyze the content complexity of the video, obtain scene information, obtain the complexity of the video space domain and the complexity of the time domain, and then Obtain the final complexity of the video sequence, and finally determine the video coding scene, which is the meaning of scene division.

Reference materials:

http://www.52im.net/thread-1033-1-1.html

http://www.52im.net/thread-2087-1-1.html

Text/Zichen

Pay attention to the material technology, be the most fashionable technical person!


得物技术
851 声望1.5k 粉丝