2
头图

Image credit: https://marketing.chron.com/what-we-do/rich-media

Author: Liu Senmao

The significance of rich media content cold start

Rich media content (Rich Media Content, referred to as rich content) refers to a comprehensive information form with multiple carriers such as video, sound, and text, mainly including short videos, live broadcasts, and so on. Compared with traditional single-carrier content such as articles, pictures, and music, in the era of "attention economy", rich content is a content form that can maximize the transmission of information and attract consumers' attention. And because of its rich presentation forms (both pictures, text and sound, which can be matched with various gameplay and special effects), it has become the best expression tool for platform creators (talents, artists, etc.).

Compared with "classic" content such as music and film and television, rich content, especially UGC-oriented rich content, has extremely high requirements for cold start, mainly for the following reasons:

  • The production threshold is low and the cycle is short. The specific performance is that short video producers or live broadcasters can basically produce content once or even several times a week. Among them, the playback, likes, comments and other data of the last video/live broadcast are crucial feedback for producers, and producers can use this to better grasp the direction of content production. Only when the distribution of new content is done well, will producers be willing to play with a platform, and if they persist, the producers themselves will grow slowly and contribute to a more sticky platform.
  • Consumption is fresh. The high production cadence also brings about the rapid evolution of content trends. New gameplay, special effects, and hot spots are almost endless, and consumers are always the first to pursue the newest things. If one platform always recommends content a few days later than others, it will be difficult for the mind to build.
    Based on this, the recommendation system for distributing rich content has reached an unprecedented level of consideration for the cold start of content. It can even be said that the entire system is built around the cold start and the rise of new content.

Detection of Cold Start Problems

In order to solve the problem in a targeted manner, it is first necessary to check whether a recommender system is systematically biased against new content. Here are two common methods:
(1) Calibration analysis of the time dimension Calibration is a commonly used analysis technique in advertising recommendation, which is mainly used to check whether there is a systematic deviation between the model score under a specific population/material and the actual conversion effect. In the case of the cold start problem, it can also be used to check whether there is a systematic bias in the model scoring of new/old content.

The above picture shows the new video calibration analysis we did on cloud music short video recommendation, according to the new/old content and new/old users, the cross-analysis of the model score and the actual conversion rate. We can see that there is a serious systematic bias in the new content before the correction, and its score is systematically underestimated.

(2) The cold start problem of the content life cycle curve can also be found by monitoring the distribution life cycle of specific content. Specifically, if a content maintains a high conversion rate in the early stages, but the distribution volume remains low until it achieves effective mass distribution for a long time, the system is likely to have a cold start problem.

The figure above shows the changes in the distribution volume (blue) and CTR (orange) of a certain piece of content over time. As you can see, the organic distribution of premium content climbs slowly until a cold-start strategy is specifically built.

Cold start workaround

The introduction of cold start methods on the Internet has been relatively rich, and many classic methods (such as Bandit strategy, tag-based new content recall, etc.) will not be repeated here. This paper mainly introduces two cold start technologies unique to rich content: the natural combination of cold start and ascending channel on rich content; and content understanding based on cross-modal technology.

ascending channel

The content ascending channel in the recommendation system is to maximize the production of high-quality content, and the content is selected and promoted layer by layer until the distribution strategy of popular content is launched.

The above picture shows the content rising channel of a certain platform (the picture comes from the Internet). Let's take this as an example to illustrate: the recommendation system first selects content from the content pool for the first-stage distribution test, providing about 300 exposures, and then filters according to data standards, and those that meet the standards will enter the second stage and receive more Large exposure, and so on, until it becomes a hit on the whole network.

For the content-rich recommendation system, as mentioned above, in order to strengthen the effect of platform innovation, generally only the latest released content will be promoted. Therefore, the cold start of the new content and the ascending channel are naturally combined.

Similar to Douyin, NetEase Cloud Music short video business has also built a similar content promotion channel, and according to the characteristics of NetEase Cloud Music, it can also assist in the promotion according to popular songs. Here are some practical experiences:
(1) From early personalization to "breaking the circle" in the final stage. In the early audition stage of the ascending channel, the amount of content is huge and the quality is uneven, so personalized distribution should be carried out as much as possible. In the later stage, the popular models that have risen have begun to have the attribute of "breaking the circle", and the demand for personalization has begun to decrease. It is necessary to boldly recommend the content to more user groups.
(2) New content should be distributed to high-active users first, so as to avoid uncertain new content to discourage low-active users who are not mentally strong. Under the streaming consumption experience (whether it is a single-column full-screen streaming or a double-column waterfall streaming), the deeper the exposure, the more active users are. The algorithm will proportionally expand the distribution of new videos in the depths of the traffic location, so that This can be achieved.

The rising channel technology was first proposed by short video platforms such as Douyin on the Internet. Today, it has become an indispensable technical framework for major rich content platforms. To some extent, it represents a platform's value of content: innovation, encouragement original. The distribution efficiency of the ascending channel has also become the key to the success or failure of a cold start.

Cross-modal content understanding

If the ascending channel is the guarantee of cold start traffic of new content, content understanding is the key to the efficiency of traffic utilization and the main reliance on the personalized distribution of new content. Among them, rich content has the highest requirements for content understanding because it spans multiple types such as pictures, videos, and text, and has also become a stage for cross-modal technology application.

The main role of cross-modality technology is to extract (represent) the content of different modalities and integrate information according to the downstream recommendation system.

First: information extraction. The popular technology before is to represent the content vector separately for each modal (for example, ResNet or Swin Transformer is used for the picture modal, and Bert is used for the text modal). The latest technology advances to the information compression stage, and starts to perform the information of each modal. integration. For example, the popular framework CLIP builds a pairwise sample of pictures and text, and uses matching loss to train the model to uniformly produce vector representations of pictures and text.

The above picture shows the matching loss under the framework of CLIP: scramble the paired pictures and text to construct positive and negative samples

Second: Integrate with downstream recommended information. Behavioral data downstream of recommender systems play a key supervisory role in cross-modal information integration. Here are a few frameworks for information integration:

  • Information integration based on vector fitting: The core idea is to use the cross-modal content vector representation of a piece of content to fit the vector representation calculated based on behavioral data as much as possible. A representative example in this regard is the CB2CF technology .
  • Information integration based on user preference twin-tower model: The core idea is to predict the user's content preference by constructing a twin-tower model, in which the content tower only uses cross-modal raw vectors. The advantage of this method over CB2CF is that the structure of the twin-tower model enables user behavior data to have a more profound impact on content representation, avoiding information loss in the process of CF representing behavior data.

Specific to the cross-modal progress of cloud music, we have practiced in many aspects, and achieved certain positive results:

  • The I2I recall based on CB2CF improves the distribution efficiency of new content (short videos) by about 25%
  • The I2I recall based on the twin-tower model framework, the click-through rate has increased by 20%+ compared with CB2CF
  • The cross-modal representation of graphics and text based on CLIP technology can improve the accuracy of the corresponding recall source (measured by NDCG) by 15%+ in the offline verification stage.
  • Through cross-modal technology combined with users' long-term interests, the video ascending channel is optimized, and the average click-through rate in some stages can even be doubled.
  • Saves about 40% of review manpower by applying cross-modal content understanding technology to video recommendation review

Compared with CB2CF, the cloud music video recommends the efficiency of the dual-tower model framework. The blue is the I2I recall based on the dual-tower model framework, and the green is the I2I recall based on CB2CF.

final summary

The significance of content cold start to a recommendation system is not limited to the optimization of click-through rate, but is more related to the overall value of a platform for content distribution. Based on different content cold-start traffic distribution strategies, the impact on the final ecology of the platform is also a direction worthy of in-depth study. Cloud Music has also achieved some very meaningful results in this regard. In addition, the research on cross-modal technology is still in its infancy compared to downstream recommendation systems. There is still a broad room for improvement to reduce the dependence on labor for content storage (audit + marking + cold start) through technical means.

This article is published from the NetEase Cloud Music technical team, and any form of reprinting of the article is prohibited without authorization. We recruit all kinds of technical positions all year round. If you are ready to change jobs and happen to like cloud music, then join us at staff.musicrecruit@service.netease.com .

云音乐技术团队
3.6k 声望3.5k 粉丝

网易云音乐技术团队