头图

write in front

This article mainly introduces how BIGO, a video live broadcast company with 400 million users worldwide, uses the vector search engine Milvus to deduplicate massive short videos. Accelerated by the Milvus vector search engine, Likee, a short video product under BIGO, can control each search within 200ms and ensure a high recall rate. At the same time, we use the horizontal expansion of Milvus to improve the throughput of vector query to ensure the efficiency of business query.

business background

Since its establishment in 2014, BIGO has launched a series of audio and video social and content products, such as BIGO LIVE and Likee, based on powerful audio and video processing technology, global audio and video real-time transmission technology, and artificial intelligence technology. As of the second quarter of 2020, BIGO's short video product Likee had 150 million monthly active users on the mobile terminal , and the system needs to process a large number of videos uploaded by users every day. In this process, in order to recommend high-quality content to users, the system needs to eliminate repetitive and low-quality content in massive videos.

Deduplication process

Here we utilize deep learning methods for deduplication work.

First, cut the video uploaded by the user into 15-20 frames, then convert each frame into a feature vector, and then search in the base library of more than 700 million data, and find the video corresponding to the searched top k vectors, Then do a more detailed video similarity calculation.

In the process of vector similarity search, billions of data must be processed, and a large number of new data are added every day, which has very high requirements and challenges to the performance of the vector search system.

After a comprehensive analysis and comparison, we adopted the distributed vector search engine Milvus to help us complete the vector similarity retrieval work.

Overall structure

Next, we introduce the overall business structure of using Milvus for short video duplication work.

As shown in the figure below, the newly added videos on the Likee platform will be written into kafka in real time, and then go into the review process after being consumed by kafka-consumer. Next, the content that passes the review will use a deep learning model for video feature extraction to convert unstructured data (video) into structured data (feature vector). The system packs the feature vector and sends the request to the video similarity review program.

Video deduplication business architecture

Each video that has undergone feature extraction and converted into multiple feature vectors will first be indexed by Milvus, then stored in Ceph, and then loaded by the Milvus query node to provide search capabilities. At the same time, we will also store the video ID and corresponding feature vector in TiDB or Pika synchronously according to the business situation.

Video Similarity Retrieval

In the above process, we can see that the focus of this scheme is to perform similarity retrieval on massive feature vectors .

The similarity-audit (similarity test) in the above figure uses the batch search function of Milvus to first perform a similarity search on multiple feature vectors of each new video, and recall the first 100 similarity vectors of each feature vector (here Each similarity vector recalled is bound to its corresponding video ID). Next, deduplicate all video IDs recalled by each similarity search, and then query the corresponding feature vector from TiDB or Pika. Finally, perform a specific video similarity calculation and score for each set of queried feature vectors and the feature vectors of the requested video, and return the video ID with the highest score as the result. Here, the video similarity retrieval is completed.

The complete process is shown in the following figure:

similarity-audit similarity check business process

Summary and Outlook

The above is the content sharing about using Milvus to complete the task of short video deduplication in the Likee business. As a high-performance, high-recall distributed vector search engine, Milvus has an amazing performance in Likee's short video deduplication business, which has greatly helped BIGO's business development.

BIGO hopes to carry out more in-depth cooperation with Milvus in the future, such as reviewing or banning illegal content, personalized video recommendation services, etc., to jointly promote the business development of both parties, and looking forward to the development of the Milvus community getting better and better!


About Likee

With high-quality and diverse entertainment content, Likee has now become the pioneer and benchmark of the world's trend-setting Internet short video social products.

  • In mid-2020, Likee's mobile monthly active users reached 150 million.
  • At the end of September 2019, Likee's mobile monthly active users reached 100.2 million, ranking among the top five in Google Play's global download list, surpassing well-known apps such as Instagram and SnapChat, and second only to Facebook in downloads.
  • In mid-2019, Likee's mobile monthly active users reached 80.7 million.
  • In 2017, BIGO founded the short video community Likee, which was officially launched on the App Store in August of the same year, facing the overseas market, and won the Best Entertainment App of the Year in the Google App Market in the same year.
  • Founded in Singapore in 2014 by David Li and Jason Hu, BIGO is an artificial intelligence technology company.

about the author

Guo Xinyang, Head of BIGO Machine Learning Platform, Senior Staff Engineer

Han Baoyu, BIGO Machine Learning Platform Team, Engineer

Editor introduction

Ye Xiong, Zilliz Community Intern

Zang Peng, Zilliz Community Intern


With a vision to redefine data science, Zilliz is committed to building a global leader in open source technology innovation and unlocking the hidden value of unstructured data for enterprises through open source and cloud-native solutions.
Zilliz built the Milvus vector database to accelerate the development of a next-generation data platform. The Milvus database is a graduate project of the LF AI & Data Foundation. It can manage a large number of unstructured data sets and has a wide range of applications in new drug discovery, recommendation systems, chatbots, etc.


Zilliz
154 声望829 粉丝

Vector database for Enterprise-grade AI