System recall is too slow? Go to Milvus × PaddleRec to combine the two swords!

About the Author
Li Yunmei, a data engineer at Zilliz, graduated from the Computer Department of Huazhong University of Science and Technology. Since joining Zilliz, he has been committed to exploring solutions for the open-source vector database Milvus and helping users create scenario applications. Deeply pay attention to natural language processing technology and search recommendation system, and like to read books by myself every day.

"The amount of data is too big, why is the recall so slow?"

"Business data is updated too fast, dynamic updates can't keep up 💥"

``It's difficult to deploy 🤮''

Friends who do recommend system engineering, do you often hear such complaints? I believe that after reading this article, you may get some new ideas and new methods.

Before introducing specific items, let's first understand the recommendation system. To put it simply, the recommendation system is to determine what specific content to provide to the user based on the user's individual needs, in a large amount of information. Usually the recommendation system is divided into two stages: "recall" and "ranking". "Recall" is the first stage of the recommendation system. It is mainly based on the characteristics of users and products. From the massive inventory, quickly find out some of the items that users may be interested in, and then hand them to the sorting link; while "Sort" is correct All recalled content is scored and sorted again, and the results with the highest scores are selected and recommended to users.

Want to efficiently implement a recommendation system? This article will take product recall as an example to introduce how to deploy a stable and easy-to-use industrial-grade recommendation system through the recommended recall algorithm MIND, the large-scale model library PaddleRec under the Baidu Feidao ecosystem, and the open-source vector database Milvus. The combination of PaddleRec and Milvus can make development simpler and deployment more flexible. It can also quickly verify model effects and improve iteration efficiency, achieving rapid recall while taking into account system stability.

system structure

This project is divided into four steps: data processing, model training, model testing, and product recall cases. In the entire product recall process, the system first reads the item vector in the trained model from the model, and then imports the item vector into Milvus for storage. In the recall phase, the system converts the user's historical click sequence through the MIND model to obtain four user vectors, representing different interests of the user. Then, perform a similarity search of product vectors in the Milvus library, and get top_k similar products for each interest vector. Finally, sort the four groups of products according to the similarity to get the top_k products, and get the products we want to recall.

Next, I will introduce the main components used in this project:

MIND

The full name of the MIND algorithm is: Multi-Interest Network with Dynamic Routing for Recommendation at Tmall, which is a recommendation recall algorithm developed by the Ali algorithm team.

Before the birth of MIND, most of the existing recommendation system models used a vector to represent a user’s multiple interests, but this could not well represent the user’s multiple interests, while MIND tried to use multiple interest vectors to represent the same one. User interest in different directions.

The MIND algorithm proposes a multi-interest network with dynamic routing to handle different interests of users in the recall phase. Specifically, it designs a multi-interest extractor layer based on the capsule routing mechanism, which is suitable for clustering historical behaviors and extracting different interests.

The complete network structure of MIND is shown in the figure below: MIND inputs user behavior and user portrait characteristics, and outputs vectors representing user interests. MIND first converts the item features from the input layer (Embedding Layer) to embedding through the embedding layer, and then the embedding of each item is further averaged through the pooling layer. Then, the user behavior embedding will be sent to the multi-interest extraction layer to generate interest capsules. Finally, by connecting the interest capsule with the user behavior embedding, and converting the connected capsule through several ReLU layers, the user representation vector is obtained. In addition, during the training process, an additional label-aware attention layer (Label-aware Attention) is introduced to guide the training process.

PaddleRec

PaddleRec is a large-scale search recommendation model library derived from the Baidu PaddlePaddle ecology. Its purpose is to provide users with a one-stop solution for building a recommendation system, so that AI practitioners in the search recommendation field, especially AI application developers, can easily Quickly build a recommendation system based on your own business.

It was mentioned at the beginning that for recommender system practitioners, to build their own recommender system based on the business itself, they often encounter problems such as poor ease of use and difficulty in deployment. In view of the shortcomings of the traditional method of building a recommendation system, PaddleRec uses its own advantages to solve this type of pain point, specifically as follows:

Ease of use: open sourced many types of classic models in the industry, such as recall, sorting, fusion, and multitasking, which can quickly verify the effect of the model and improve the iterative efficiency of the model. PaddleRec supports easy-to-use and extremely high-performance distributed training capabilities, optimized for the extremes of large-scale sparse scenes, and has good horizontal expansion capabilities and acceleration ratios. Users can quickly build a training environment based on K8s.
Support deployment: Provide online model deployment solutions, ready to use, taking into account flexible development and high performance

In addition, PaddleRec provides a variety of classic recommendation related models in its project, which can be found in the GitHub project. For details, please refer to: https://github.com/PaddlePaddle/PaddleRec

Milvus database

Milvus is an open source vector database developed based on a cloud native architecture, which supports query and management of vector data generated by machine learning models or neural networks. Milvus expands on the basis of the functions of the first-class approximate nearest neighbor (ANN) search library (such as Faiss, NMSLIB, Annoy), and has the characteristics of on-demand expansion, stream batch integration, and high availability. Milvus is committed to simplifying unstructured data management and providing a consistent user experience in different deployment environments.

Based on high-performance columnar storage and vector indexes such as Faiss and HNSWLib, the Milvus database can efficiently implement data query and millisecond-level recall of tens of millions of vector data.
Based on the cloud-native design, the Milvus database can be easily scaled horizontally and can support storage and computing of any scale.
Milvus helps users pay attention to the semantic meaning of unstructured data, and users no longer need to pay attention to complex issues such as data persistence and load balancing.
Milvus uses an architecture design that separates storage and computing.

Based on the above characteristics, Milvus can well solve the problem of frequent data updates in the recommender system, meet the requirements for recall speed in the recall phase, and take into account system stability.

This is also one of the reasons why we choose to use Milvus for the similarity retrieval of massive vectors in the recall system introduced in this article instead of directly using approximate nearest neighbor algorithms such as Faiss and Annoy to store and retrieve vectors.

In the Milvus open source community, you can also find more application scenarios of Milvus: search for images with pictures, intelligent question and answer, similar text retrieval, video retrieval... If you have vector retrieval requirements in the AI field, introducing Milvus will do something for you help.

System implementation

The specific implementation of the project has been released on Baidu AI Studio. You can start the environment on the AI Studio platform and run the project directly: https://aistudio.baidu.com/aistudio/projectdetail/2250360?contributionType=1&shared= 1

The following will introduce the specific implementation of this project from data, model implementation and training, and model testing, and how to use the trained model and Milvus to build a recall service.

Data introduction

The original data set used in this article comes from the AmazonBook data set provided by ComiRec.

This project directly uses the data download and data processing methods provided in PaddleRec. For details, please refer to AmazonBook: under dataset in the PaddleRec project on GitHub https://github.com/PaddlePaddle/PaddleRec/tree/release/2.1.0 /datasets/AmazonBook

The format of the obtained training data set is as follows:

0,17978,0
0,901,1
0,97224,2
0,774,3
0,85757,4

Each column represents:

uid: user id.
item_id: the id of the item clicked by the user
time: click sequence (timestamp)

The format of the test data set is as follows:

user_id:487766 hist_item:17784 hist_item:126 hist_item:36 hist_item:124 hist_item:34 hist_item:1 hist_item:134 hist_item:6331 hist_item:141 hist_item:4336 hist_item:1373 eval_item:1062 eval_item:867 eval_item:62user_id:487793 hist_item:153428 hist_item:132997 hist_item:155723 hist_item:66546 hist_item:335397 hist_item:1926 eval_item:1122 eval_item:10105user_id:487820 hist_item:268524 hist_item:44318 hist_item:35153 hist_item:70847 eval_item:238318

Each column represents:

uid: user id
hist_item: historical item id clicked by the user, multiple hist_items are sorted according to the timestamp of the user historical click
eval_item: recall evaluation sequence

model implementation and training

In this step, PaddleRec will be used to implement a recall model in the recommendation system based on MIND, and the model will be trained using AmazonBook data.

Model input:

The code reference script for reading the original training data set in this project is /home/aistudio/recommend/model/mind/mind_reader.py

dygraph_model.py uses the following code to process the data as the input data of the model. This part sorts the click-through rates of the same user in the above-mentioned original data according to the timestamp, and combines them into a sequence. Then, randomly select an item_id from the sequence as the target_item, and express the maxlen part of the sequence target_item as the hist_item input by the model (the length is not enough to be filled with 0), and seq_len is the actual length of the hist_item sequence.

def create_feeds_train(self, batch_data):    
  hist_item = paddle.to_tensor(batch_data[0], dtype="int64")    
  target_item = paddle.to_tensor(batch_data[1], dtype="int64")    
  seq_len = paddle.to_tensor(batch_data[2], dtype="int64")    
  return [hist_item, target_item, seq_len]

model networking:

The specific network structure of model MIND refers to /home/aistudio/recommend/model/mind/net.py
The net.py code of the networking part is as follows:

class Mind_Capsual_Layer(nn.Layer):
    def __init__(self):
        super(Mind_Capsual_Layer, self).__init__()
        self.iters = iters
        self.input_units = input_units
        self.output_units = output_units
        self.maxlen = maxlen
        self.init_std = init_std
        self.k_max = k_max
        self.batch_size = batch_size
        # B2I routing
        self.routing_logits = self.create_parameter(
            shape=[1, self.k_max, self.maxlen],
            attr=paddle.ParamAttr(
                name="routing_logits", trainable=False),
            default_initializer=nn.initializer.Normal(
                mean=0.0, std=self.init_std))
        # bilinear mapping
        self.bilinear_mapping_matrix = self.create_parameter(
            shape=[self.input_units, self.output_units],
            attr=paddle.ParamAttr(
                name="bilinear_mapping_matrix", trainable=True),
            default_initializer=nn.initializer.Normal(
                mean=0.0, std=self.init_std))
                
class MindLayer(nn.Layer):

    def label_aware_attention(self, keys, query):
        weight = paddle.sum(keys * query, axis=-1, keepdim=True)
        weight = paddle.pow(weight, self.pow_p)  # [x,k_max,1]
        weight = F.softmax(weight, axis=1)
        output = paddle.sum(keys * weight, axis=1)
        return output, weight

    def forward(self, hist_item, seqlen, labels=None):
        hit_item_emb = self.item_emb(hist_item)  # [B, seqlen, embed_dim]
        user_cap, cap_weights, cap_mask = self.capsual_layer(hit_item_emb, seqlen)
        if not self.training:
            return user_cap, cap_weights
        target_emb = self.item_emb(labels)
        user_emb, W = self.label_aware_attention(user_cap, target_emb)

        return self.sampled_softmax(
            user_emb, labels, self.item_emb.weight,
            self.embedding_bias), W, user_cap, cap_weights, cap_mask

The class Mind_Capsual_Layer defines the user multi-interest extractor layer based on the capsule routing mechanism. The function label_aware_attention() implements the label-aware attention technique in the MIND algorithm. In the forward() function of the MindLayer class, the user characteristics are modeled to form the user characteristic weight vector.

model optimization:
This project uses Adam algorithm as the model optimizer, the specific implementation part is in the script /home/aistudio/recommend/model/mind/dygraph_model.py , the code is as follows:

def create_optimizer(self, dy_model, config):
    lr = config.get("hyper_parameters.optimizer.learning_rate", 0.001)
    optimizer = paddle.optimizer.Adam(
        learning_rate=lr, parameters=dy_model.parameters())
    return optimizer

In addition, the hyperparameters in PaddleRec are written in config.yaml, so only one file of config.yaml needs to be modified to clearly compare the model effects and quickly verify the model effects, which greatly improves the iterative efficiency of the model. When training the model, the poor model effect may be caused by underfitting or overfitting. We can improve the model effect by modifying the number of training rounds to allow the model to get more adequate training. Here we only need to change the parameter epochs in config.yaml to adjust the training rounds. In addition, you can also debug the model by changing the model optimizer optimizer.class or trying to modify the learning rate (learning_rate). Some parameters in config.yaml are as follows:

runner:
  use_gpu: True
  use_auc: False
  train_batch_size: 128
  epochs: 20
  print_interval: 10
  model_save_path: "output_model_mind"

# hyper parameters of user-defined network
hyper_parameters:
  # optimizer config
  optimizer:
    class: Adam
    learning_rate: 0.005

model training:
In this project, put the model training script in /home/aistudio/recommend/model/trainer.py , and directly run the following command to start training the model.

python -u trainer.py -m mind/config.yaml

model test:
This step will use the test data set to test the recall rate and other characteristics of the model generated by the training.

During the test, the vectors of all items saved during training will be read out from the model first, and then imported into the Milvus database. Next, use the script /home/aistudio/recommend/model/mind/mind_infer_reader.py read the data in the test data collection.

Load the above-mentioned saved model, and input the test data set into the model to obtain the input user's multi-interest vector. For each user, the model will return four vectors. Finally, search for the obtained user vector in Milvus's item library to find the 50 most similar items, which are the vectors we want to recommend to users.

Test the model by running the following command:

python -u infer.py -m mind/config.yaml -top_n 50

The model testing section provides several evaluation indicators such as Recall@50, NDCG@50, HitRate@50, and the effect of the model can be judged based on this value. Since this project is only used as a demonstration and a tutorial, the training is not sufficient, resulting in these few evaluation values are not ideal. In actual business, it is necessary to train a few more epochs to ensure the effect of the model. Correspondingly, more model parameters will be saved during the training process. Generally, it is recommended that you choose to save a few models for testing at the end, and then select the best model based on the results of testing and analysis. In addition, you can train the model by changing parameters such as different optimizers and learning rates, and train and test multiple times to select the best model to apply to the actual project.

Recall Service

We use the above-trained model combined with the Milvus database to implement a recommendation recall service.

In this recall service, FASTAPI is used to provide external services. After startup, you can directly execute commands on the terminal through http to implement the recall service.

Execute the following command to start the recall service:

uvicorn main:app

The service provides a total of four interfaces:

Item vector import: After starting the service, execute the following command, the service will read the item vector saved in the model and import it into the collection of the Milvus database.
```
curl -X 'POST' \  'http://127.0.0.1:8000/rec/insert_data' \  -H 'accept: application/json' \  -d ''
```
Recall: This interface provides the most important recall service. Enter the item click sequence of any user, and recall the next item that the user may click. Here you can recall the interest items of multiple users in batches. The hist_item in the following command line is a two-dimensional vector. Each line represents any item sequence that the user has clicked in the past. The sequence here is allowed to be a variable length sequence. The returned result is also a set of two-dimensional vectors, each row corresponds to a user in the input sequence, and multiple item ids are recalled.
```
curl -X 'POST' \
  'http://127.0.0.1:8000/rec/recall' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "top_k": 50,
  "hist_item": [[43,23,65,675,3456,8654,123454,54367,234561],[675,3456,8654,123454,76543,1234,9769,5670,65443,123098,34219,234098]]
}'
```
Query the total number of items: execute the following command, it will return the total number of item vectors saved in the Milvus database.
```
curl -X 'POST' \
  'http://127.0.0.1:8000/rec/count' \
  -H 'accept: application/json' \
  -d ''
```

Delete: This interface is used to delete the data saved in the Milvus database.

curl -X 'POST' \
  'http://127.0.0.1:8000/qa/drop' \
  -H 'accept: application/json' \
  -d ''

If you start the recall service on the local server, you can also view and access the various interfaces provided by the recall service by visiting 127.0.0.1:8000/docs. As shown in the figure:

Click the corresponding interface and enter the corresponding parameters to experience the corresponding service. For example, in the recall service, click rec/recall, then click try it out, enter the corresponding parameters in the Request body box and click Execute to get the corresponding result:

System implementation

This project chose to use PaddleRec to implement the algorithm MIND because the training script trainner.py and configuration file config.yaml provided by PaddleRec are also suitable for training other models, which makes model training and deployment very simple. In addition, the PaddleRec project also provides the specific implementation of many classic models in the recommended fields, including the MIND model used in this project, which can be used for our reference. With the emergence of PaddleRec, we only need to pay attention to the algorithm itself in the process of implementing and training a model without paying too much attention to model deployment.

The high performance of Milvus database in vector similarity search meets the requirements of the recommender system for recall speed. At the same time, due to the native characteristics of Milvus cloud, it can also well meet the requirements of the recommendation system in terms of high availability and stability. In addition to the recommendation field, the Milvus database is also widely used in computer vision (searching pictures by pictures, searching videos by pictures, etc.), natural language processing (intelligent question answering, text similar search) and other fields. The Milvus database has open sourced the specific implementations of these projects on GitHub ( https://github.com/milvus-io/bootcamp). Users can directly refer to how these projects use the Milvus database when Getting started is also easier for database novices.

references

[1] Li C, Liu Z, Wu M, et al. Multi-interest network with dynamic routing for recommendation at Tmall[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 2615-2623.

[2] Cen Y, Zhang J, Zou X, et al. Controllable multi-interest framework for recommendation[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020: 2942-2951.

System recall is too slow? Go to Milvus × PaddleRec to combine the two swords!

system structure

MIND

PaddleRec

Milvus database

System implementation

Data introduction

model implementation and training

System implementation

references

Zilliz

引用和评论

成本最高直降50倍! Zilliz Cloud Serverless Beta上线，限时免费，早用早省钱！

一文掌握 MCP 上下文协议：从理论到实践

开放创新，昇腾 CANN 再向深处

AI Agent爆火后，MCP协议为什么如此重要！

2025年医疗大模型各医疗场景赋能实践研究报告130+份汇总解读|附PDF下载

AdventureX 2025 正式启动：五天四夜，120小时极限创造！一起在杭州点燃青年创新之火！

MCP 协议为何不如你想象的安全？从技术专家视角解读