The practice of multi-interest recall model of Dewu Technology

In the practice process of MIND multi-interest recall, it goes through two stages of offline and real-time to implement the final landing. The intermediate steps are therefore recorded. I hope others can gain something from reading this article. Specific steps: First, the feasibility is proved from the experimental data through the offline recall method, and the confidence of the day-level indicators is improved: pvctr+0.36%, uvctr+0.36, dpv+0.76%, and then based on this version, the MIND online estimation method is developed. The final income obtained in the transaction waterfall scenario is: dpv+3.61%, per capita collection pv+2.39%, pvctr+1.95%, uvctr+0.77%. In the offline recall process, it is carried out by a recall method similar to I2I. First, after training the model, then run the user embedding and item embedding on a single machine, and then obtain the inverted recall under the corresponding userId through faiss, and finally use the userId as a trigger. Recall of I2I. Online estimation is to use the online real-time user behavior to call the model of the neuron estimation service to calculate the user embedding, and then calculate the faiss through the offline item embedding of the C engine to obtain the recalled products. Currently, the number of interests is set to 3, considering the timeout of the recall component. time, thus requiring concurrent execution.

1. Introduction to multi-interest recall

1.1 MIND Multi-Interest Recall

Multi-interest MIND recall is a u2i recall that proposes a Behavior-to-Interest (B2I) dynamic routing for adaptively aggregating users' behaviors into interest representation vectors. Specifically, the input of the multi-interest model is the user's behavior sequence, which uses the cspuid of the user's clicks, favorites, and shares in the waterfall scene, and the output is the user's multiple interest vectors (this size can be defined. , determined by the output dimension of the network), in general, the difference of interest vectors is focused on category/brand.

The user vector of multiple interests is a highlight of the recall, because it is obviously impossible for users to have only a single interest when they visit the waterfall scene, and often focus on several different interests, which may be shoes, clothing, cosmetics, etc. Therefore, it is often difficult for a single interest vector to cover what the user wants to perceive.

The recall results of a single interest are often limited to one category. This recall strategy has poor scalability and is easy to push narrower, making the video head effect in the candidate pool more and more obvious.

1.2 Some other multi-interest recall methods:

(1) It is possible to recall users' long-term and short-term interests and clicked items, so that all past interests of users can be captured. This is also a strategy of current online i2i

(2) You can also use the clicked/favorite/shared items to recall items like item2vec to get the item's embedding, then faiss index and topn to get all the points of interest of the user

2. Multi-interest recall model

The following figure is the overall network block diagram of MIND recall

For the model, the input for each sample can be represented as a 2-tuple: ,in Representatives and Users The set of interacted items, that is, the user's historical behavior sequence; Defined as the target item Some features of the product, such as the brand_id and category_id of the product.

Obtain multiple vectors to express different aspects of user interests through Mulit-Interest Extractor Layer;
A multi-interest network (MIND) with dynamic routing is proposed, which uses dynamic routing to adaptively aggregate user historical behaviors into user expression vectors to deal with different interests of users;
Developing Label-Aware Attention Label-Aware Attention Mechanism for Learning User Representations with Multiple Interest Vectors

Specifically, a multi-interest extraction layer based on the capsule network mechanism is designed, which is suitable for clustering historical behaviors and extracting different interests. The core idea of capsule network is "the output is the result of some kind of clustering of the input"

There is an advantage here: if all the information related to the user's interests is compressed into an expression vector, this will become a bottleneck for the expression of the user's diverse interests. When the candidate set is recalled in the recommendation recall stage, all the information about the user's different interests will be mixed in the Used together, the relevance of recalled items will be greatly reduced. Therefore, the process of using multiple vectors to express different interests of users, grouping the user's historical behavior into multiple interest capsules, expects related items belonging to the same capsule to jointly express a specific aspect of the user's interest.

Through the multi-interest extraction layer, multiple interest capsules are built from the user behavior embedding. During training, a label-aware attention layer lets label items select used interest capsules . In particular, for each label item, we calculate the similarity between the interest capsule and the label item embedding, and calculate the weight of the interest capsule and the user representation vector as the target item, and determine the weight of an interest capsule through the corresponding compatibility, This is almost the same as Attention in DIN, but the meaning of key and value is different . The target item is query, and the interest capsule is both keys and values.

where pow defines the exponential operation of each element and is an adjustable parameter to adjust the attention distribution. When approaching 0, each interest capsule gets the same attention. When greater than 1, the dot product with a larger value will get more and more weight as . Considering the limit case, when approaching infinity, the attention mechanism becomes a hard attention, choosing to focus on the largest value and ignoring other values.

2.1 User embedding calculation part

The core task of MIND is to learn a function that maps from user behavior sequences to multi-interest vector representations, which are defined as:

in for users The multi-interest vector representation of , The dimension of embedding, the dimension set by this network 32 like DSSM, is the number representing the interest vector. like , the Embedding representation of other models such as Youtube DNN.

2.2 item embedding calculation part

The embedding function of the target product is:

in Represents an Embedding&Pooling layer.

2.3 Loss function

2.4 The way of negative sampling

Negative adoption is to uniformly sample several cspu_id as negative samples. The cspu_id here is selected from the sample at that time, and it all contains click behavior.

3. Implementation and deployment of multi-interest recall

Multi-interest recall implementation specifically includes odps, C++ engine, dpp and neuron estimation server

3.1 odps part

Among them, odps mainly performs the construction of training samples, model training and offline monitoring of models. The samples include two versions. The first version adopts the user behavior table on the algorithm side, and the second version takes into account the delay in the reporting of the first version. Therefore, the dumped sequence of clicks, favorites and sharing is used. Here, only cspu_id is used as a feature, and the original information of the product has not been used. For example, Taobao uses brand_id and category_id. used in subsequent iterations.

Here, a windowed construction sample method is used, and the data of the past three days are used. Then it is filtered through the pushable pool. In addition, because the embedding_lookup method used in the model is used to find the embedding of cspu_id, a mapping table of cspu_id and hash_id is also constructed separately in the sample. Then the model will be trained

3.2 C++ engine part

The C++ engine stores the mapping table of cspu_id and hash_id, and the vector cluster stores the item embedding table

In the process of online use, it is necessary to obtain the user's clklist portrait, and then based on the cspu_id saved by clklist, go to the front row to request the mapping table caused by C++ to obtain the hash_id, and then construct the characteristics of neuron. Only the sequence and length of hash_id are needed here, and the estimated user embedding can be achieved after transmitting to neuron.

After getting the user embedding, construct three concurrent requests through dpp to call faiss to obtain the recall result corresponding to each interest, then splicing and deduplicating and then transmitting to the subsequent layers

3.2.1 Some small pits

The first is that the generation time of embedding is uncertain, and then there is no synchronization mechanism in the middle, so the inconsistency between the neuron model and the emebdding of faiss will lead to a decrease in the effect.

Second, the task of building embeddings is executed serially in a cluster, so if there are too many recalls of u2i in the future, the task will be stuck.

3.3 Neuron estimation part

The neuron estimation part needs to build a service, specifically, a neuron_script script needs to be written for dpp to call neuron in the process, neuron then performs feature processing according to this script, and then builds the model feed, and then through a tf-servering The service pulls up the model, and finally outputs the user vector back to dpp.

4. Stability of the multi-interest recall model

The online service model not only needs to be effective, but also needs to ensure the stability of its operation. Therefore, the online model requires additional offline monitoring and corresponding blocking mechanisms, as well as online monitoring of model services. The monitoring and blocking mechanism is that when the model is trained offline at the day level, if the offline indicators (including auc, pcoc, loss) do not meet expectations, it will be blocked in time and an alarm will be issued as a pre-blocking to prevent the impact. The online effect of this road recall. The realization of online monitoring of the model service judges the recall vacancy rate, recall qps, recall number of recall funnel, and pvctr, uvctr and other indicators of the corresponding channel for online monitoring, and configures it in a similar way as a post-monitoring. If there is a large fluctuation, an alarm will be issued, and then a quick response will be performed to roll back and deal with it accordingly.

The monitoring of the model is realized by odps. The main reason is that the loss will be dropped during the model training process, and then the rules will be used for long-term and short-term monitoring.

4.1 Monitoring of model offline indicators

Long-term monitoring is to obtain the 90% quantile of the past 30 days and the maximum value of 1.1 times, and then judge that if the loss is less than the 90% quantile, the training may be a bit over-fitting, and the interest will not be divergent enough, it will be blocked. 1.1 times the maximum value, the model may not be fully trained, the recall results may be too divergent, and there will be some bad cases.

Short-term monitoring is mainly to calculate the mean and variance of the past 14 days, and then the loss should be between (mean-3 variance, mean+3 variance), if not, block it

Monitored sql code:

 SELECT CUR_LOSS,MIN_LOSS,MAX_LOSS,IF (CUR_LOSS<MIN_LOSS or CUR_LOSS>MAX_LOSS,1,0) FROM (
    (
        SELECT loss AS CUR_LOSS ,1 as rn1 FROM  deal_pai_model_mind_recall_check 
        where ds=(select(regexp_replace(substr(date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP()),1),1,10),'-','')))
    ) a
LEFT JOIN 
    (
        SELECT (avg(loss)-2*STDDEV(loss)) AS MIN_LOSS ,1 as rn2 FROM deal_pai_model_mind_recall_check 
        where ds<(select(regexp_replace(substr(date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP()),1),1,10),'-',''))) 
        and ds>(select(regexp_replace(substr(date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP()),14),1,10),'-','')))
    ) b
on a.rn1=b.rn2
LEFT JOIN 
    (
        SELECT (avg(loss)+3*STDDEV(loss)) AS MAX_LOSS ,1 as rn3 FROM deal_pai_model_mind_recall_check 
        where ds<(select(regexp_replace(substr(date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP()),1),1,10),'-','')))
        and ds>(select(regexp_replace(substr(date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP()),14),1,10),'-','')))
    ) c 
on a.rn1=c.rn3
);

4.3 Model blocking mechanism

The fine-arrangement model update is realized by using the shell deployment script of odps

And neuron executes the deployment script regularly on the springboard to determine whether the model file is generated under the path. Therefore, two paths can be constructed here, one is the fixed production path, and the other is the model file path for neuron. If the model is blocked, it will also be Just don't push the model to neuron's model file path.

Here is the data quality method of dataworks, which is blocked by rules once it does not meet the rules. The blocking flow chart is as follows:

Official configuration document

When it is normal, follow the normal process. If it is abnormal, it will hinder the downstream nodes, so that the online model will not be updated to ensure the quality of the model.

5. Summary of multi-interest recall model

The core of MIND multi-interest recall is based on the capsule network originally applied to the image and then used for recommendation, and then uses the dynamic routing algorithm to capture the user's multiple interests, and the corresponding multiple interests recall different products to cover what the user wants to perceive. , and online prediction will continuously extract the user's behavior to capture the user's subsequent interest points more accurately, which is why online prediction is much better than offline prediction.

6. Post-optimization points

1. Add the original information of cspu_id to the user's click/share/collection behavior sequence, which is convenient for cold start and can obtain more product representation information to generate item embedding

2. The optimization of the engineering side, the concurrent index is placed in the C engine, which is used to increase the amount of recall interest

3. The optimization of negative sampling adopts the construction method of negative samples similar to DSSM, instead of uniform sampling for all clicked samples

quote

https://arxiv.org/pdf/1904.08030.pdf

https://mp.weixin.qq.com/s/unM36UYmTHo_ddvUXT00Qg

Text /QC

Pay attention to Dewu Technology and be the most fashionable technical person!

The practice of multi-interest recall model of Dewu Technology

1. Introduction to multi-interest recall

1.1 MIND Multi-Interest Recall

1.2 Some other multi-interest recall methods:

2. Multi-interest recall model

2.1 User embedding calculation part

2.2 item embedding calculation part

2.3 Loss function

2.4 The way of negative sampling

3. Implementation and deployment of multi-interest recall

3.1 odps part

3.2 C++ engine part

3.2.1 Some small pits

3.3 Neuron estimation part

4. Stability of the multi-interest recall model

4.1 Monitoring of model offline indicators

4.3 Model blocking mechanism

5. Summary of multi-interest recall model

6. Post-optimization points

quote

得物技术

引用和评论

得物自研DScript2.0脚本能力从0到1演进

70k star，取代Postman！这款轻量级API工具，太香了！

C++ 中 VS 项目引入公共配置文件

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

狂揽17k star！Docker可视化神器，一键部署项目真香！

OpenWebUI：一站式 AI 应用构建平台体验