Evolution of Multi-objective Sorting Model for Classified TAB Commodity Flow

1 Overview

The category TAB product flow is the product recommendation flow in all TABs except the "recommended" page on the purchase page of the Dewu app, such as "shoes", "bags" and so on. When the user enters the category TAB, we can simplify it into the product flow recommendation given the <userId, tabId, itemId> triplet. It can be seen that the biggest difference between the category TAB recommendation scenario and other "open" recommendation scenarios is that, It is a recommendation under limited conditions (category), which has a little similarity with the search scene. The category TAB represents the user's category intention. With our current iteration progress, we mainly focus on the binary modeling of <userId, ItemId> at this stage. In fact, <userId, tabId> refers to the correlation between user behavior and TAB, and <tabId, itemId> refers to TAB. The correlation with the product is what we need to consider in our subsequent differentiated modeling; the following related progress mainly describes the landing iteration of the more general product recommendation model. We use a multi-objective ranking model as the refinement strategy.

2. Model

2.1 Base ESMM

From the multi-objective learning paradigm, we choose the paradigm of the ESMM model as our refined ranking model. We will not elaborate on the introduction of ESMM here. For details, please refer to the paper. From the paper, we know that the architecture of ESMM is as follows:

The baseline model in the business is to replace the MLP layer in the above figure with the structure of the DeepFM model, and add the FM structure to learn cross-information. However, on the whole, the model is still relatively shallow, and there is no more extraction of the user's information representation. For this reason, we have upgraded the model structure.

2.2 Overall structure of the model

Our current modeling is <userId, itemId>. From the perspective of sample representation, item is a relatively dense and stable part. In the environment of large samples, most of the information can be expressed by id embedding. On the contrary, user The part is relatively sparse, and the description of the user requires a large number of generalization features. The introduction and modeling of user sequence behavior can greatly enhance the discrimination between samples, thereby improving the classification performance of the model. Therefore, we introduce a user modeling module to characterize user interests.

From the overall structure of the model, we do not change the learning paradigm of ESMM. We only improve the structure of generating ctr logits and cvr logits. The overall model structure is shown in the figure:

The structure of Deep Interest Transformer is as follows:

Overall, we can see that for the ctr task and the cvr task, each task has its own main net and bias net; the user vector learned by the user interest modeling module is shared by the two tasks, namely the interest transformer As an information extraction module, the extracted information is concat with other representations such as cross features as the underlying shared information.

2.2.1 User behavior sequence modeling

We consider a user's behavior to be relevant to the item they click on next, or an important attribute such as the same category, brand, or series. In order to disassemble and analyze the impact of the correlation between the user behavior sequence and the recommended products on the recommendation efficiency, we take the three-level category of the product as the analysis dimension, as follows:

We analyze the behavior logs of users on the tab commodity flow line, and draw the relationship between the three-level category of the target item and the three-level category of the user behavior sequence commodity on the recommendation efficiency, as shown in the following figure:

The abscissa is the category of the recommended commodity and the category of the user's behavior sequence, and the number of behavioral commodities in the same three-level category in the sequence; the ordinate is pvctr; The more relevant the product categories in the user behavior sequence, the higher the pvctr. Of course, intuitively, the longer the click sequence (the higher the user activity), the higher the click rate. For this reason, we analyzed the relationship between the sequence length and pvctr, as shown in the following figure:

Obviously, the deviation of user activity also affects pvctr, but through comparison, it can be clearly seen that the influence of user activity on pvctr has a small trend (slope); The trend of commodity category correlation is relatively obvious.

And how to model the correlation between scoring products and user behavior sequences, attention is obviously a reliable method, self-attention can model the correlation between each product in the user behavior sequence, and target-attention is to build A method to model the correlation between candidate products and user behavior sequences.

The Deep Interest Transformer in the model is a structure for learning user representation. We select the user's behavior sequence: real-time purchase, click to buy immediately, collection, click commodity behavior, purchase within 7 days, click to buy immediately, favorite, click commodity behavior, and integrate A user behavior sequence with a length of 120. The excess length is truncated, and the insufficient length is filled by default.

The Deep Interest Transformer first performs multi-head self-attention on the user's behavior sequence to learn the correlation between elements in the sequence, which is to encode the user sequence; for the item to be scored, we call it the target item, we When decoding, the embedding representation of the target item is used as the Q in the attention, and the vector represented by the encode representation of the user sequence is used as the K and V in the attention. This is the target-attention, that is, for different scoring items, calculate the item and the The correlation of the elements in the user behavior sequence, so for different target items, the elements of the activated user behavior sequence are different, so as to generate different user interest vectors, which will eventually effectively distinguish the scores of different target items.

2.2.2 Bias network

There are various biases in the recommender system. For the tab product flow, we analyze the user's own bias from several dimensions, such as the user's gender, equipment used, and registration location.

The comparison chart is omitted, the picture means that different user groups have different deviations. For example, male users have higher click-through rates on alcohol and watch categories, while female users have higher click-through rates on beauty and women’s clothing categories; Android users have higher click-through rates on categories The click rate of digital categories is high but iOS users are low; we select four representative regions in China (Shanghai, Guangdong, Sichuan, Beijing) for analysis, and find that categories such as wine, fashion, accessories, etc. There are obvious regional deviations.
To explicitly model this bias, we add a bias net to each task, each task is modeled by a separate network, and its output logits are added to the main network. bias net In this stage, we only use user features as input to bias net. This separate network is used to model the user's own bias. For example, some users prefer to browse but not click, while some users have a high click-through rate, that is, the difference in activity, and so on. In this case, using a separate bias net to model the user's own bias that has nothing to do with the recommendation results is a more effective method than simply adding bias features to learn in the main network.

More generally, bias nets are not limited to using only user embeddings for input. The recommendation system has many biases. If these biases are simply used as features for the main network input, the effect will not be as good as using a bias net to learn these biases alone. The bias net can not only model the user's click bias to the position, but also the user's bias to the time feature. It can concat the user vector and the representation of various bias features more generally, and then input it to the bias net for learning.

To sum up, the above model changes are our second version of the multi-objective model in the classified TAB commodity flow, which has been tested for a long time, and has achieved good online revenue and has been fully promoted.

3. Modeling of long-term user behavior sequences

3.1 Long-term interest

From the iteration of the above version, the user behavior we use is limited to the product sequence in the user's real-time portrait and the product sequence within 7 days of the user being offline. From the analysis of the sequence samples, we take the length of 120 as the maximum sequence length, where The effective commodity sequence length, the average is only 61; the median effective commodity length is only 65. The following table:

That is, a large number of sequence lengths are filled with invalid default values, which greatly weakens the user's interest expression after the attention is masked. Therefore, effectively expanding the user's effective behavior length will be able to enrich the user's behavior characteristics, which enables some inactive users to be recommended based on long-term behavior. In fact, even though they are users with rich behaviors, their long-term purchases, collections, and other behaviors are also beneficial to current recommendations.

In the above analysis, we use the three-level category as the analysis bridge to obtain the trend effect of the correlation between the user behavior sequence and the candidate product on the pvctr; for the long-term sequence of users, whether there is such a trend, we will The long-term sequence is analyzed after excluding sequences within 7 days, as shown in the following figure:

It can be seen that there is still a correlation between the recommended product category and the user's long-term behavior; the more related the category in the user's long-term serial product is to the candidate product category, the higher the candidate product pvctr.

So it is intuitive that we introduce the long-term behavior of the user without considering the time span of the user's behavior. From the 160 items of the user's recent behavior, we fill it into the sequence we constructed before in a deduplication manner, and truncate the most recent 120 behaviors chronologically. As can be seen from the above table, after filling the long sequence, the user's effective sequence behavior length has a median of 120 and an average of 101. This greatly enriches the user's feature expression.

From the offline evaluation indicators, ctr auc: +0.3%, cvr auc: +0.1%.

3.2 Long-term and short-term interest modeling

In the above versions, our modeling method is to fuse all user behaviors into a large sequence to generate user interest vectors. In fact, the interests reflected by the different behavior time spans of users are different. We hope to model the behaviors of users in different time spans in the model to describe the interests of users with different granularities.

Moreover, from the analysis of the clicked products, the categories in the user's short-term behavior are more correlated with the clicked product categories, while the categories in the long-term behavior are less relevant. As shown below:

Obviously, the category of the product clicked by the user has the highest degree of overlap with the 10 most recently acted commodities, and the category correlation with the 50th commodity that has been acted on in the past gradually decreases significantly.

In order to consider the impact of this short-term and long-term behavior sequence on candidate products, we divide the user's behavior into short-period behavior and long-period behavior, and consider long and short interests in user interest modeling. Portraits and offline portraits are divided. Experimentally, we tried the following two methods for modeling.

Long-term and short-term interests are modeled separately

Model short-term interest and long-term interest user vectors, respectively Sv and Lv; then concat [Sv, Lv] to get the user interest vector Uv , which is learned by the network of the respective tasks of the upper layer; as shown in the figure:

i.e. Uv=concat([Sv, Lv])

Long-term and short-term interests are integrated through the gate network

The short-term interest and long-term interest user vectors, respectively Sv and Lv, are integrated through the gate network, as shown in the figure:

That is, Uv = aSv + (1-a)Lv

Among them, gate net, the input is the user feature vector and the user's long and short-term interest vector learned by attention, which is activated by the sigmoid function through the MLP network, so that the short-term vector and the long-term vector are fused through the fusion gate to obtain a new user interest expression Uv . As for how to choose the characteristics of the gate network, we refer to the structure of MMoE, and we believe that it is the characteristics of the user that can distinguish the user's long-term and short-term interests.

To sum up the two long-term and short-term interest modeling methods, the auc from the offline evaluation index: about +0.1%, which has been combined with the above-mentioned long-term interest modeling in the classification TAB scene. The two modeling methods are basically equivalent to the final effect. It is true that in the first method, the long-term and short-term interests are concatenated and then handed over to the upper-layer task for self-learning, and at the same time, the specific network parameters of each upper-layer task are added. And the second way will depend on the learning of the gate network.

4. Outlook

As mentioned above, the recommendation scenario of classified TAB actually belongs to the product flow recommendation of the <userId, tabId, itemId> triplet. Our current work focuses on the modeling of the <userId, itemId> doublet, and Mainly model on the user side to improve generalization. However, the information of the TAB itself is also a factor to consider in this scenario. For example, in different TABs, some users prefer high clicks, and some prefer browsing. How to consider the differences in modeling different TABs in the model will be a follow-up direction. .

At the same time, the correlation between item and TAB itself will also be a consideration. TAB is similar to search category words in terms of category, and has category intentions. Item and TAB have strong and weak correlations, which are similar to search categories. Strong and weak correlation binning. We believe that items that are strongly related to TAB and hit user interests will be more likely to be clicked and converted. Of course, this remains to be our further analysis.

*Text / Wu Lifu
@德物科技public account

Evolution of Multi-objective Sorting Model for Classified TAB Commodity Flow

1 Overview

2. Model

2.1 Base ESMM

2.2 Overall structure of the model

2.2.1 User behavior sequence modeling

2.2.2 Bias network

3. Modeling of long-term user behavior sequences

3.1 Long-term interest

3.2 Long-term and short-term interest modeling

4. Outlook

得物技术

引用和评论

从零实现模块级代码影响面分析方案｜得物技术

大模型中的Token究竟是什么？从原理到作用深度解析

功率器件热设计基础（九）——功率半导体模块的热扩散

英飞凌 | 驱动电路设计（二）——驱动器的输入侧探究

DeepSeek的开源之路:一文读懂从V1-R1的技术发展,见证从开源新秀到推理革命的领跑者

2025低空经济eVTOL行业研究报告42份汇总解读|附PDF下载

入选ICLR 2025，MIT/UC伯克利/哈佛/斯坦福等提出DRAKES算法，突破生物序列设计瓶颈