Exploration and practice of rough ranking optimization of Meituan search

Rough sorting is an important module of the search and promotion system in the industry. In the exploration and practice of optimizing the effect of rough sorting, the Meituan search ranking team optimized rough sorting from two aspects of fine sorting linkage and effect performance joint optimization based on actual business scenarios, which improved the effect of rough sorting. This article introduces the iterative route of Meituan search for rough sorting, the rough sorting optimization work based on knowledge distillation and automatic neural network selection, and hopes to bring some inspiration or help to students engaged in related work.

1 Introduction

As we all know, in large-scale industrial applications such as search, recommendation, and advertising, in order to balance performance and effect, the ranking system generally adopts a cascaded architecture [1,2], as shown in Figure 1 below. Taking the Meituan search and sorting system as an example, the entire sorting is divided into coarse sorting, fine sorting, rearranging and shuffling. Sent to the fine layer.

图1 排序漏斗

Looking at the coarse ranking module from the perspective of Meituan's search and ranking full link, the current coarse ranking layer optimization has the following challenges:

Sample selection bias : Under the cascade sorting system, the rough sorting is far from the final result display link, resulting in a large difference between the offline training sample space of the rough sorting model and the sample space to be predicted, and there is a serious sample selection bias.
Coarse sorting and fine sorting linkage : Coarse sorting is between recall and fine sorting. Coarse sorting needs more information to obtain and use subsequent links to improve the effect.
Performance constraints : The candidate set for online coarse ranking prediction is much higher than that of the fine ranking model. However, the actual entire search system has strict performance requirements, which leads to the need to focus on the prediction performance for coarse ranking.

This article will focus on the above challenges to share the relevant exploration and practice of the optimization of the rough ranking layer of Meituan search. Among them, we will solve the problem of sample selection bias in the linkage problem of fine ranking. This article is mainly divided into three parts: the first part will briefly introduce the evolution route of the rough ranking layer of Meituan search and sorting; the second part will introduce the relevant exploration and practice of rough ranking optimization. The combination of row and rough row to optimize the effect of rough row, the second work is to consider the performance of rough row and the effect of trade-off rough row optimization, all related work has been launched in full, and the effect is remarkable; the last part is the summary and outlook, I hope these The content is helpful and inspiring for everyone.

2. Rough rehearsal of the evolution route

The evolution of the rough sorting technology of Meituan Search is divided into the following stages:

2016: Linear weighting based on information such as relevance, quality, conversion rate, etc. This method is simple but the feature expression ability is weak, the weight is manually determined, and there is a lot of room for improvement in the sorting effect.
2017: Using a simple LR model based on machine learning for Pointwise Predictive Ranking.
2018: Adopt the double-tower model based on vector inner product, input query words, user and context features and merchant features respectively on both sides, after deep network calculation, output user & query word vector and merchant vector respectively, and then pass the inner product Calculate the estimated scores for sorting. This method can save the business vector calculation in advance, so the online prediction is fast, but the crossover ability of the information on both sides is limited.
2019: In order to solve the problem that the twin-tower model cannot model cross-features well, the output of the twin-tower model is used as a feature to fuse with other cross-features through the GBDT tree model.
2020 to present: Due to the improvement of computing power, began to explore the NN end-to-end coarse sorting model and continue to iterate the NN model.

At this stage, two-tower models are commonly used in the industry, such as Tencent [3] and iQiyi [4]; interactive NN models, such as Alibaba [1,2]. The following mainly introduces the relevant optimization work of Meituan Search in the process of upgrading coarse ranking to NN model, mainly including two parts: coarse ranking effect optimization and effect & performance joint optimization.

3. Rough row optimization practice

With a lot of effect optimization work [5,6] in the Meituan search fine-ranked NN model, we also began to explore the optimization of the coarse-ranked NN model. Considering that rough sorting has strict performance constraints, it is not applicable to directly reuse the work of fine sorting optimization to rough sorting. The following will introduce the optimization of the fine-arrangement linkage effect of migrating the sorting ability of the fine-arrangement to the coarse-arrangement, and the effect and performance trade-off optimization of the automatic search based on the neural network structure.

3.1 Optimization of fine-arrangement linkage effect

The rough sorting model is limited by the scoring performance constraints, which will lead to a simpler model structure than the fine sorting model, and the number of features is much less than that of the fine sorting, so the sorting effect is worse than that of the fine sorting. In order to make up for the loss of effect caused by the simple structure and fewer features of the rough sorting model, we try the knowledge distillation method [7] to optimize the rough sorting by linking the fine sorting.

Knowledge distillation is a common method in the industry to simplify the model structure and minimize the effect loss. It adopts a Teacher-Student paradigm: the model with complex structure and strong learning ability is used as the Teacher model, and the model with simpler structure is used as the Student model. To assist the training of the Student model, so as to transfer the "knowledge" of the Teacher model to the Student model to improve the effect of the Student model. Figure 2 shows the schematic diagram of the refined row distillation and the rough row. The following will introduce the practical experience of these distillation schemes in the rough ranking of Meituan search.

图2 精排蒸馏粗排示意图

3.1.1 Refinement result list distillation

Rough sorting is the pre-module for fine sorting. Its goal is to initially screen out candidate sets with better quality for fine sorting. From the selection of training samples, except for items with regular user behaviors (click, order, payment) As positive samples, in addition to exposing items that do not behave as negative samples, you can also introduce some positive and negative samples constructed from the sorting results of the fine sorting model, which can not only alleviate the sample selection bias of the rough sorting model to a certain extent, but also reduce the fine sorting The sorting ability of the row is migrated to the coarse row. The following will introduce the practical experience of using the fine sorting results to distill the rough sorting model in the Meituan search scenario.

Strategy 1 : On the basis of the positive and negative samples fed back by users, randomly select a small number of unexposed samples that are later in the fine-ranked order as supplements to the coarsely-ranked negative samples, as shown in Figure 3. For this change, offline Recall@150 (see appendix for indicator explanation) +5PP, online CTR +0.1%.

图3 补充排序结果靠后负例

Strategy 2 : Random sampling is performed directly in the fine-sorted set to obtain training samples, and the fine-sorted position is used as a label to construct a pair for training, as shown in Figure 4 below. Compared with strategy 1 Recall@150 +2PP offline effect, online CTR +0.06%.

图4 排序靠前靠后构成 pair 对样本

Strategy 3 : Based on the sample set selection of strategy 2, the label is constructed by binning the precise sorting positions, and then the pair is constructed according to the binning label for training. Compared with strategy 2 Recall@150 +3PP offline effect, online CTR +0.1%.

3.1.2 Refinement Prediction Fractional Distillation

The previous use of sorting result distillation is a relatively rough way of using refined sorting information. On this basis, we further add prediction score distillation [8], hoping that the score output of the coarse sorting model and the score distribution of the fine sorting model are as aligned as possible, as follows Figure 5 shows:

图5 精排预测分数构造辅助损失

In the specific implementation, we use a two-stage distillation paradigm to distill the coarse sorting model based on the pre-trained fine sorting model. The distillation Loss uses the minimum square error of the output of the coarse sorting model and the output of the fine sorting model, and adds a parameter Lambda to control the effect of distillation Loss on the final Loss, as shown in formula (1). Using the method of fractional distillation, offline effect Recall@150 +5PP, online effect CTR +0.05%.

3.1.3 Feature Representation Distillation

In the industry, it has been verified that it is an effective way to improve the effect of the model by using knowledge distillation to guide the rough sorting representation modeling [7]. However, directly using the traditional method to distill the representation has the following defects: First, it is impossible to distill coarse sorting and fine sorting. The sorting relationship between the rows, and as mentioned above, the sorting results distillation in our scenario, both offline and online are effective; the second is the traditional knowledge distillation scheme that uses KL divergence as a representation metric, and the representation Each dimension is treated independently and cannot effectively distill highly related and structured information [9]. In the Meituan search scenario, the data is highly structured, so it is possible to use traditional knowledge distillation strategies for representation distillation. This structured knowledge cannot be captured well.

图6 对比学习精排信息迁移

On the basis of the above formula (1), supplementary comparative learning to represent distillation Loss, offline effect Recall@150 +14PP, online CTR +0.15%. Details of related work can be found in our paper [10] (under submission).

3.2 Joint optimization of effect performance

As mentioned above, the candidate set of coarse ranking for online prediction is relatively large. Considering the constraints of the whole link performance of the system, the coarse ranking needs to consider the prediction efficiency. The work mentioned above is optimized based on the paradigm of simple DNN + distillation, but there are the following two problems:

Currently limited by online performance, only simple features are used, and richer cross-features are not introduced, resulting in further improvement of the model effect.
Distillation with a fixed coarse-row model structure loses the distillation effect, resulting in suboptimal solutions [11].

According to our practical experience, it is not enough to directly introduce cross features in the coarse row layer to meet the online delay requirements. Therefore, in order to solve the above problems, we have explored and practiced a coarse-ranking modeling scheme based on neural network architecture search, which simultaneously optimizes the effect and performance of the coarse-ranking model, and selects the best feature combination and model that meets the coarse-ranking delay requirements. The structure, the overall architecture diagram is shown in Figure 7 below:

图7 基于 NAS 的特征和模型结构选择

Below we briefly introduce the two key technical points of neural network architecture search (NAS) and introduction efficiency modeling:

图8 模型延时计算图

Through the modeling of neural network architecture search to jointly optimize the effect and prediction performance of the rough ranking model, offline Recall@150 +11PP, and the online index CTR +0.12% without increasing the final online delay; for details, please refer to [13], accepted by KDD 2022.

4. Summary

Beginning in 2020, we have implemented a large number of engineering performance optimizations to implement the MLP model of the rough sorting layer. In 2021, we will continue to iterate the rough sorting model based on the MLP model to improve the rough sorting effect. First of all, we draw on the distillation scheme commonly used in the industry to link fine sorting to optimize rough sorting, and conduct a large number of experiments from three levels of fine sorting result distillation, fine sorting predicted fractional distillation, and feature representation distillation, without increasing the online delay. Next, improve the effect of rough row model.

Secondly, considering that the traditional distillation method cannot handle the feature structure information in the sorting scene well, we have developed a set of fine-arrangement information transfer and coarse-arrangement scheme based on contrastive learning.

Finally, we further consider that the rough ranking optimization is essentially a trade-off of effect and performance, adopt the idea of multi-objective modeling to optimize the effect and performance at the same time, and implement the automatic search technology of neural network architecture to solve the problem, so that the model can automatically select efficiency and performance. The best-performing feature set and model structure. In the future, we will continue to iterate the coarse layer technology from the following aspects:

Coarse sorting and multi-objective modeling : The current rough sorting is essentially a single-objective model, and we are currently trying to apply the multi-objective modeling of the fine sorting layer to the rough sorting.
System-wide dynamic computing power distribution with linkage between coarse ranking : Coarse ranking can control the computing power of recall and the computing power of fine ranking. For different scenarios, the computing power required by the model is different, so dynamic computing power allocation can be performed without reducing the line. In the case of the above effect, the system computing power consumption can be reduced. At present, we have achieved certain online effects in this regard.

5. Appendix

The traditional offline ranking indicators are mostly based on NDCG, MAP, and AUC indicators. For coarse ranking, its essence is more inclined to recall tasks targeting set selection. Therefore, traditional ranking indicators are not conducive to measuring the iteration of the coarse ranking model. The effect is good or bad. We refer to the Recall indicator in [6] as a measure of the effect of the rough sorting line, that is, taking the fine sorting result as the ground truth, to measure the alignment degree of the rough sorting and the fine sorting result TopK. The specific definition of the Recall indicator is as follows:

The physical meaning of this formula is to measure the coincidence of the top K in the rough sorting and the K in the fine sorting, which is more in line with the essence of the coarse sorting set selection.

6. About the author

Xiao Jiang, Suo Gui, Li Xiang, Cao Yue, Pei Hao, Xiao Yao, Da Yao, Chen Sheng, Yun Sen, Li Qian, etc. are all from the Meituan Platform/Search Recommendation Algorithm Department.

7. References

[1] Wang Z, Zhao L, Jiang B, et al. Cold: Towards the next generation of pre-ranking system[J]. arXiv preprint arXiv:2007.16122, 2020.
[2] Ma X, Wang P, Zhao H, et al. Towards a Better Tradeoff between Effectiveness and Efficiency in Pre-Ranking: A Learnable Feature Selection based Approach[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021: 2036-2040.
[3] https://mp.weixin.qq.com/s/Jfuc6x-Qt0rya5dbCR2uCA
[4] https://mp.weixin.qq.com/s/RwWuZBSaoVXVmZpnyg7FHg
[5] https://tech.meituan.com/2020/04/16/transformer-in-meituan.html.
[6] https://tech.meituan.com/2021/07/08/multi-business-modeling.html.
[7] Tang, Jiaxi, and Ke Wang. "Ranking distillation: Learning compact ranking models with high performance for recommender system." Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018.
[8] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015).
[9] Chen L, Wang D, Gan Z, et al. Wasserstein contrastive representation distillation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 16296-16305.
[10] https://arxiv.org/abs/2207.03073
[11] Liu Y, Jia X, Tan M, et al. Search to distill: Pearls are everywhere but not the eyes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 7539-7548 .
[12] Cai H, Zhu L, Han S. Proxylessnas: Direct neural architecture search on target task and hardware[J]. arXiv preprint arXiv:1812.00332, 2018.
[13] https://arxiv.org/abs/2205.09394

Job Offers

The Search Recommendation Algorithm Department/Basic Algorithm Group is the core team responsible for the research and development of Meituan. Its mission is to build a world-class search engine, relying on Deep Learning (deep learning), NLP (natural language processing), Knowledge Graph (knowledge graph) and other technologies , process the massive user, merchant, and commodity data of Meituan, continuously deepen the understanding of users, scenarios, queries and services, efficiently support various forms of life service searches, and solve the multi-service mix, relevance, and personalization of search results. and other issues, to give users the ultimate search experience. The Search and Recommendation Algorithm Department has been recruiting search and recommendation algorithm experts for a long time. Interested students can send their resumes to: tech@meituan.com (mail subject: Meituan Platform/Search and Recommendation Algorithm Department).

Read more collections of technical articles from the Meituan technical team

| Reply keywords such as [2021 stock], [2020 stock], [2019 stock], [2018 stock], [2017 stock] in the public account menu bar dialog box, you can view the collection of technical articles by the Meituan technical team over the years.

| This article is produced by Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "The content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activities, please send an email to tech@meituan.com to apply for authorization.

Exploration and practice of rough ranking optimization of Meituan search

1 Introduction

2. Rough rehearsal of the evolution route

3. Rough row optimization practice

3.1 Optimization of fine-arrangement linkage effect

3.1.1 Refinement result list distillation

3.1.2 Refinement Prediction Fractional Distillation

3.1.3 Feature Representation Distillation

3.2 Joint optimization of effect performance

4. Summary

5. Appendix

6. About the author

7. References

Job Offers

美团技术团队

引用和评论

MTGR：美团外卖生成式推荐Scaling Law落地实践

大模型中的Token究竟是什么？从原理到作用深度解析

深度探索 DeepSeek 微调：LoRA 与全参数微调实战指南

DeepSeek行业应用实践报告100+份汇总解读|附PDF下载

功率器件热设计基础（九）——功率半导体模块的热扩散

2025增长新前沿——AI人工智能拐点重塑人类潜力 400+份报告汇总解读 | 附PDF下载

DeepSeek的开源之路:一文读懂从V1-R1的技术发展,见证从开源新秀到推理革命的领跑者