Application practice of terminal intelligence in public comment search reordering

Terminal intelligence refers to the technology of running artificial intelligence (AI) applications on mobile devices. This article mainly describes the practical scheme of deploying large-scale deep learning models on the terminal side for search and reordering tasks in the public comment search scenario, including terminal feature engineering, model iteration ideas, and the process of specific deployment optimization. The developed classmates have helped or inspired.

1 Introduction

With the rapid development of information technologies such as big data and artificial intelligence, cloud computing has been unable to meet the requirements for data privacy and high real-time performance in specific scenarios. Drawing on the idea of edge computing, the deployment of AI capabilities in terminals has gradually entered the public's field of vision, and the concept of "end intelligence" came into being. Compared with traditional cloud computing, deploying and running AI modules on smartphones and other terminals has the following advantages: First, data localization can ease the pressure of cloud storage and is also conducive to the privacy protection of user data; second, computing Localization can alleviate the problem of cloud computing overload; finally, terminal intelligence reduces the request communication cost with the cloud system, and can make better use of user interaction on the terminal to provide a more real-time and personalized service experience.

In the application of terminal intelligence, major technology companies at home and abroad have been at the forefront. Google proposed the concept of Recommendation Android App, which recommends content based on user interests; Apple's Face ID recognition, Siri intelligent assistant and other well-known products are also typical application representatives of terminal intelligence. Alibaba, Kuaishou, ByteDance and other companies have also implemented end-to-end intelligence in their respective application scenarios, and launched corresponding end-to-end model inference frameworks. For example, Kuaishou launched short video special effects shooting, intelligent object recognition and other functions. In addition, there are also some practices in the search recommendation scenario. Among them, mobile Taobao "guess you like it" deployed an intelligent recommendation system on the terminal, and achieved relatively significant benefits (EdgeRec ^[1] , Double Eleven IPV increased by 10%+, GMV increased 5%+). The Kuaishou up-and-down recommendation scenario also applied the terminal rearrangement scheme, and achieved the effect of increasing the app duration by 1%+.

Search is an important channel for Dianping App to connect users and businesses. More and more users will search for the services they want in different scenarios. Understanding the user's search intent and ranking the results most wanted by the user is the most important step for a search engine. In order to further optimize the sorting ability of search personalization and improve the user experience, the Search Technology Center has carried out the exploration and practice of deploying a deep personalization model on the terminal. This article mainly introduces the practical experience of terminal intelligent rearrangement on the Dianping App. The article is mainly divided into the following three parts: the first part mainly analyzes the problems to be solved and the overall process of terminal intelligent rearrangement; the second part will introduce the terminal intelligent rearrangement. The exploration and practice process of the sorting algorithm part; the third part will introduce the architecture design and deployment optimization of the on-end rearrangement system, and finally the summary and outlook.

2 Advanced sorting system: why do you need on-device rearrangement

2.1 Pain Points of Cloud Sorting

Let's take a look at the entire front-end and back-end execution process with a complete search behavior. As shown in Figure 1, after the user initiates a retrieval request at the search portal of the mobile phone, the cloud server is triggered to execute, including query understanding, multi-channel recall, model sorting and display information merging, etc., and finally returned to the client for rendering and presentation to the user.

图1 搜索执行链路示意图

Due to the limitation of the number of queries per second (QPS) of the entire system, as well as the impact of front-end and back-end request communication and transmission of the packet body, the paging request mechanism is usually used. This kind of client-side paging request, the cloud service retrieval and sorting returns to the client-server structure of the final display list for the user, there are the following two problems for the public comment LBS scene and the search products like recommendation:

①List result sorting update delay

The paging request limit will cause the sorting results to be updated untimely. Before the next pagination request, no user action can have any effect on the search ranking results within the current page. Taking the public comment search result page as an example, a request returns 25 results to the client, and each screen displays about 3~4 results, then the user needs to slide about 6~8 screens to trigger a new paging request to the cloud to get the next page Results (Take the Food Channel listing page as an example, over 20% of searches view more than one page of results). The ranking system in the cloud cannot sense changes in users' interests in time and adjust the order of results that have been delivered to the client.

图2 分页浏览决策示意图

②Real -time feedback signal perception delay

Generally speaking, the real-time feedback signal will pass through the stream processing platforms such as Storm and Flink, and the log stream will be calculated in the form of mini-batch, and then stored in the KV feature database for use by the search system model. This method often has a minute-level feature delay because the feedback data needs to be parsed and processed. When more and more complex feedback data is involved, this delay will be more obvious. The real-time feedback reflects the real-time preferences of users, which is of great significance for the optimization of search ranking.

2.2 Intelligent rearrangement process and advantages of the terminal

In order to solve the delay of paging results sorting and adjustment decision-making, and to model users' real-time interest preference changes in a more timely manner, we built a reordering system architecture on the client side, so that the client side has deep model reasoning capabilities. This solution has the following aspects: Advantage:

Supports in-page rearrangement and makes real-time decisions based on user feedback : It is no longer limited by the cloud's paging request update mechanism, and has real-time decision-making functions such as local rearrangement and intelligent refresh.
Real-time user preference perception without delay : There is no need to process through the cloud computing platform, and there is no feedback signal perception delay problem.
Better protection of user privacy : In the era of big data, data privacy issues have attracted more and more attention from users. Dianping app is also actively responding to the regulatory authorities' enforcement regulations on personal information protection, upgrading personal privacy protection functions, and sorting on the terminal can be done. So that the relevant data is stored in the client to better protect the privacy of users.

The terminal intelligent rearrangement has achieved remarkable results after the public comment search and the food channel page were launched. The click-through rate of search traffic increased by 25BP (basis points), the click-through rate of the food channel page increased by 43BP, and the average number of Query clicks increased by 0.29%.

图3 端智能重排流程示意图

3. Exploration and practice of on-device reordering algorithm

There has been a lot of research work and implementation practice in the field of search and recommendation for reordering tasks. The core problem is to generate an arrangement of Top-K results from N result candidates. Specific to the end-to-end reordering scenario, the main work we need to do is to generate an arrangement of candidate business contexts based on the user's feedback on the previous sorting results, so that the overall search click-through rate of the list page is optimal. In the following, we will introduce in detail some exploration and practice of feature engineering, real-time feedback sequence modeling and model structure for on-end reordering scenarios.

3.1 Feature Engineering

The idea of building feature engineering on the terminal is basically the same as that of the cloud search and sorting system. The basic and cross features of each dimension of User/Item/Query/Contextual can be quickly reused on the terminal. Of course, it is necessary to consider transmission and storage optimization, as well as terminal and cloud. The consistency of the feature system enables the development and deployment of device-cloud "no sense". This part of the content will be described in detail in the following chapters on Architecture & Deployment Optimization. In addition, there are some user-specific real-time feedback signals on the terminal, including more fine-grained interactive behaviors, etc. These characteristics are also the key signals for the advantages of real-time feedback decision-making on the terminal analyzed above.

表1 特征体系

Specifically, the feature system of the rearrangement model built on the terminal is shown in Table 1, which mainly includes the following aspects:

Basic features, typical user/merchant/Query/Context side features, and bilateral cross features, etc.
Bias features, mainly including the sorting position returned by the backend, and some visual biases such as some sizes existing on the terminal device.
User's real-time feedback features, this part is an important part of the entire on-device rearrangement feature system, including:
- A sequence of direct user interactions (exposures, clicks, etc.).
- Behavior-related features, such as clicks to enter the merchant details page, stay, interaction and other related behaviors.

3.2 Modeling of User Feedback Behavior Sequence

For the modeling of user feedback sequences, there are many algorithm schemes in the industry, such as the well-known DIN (Deep Interest Network ^[10] ), DIEN (Deep Interest Evolved Network ^[11] ) and Transformer-based BST (Behavior Sequence Transformer) ^[12] ) and so on. In the on-device sorting scenario, the application of the user feedback behavior sequence will greatly affect the effect of the algorithm. Therefore, we also did some exploration in this area.

Introducing Deep Feedback Networks

In the optimization of the fine-arrangement model in the cloud, we generally only consider the explicit "positive feedback" behavior of users and merchants (including clicking, placing orders, etc.), and the implicit "negative feedback" signal of unclicked exposure is rarely introduced. Because in the long-term and short-term historical behaviors, there are many such exposures without clicks, which are very noisy compared to click signals. For devices, this real-time exposure "negative feedback" signal is also important. For example, after a certain type of merchants of the same brand are exposed multiple times in real time, the click-through rate of the brand merchants will show a significant downward trend.

Since the exposed and unclicked implicit negative feedback signal accounts for a large proportion in the real-time feedback sequence, modeling as a whole sequence has a larger dominant influence on the sparse positive feedback signal. Alibaba also proposed a confrontation-based approach in the context of Taobao’s homepage information flow recommendation to mine the relationship between exposure and click behavior sequences, so as to find which behaviors in the current exposure sequence are true negative feedback, and which behaviors are true negative feedback. More closely related to clicks. The WeChat team proposed a deep feedback network DFN ^[4] , which can de-noise and correct deviations to a certain extent by introducing the interaction relationship between positive and negative feedback signals.

First, based on the optimization idea of DFN, we split the feedback sequence to generate positive and negative feedback sequences, and use Transformer to perform Cross Attention interaction of positive and negative feedback signals. Specifically, taking the exposure sequence and the click sequence as an example, the exposure behavior sequence is used as Query, and the click behavior sequence is used as the Key and Value to obtain the Attention result of the exposure behavior sequence to the click behavior sequence. In the same way, change it again to get the Attention result of the click behavior sequence to the exposure behavior sequence. Considering the sparsity of the positive feedback signal, when there are only negative feedback sequences, some average irrelevant noise weights are calculated. Therefore, we refer to the practice of ^[7] and introduce an all-zero vector into the negative feedback sequence to eliminate this potential noise. The specific model structure is shown in Figure 4 below:

图4 正负反馈交叉 Attention 结构图

Improve the signal-to-noise ratio of negative feedback signals

After the first version of the model was launched on the Food Channel list page, it achieved a steady increase of 0.1% compared to Base, but there is still some gap between the offline revenue and the comparison, which is not in line with our expectations. After analysis of the ablation experiment, it is found that the main reason is that there is a lot of noise in the negative feedback signal, and the root cause of the noise is that the click behavior of some exposed merchants may occur after the moment of feature collection. Therefore, in order to improve the signal-to-noise ratio of negative feedback signals, we limit the exposure time of negative feedback signals, and merchants exposed for a long time but not clicked are more likely to be real negative feedback signals. As shown in Figure 5 below, a longer stay can be associated with a more stable feedback signal, and the online effect is better.

图5 停留时长-点击率效果对比

Cross modeling of positive and negative feedback sequences from multiple perspectives

Continuing to iterate on the basis of the first version of the positive and negative feedback sequence model, we noticed that when adjusting the number of Multi-Heads in the Transformer, there is no expected incremental benefit, and there is no significant change compared to using only one Head indicator. After analysis, we suspect that the generated multi-head representation through random initialization is largely an expansion of purely parameter quantities.

In addition, in the public comment search scenario, the overall correlation with the merchant list under Query is relatively high, especially for the results on the page, the homogeneity is higher. Differences are mainly reflected in fine-grained representations such as price, distance, environment, and taste. Therefore, we designed a multi-view positive and negative feedback sequence cross modeling method Multi-View FeedBack Attention Network (MVFAN) to strengthen the interaction between exposure and click behavior in these more perceptual dimensions. The specific network structure is shown in Figure 6 below:

图6 Multi-View FeedBack Attention Network 结构图

The user behavior sequence is divided into positive click feedback and untouched negative feedback according to the feedback type. In addition to the shopid itself, the sequence also supplements more generalized attribute information (including categories, prices, etc.), as well as context-related features (such as latitude and longitude, distance). These sequences are superimposed after embedding to form the representation of the final positive and negative feedback sequence. Next, the multi-level Transformer will be used for encoding, and the signals of multiple dimensions will be input to decode, and the user's preferences in different merchant dimensions will be activated:

Use the merchant ID to be queued as Q to activate real-time feedback behaviors and express users' hidden diverse interests.
Using the attribute information of merchants with more granularity as Q to activate the attention weight, improve the user's interest expression in these more explicitly perceived merchant representations.
Using the signal related to the current search context as Q, the activation gets the attention weight, which enhances the adaptive expression of real-time feedback behavior for different contexts.

$Q = [x_s, x_c, ..., x_d] \in \Re^{K\times d_{model}}$, $K = V = x_s \oplus x_c \oplus ... \oplus x_d$ means that each The various feedback sequences (shop_id/category/distance/position, etc.) are added. As the input of the Transformer, the attention structure of Multi-View can be represented by the following formula:

$$MultiHead(Q, K, V) = Concat(head_1, head_2, ..., head_h) W^O$$$$head_i = Attention(Q_iW^{Q_i}, KW_i^K, VW_i^V)$$ $$Attention(Q_i, K, V) = softmax(\frac{Q_iK^T}{\sqrt{d_k}})V$$

Through ablation comparison experiments, it is found that the Transformer activation method that explicitly uses multiple merchant context features is more effective than the randomly initialized Multi-Head Attention.

Match&Aggregate sequence features

For real-time user feedback features on the end, in addition to various commonly used Attention-based sequence modeling methods, there is also an interest extraction method that uses explicit crossover. As shown in Figure 7, compared to the Attention modeling that generally calculates the "Soft" weight based on the Embedding inner product, it can be understood as a "Hard" Attention method, and the extraction forms include: Hit (whether it hits), Frequency ( How many hits), Step (how long is the interval), etc. In addition to the crossover of univariate sequences, multiple variables can also be combined for crossover to improve the granularity and discrimination of behavior descriptions.

图7 Attention、Match&Aggregate 序列特征提取对比图

This feedback sequence cross-feature introduced based on prior knowledge can avoid some noise information introduced by the "Soft" Attention method to a certain extent, and also has better interpretability. For example, when a user searches for "hot pot", he does not select a nearby merchant, but clicks on a historically preferred merchant near his usual place of residence. In this scenario, there are obvious signals indicating the user's intention to make decisions in advance. At this time, adding some explicit strong intersection features (for example, the distance between the merchants to be queued and the real-time clicked merchants, etc.) can capture this kind of intent very well, so as to rank related merchants that are far away but better match the user's intent. . In the context of public comment search, we introduced a large number of prior cross-features based on this method, and achieved remarkable results.

3.3 Rearranged model design

Regarding the research on reordering, there are also many related works in the industry, including: MMR (Maximal Marginal Relevance) ^[8] based on greedy strategy optimization for multi-objectives, Context-aware List-wise Model ^{[2] , 3]} and reinforcement learning-based schemes ^[9] , etc. In the search-side intelligent rearrangement scenario, we use a Context-aware List-wise model for construction, and generate Top-K by modeling the interaction relationship between the Top-N item contexts generated by the refined ranking model. result. The overall model structure is shown in Figure 8 below, which mainly includes the training scheme of device-cloud linkage, so as to introduce more interactive representations in the cloud; and the context relationship modeling based on Transformer, which will be introduced separately below.

图8 整体模型结构图

Device-cloud joint training

Generally speaking, the reordering models in the cloud basically reuse the features of the fine sorting layer, and add the position or model score of the fine sorting output on this basis. After a long-term iterative update, the Dianping search refined ranking model has built a large number of basic and scene-related features, and modeled multiple joint goals including clicks, visits and purchases. These large-scale dimensional features and multi-objective optimization are in the end There is huge computational overhead, storage & transmission pressure on direct multiplexing. However, only using the cloud model location or estimated sub-output will inevitably lose the cross-expression ability of many device-cloud features. At the same time, there will be large maintenance costs for model iteration and update on both sides of the device and cloud.

Therefore, we use the device-cloud joint training method to introduce a large number of cloud feature cross signals and multi-target high-order representations to the device for use. As shown in Figure 9, after the model training in the cloud converges, it is added to the on-device rearrangement task to continue the fine-tune update. have to be aware of is:

Because the search refinement layer uses ListWise's LambdaLoss, the estimated score output by the model only means relative size, and cannot represent the estimated range of the merchant's click-through rate, and cannot be used as a global absolute value. Therefore, only the last layer of the network is used for output access.
Only accessing the Dense output of the last layer greatly loses the ability to cross cloud features and on-device features. Therefore, it is necessary to select head features and add them to the cloud for use through feature selection.

图9 端云联合训练模型结构图

Reordered Merchant Context Modeling

The merchant context rearrangement modeling structure refers to PRM ^[3] , and some adjustments have been made in combination with the application scenarios on the terminal. The specific structure is shown in Figure 10 below:

图10 重排算法模型结构图

It mainly consists of the following parts:

Merchant feature vector X: It is represented by the output of the fully-connected mapping of the aforementioned features (User/Shop single- and double-sided statistical cross-features, feedback sequence coding features, and features of cloud fusion output). This output already contains positional information, so subsequent Transformer inputs do not need to add positional encoding.
The input layer needs to be processed by Query Dynamic Partition, divided into the context merchant sequence of each Query unit, and then input to the Transformer layer for coding.
Transformer encoding layer: Encode merchant context through Multi-Head Self-Attention.

optimize the target

In the search scenario, we still focus on the success rate of the user's search (whether there is a click behavior). Unlike the recommendation and advertising scenarios, which often estimate the click-through rate of an item based on the global loss, the search business is more concerned with the results ranked at the top of the page. Whether it is good or bad, the order of the front position needs to be prioritized. Therefore, in the modeling of the goal of rearranging to improve the user's search click rate, we use ListWise's LambdaLoss, and introduce the DeltaNDCG value in the gradient update to strengthen the influence of the head position. For the detailed inference and calculation implementation process, please refer to the deep learning ranking practice based on knowledge graph for public comment search .

$$C = \frac{1}{2}(1 - S{ij})\sigma(s_i - s_j) + log(1 + e^{-\sigma (s_i-s_j)})$$
$$\lambda_{ij} = \frac{\partial C(s_i - s_j)}{\partial s_i} = \frac{-\sigma}{1 + e^{\sigma (s_i-s_j)}}| \ Delta _{NDCG}|$$

3.4 Multi-scene application effects

Combining the above features & model optimization measures, the performance comparison of relevant offline experiment indicators is shown in Table 2:

表2 实验迭代指标对比数据表

The AB experiment was launched on the main review search and the food channel list page for the terminal intelligent re-ranking, and the core business indicator QV_CTR was significantly improved on the basis of high level. As shown in Figure 11, in the first half, the QV_CTR of the main search list page increased by 0.25%, and the QV_CTR of the food channel list page increased by 0.43%, and the performance of the split ends was stable and positive. In addition, from the comparison curve of the click-through rate in the lower half of the sub-positions, it can be seen that the rearrangement on the terminal can alleviate the click attenuation effect of the fixed paging request to a certain extent, especially in the display of the next few screens, there is a significant improvement. .

图11 线上 AB 实验 QV_CTR 指标效果 & 分位置点击率对比

4 System Architecture and Deployment Optimization

Different from the large-scale deep models in the cloud, models of hundreds of GB or even T can be deployed and used by expanding the distributed solution of machine shard loading. Although the computing and storage capabilities of terminal devices have been significantly improved and can support a certain scale of deep models for inference, relatively speaking, the storage resources on the terminal are very limited. After all, the overall size of the app is no more than a few hundred MB.

Therefore, in addition to the trade-off between the effect and the performance in the feature selection and triggering decision control mentioned above, we have further optimized the model deployment and compression, and conducted a detailed evaluation of various indicators such as energy consumption. In addition, in order to iterate the models on the terminal more efficiently, including further mining the user's real-time interest and preference characteristics, we have developed a set of "endless sense" model training and estimation framework that is consistent with the cloud system process. The following will be introduced step by step. .

4.1 System Architecture

The overall end-to-end intelligent rearrangement system architecture, including the joint deployment scheme with the search and sorting system in the cloud, is shown in Figure 12. Specifically, there are mainly the following three modules to support the implementation of the on-device rearrangement system:

The intelligent triggering solution module executes the scheduling of the intelligent module on the terminal according to various triggering events designed by the business. For example, the user clicks on the merchant to trigger the execution of local reordering.
The on-device rearrangement service module executes the construction of feature data, and calls the on-device inference engine to run the rearrangement model for scoring output. in:
- The feature processing part is a set of general feature operator processing services specially designed by the Search Technology Center for the search/promotion/promotion algorithm scenarios to facilitate the use of algorithms. It supports the use of lightweight and simple expressions to construct features for various types of data on the client and the cloud.
- The terminal-side inference engine part is a unified model management framework output by the terminal R&D center, which supports the deployment of various types of lightweight inference engines on the terminal, and the dynamic distribution control of models.
The native rearrangement processing logic part mainly performs the re-insertion of the results after the rearrangement and output, and refreshes the control processing.

图12 端智能重排系统架构

4.2 Optimization of large-scale deep model deployment on the terminal

Sparse Embedding and Dense network split deployment

Due to the limited computing resources on the end, it is impossible to store a complete large-scale parameter model. Therefore, based on the most intuitive idea, we split the offline training model parameters into Dense networks and large-scale ID feature Embedding Tables for deployment:

The main Dense network and some smaller Query/Contextual features, Shop basic attribute features and other input layer structures are converted into MNN format and stored on the Meituan resource management platform for a one-time pull when the client starts, and stored on the client local.
The large-scale ID feature Embedding Table part (accounting for 80% of the overall network parameters) is stored in the TF-Servering service in the cloud. When the client initiates a search request, it will obtain the corresponding results of the current page merchant from the Serving service. The Embedding feature is returned to the client together with the merchant result list. After concat with the rest of the features constructed by the client, it is input to the inference engine for scoring and rearranging.

model compression

After the previous splitting process, the size of the model can be controlled within 10MB. In order to further reduce the space occupation of the model on the mobile phone, as well as the power consumption/performance impact, we adopted the compression scheme provided by Meituan Vision Intelligence Department. In view of the existing neural network model compression technology that does not consider the smart devices to be deployed, the compressed models often cannot be adapted to specific devices, and the alignment of the output results is poor. The neural network compression tool deployed on the platform has better performance on the end-to-end inference framework.

After compression optimization, it can be seen from the test comparison data below that the model size is further reduced to less than 1MB, and the accuracy loss is within the 100,000th percentile. Use Sysdiagnose to analyze the power consumption, enable the reasoning function, and repeat the action: search for "Hotpot/Wujiaochang" from the home page, enter the search list page to perform the first rearrangement reasoning, slide the list to calculate again, and exit the page (the test time is 10 minutes, and the interval is 10 minutes). Once every 20 seconds), there is no significant change in the related energy consumption indicators.

图13 模型压缩数据、能耗相关指标对比

4.3 Terminal Intelligent Model Training Prediction Platform

Different from the cloud-based sorting algorithm experiment process, it has been supported by a mature and complete training estimation platform, and the feature & model launch is very convenient and efficient. In the early stage of the client's experimental process, there was a very big iterative efficiency problem, such as the cumbersome online process of the model, including the separation, conversion & verification of the model structure, and the release of relying on a large number of manual operations, and the circulation and connection with multiple internal platforms; In addition, feature iteration Inefficiency, requires the client to collaboratively develop the corresponding feature processing logic, there is a greater risk of logic consistency, and there will be problems such as implementation differences between ends.

Based on this, Meituan's front-end and back-end projects jointly promoted the development and designed a set of Augur feature processing framework adapted to the client. Augur) to open up, laying a good foundation for further algorithm iteration experiments. The follow-up search technology center team will also introduce a one-stop model training and estimation platform for end-to-end intelligent algorithm applications, so stay tuned.

图14 端智能模型训练预估框架图

5 Summary and Outlook

End-to-end intelligent reordering is an exploration practice of Dianping Search in the direction of edge computing, and has achieved remarkable results in core indicators. By leveraging on-end computing capabilities, users' real-time interests and preferences can be captured more efficiently, making up for the delay in cloud service decision-making and the delay in obtaining user feedback information. Adjust the order of unexposed candidate results in a timely manner, and rank merchants that are more in line with user intentions, thereby bringing a better user search experience. At the same time, we have upgraded the front-end and back-end training and deployment estimation framework, laying a good foundation for further rapid iterative experiments.

The Dianping Search Technology Center team will continue to implement end-to-end intelligence technology in various business scenarios. The future directions for exploration and optimization include:

Based on the federated learning model, on the basis of ensuring data privacy security and legal compliance, the intelligent search ranking model of the end-cloud federation is iterated.
Modeling more accurate and diverse trigger control strategies, for the decision-making module of real-time user intention perception on the terminal, the current control strategy is relatively simple. In the future, we will consider combining Query context, user feedback signals and other features to output more flexible prediction signals, and request the cloud to obtain more candidate results that meet the user's current intentions.
Continue to optimize the reordering model, including real-time feedback sequence modeling algorithms, and explore more robust coding expressions for implicit negative feedback signals.
Think about more abundant and flexible application scenarios on the terminal, such as the personalized customization of models, so as to achieve the ultimate personalized experience of "thousands of people and one thousand models".

About the Author

Zhu Sheng, Liu Zhe, Tang Biao, Jia Wei, Kai Yuan, Yang Le, Hong Chen, Manman, Hua Lin, Xiaofeng, Zhang Gong, from Meituan/Dianping Division/Search Technology Center.

Yi Ran and Zhu Min are from Meituan Platform/Search and NLP Department/Engineering R&D Center.

References

[1] Yu Gong, Ziwen Jiang, et al. "EdgeRec: Recommender System on Edge in Mobile Taobao" arXiv preprint arXiv:2005.08416 (2020).
[2] Qingyao Ai, Keping Bi, et al. "Learning a Deep Listwise Context Model for Ranking Refinement" arXiv preprint arXiv:1804.05936 (2018).
[3] Changhua Pei, Yi Zhang, et al. "Personalized Re-ranking for Recommendation" arXiv preprint arXiv:1904.06813 (2019).
[4] Ruobing Xie, Cheng Ling, et al. "Deep Feedback Network for Recommendation" (IJCAI-2020).
[5] Feiyi, Zhu Sheng, etc. The practice of deep learning sorting based on knowledge graph for public comment search.
[6] Xiao Yao, Jia Qi, etc. The practice of Transformer in search and sorting of Meituan.
[7] Qingyao Ai, Daniel N Hill, et al. "A zero attention model for personalized product search" arXiv preprint arXiv:1908.11322 (2019).
[8] Teo CH, Nassif H, et al. "Adaptive, Personalized Diversity for Visual Discovery" (RecSys-2016).
[9] Eugene Ie, Vihan Jain, et al. "SLATEQ - A Tractable Decomposition for Reinforcement Learning with Recommendation Sets" (IJCAI-19).
[10] Zhou, Guorui, et al. "Deep interest network for click-through rate prediction." (SIGKDD-2018).
[11] Zhou, Guorui, et al. "Deep interest evolution network for click-through rate prediction." (AAAI-2019).
[12] Chen, Qiwei, et al. "Behavior Sequence Transformer for E-commerce Recommendation in Alibaba." arXiv preprint arXiv:1905.06874 (2019).

Job Offers

Meituan/Dianping Division - Search Technology Center, is committed to building a first-class search system and search experience to meet the diverse search needs of Dianping users and support the search needs of various businesses on the Dianping app. Interested students are welcome to send their resumes to: edp.itu.zhaopin@meituan.com .

Read more collections of technical articles from the Meituan technical team

| Reply keywords such as [2021 stock], [2020 stock], [2019 stock], [2018 stock], [2017 stock] in the public account menu bar dialog box, you can view the collection of technical articles by the Meituan technical team over the years.

| This article is produced by Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "The content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activities, please send an email to tech@meituan.com to apply for authorization.

Application practice of terminal intelligence in public comment search reordering

1 Introduction

2 Advanced sorting system: why do you need on-device rearrangement

2.1 Pain Points of Cloud Sorting

2.2 Intelligent rearrangement process and advantages of the terminal

3. Exploration and practice of on-device reordering algorithm

3.1 Feature Engineering

3.2 Modeling of User Feedback Behavior Sequence

3.3 Rearranged model design

3.4 Multi-scene application effects

4 System Architecture and Deployment Optimization

4.1 System Architecture

4.2 Optimization of large-scale deep model deployment on the terminal

4.3 Terminal Intelligent Model Training Prediction Platform

5 Summary and Outlook

About the Author

References

Job Offers

美团技术团队

引用和评论

可信实验白皮书系列02：AB实验基础

老显卡福音！美团开源首发INT8无损满血版DeepSeek R1

上下文感知的聚合页广告优化实践

AIoT 智变浪潮演讲实录 | 刘浩然：让硬件会思考：边缘大模型网关助力硬件智能革新

OR算法+ML模型混合推理框架架构演进

ICLR&CVPR 2025美团技术团队论文精选

重磅来袭！“2025中国边缘计算企业20强”榜单发布！