Exploration and application of takeaway package

This article is the third article in the Takeaway Food Knowledge Atlas series. From a technical perspective, we will introduce the technical solutions for takeaway package matching, including offline and real-time package matching iterations, package quality evaluation solutions, and business applications of package matching.

1. Background

Meituan Waimai has been working hard to make it easier for users to purchase satisfactory takeaway products. This article mainly introduces the package matching technology and application practice for gourmet businesses. In the selection process of ordering take-out meals, users generally consider factors such as single product preference, combination and collocation, and the process of selecting merchants and products takes a long time. Through the package matching technology, we automatically match high-quality packages based on the merchant’s candidate products to easily solve the user’s "difficult choice" and improve the user’s decision-making efficiency.

2. Business goals and challenges

2.1 Business goals

At present, there are many package matching applications of Meituan Takeaway App, such as "Today's Package Recommendation", "Full Reduction Artifact", "Package Matching Recommendation" and so on. Due to the current lack of ability and willingness of takeaway merchants to match packages on their own, the underlying supply of takeaway packages has a low coverage of business scenarios and merchants, and cannot meet the needs of package-related recommendation ranking applications. Therefore, the business goal of take-out package matching is to match candidate package combinations for gourmet businesses, and provide more abundant package supply to package-related application parties.

图1 “套餐推荐” 、“满减神器套餐推荐”、“菜品详情页套餐搭配”应用示例

For package-related applications, we conducted a business analysis: the matching conditions for services such as "recommended today" and "full reduction artifact" are relatively weak and the matching conditions can be obtained offline, which are classified as recommended-related services, and such services need to be guaranteed The package coverage rate of merchants has been increased to ensure that the recommendations of the merchants are exposed. The matching conditions for business such as detail page, full minus plus purchase, etc. are relatively strong, and more real-time. For example, in the detail page, the user specifies a dish to match, and the full minus plus purchase scenario is that the user selects a dish and a specific price range as conditions. These are collocation services. Such services need to ensure that the package covers the real-time scene, so as to ensure the exposure of the package and Tab. The goals of the package matching algorithm are as follows: ①Improve the coverage of the package combination, so as to provide downstream package-related applications with high scenario coverage and sufficient diversity of package combinations. ②Guarantee the quality of the matching package.

2.2 Business challenges

There are also many applications of product matching in the e-commerce scene, such as Taobao's shopping cart matching, clothing matching, and cosmetics matching. Shopping cart matching is a package recommendation based on the user's shopping cart and purchased products. For example, after purchasing a toothbrush, the user can give a recommendation for toothpaste. This kind of method is mainly based on the purchase behavior of commodities to make relevant recommendations, and the goal is not to form a complete combination. However, the collocation of take-out food products needs to consider the rationality of the entire combination, rather than simply based on whether the products are related. For example, a large number of orders have combinations such as "fried pork + tomato and egg soup + rice", "fish-flavored pork shredded meat + tomato and egg soup + rice", but "tomato and egg soup + rice" does not constitute a good set meal.

Clothing collocation and cosmetics collocation are combination-oriented collocation recommendations. Solutions to such collocation problems are roughly divided into two categories. One category is: collocation mode is used for pruning the model selection process, collocation mode can be artificial or model way. Given a priori, the papers 4 and 5 in the reference adopt this idea. The characteristic of this method is that the matching effect is guaranteed by the pruning strategy + quality evaluation model. The other is the idea of learning matching patterns through end-to-end network parameters. Paper 6 and our offline package use this idea. The feature of this solution is that the matching effect is more dependent on the end-to-end model guarantee, but at the same time the matching model is more complicated.

Compared with product matching in e-commerce scenarios, food matching faces unique business challenges:

The business scenarios and matching conditions of package matching are relatively diverse, so the package matching plan needs to meet the needs of various businesses and various matching conditions.
Gourmet products are non-standard products, and the products sold by different merchants are different, resulting in different set meal matching models from merchant to merchant. For example, the portion, taste, ingredients, and price of Kung Pao Chicken sold by different merchants are different. Therefore, there will be different set menus for Kung Pao Chicken.
Algorithmic matching will inevitably produce low-quality matching results, and the non-standard attributes of commodities make it more difficult for us to measure the quality of food matching. Low-quality collocations may include: a. Contains collocations that are not suitable for separate sale and non-US foods, such as collocations that include gifts, pots, and tableware. b. The matching result does not conform to the conventional matching pattern, such as two drinks, drink + steamed buns, etc.

To this end, our solution is:

In order to solve the problem of diverse business scenarios and matching conditions, we have formed an algorithm matching framework that combines offline and real-time. For recommendation-related services, we use the offline matching method to pre-match package candidates, and then perform personalized ranking in the business scenario. Offline collocation is based on the iterative idea of rule-to-model. Rule collocation relies on the product representation of the knowledge graph. Through high-frequency aggregation + rule collocation generalization, relatively high-quality packages are produced to ensure the coverage of top merchants. Model collocation can ensure the quality of the collocation and at the same time increase the coverage of the package through the generalization of the model. For real-time matching services, the algorithm will match the packages in real time according to the matching conditions of the business to further improve the coverage of the packages in each real-time scenario.
In order to solve the problem of non-standard food products, we introduced takeaway food maps to describe the dishes in a multi-faceted manner. Based on the takeaway knowledge map, we extracted rich information representations of the dishes, such as standard dishes, dish categories, tastes, ingredients, practices, etc., to reduce the impact of non-standard products.
In order to ensure the quality of the package, we have developed a model of package quality evaluation.

In general, we have carried out relevant explorations and iterations on non-standard product representations, merchant representations, package matching models, and package matching quality assessments, forming a package matching framework as shown in Figure 2 below.

图2 套餐搭配框架

3. Package match model

3.1 Package matching model based on map label induction

One of the problems we face is that the take-out products are non-standard products, the quality of the dish data is poor, and the attributes are missing. To this end, based on various information sources such as merchant menus, recipes, product descriptions, and various methods such as information extraction, relationship recognition, and knowledge fusion, we have constructed a knowledge map with cuisine as the core, and established categories, tastes, and tastes of dishes. Representation of multiple dimensions such as practice and efficacy.

图3 外卖美食知识图谱

Merchants’ historically high-sales packages can generally be considered high-quality packages. However, the number of high-sales packages for medium and low-sales merchants is small, and it is difficult to support applications such as personalized recommendations for packages. Relying on the semantic expression of the dishes in the food map, we first tried the direct induction and deduction based on the knowledge map to match the meal plan. For example, through high-frequency orders, it can be concluded that {hot dish}+{rice}+{soup} is a common set meal combination, and then for businesses to deduce the set meal combination of "tomato scrambled eggs + tomato egg soup + rice" .

The process of graph induction and deduction is the process of high-frequency aggregation and generalization based on matching templates. We use order aggregation, same brand, same label, and same dish template generalization to produce high-quality set collocations. At the same time, the merchant coverage of the set Significant improvement. However, the problem with matching templates is that it is difficult to compromise between matching quality and generalization. The more restrictive matching template can ensure the matching quality, but the generalization ability is insufficient, and the package coverage is low. If a single or a small number of labels are used to describe the collocation items, it will lead to excessive generalization of the model, and the accuracy cannot be guaranteed. To this end, we introduced a model-based package matching method.

3.2 Package matching model based on Encoder-Decoder

User matching package is also a process from information encoding to information output: the user browses the merchant menu is the encoding process, obtains an overall overview of the merchant and product information, and then matches the package based on this overview. One idea that fits this process is to use the Encoder-Decoder framework to build a package matching model. The Encoder is analogous to the process of users browsing the menu, learning the semantic information of the menu, and the Decoder is responsible for matching the package. Encoder-Decoder is a deep learning network framework, which has been widely used in text summarization, machine translation, dialogue generation and other applications. Its modeling method is through encoding (feature extraction) and decoding (target fitting), learning from The mapping of Encoder input data to Decoder output data. Common encoding methods include CNN, RNN, Transformer and other structures, and decoding methods are similar.

3.2.1 Package matching model based on LSTM

The problem of package generation is to extract multiple product subsets from a set of all candidate products of a merchant to form a package that is convenient for users to screen and can place orders directly. The data source generated by the package is mainly the candidate product information of the merchant (such as the name, label, price, sales volume, etc.) of the merchant, combined with the constraint conditions such as the meal price range, the number of meals, and user preferences and other information. Initially we used LSTM as the neural network of Encoder and Decoder for package matching. We extract the semantic representation of the product based on the semantics of the graph and input it into the RNN model of the Encoder. The Encoder encoding process is similar to the process that users browse through the merchant's candidate products. The Encoder terminal enters the name of the dish, the label of the dish, and the business attributes (price, sales volume, etc.) of the dish, and features extraction of non-standard dishes through LSTM. As shown in Figure 4 below, the name of each product is extracted through the Embedding layer and the CNN+Pooling layer, and is spliced with continuous features such as the dish label, the embedding of the category, and the price and sales volume, and finally serves as the input of each step in the Encoder RNN.

图4 Encoder网络结构

Decoder generally relies on a fixed dictionary or dictionary as a candidate set in the decoding process, and each step outputs the probability distribution of the selected words and words in the candidate set. For the package matching network, the candidate set decoded by the decoder comes from the product list in the merchant at the input of the encoder, rather than a fixed-dimensional external dish vocabulary. The Pointer Network is an effective framework for modeling this problem. Pointer Network is based on the extension of Seq2seq. It mainly solves the problem of not fixed candidate sets. This model architecture has been successfully applied to extractive text summaries, as well as solutions to combined optimization problems such as the traveling salesman problem and the convex hull problem.

The specific process of package collocation decoding is that the decoder estimates the probability distribution of the target dish from the dish list at each step. At the nth step (n>=1), this probability distribution vector expresses the probability that a certain product or the end position will be selected when n-1 products have been selected. If the probability corresponding to the termination position is relatively large, the model tends to form a complete set of n-1 selected products. During the decoding process, we combine the BeamSearch algorithm to generate TopN results to ensure the diversity of collocations.

图5 Encoder-Decoder网络结构

3.2.2 Optimization of package matching model

package matching model learning objectives

In order to solve the problem that the matching mode of dishes varies from merchant to merchant, the model learns the merchant's matching characteristics by fitting the merchant's historical orders. A more mainstream form of training is based on the real orders of the merchants, and the training is carried out in the form of Teacher Forcing, so that the dishes predicted by the model are matched with the dishes in the real order one by one. The Teacher Forcing-style training method makes the probability of predicting dishes tend to 0-1 distribution, but the actual dishes are usually personalized and diverse. For example, after the Decoder has output "Kung Pao Chicken" dishes, the next step The staple food of choice is either "rice" or "fried rice".

To this end, we collect statistics on the package matching patterns of the merchant’s history and calculate the probability distribution of product selection. The decoder uses the probability distribution of product selection as the training target, calculates the MSE Loss with the estimated distribution, and minimizes the value. Guide the training of the model. Another problem of Teacher Forcing is that it is difficult to introduce external knowledge such as matching quality and package click-to-buy behavior to guide model training. For this reason, we tried to use reinforcement learning to improve. At time T of the decoding process, we sample a complete package candidate through Monte Carlo Sampling, calculate the collocation quality score of the package candidate as the reward, and combine the MSE Loss and the collocation quality score for model training.

package collocation constraints

The package matching process will face a variety of business constraints. For example, for the "full reduction artifact", the matching package needs to meet a given full reduction price range. The "smart assistant" package matching process needs to consider the filter conditions selected by the user, for example, the conditions may be "the staple food is rice" and "the price is less than 30 yuan". We use the pruning strategy to ensure that the matching process meets the constraints. Taking the price range constraint of the "full reduction artifact" as an example, when the decoder side generates candidate dishes in a single step, it will filter out dishes that exceed the remaining price range based on the remaining prices. As shown in Figure 6 below, for merchants’ A, B, C, D, and E dishes, Decoder will use the remaining price range "within 15 yuan" to prune the next round of dishes A, B, C, D, and E. And delete the two dishes C and D that exceed the price range.

图6 套餐搭配价格约束

Package match model based on Attention network

The problems faced by the feature extraction of dishes in merchants based on the LSTM network are as follows: First, the dishes of the merchant menu are disordered, and the RNN network relies on sequence for modeling. Second, there may be long-distance semantic dependence between dishes. For example, whether there are "rice", "steamed buns" and other dishes in the menu will affect the matching of "Gongbao Chicken" dishes.

In order to better characterize the dependent information between disordered menus and dishes, we tried the Encoder-Decoder model based on the Attention structure. The Encoder part uses a hierarchical Attention structure to extract the semantic information of the dishes, including the Attention of the bottom single-dish level and the Attention between the dishes. For single-dish-level Attention, we use the Multi-Head Attention structure in the word dimension to get the semantic vector of the dish name, and the dish label also uses Multi-Head Attention to get the semantic vector of the dish label. For the transaction attribute of the dish, we use multi-layer fully connected The network extracts semantic vectors of transaction features.

Finally, the semantic vector of the dish name, the semantic vector of the dish label, and the semantic vector of the transaction feature are spliced and then normalized by the fully connected layer + layer to obtain the dish semantic vector. For the Attention layer between dishes, we use multi-layer Multi-Head Attention to obtain the menu-level semantic vector of the restaurant's menu-level semantic vector list for the restaurant. The Decoder part of the model also uses Multi-Head Attention for decoding. The input information includes user preference information, historical moment decoding input, price constraints and other contextual information. The model outputs the probability distribution of the selected dishes in the merchant's menu at each step. In the decoder process, we perform Multi-Head Attention on the user preference information and the semantic vector of the merchant's menu level, and consider the user's dining preferences during the package matching process.

图7 基于Attention的套餐搭配网络

3.2.3 Analysis of Package Matching Model

We believe that the high-quality combination of merchants can be reflected in the sales volume of the order. One evaluation method is to evaluate the coverage of the package output by the model to the real high-volume package of the merchant. Through offline and online evaluations, we found that the model can fit merchants’ high-volume packages. In the manual evaluation part, we mixed the package matched by the algorithm with the real single order and let the manual distinguish it. We found that the manual could not distinguish the difference between the model matching order and the real single order. At the same time, the model has good generalization capabilities, which significantly improves the coverage of packages for merchants and specific business scenarios.

We analyzed the dish representation vector output by the model to understand the model's set menu matching mode. Use TSNE to perform dimensionality reduction and clustering of vectors. Observe the cluster graphs and find that the "staple", "main dish", and "snacks" dishes are clustered together. It can be seen that the model has identified the "staple" and "dishes" of the dishes "", "Snacks" and other category semantic attributes, and refer to this semantics for set meal matching.

Staple food: TOP N similar dishes of "Wonton"	Dishes: TOP N similar dishes of "Braised Pork"
Chicken Soup Wonton 0.981	Cucumber with beef 0.975
Peas Hot and Sour Powder 0.979	Fresh Mushroom Beef 0.977
Pork wonton 0.975	Maojia Braised Pork 0.980
Beef Noodles in Clear Soup 0.975	Chinese cabbage sausage 0.973
Skin belly fat intestine noodles 0.974	Mixed small intestine 0.976
Fried udon noodles with seafood 0.974	Pork head meat 0.981
Scallion Pork Pot Sticker 0.973	Braised small potatoes 0.975
Pea rice noodles 0.971	Mix beef 0.980

3.3 Real-time package matching model

The scheme of using offline collocation to generate package candidates can meet the needs of recommended services, but it is still insufficient for some matching business scenarios. For example, the current offline package coverage of dishes is low, that is, only guarantees for applications such as dish details pages Part of the PV matching module is exposed.

One solution is to increase the coverage of gourmet products through offline matching. However, the storage cost of this solution is relatively high. For this reason, we adopt a real-time package matching solution. The difficulty of generating a solution in real time lies in not only ensuring the quality of the package, but also meeting various matching conditions, and the most important thing is to ensure real-time. Initially, we applied the offline collocation model to online real-time collocation, and found that there were bottlenecks in performance. Therefore, we have streamlined the offline model. The streamlining idea is to streamline the process of selecting dishes to the process of selecting dishes, and streamline the collocation relationship of dish dimensions to the collocation relationship of dish category, so as to reduce the overall solution space. As shown in Figure 8 below, the specific process is as follows:

Matching template mining : Mining the matching relationship of the merchant's high-selling category level through the merchant's historical orders, that is, matching template, such as "hot dishes + staple food".
Search and pruning : When choosing dishes, select dishes according to the dish category in the matching template. For example, in the above example, first select "hot dishes", and then select "staple food". In the selection process, the overall selection process is pruned according to the user's real-time needs, such as specifying mandatory dishes, specifying prices, specifying staple food types and other constraints.
Screening and evaluation : After the collocation is completed, the quality of the obtained candidate collocation results is evaluated. Based on performance considerations, the tree model is used for quality evaluation and the Top N collocation results are selected.

图8 实时套餐搭配和离线套餐搭配

4. Package quality evaluation

There are also low-quality packages in high-volume orders. Coupled with the accuracy of model generalization, the matching model can easily generate poor-quality matching combinations. As shown on the right side of Figure 9 below, the last two packages generated by the model are relatively unreasonable. In order to further ensure the user experience, we have established a package matching quality model to conduct a unified evaluation of the quality of the package. The package quality classification model transforms the package quality into a classification problem. Because the set menu is composed of multiple dishes, we construct the representation of the dishes based on the name of the dish, label and other information, and then use Global-Attention to consider the importance of the dishes, and add the total number of products, the total number of servings, etc. Global features are used to represent the overall collocation information. The specific model structure is shown in Figure 9 below:

图9 套餐质量分类

We have fine-grained the quality of the package: extremely poor, poor, medium, and good, and the four categories have an orderly relationship (very poor<bad<medium<good), and the corresponding model has four output values. Each one represents the probability that the bit is 1, for example, "extremely bad" is represented as "1,0,0,0", "poor" is represented as "1,1,0,0", and "medium" is represented as "1, 1,1,0", "good" is represented as "1,1,1,1". The loss of the model adopts the Pair Hinge Loss function to avoid the situation where the front node is 0 and the back node is 1 to ensure the accuracy of the model. The collocation quality score of the package is the sum average of the four output nodes, making the predicted value more credible. The model structure is roughly the same as the general classification model, and the objective function is as follows:

图10 套餐质量分类

During the construction of the package quality model, negative examples mainly come from Bad Cases feedback from users, and packages screened out by artificially constructing unreasonable matching model versions. The problem with this method is that the bad case and the artificially constructed matching negative sample are biased and the diversity is poor, and the ratio of the negative sample to the positive sample is not easy to adjust.

To this end, we introduce a pre-training task to learn the matching mode of historical orders, and introduce more prior knowledge of matching for the package matching quality model. The pre-training process is shown in Figure 11 below. We have randomly masked a dish in a single collocation combination, and then trained the Transformer model to restore the dish that was dropped by the Mask. In this process, consider the rationality of some sub-optimal packages (for example: "Gong Pao Chicken + Rice + Coke", Mask out "Gong Pao Chicken", the generator generates "Fish-flavored pork shreds", " "Fish-flavored shredded pork + rice + cola" can be understood as a sub-optimal package), we add a discriminator to predict the similarity between the dish and the target dish in the final loss function to solve this kind of situation. The pre-trained parameters are finally used to initialize the package collocation quality binning model, and the model is tuned based on a small amount of manual annotation corpus.

图11 套餐质量分类

5. Package matching applications and future prospects

At present, takeaway has created a variety of products with set meals as the core. "Today's set meal recommendation" helps users solve the problem of not knowing what to eat and buying slowly. The "full reduction artifact" and "single product matching recommendation" on the shop page "Solve the problem that users find it difficult to make up orders and match up. In order to solve the problem of package matching in various business scenarios, the package matching algorithm has been continuously optimized for coverage, matching quality, and matching diversity, providing important technical and data support for the business. Offline package matching is used for services such as "full reduction artifact" and "today package recommendation", which significantly increases the coverage of package merchants. Real-time package matching is used for services such as "dish detail page package matching" and has achieved good business income .

In the follow-up work, on the one hand, we will continue to optimize the construction of the dish knowledge map, improve the description of non-standard dishes, and further improve the accuracy and coverage of the data by introducing multi-modal data such as images, and better describe the scene knowledge map by constructing the scene knowledge map User demand and supply. On the other hand, we will explore scenario-based package matching: currently we have less work on scenario-based package matching, and users will have different package requirements in different scenarios, such as cold weather preferring hot pot packages and Laba festival food Congee set meal, hope to eat local special set meal in a different place. Next, we will explore the matching of scene-based packages, and match packages for solar terms, festivals, crowds, etc., to better meet users' personalized and scenario-based dining needs.

图12 套餐搭配相关应用

6. References

Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer networks." Advances in neural information processing systems. 2015.
See, Abigail, Peter J. Liu, and Christopher D. Manning. "Get to the point: Summarization with pointer-generator networks." arXiv preprint arXiv:1704.04368 (2017).
Gong, Jingjing, et al. "End-to-end neural sentence ordering using pointer network." arXiv preprint arXiv:1611.04953 (2016).
Han, Xintong, et al. "Learning fashion compatibility with bidirectional lstms." Proceedings of the 25th ACM international conference on Multimedia. 2017.
Alashkar, Taleb, et al. "Examples-Rules Guided Deep Neural Network for Makeup Recommendation." AAAI. 2017.
Chen, Wen, et al. "Pog: Personalized outfit generation for fashion recommendation at alibaba ifashion." Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019.
Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 (2015).
Paulus, Romain, Caiming Xiong, and Richard Socher. "A deep reinforced model for abstractive summarization." arXiv preprint arXiv:1705.04304 (2017).
See, Abigail, Peter J. Liu, and Christopher D. Manning. "Get to the point: Summarization with pointer-generator networks." arXiv preprint arXiv:1704.04368 (2017).

7. About the author

Ruiyu, Wen Bin, Yang Lin, and Mao Di are all from the Meituan takeaway technical team.

Read more technical articles from the

| in the public account menu bar dialog box, and you can view the collection of technical articles from the Meituan technical team over the years.

| This article is produced by the Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "the content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activity, please send an email to tech@meituan.com to apply for authorization.

Exploration and application of takeaway package

1. Background

2. Business goals and challenges

2.1 Business goals

2.2 Business challenges

3. Package match model

3.1 Package matching model based on map label induction

3.2 Package matching model based on Encoder-Decoder

3.2.1 Package matching model based on LSTM

3.2.2 Optimization of package matching model

3.2.3 Analysis of Package Matching Model

3.3 Real-time package matching model

4. Package quality evaluation

5. Package matching applications and future prospects

6. References

7. About the author

美团技术团队

引用和评论

可信实验白皮书系列04：随机轮转实验

MTGR：美团外卖生成式推荐Scaling Law落地实践

OR算法+ML模型混合推理框架架构演进

从零构建知识图谱：使用大语言模型处理复杂数据的11步实践指南

ICLR&CVPR 2025美团技术团队论文精选

可信实验白皮书系列02：AB实验基础

可信实验白皮书系列01：从0到1的方法论与实践指南