Exploration and practice of public comment search relevance technology

Search relevance is used to measure the degree of relevance between Query and Doc, and is an important part of search engines. This article mainly describes the technical exploration and practice of the public comment search team in relevance calculation. Through the multi-similar matrix model structure, multi-stage training and other methods Improve the effect of the pre-training model on the correlation problem, and solve the performance problem of the online prediction of the interaction-based model, hoping to bring some inspiration or help to the students engaged in related work.

1. Background

Dianping search is one of the core entrances of the Dianping app. Users can search to meet the needs of life service merchants in different scenarios to find stores. The long-term goal of search is to continuously optimize the search experience and improve users’ search satisfaction, which requires us to understand users’ search intentions, accurately measure the degree of correlation between search terms and merchants, display relevant merchants as much as possible, and rank more relevant merchants by forward. Therefore, the correlation calculation between search terms and merchants is an important part of review search.

The relevance problems faced by the public comment search scenario are complex and diverse, and users’ search terms are relatively diverse, such as searching for business names, dishes, addresses, categories, and various complex combinations between them. At the same time, businesses also have various types of information, including Merchant name, address information, group order information, dish information, and various other facilities and label information, etc., make the matching mode between Query and merchants extremely complicated, and it is easy to breed various correlation problems. Specifically, it can be divided into the following types:

Text mismatch : In order to ensure that more merchants are retrieved and exposed, Query may be split into finer-grained words for retrieval, which may cause Query to incorrectly match different fields of merchants, as shown in Figure 1 The user shown in (a) searching for "oyster hot pot" should be looking for hot pot with oysters in the soup base, and "raw oyster" and "hot pot" respectively match two different dishes of the merchant.
Semantic offset : Query matches the merchant literally, but the main intent of the merchant and Query is semantically irrelevant, such as "milk tea" - "brown sugar pearl milk tea bag", as shown in Figure 1(b).
Category offset : Query matches the merchant literally and is semantically related, but the main category does not match the user's needs. For example, when a user searches for "fruit", a KTV merchant that provides "fruit plate" is obviously not related to the user's needs.

(a) 文本误匹配示例

(b) 语义偏移示例

The correlation method based on literal matching cannot effectively deal with the above problems. In order to solve the various irrelevant problems in the search list that do not meet the user's intention, it is necessary to more accurately describe the deep semantic correlation between search words and merchants. Based on the MT-BERT pre-training model trained on the massive business corpus of Meituan, this paper optimizes the deep semantic correlation model between Query and merchants (POI, corresponding to Doc in the general search engine) in the public comment search scenario, and uses The correlation information between Query and POI is used in each link of the search link.

This paper will introduce the review search relevance technology from four aspects: review of the existing technology of search relevance, calculation scheme of the relevance of review search, practical application, summary and prospect. Among them, the review search relevance calculation chapter will introduce how we solve the three main challenges, such as the construction of merchant input information, adapting the model to review search relevance calculation, and the performance optimization of the model online. The application practice chapter will introduce the offline review search relevance model. and online effects.

2. Search for relevant prior art

The purpose of search correlation is to calculate the degree of correlation between the Query and the returned Doc, that is, to determine whether the content in the Doc meets the needs of the user's Query, corresponding to the semantic matching task (Semantic Matching) in NLP. In the search scenario of Dianping, search relevance is to calculate the degree of correlation between the user's Query and the merchant's POI.

Text matching method : The early text matching task only considered the literal matching degree between Query and Doc, and calculated the correlation through Term-based matching features such as TF-IDF and BM25. The online calculation efficiency of literal matching correlation is high, but the generalization performance of term-based keyword matching is poor, lacks semantic and word order information, and cannot deal with the problem of polysemy or polysemy, so missing matches and errors The matching phenomenon is serious.

Traditional semantic matching model : To make up for the shortcomings of literal matching, a semantic matching model is proposed to better understand the semantic correlation between Query and Doc. The traditional semantic matching model mainly includes matching based on implicit space: both Query and Doc are mapped to the vector of the same space, and then the vector distance or similarity is used as the matching score, such as Partial Least Square (PLS) ^[1] ; and Matching based on translation model: Map the Doc to the Query space for matching or calculate the probability of Doc being translated into Query ^[2] .

With the development of deep learning and pre-training models, deep semantic matching models are also widely used in the industry. In terms of implementation methods, deep semantic matching models are divided into representation-based methods and interaction-based methods. As an effective method in the field of natural language processing, pretrained models are also widely used in semantic matching tasks.

(a) 基于表示的多域相关性模型

(b) 基于交互的相关性模型

Representation-based deep semantic matching model : The representation-based method learns the semantic vector representations of Query and Doc respectively, and then calculates the similarity based on the two vectors. Microsoft's DSSM model ^[3] proposes a classic two-tower structure text matching model, that is, two independent networks are used to construct the vector representation of Query and Doc, and the cosine similarity is used to measure the degree of correlation between the two vectors. The NRM ^[4] of Microsoft Bing search is aimed at the Doc representation problem, in addition to the basic Doc title and content, it also considers other multi-source information (each type of information is called a field), such as external links, Query clicked by users, etc. , consider that there are multiple Fields in a Doc, and there are multiple Instances in each Field, and each Instance corresponds to a text, such as a Query word. The model first learns the Instance vector, aggregates the representation vectors of all Instances to obtain a Field representation vector, and aggregates the representation vectors of multiple Fields to obtain the final Doc vector. SentenceBERT ^[5] introduces the pre-trained model BERT into the coding layer of the Query and Doc of the twin towers, uses different pooling methods to obtain the sentence vectors of the twin towers, and interacts with Query and Doc through point multiplication, splicing, etc.

The early model of Dianping's search relevance draws on the ideas of NRM and SentenceBERT, and adopts the representation-based multi-domain relevance model structure shown in Figure 2(a). The representation-based method can calculate the POI vector in advance and coexist In the cache, only the interactive part of the Query vector and the POI vector needs to be calculated online, so the calculation speed is faster when used online.

Interaction-based deep semantic matching model : The interaction-based method does not directly learn the semantic representation vectors of Query and Doc, but allows Query and Doc to interact in the underlying input stage to establish some basic matching signals, and then fuse the basic matching signals. into a matching score. ESIM ^[6] is a classic model widely used in the industry before the introduction of the pre-training model. First, the Query and Doc are encoded to obtain the initial vector, and then the Attention mechanism is used for interactive weighting and splicing with the initial vector, and finally the correlation score is obtained by classification.

When the pre-trained model BERT is introduced for interactive computing, Query and Doc are usually spliced as the input of the BERT inter-sentence relation task, and the final relevance score is obtained through the MLP network ^[7] , as shown in Figure 2(b). CEDR ^[8] splits the Query and Doc vectors after obtaining the Query and Doc vectors in the BERT inter-sentence relation task, and further calculates the cosine similarity matrix of Query and Doc. The Meituan search team ^[9] introduced an interaction-based method into the Meituan search correlation model, introduced merchant category information for pre-training, and introduced entity recognition tasks for multi-task learning. The Meituan Daodian search advertising team ^[10] proposed a method of distilling the interaction-based model to the representation-based model, realizing the virtual interaction of the two-tower model, and increasing the interaction between Query and POI while ensuring performance.

3. Review Search Relevance Calculation

The representation-based model focuses on expressing the global features of POI, and lacks the matching information between online Query and POI. The interaction-based method can make up for the deficiency of the representation-based method, enhance the interaction between Query and POI, and improve the expression ability of the model. The strong performance of the trained model on the text semantic matching task, and the comment search relevance calculation identified an interactive scheme based on the Meituan pre-trained model MT-BERT ^[11] . There are still many challenges when applying interactive BERT based on pre-trained models to the relevance task of review search scenarios:

How to better construct the input information of the POI side model : The construction of the input information of the Doc side model is an important link in the correlation model. In general web search engines, Doc's web page title is extremely important for judging relevance, but in the comment search scenario, POI information has the characteristics of many fields and complex information, and there is no field that can provide information similar to "web page title". , each merchant is expressed by various structured information such as merchant name, category, address, group order, and merchant label. When calculating the relevance score, a large amount of multi-source business information cannot be input into the model, and only using basic information such as business name and category will not be able to achieve satisfactory results due to lack of information. Therefore, how to better construct a rich information content The POI side model input is the first problem we want to solve.
How to optimize the model to better adapt to the calculation of the relevance of comment search: The text information in the public comment search scene is somewhat different from the general pre-training model corpus information. For example, in the general semantic scene, "happy" and "happy" are synonymous. But in the context of comment search, "Happy BBQ" and "Happy BBQ" are two completely different brands. At the same time, the correlation determination logic between Query and POI is not exactly the same as the semantic matching task in general NLP scenarios. The matching mode between Query and POI is very complex. When Query matches different fields of POI, the correlation determination results are also different. For example, when Query "fruit" matches the "fruit shop" merchant category, the correlation is high, but when it hits the "fruit platter" label of KTV, the correlation is weak. Therefore, compared with the general interaction-based BERT inter-sentence semantic matching task, the correlation calculation also needs to pay attention to the specific matching between Query and POI. The main challenge we face is how to optimize the model to adapt to the review and search scenarios, handle complex and diverse correlation judgment logic, and solve various irrelevant problems as much as possible;
How to solve the online performance bottleneck of the pre-trained correlation model : The representation-based model has a faster computing speed but limited expressive ability. The interaction-based model can enhance the interaction between Query and POI to improve the model effect, but there is a big problem when used online. performance bottleneck. Therefore, when the 12-layer BERT-based interaction model is used online, how to ensure the performance of the entire computing link while ensuring the model's computing effect, so that it can run stably and efficiently online is the last hurdle in the online application of correlation computing. .

After continuous exploration and attempts, we constructed a POI text summary suitable for the review search scenario based on the complex multi-source information on the POI side. The model structure was modified according to the characteristics of correlation calculation. Finally, by optimizing the calculation process and introducing cache, the time-consuming of real-time calculation of the model and the overall application link was successfully reduced, and the performance requirements of online real-time calculation of BERT were met.

3.1 How to better construct the POI side model input information

When judging the degree of correlation between Query and POI, there are more than a dozen fields involved in the calculation on the POI side, and some fields have a lot of content (for example, a merchant may have hundreds of recommended dishes), so it is necessary to find a suitable way to extract and Organize POI side information and input it into the correlation model. For general search engines (such as Baidu), or common vertical search engines (such as Taobao), the web page title or product title of the Doc is rich in information, which is usually the main content input by the Doc side model during the correlation determination process.

As shown in Figure 3(a), in a general search engine, the key information of the corresponding website and whether it is related to Query can be seen at a glance through the title of the search result, while in the search result of Figure 3(b), only the merchant The name field cannot obtain sufficient merchant information. It is necessary to combine multiple fields of merchant category (milk tea juice), user-recommended dishes (Oli Oli milk tea), label (Internet celebrity shop), and address (Wulin Square) to determine whether the merchant is related to the merchant. Query the relevance of "Wulin Square Net Red Milk Tea".

(a) 通用搜索引擎搜索结果示例

(b) 大众点评App搜索结果示例

Label extraction is a common way to extract topic information in the industry. Therefore, we first tried to construct the POI side model input method through merchant tags, and extract according to the merchant's comments, basic information, dishes, and the corresponding head search words of the merchant. Generate representative business keywords as business labels. When used online, the extracted merchant label, together with the merchant name and basic category information, are used as the input information on the POI side of the model, and interactive calculation is performed with Query. However, the coverage of merchant information by merchant tags is not comprehensive enough. For example, when a user searches for the dish "egg custard", a Korean restaurant close to the user sells egg custard, but the store's signature dishes, head click words Equals are not related to "egg custard", resulting in a low correlation between the label words extracted by the store and "egg custard", so the model will judge the store as irrelevant, which will harm the user experience.

In order to obtain the most comprehensive POI representation, one solution is to directly splicing all the fields of the merchant into the model input without extracting keywords, but this method will seriously affect the online performance due to the long model input length, and a lot of redundant Additional information can also affect model performance.

In order to construct more informative POI side information as model input, we propose a method of POI matching field abstract extraction , which is to extract the matching field text of POI in real time based on the matching situation of online Query, and construct the matching field abstract as the POI side model. Enter information. The POI matching field summary extraction process is shown in Figure 4. Based on some text similarity features, we extract the text fields that are most relevant and informative to Query, and fuse the field type information to construct a matching field summary. When used online, input the extracted POI matching field summary, business name and basic category information as the POI side model.

图4 POI匹配字段摘要抽取流程

After determining the input information of the POI side model, we use the BERT inter-sentence relationship task, first use MT-BERT to encode the matching field summary information on the Query side and the POI side, and then use the pooled sentence vector to calculate the correlation score. After using the POI matching field summary scheme to construct the input information of the POI side model, and with the sample iteration, the effect of the model has been greatly improved compared with the label-based method.

3.2 How to optimize the model to better fit the review search relevance calculation

Making the model better adapt to the review search relevance calculation task contains two meanings: there is a certain difference in the distribution of the text information in the public review search scenario and the corpus used by the MT-BERT pre-training model; the sentence of the pre-training model is different. The inter-relationship task is also slightly different from the correlation task of Query and POI, and the model structure needs to be transformed. After continuous exploration, we adopted a two-stage training scheme based on domain data , combined with training sample construction, to make the pre-training model more suitable for the relevance task of review and search scenarios; and proposed a deep interactive correlation model based on multi-similarity The interaction between Query and POI improves the model's ability to express complex Query and POI information, and optimizes the effect of correlation calculation.

3.2.1 Two-stage training based on domain data

In order to effectively utilize the massive user click data and make the pre-trained model MT-BERT more suitable for the review search relevance task, we draw on the idea of Baidu search relevance ^[12] , introduce a multi-stage training method, and use user clicks and negative sampling data. The first stage of domain-adaptive pre-training (Continual Domain-Adaptive Pre-training) is performed, and the second stage of training (Fine-Tune) is performed with manually labeled data. The model structure is shown in Figure 5 below:

图5 基于点击及人工标注数据的两阶段训练模型结构

The first stage of training based on click data

The direct reason for introducing click data as the first-stage training task is that there are some unique problems in the review search scenario. For example, the words "happy" and "happy" are almost completely synonymous in general scenarios, but in In the comment search scenario, "Happy BBQ" and "Happy BBQ" are two completely different brand merchants, so the introduction of click data can help the model learn some unique knowledge in the search scenario. However, using the click sample directly for correlation judgment will have a lot of noise, because the user clicks on a certain business may be a mistaken click caused by a higher ranking, and does not click on a business may simply be because the business is far away. Rather than because of the correlation problem, we introduce a variety of features and rules to improve the accuracy of automatic labeling of training samples.

When constructing the samples, the candidate samples are screened according to the characteristics of whether the click is clicked, the click rank, the distance between the largest click merchant and the user, etc., and the Query-POI pair whose exposure click rate is greater than a certain threshold is regarded as a positive example, and different types of Merchants adjust different thresholds. In the construction of negative examples, the Skip-Above sampling strategy takes the merchants that are located before the clicked merchants and whose click-through rate is less than the threshold value as negative samples. In addition, the method of random negative sampling can supplement simple negative examples for training samples, but some noise data will also be introduced when random negative sampling is considered, so we use artificially designed rules to denoise the training data: when the category intent of Query is the same as When the category system of the POI is more consistent or highly matched with the POI name, it will be removed from the negative sample.

Second-stage training based on manually labeled data

After the first stage of training, considering the inability to completely remove the noise in the click data and the characteristics of related tasks, it is necessary to introduce the second stage of training based on manually annotated samples to correct the model. In addition to randomly sampling a part of the data for manual labeling, in order to improve the ability of the model as much as possible, we produce a large number of high-value samples through difficult example mining and comparison sample enhancement for manual labeling. details as follows:

1) Difficult example mining

Mining of specific types of samples: By designing a method to characterize BadCase types based on the characteristics of Query and POI and their matching, automatically select samples of specific BadCase types from the candidate data set for bid submission.
Users have clicked but the old online model is judged to be irrelevant: This method can mine difficult cases where the current online model prediction is wrong and users whose semantics are close are indistinguishable.
Edge sampling: Mining samples with high uncertainty through edge sampling, such as extracting samples with model prediction scores near the threshold.
Samples that are difficult for model or manual identification: use the current model to predict the training set, and resubmit the samples whose prediction results of the model are inconsistent with the labeled labels, and the sample types with conflicting manual labels.

2) Contrastive sample enhancement : Drawing on the idea of contrastive learning, generate comparative samples for some highly matched samples for data enhancement, and perform manual labeling to ensure the accuracy of sample labels. By comparing the differences between samples, the model can focus on really useful information, and at the same time improve the generalization ability of synonyms, so as to get better results.

For the correlation problem of cross-dish matching where dish words are more likely to occur (for example, a search for "foie gras burger" matches a business that sells "beef burger" and "foie gras sushi"), each sub-component of the dish and the recommended dish field are used respectively. Carry out matching, produce a large number of comparison samples, and strengthen the model's ability to identify cross-dish matching problems.
Aiming at the problem that the dish word hits the recommended dish prefix, by transforming the situation where it matches the recommended dish completely (searching for "durian cake" matches the business that sells "durian cake"), only the prefix in the search word is retained, and the matching recommended dish prefix is constructed. The comparison samples of (searching for "durian" and businesses selling "durian cake"), so that the model can better distinguish the situation when matching the recommended dish prefix.

图6 对比样本增强示例

Taking the correlation problem of cross-dish matching as an example, as shown in Figure 6 above, it is also the case where the Query is split and matched with multiple recommended dishes of the merchant. The query "Durian Cake" and the recommended dishes "Durian Melaleuca, Black Forest" Cake" is related, but the Query "foie gras burger" is not related to "sizzling foie gras, cheese beef burger". In order to enhance the model's ability to recognize such cases with highly matching but opposite results, we constructed " The two comparison samples of "Durian Cake" and "Durian Melaleuca", "Foie Gras Burger" and "Sizzling Foie Gras" have removed the information that matches Query in text but is not helpful for model judgment, allowing the model to learn the real The key information to determine whether it is relevant, and improve the generalization ability of the model for synonyms such as "cake" and "melaleuca". Similarly, other types of hard examples can also use this sample enhancement method to improve the effect.

3.2.2 Deep Interaction Model Based on Multiple Similarity Matrix

The relationship between BERT sentences is a general NLP task for judging the relationship between two sentences, and the correlation task is to calculate the degree of correlation between Query and POI. In the calculation process, the inter-sentence relation task not only calculates the interaction between Query and POI, but also calculates the interaction between Query and POI, while correlation calculation pays more attention to the interaction between Query and POI. In addition, during the model iteration process, we found that some types of difficult BadCases have higher requirements on the expressive ability of the model, such as types whose texts are highly matched but irrelevant. Therefore, in order to further improve the computational effect of the model on complex Query and POI related tasks, we reformed the BERT sentence relationship task in the second-stage training, and proposed a deep interaction model based on a multi-similarity matrix. The multi-similarity matrix is used to deeply interact Query and POI, and the indicator matrix is introduced to better solve the difficult BadCase problem. The model structure is shown in Figure 7 below:

图7 基于多相似矩阵的深度交互相关性模型

Inspired by CEDR ^[8] , we split the MT-BERT-encoded Query and POI vectors to explicitly calculate the deep interaction relationship between the two parts, and split Query and POI for deep interaction. On the one hand, it can be specially used to learn the correlation between Query and POI, on the other hand, the increased number of parameters can improve the fitting ability of the model.

Referring to the MatchPyramid ^[13] model, the deep cross-correlation model calculates four different Query-Doc similarity matrices and fuses them, including Indicator, Dot-product, cosine distance, and Euclidean distance, and performs Attention weighting with the output of the POI part . The Indicator matrix is used to describe whether the Token of Query and POI are consistent. The calculation method is as follows:

The Indicator matrix can better describe the matching relationship between Query and POI. The introduction of this matrix mainly considers a difficulty in judging the degree of correlation between Query and POI: sometimes even if the text is highly matched, the two are not related. The interaction-based BERT model structure makes it easier to judge Query and POI with a high degree of text matching as related, but in the comment search scenario, some difficult cases may not be the case. For example, "bean juice" and "mung bean juice", although highly matched, are not related. Although "Maokong" and "Cat's Sky City" are split and matched, they are related because the former is an abbreviation of the latter. Therefore, different text matching situations are directly input to the model through the Indicator matrix, so that the model can explicitly receive text matching situations such as "contain" and "disconnect matching", which not only helps the model to improve the ability to discriminate difficult cases, but also It will affect the performance of most normal cases.

The deep interaction correlation model based on multi-similarity matrix splits Query and POI and calculates the similarity matrix, which is equivalent to letting the model explicitly interact with Query and POI, making the model more suitable for the correlation task. Multiple similarity matrices increase the model’s ability to characterize the calculation of Query and POI correlation, while the Indicator matrix is specially designed for complex text matching in correlation tasks, making the model more accurate in judging irrelevant results.

3.3 How to solve the online performance bottleneck of the pre-trained correlation model

When the correlation calculation is deployed online, the existing scheme usually adopts the double-column structure of knowledge distillation ^{[10, 14]} to ensure the online calculation efficiency, but this processing method is more or less detrimental to the effect of the model . Comment search correlation calculation In order to ensure the effect of the model, a 12-layer BERT pre-training correlation model based on interaction is used online, and hundreds of POIs under each Query need to be predicted by the 12-layer BERT model. In order to ensure the efficiency of online computing, from the perspective of model real-time computing process and application link, we optimize by introducing caching mechanism, model prediction acceleration, introducing pre-golden rule layer, parallelizing correlation calculation and core sorting, etc. The performance bottleneck when the correlation model is deployed online makes the 12-layer interaction-based BERT correlation model run stably and efficiently online, ensuring that it can support the correlation calculation between hundreds of merchants and queries.

3.3.1 Performance optimization of correlation model calculation process

图8 相关性模型线上计算流程图

The online calculation process of the comment search relevance model is shown in Figure 8. The performance of the real-time calculation of the model is optimized through the caching mechanism and the acceleration of TF-Serving model prediction.

In order to effectively utilize computing resources, a caching mechanism is introduced for online deployment of the model, and the relevance scores of high-frequency queries are written into the cache. In subsequent calls, the cache will be read first. If the cache is hit, the score will be output directly. If the cache is not hit, the online real-time calculation will be performed. The caching mechanism greatly saves computing resources and effectively relieves the performance pressure of online computing.

For the query that misses the cache, it is processed as the input of the query side model, and the matching field summary of each POI is obtained through the process described in Figure 4, and processed into the POI side model input format, and then the online correlation model is called to output the correlation. point. The correlation model is deployed on TF-Serving. During model prediction, the model optimization tool ART framework of the Meituan machine learning platform (improved based on Faster-Transformer ^[15] ) is used for acceleration, which greatly improves the model while ensuring accuracy. forecast speed.

3.3.2 Application Link Performance Optimization

图9 相关性模型在点评搜索链路中的应用

The application of the correlation model in the search link is shown in Figure 9 above. The performance in the overall search link is optimized by introducing the pre-golden rule and parallelizing the correlation calculation with the core ranking layer.

In order to further accelerate the correlation call link, we introduced the pre-golden rule to divert the query, and directly output the correlation score for some queries through the rules, so as to relieve the calculation pressure of the model. In the golden rule layer, the text matching feature is used to judge Query and POI. For example, if the search term is exactly the same as the business name, the "related" judgment is directly output through the golden rule layer without calculating the correlation score through the correlation model.

In the overall calculation link, the correlation calculation process and the core sorting layer perform concurrent operations to ensure that the correlation calculation has no effect on the overall time-consuming of the search link. At the application layer, correlation calculation is used in multiple links such as recall and ranking of search links. In order to reduce the proportion of irrelevant merchants on the first screen of the search list, we introduce the relevant points into the LTR multi-objective fusion sorting to sort the list pages, and adopt the multi-channel recall fusion strategy, using the results of the correlation model, only the supplementary recall channels are used. The related merchants in are merged into the list.

4. Practical application

4.1 Offline effect

In order to accurately reflect the offline effect of model iteration, we constructed a batch of benchmarks through multiple rounds of manual annotation. Considering that the main goal of current online actual use is to reduce the BadCase index, that is, to accurately identify irrelevant merchants, we use a negative example. The precision rate, recall rate, and F1 value are used as measurement indicators. The benefits brought by two-stage training, sample construction and model iteration are shown in Table 1 below:

表1 点评搜索相关性模型迭代离线指标

The initial method (Base) adopts the BERT sentence pair classification task of Query splicing POI matching field summary information. The query side model input adopts the original Query input by the user, and the POI side adopts the business name, business category and matching field summary text splicing method. After the introduction of the two-stage training based on click data, the F1 index of the negative case is improved by 1.84% compared with the Base method. By introducing the comparative samples and the difficult case samples, the training samples are continuously iterated and matched with the model input structure of the second stage. Compared with the Base method, the negative case F1 is compared with the Base method Significantly increased by 10.35%. After introducing the deep interaction method based on multi-similarity matrix, the negative example F1 increased by 11.14% compared with Base. The overall index of the model on Benchmark also reached a high value of AUC of 0.96 and F1 of 0.97.

4.2 Online effects

In order to effectively measure the user's search satisfaction, Comment Search samples the actual online traffic every day and manually marks it, and uses the BadCase rate on the top screen of the list page as the core indicator for evaluating the effect of the correlation model. After the correlation model was launched, the monthly average BadCase rate indicator of comment search dropped significantly by 2.9pp (Percentage Point, absolute percentage point) compared to before the launch, and the BadCase rate indicator stabilized near the low point in the following weeks. At the same time, the search list The page's NDCG indicator steadily increased by 2pp. It can be seen that the correlation model can effectively identify irrelevant merchants, significantly reducing the proportion of irrelevant problems on the first screen of search, thereby improving the user's search experience.

Figure 10 below lists some examples of online BadCase solutions. The subtitle is the Query corresponding to the example. The left side is the experimental group with the correlation model applied, and the right side is the control group. In Figure 10(a), when the search term is "Pei Jie", the correlation model judges the merchant "Pei Jie Famous Product" whose core word contains "Pei Jie" as relevant, and assigns the high value that the user may want to find but mistyped. The quality target merchant "Sister Pei's old hotpot" is also judged to be relevant. At the same time, by introducing the address field identifier, the merchants located next to "Sister Pei" in the address are judged to be irrelevant; in Figure 10(b), the user Query "Grapefruit Day" "Material Self-Service" wants to find a Japanese food self-service shop named "Grapefruit", and the correlation model matches the split words to the Japanese food self-service shop "Zhuruo Tuna" that sells grapefruit-related products. It is correctly judged as irrelevant and sorted by After that, it is guaranteed that the merchants that are displayed in the front are more in line with the main needs of users.

(a) 佩姐

(b) 柚子日料自助

5. Summary and Outlook

This paper introduces the technical solution and practical application of the public comment search relevance model. In order to better construct the input information of the merchant side model, we introduced a method of extracting the summary text of the merchant matching field in real time to construct the merchant representation as the model input; in order to optimize the model to better adapt to the review search correlation calculation, a two-stage method was used. The training method adopts a two-stage training scheme based on click and manual annotation data to effectively utilize the massive user click data of public comments. According to the characteristics of correlation calculation, a deep interaction structure based on multi-similarity matrix is proposed to further improve the correlation model. In order to alleviate the online calculation pressure of the correlation model, a caching mechanism and TF-Serving prediction acceleration are introduced during online deployment, the golden rule layer is introduced to divert the query, and the correlation calculation and the core sorting layer are parallelized, thus satisfying the Performance requirements for online real-time computing of BERT. By applying the correlation model to each link of the search link, the proportion of irrelevant problems is significantly reduced, and the user's search experience is effectively improved.

At present, the review search correlation model still has room for improvement in model performance and online application. In terms of model structure, we will explore the introduction of more domain prior knowledge, such as multi-task learning to identify entity types in Query, integration of The input of external knowledge optimization model, etc.; in terms of practical application, it will be further refined into more gears to meet users' needs for refined store search. We will also try to apply the ability of relevance to non-merchant modules to optimize the search experience across the search list.

6. References

[1] Rosipal R, Krämer N. Overview and recent advances in partial least squares[C]//International Statistical and Optimization Perspectives Workshop "Subspace, Latent Structure and Feature Selection". Springer, Berlin, Heidelberg, 2005: 34-51.
[2] Gao J, He X, Nie J Y. Clickthrough-based translation models for web search: from word models to phrase models[C]//Proceedings of the 19th ACM international conference on Information and knowledge management. 2010: 1139- 1148.
[3] Huang PS, He X, Gao J, et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2013: 2333-2338.
[4] Zamani, H., Mitra, B., Song, X., Craswell, N., & Tiwary, S. (2018, February). Neural ranking models with multiple document fields. In Proceedings of the eleventh ACM international conference on web search and data mining (WSDM) (pp. 700-708).
[5] Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using siamese bert-networks[J]. arXiv preprint arXiv:1908.10084, 2019.
[6] Chen Q, Zhu X, Ling ZH, et al. Enhanced LSTM for Natural Language Inference[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017: 1657- 1668.
[7] Nogueira R, Yang W, Cho K, et al. Multi-stage document ranking with bert[J]. arXiv preprint arXiv:1910.14424, 2019.
[8] MacAvaney S, Yates A, Cohan A, et al. CEDR: Contextualized embeddings for document ranking[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 1101-1104.
[9] Li Yong, Jia Hao, etc. Exploration and practice of BERT in the core ranking of Meituan search.
[10] Shao Wen, Yang Yang, etc. The application of pre-training technology in Meituan-to-store search advertising.
[11] Yang Yang, Jia Hao, etc. The exploration and practice of Meituan BERT.
[12] Zou L, Zhang S, Cai H, et al. Pre-trained language model based ranking in Baidu search[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021: 4014-4022.
[13] Pang L, Lan Y, Guo J, et al. Text matching as image recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2016, 30(1).
[14] Relevance exploration of deep semantic search of Alibaba Entertainment. https://mp.weixin.qq.com/s/1aNd3dxwjCKUJACSq1uF-Q.
[15] Faster Transformer. https://github.com/NVIDIA/DeepLearningExamples/tree/master/FasterTransformer.

7. The author of this article

Xiaoya, Shen Yuan, Zhu Di, Tang Biao, Zhang Gong, etc. are all from the Search Technology Center of Meituan/Dianping Division.

This article is co-authored.

Job Offers

Meituan/Dianping Division - Search Technology Center, is committed to building a first-class search system and search experience to meet the diverse search needs of Dianping users and support the search needs of various businesses on the Dianping app. Interested students are welcome to send their resumes to: edp.itu.zhaopin@meituan.com .

Read more collections of technical articles from the Meituan technical team

| Reply keywords such as [2021 stock], [2020 stock], [2019 stock], [2018 stock], [2017 stock] in the public account menu bar dialog box, you can view the collection of technical articles by the Meituan technical team over the years.

| This article is produced by Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "The content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activities, please send an email to tech@meituan.com to apply for authorization.

Exploration and practice of public comment search relevance technology

1. Background

2. Search for relevant prior art

3. Review Search Relevance Calculation

3.1 How to better construct the POI side model input information

3.2 How to optimize the model to better fit the review search relevance calculation

3.2.1 Two-stage training based on domain data

3.2.2 Deep Interaction Model Based on Multiple Similarity Matrix

3.3 How to solve the online performance bottleneck of the pre-trained correlation model

3.3.1 Performance optimization of correlation model calculation process

3.3.2 Application Link Performance Optimization

4. Practical application

4.1 Offline effect

4.2 Online effects

5. Summary and Outlook

6. References

7. The author of this article

Job Offers

美团技术团队

引用和评论

可信实验白皮书系列04：随机轮转实验

一文掌握 MCP 上下文协议：从理论到实践

LRU算法，你别跑，我就要吃透你

AI Agent爆火后，MCP协议为什么如此重要！

2025年医疗大模型各医疗场景赋能实践研究报告130+份汇总解读|附PDF下载

AdventureX 2025 正式启动：五天四夜，120小时极限创造！一起在杭州点燃青年创新之火！

MCP 协议为何不如你想象的安全？从技术专家视角解读