Knowledge-based Question Answering (Knowledge-based Question Answering, KBQA) refers to a given natural language question, through the semantic understanding and analysis of the question, and then using the knowledge base to query and reason to get the answer. Meituan has a large number of consulting issues in multiple scenarios of the pre-sales, in-sales, and after-sales full links of platform services. Based on the question-and-answer system, we help businesses improve the efficiency of answering user questions and solve user questions faster by means of automatic intelligent responses or recommended responses. This article combines the specific practice of KBQA in the Meituan scene and the paper published on EMNLP 2021 to introduce the overall design of the KBQA system, breakthroughs in difficulties, and exploration of end-to-end question and answer. I hope it can be helpful to students engaged in related research or Inspired.
1 Background and challenges
Question Answering System (QA) is a direction that has attracted much attention and has broad development prospects in the field of artificial intelligence and natural language processing. It is an advanced form of information retrieval system that can answer with accurate and concise natural language. Questions asked by users in natural language. The main reason for the rise of this research is people's demand for fast and accurate access to information, so it is widely used in various business scenarios in the industry. In the multiple scenarios of the pre-sales, in-sales, and after-sales full links of Meituan's platform services, users have a lot of questions to consult with merchants. Therefore, based on the question-and-answer system, we use automatic intelligent responses or recommended responses to help businesses improve the efficiency of answering user questions and solve user problems faster.
For different problems, Meituan’s intelligent question answering system includes multi-channel solutions:
- PairQA : Using information retrieval technology, the answer to the question that is closest to the current question is returned from the questions already answered in the community.
- DocQA : Based on reading comprehension technology, extract answer fragments from unstructured business information and user reviews.
- KBQA (Knowledge-based Question Answering) : Based on the knowledge graph question answering technology, the answer is inferred from the structured information of merchants and commodities.
This article mainly shares the practice and exploration in the implementation of KBQA technology.
Among the user’s questions, there are a lot of information consultation about basic information and policies related to commodities, businesses, scenic spots, hotels, etc. Based on KBQA technology, the information in the product and business detail pages can be effectively used to solve such information consultation problems. . After the user enters the question, the KBQA system analyzes and understands the user's query based on the machine learning algorithm, and queries and infers the structured information in the knowledge base, and finally returns the accurate answer to the user. Compared with PairQA and DocQA, KBQA's answer sources are mostly merchant data, which is more credible. At the same time, it can perform multi-hop query and constraint filtering to better handle complex online problems.
In actual application, the KBQA system faces many challenges, such as:
- : There are many business scenarios on the Meituan platform, including hotels, tourism, food, and more than ten types of life service businesses. There are differences in user intentions in different scenarios, such as "how much is the breakfast". Gourmet businesses need to answer the per capita price, while for hotel businesses, they need to answer the price details of the restaurants in the hotel.
- Constrained question : The user’s question usually contains many conditions, such as "Does the Forbidden City students have discounts?" We need to filter the preferential policies associated with the Forbidden City instead of answering all the preferential policies to the user.
- Multi-hop problem : The user’s problem involves a path composed of multiple nodes in the knowledge graph, such as "What time does the swimming pool of XX hotel open", we need to find the hotel, swimming pool, and business hours in the graph.
The following will describe in detail how we designed a high-accuracy, low-latency KBQA system to process scenes, context, and other information, accurately understand users, capture user intentions, and respond to the above challenges.
2 Solution
The input of the KBQA system is user Query, and the output is the answer. The overall architecture is shown in Figure 1 below. The top layer is the application layer, including multiple entries such as dialogue and search. After obtaining the user's Query, the KBQA online service calculates the result through the Query understanding and recall sorting module, and finally returns the answer text. In addition to online services, the construction and storage of knowledge graphs are also very important. Users will not only care about the basic information of the merchant, but also ask about opinions and facility information, such as whether the scenic spots are fun, whether the hotel parking is convenient, and so on. In response to the above-mentioned problem of no official supply, we have constructed a set of information and opinion extraction process, which can extract valuable information from the unstructured introduction of merchants and UGC comments, so as to improve the satisfaction of user consultation. We will follow Give a detailed introduction.
For the KBQA model, there are currently two mainstream solutions, as shown in Figure 2 below:
- Semantic Parsing-based : Perform in-depth syntactic analysis of questions, and combine the analysis results into executable logical expressions (such as SparQL), and query the answers directly from the graph database.
- Based on Information Retrieval : First parse out the main entity of the question, and then query multiple triples associated with the main entity from the KG to form a sub-graph path (also called a multi-hop sub-graph), and then respectively Encode and sort questions and sub-picture paths, and return the path with the highest score as the answer.
The method based on semantic analysis is more interpretable, but this method requires a large number of natural language logic expressions to be labeled, while the information extraction method is more inclined to end-to-end solutions, and performs better in complex problems and fewer samples. However, if the subgraph is too large, the calculation speed will be significantly reduced.
Therefore, taking into account the advantages of the two, we adopt a plan that combines the two. As shown in Figure 3 below, the overall process is divided into four major steps. Take "Are there student tickets for the Forbidden City weekend?" as an example:
- Query comprehension : Input the original Query, and output the Query comprehension result. Among them, the Query will be syntactically analyzed, and it will be recognized that the main entity of the user's query is "Forbidden City", the business field is "Tourism", and the question type is One-hop.
- Relation Recognition : Input Query, domain, syntactic analysis result, candidate relationship, and output the score of each candidate. In this module, we use dependency analysis to strengthen the main problem of Query, recall related relationships in the tourism field, perform matching and sorting, and identify the relationship in Query as "tickets".
- subgraph recall : Input the main entity and relationship parsed in the first two modules, and output the subgraph (multiple triples) in the map. For the above example, all subgraphs whose main entity is "Forbidden City" and whose relationship is "Tickets" under the travel business data will be recalled.
- Answer ranking : Input Query and sub-picture candidates, and output the scores of sub-picture candidates. If Top1 meets a certain threshold, output as the answer. Based on the result of syntactic analysis, the constraint condition is identified as "student ticket", and finally the Query-Answer pair is sorted based on this condition, and the answers that meet the requirements are output.
The following will introduce our construction and exploration of key modules.
2.1 Query understanding
Query understanding is the first core module of KBQA, responsible for fine-grained semantic understanding of the various components of sentences. The two most important modules are:
- Entity recognition and entity linking, output meaningful business-related entities and types in question sentences, such as business name, project, facility, crowd, time, etc.
- Dependency analysis: take the word segmentation and part-of-speech recognition results as input to identify the main entity of the question, the questioned information, constraints, etc.
Entity recognition is an important step of syntactic analysis. We first identify entities based on the sequence labeling model, and then link to the nodes in the database. For this module, we mainly made the following optimizations:
- In order to improve the recognition ability of OOV (Out-of-Vocabulary) words, we have carried out knowledge injection into the sequence labeling model of entity recognition, and used the known prior knowledge to assist the discovery of new knowledge.
- Taking into account the problem of entity nesting, our entity recognition module will output both coarse-grained and fine-grained results to ensure that the subsequent modules have a full understanding of Query.
- In the long Query scenario of question and answer, use context information to link entities to obtain node id.
Finally, the module will output the types of each important component in the sentence, as shown in Figure 4 below:
Dependency analysis is a kind of syntactic analysis. Its purpose is to identify the asymmetric dominance relationship between words and words in a sentence. It is represented by a directed arc in the output result. ). For the KBQA task, we have defined five relationships, as shown in Figure 5 below:
There are two main solutions for dependency analysis: Transition-based and Graph-based. The transition-based dependency analysis models the construction of the dependency syntax tree as a series of operations. The model predicts the actions of each step (shift, left_arc, right_arc), and continuously pushes unprocessed nodes into the stack and assigns relationships to them, finally forming a syntax tree. The graph-based method is dedicated to finding a maximum spanning tree in the graph, which is the global optimal solution of the overall dependency of the sentence. Considering that the graph-based method searches the whole world with higher accuracy, we adopt the more classic "Deep Biaffine Attention for Neural Dependency Parsing" model. Its structure is shown in Figure 6 below:
The model first uses BiLSTM to encode the splicing vectors of words and parts of speech, and then uses two MLP headers to encode h (arc-head) and h (arc-dep) vectors to remove redundant information. Finally, the vectors at each time are spliced to obtain H(arc-head) and H(arc-dep), and a unit vector is spliced on H(arc-dep), and the intermediate matrix U(arc) is added for affine transformation, Get the point integral matrix S(arc) of dep and head, and find the head that each word depends on.
With the results of dependency analysis, we can better identify relationships and complex issues. The specific feature usage methods will be introduced below.
2.2 Relation recognition
Relation recognition is another core module in KBQA. The purpose is to identify the relationship (Predicate) that the user Query asks, so as to jointly determine the unique subgraph with the main entity (Subject) and get the answer (Object).
In practice, considering that the number of edge relationships in the graph will continue to increase, we model relationship recognition as a text matching task, input user Query, Query features and candidate relationships, and output relationship matching scores. In order to solve the multi-domain problem mentioned at the beginning, we added domain information to the input features, thus storing certain domain-related knowledge in the domain representation, so that the model can better judge. At the same time, in order to improve the understanding of complex Query, we have also incorporated syntactic information into the input, so that the model can better understand the constraints and multi-hop problems.
With the emergence of large-scale pre-training language models, large models such as BERT have achieved SOTA results on matching tasks. Generally, the methods commonly used in the industry are mainly classified into the following two:
- representation type : also known as the "double tower model", its main idea is to convert two paragraphs of text into a semantic vector, and then calculate the similarity of the two vectors in the vector space, focusing more on the construction of the semantic vector representation layer.
- Interactive : This method focuses on learning the alignment between phrases in sentences, and learning to compare their alignment relationships, and finally aggregate the aligned and integrated information to the prediction layer. Since the interactive model can use the alignment information before the text, the accuracy is higher and the effect is better, so in this project we use the interactive model to solve the matching problem.
In order to make full use of BERT's semantic modeling capabilities, while considering the online delay requirements of actual business, we have made the following three optimizations in reasoning acceleration, data enhancement, and knowledge enhancement:
- hierarchical pruning : Each layer of BERT will learn different knowledge, near the input side will learn more general syntactic knowledge, and near the output will learn more task-related knowledge, so we refer to DistillBERT and adopt Skip equal intervals The level of pruning in the style of pruning is only reserved for the 3 layers that have the best effect on the task. Compared with the pruning of the first three layers, the F1-score is improved by 4%. At the same time, the experiment found that the effect of different pruning methods can reach 7%.
- Domain task data pre-fine adjustment : After pruning, due to the limited training data, the effect of the 3-layer model has been greatly reduced. Through the understanding of the business, we found that Meituan’s "Ask Everyone" module data is very consistent with the online data, and the data was cleaned, the question title and related questions were taken as positive examples, and the literal similarity was randomly selected 0.5- Sentences between 0.8 are used as negative examples, and a large number of weakly supervised text pairs are generated. After pre-fine tuning, the accuracy of the 3-layer model is improved by more than 4%, and even exceeds the effect of the 12-layer model.
- Knowledge enhancement : Due to the diverse expressions of users, it is necessary to in-depth semantics and combine grammatical information to accurately identify users' intentions. In order to further improve the effect and solve some cases at the same time, we added domain and syntactic information to the input, and incorporated explicit prior knowledge into BERT. Under the action of the attention mechanism, combined with the syntactic dependency tree structure to accurately model words We verify the dependence relationship with words on business data and five large public data sets. Compared with the BERT Base model, the accuracy rate is improved by an average of 1.5%.
After the above series of iterations, the speed and accuracy of the model have been greatly improved.
2.3 Understanding of complex issues
In a real scenario, most questions can be classified into the following four categories (green is the answer node), as shown in Figure 8 below:
The number of hops for the question is determined by the number of entities. Single-hop questions usually only involve the basic information of the merchant, such as the merchant’s address, telephone number, business hours, policy, etc., and can be answered by a set of SPOs (triplets) in the knowledge graph ; The two-hop problem is mainly to ask questions about certain facilities and services in the merchant, such as the hotel gym on which floor, what time does the breakfast start, and the price of the airport shuttle service, etc. You need to find the merchant -> main entity (facilities/ Service/goods, etc.) path, and then find the basic information triples of the main entity, that is, the two triples of SPX and XPO. Constraint questions refer to constraints on the main entity or answer node, usually time, crowd or attributive.
Here are some improvements we have made for different types of complex problems.
2.3.1 Problem with constraints
Through the mining of online logs, we divide the constraints into the following categories, as shown in Figure 9 below:
The answer to the constrained question involves two key steps: Constraint Recognition and Answer Sorting .
Through the dependency analysis module in the KBQA system, we can identify the constraints imposed by the user on the entity or relationship information, but there are many constraints, and the constraint types of different nodes are different, so we are constructing database query SQL First, ensure the recall rate, try to recall all the candidate nodes under the entity and relationship path, and score and sort the answer constraints in the final sorting module.
In order to improve efficiency, we first optimized the knowledge storage layer. In terms of storage of composite attribute values, Freebase proposes the Compound Value Type (CVT) type, as shown in Figure 10 below, to solve the storage and query problems of this type of composite structured data. For example, the business hours of Happy Valley are different for different sessions. This composite attribute value can be taken over in the form of CVT.
However, the CVT storage method increases query complexity and consumes database storage. Take the picture "Happy Valley Business Hours CVT" as an example:
- This information is usually stored in the form of paired CVTs, and a CVT involves 3 triples storage.
- For the question of "What time does the summer night market in Happy Valley start", when querying, there are four jumps involved, namely, <entity -> business hours CVT>, <business hours CVT -> season = summer>, <business hours CVT -> Time period=night show>,<business hours CVT -> time>. For the industry's fast query graph database such as Nebula, the general query time of more than three hops is about tens of milliseconds, which takes a long time in actual online use.
- Once the attribute name and attribute value have different but agreed expressions, it is necessary to do one more synonym merging, so as to ensure that the query can be matched and there is no recall loss.
In order to solve the above problems, we adopt the structured form of Key-Value to carry attribute information. Among them, Key is the constraint information of the answer, such as crowd, time and other information that can be used as the constraint of the attribute value, which can be placed in the Key, and Value is the answer to be checked. For the above example, we will form the Key of all possible constraint dimension information, as shown in Figure 11 below:
After that, in order to solve the problem of too many constraint values, in the actual query process, when a complete match is not found, we use the Key of the attribute value and the constraint information in the question to match and calculate the correlation, the highest correlation Key, the corresponding Value is the answer. Therefore, the key representation method can be in various forms:
- String form : Use the text similarity method to calculate and constrain the relevance of the text.
- Text Embedding for the text form of the Key, it is calculated similarly to the constraint information, and the effect is better than the string form when the training data is reasonable.
- Other Embedding algorithms : such as Graph Embedding for virtual nodes, joint training of constrained text and corresponding virtual nodes, etc.
This form of storage is equivalent to storing only one triple, that is, <entity -> business hours KV>, and the query process is compressed into one jump + text matching sorting. Semantic model-based text matching can solve the problem of incomplete matching caused by different text expressions to a certain extent. After optimizing the semantic model, the matching time can be compressed as much as possible, reaching more than ten milliseconds.
After optimizing complex conditions, the entities, relationships, and constraints are first identified through the pre-module to form the constraint text, and then matched with the Key value candidates of the current recall subgraph to obtain the final answer.
2.3.2 Multi-hop problem
Multi-hop questions are a type of question that is naturally suitable for KBQA. When users ask about the information about the facilities, services, goods and other entities in the merchant, we only need to find the merchant in the map, then find the entity under the merchant, and then find the following Basic Information. If you use the FAQ question and answer method, you need to set a standard question for each complex question, such as "the location of the gym", "the location of the swimming pool" and so on. In KBQA, we can compress this kind of questions very well. No matter what the location of the entity is asked, it is the side relationship of "location", but the starting entity is different.
In the KBQA system, we first rely on the dependency analysis module to identify the dependency relationship between sentence components, and then use the relationship recognition module to determine the number of hops and the relationship in which the sentence is asked. The specific process is shown in Figure 12:
With the help of the type of entity recognition, we can replace the important components in the sentence, thereby compressing the number of candidate relationship configurations and improving the accuracy of relationship recognition. After fully understanding the sentence, the system will query the subgraph based on the main entity, relationship, and hop count, and input it to the answer sorting module for more fine-grained constraint recognition and scoring.
2.4 Opinion Q&A
In addition to the above basic information query query, users will also ask opinion questions, such as "Is it convenient for Disney to park?", "Is XX Hotel soundproof?" and so on. For subjective opinion questions, you can find out the corresponding comments from user comments based on FAQ or reading comprehension technology, but this method often only gives one or a few comments, which may be too subjective to summarize the group’s opinions . Therefore, we have proposed a question-and-answer program of opinions, giving the number of positive and negative supporters of a point, and considering the interpretability, we will also give evidence of comments for most points. The actual display in the App is shown in Figure 13 below:
In order to automatically mine user opinions in batches, we disassembled a two-step solution: opinion discovery and Evidence mining, as shown in Figure 14 below.
The first step is to dig out a variety of opinions from user comments through opinion discovery. We use a sequence tagging-based model to explore the entities and viewpoint descriptions in sentences, and use a dependency analysis model to judge the entity-viewpoint relationship.
The second step is to dig deeper into the evidence (Evidence) in the comment after digging a certain number of opinions, as shown in Figure 15 below. Although the source of some opinions can be found in the first step of opinion discovery, there are still many user comments that are implicit. For example, for "whether you can bring pets", users may not directly specify in the comments, but say "dogs are having fun here." This requires us to have a deep semantic understanding of the comment sentence, so as to summarize the points of view. During the implementation of the plan, we initially used a classification model to classify opinions, input user comments, and use an encoder to understand sentences, and then the classification head of each opinion judged the positive degree of the opinion. However, with the increase of automated mining opinions, in order to reduce the cost of manual labeling and classification tasks, we converted it into a matching task, that is, inputting opinion tags (Tag) and user comments to determine the degree of support for the opinion by the comment statement. Finally, in order to optimize the speed, we tailored the 12-layer Transformer, and the effect was only reduced by 0.8% when the speed was increased by 3 times, realizing the offline mining of large batches of ideas.
2.5 Exploration of end-to-end solutions
In the above, we have designed different solutions for complex problems such as multi-hop and band constraints. Although the problem can be solved to a certain extent, the complexity of the system also increases. Based on the pre-training ideas of the relation recognition module, we have carried out more explorations on general and end-to-end solutions, and published "Large-Scale Relation Learning for Question Answering over Knowledge Bases with Pre-trained" in this year’s EMNLP Language Models" paper .
For KBQA, there are a lot of researches in academia focusing on graph learning methods, hoping to use graph learning to better represent subgraphs, but ignore the semantics of the graph nodes themselves. At the same time, the pre-training model of the BERT class is trained on unstructured text, and has not been exposed to the structured data of the graph. We hope to eliminate the inconsistency between the two through task-related data, and thus propose three pre-training tasks, as shown in Figure 16 below:
- Relation Extraction : Based on large-scale relation extraction open source data sets, a large number of one hop ([CLS]s[SEP]h, r, t[SEP]) and two hops ([CLS]s1, s2 [SEP]h1 are generated , r1, t1 (h2 ), r2, t2 [SEP]) text pair training data, let the model learn the relationship between natural language and structured text.
- Relation Matching : In order to allow the model to better capture the relational semantics, we generate a large number of text pairs based on relation extraction data. Texts with the same relation are positive examples for each other, otherwise they are negative examples.
- Relation Reasoning : In order to make the model have a certain knowledge reasoning ability, we assume that the (h, r, t) in the graph is missing, and use other indirect relationships to reason whether (h, r, t) is true. The input format is: [CLS]h, r, t[SEP]p1 [SEP]... Pn [SEP].
After pre-training for the above tasks, the BERT model's inference ability for Query and structured text is significantly improved, and it performs better in the case of incomplete KB, as shown in Figure 17 below:
3 Application practice
After more than a year of construction, the current KBQA service has been connected to Meituan's travel, hotel, and to Zhuang businesses, assisting merchants in answering user questions in a timely manner, and improving user satisfaction and conversion rate.
3.1 Ask the hotel
Hotels are one of the essential needs for users to travel, but some small and medium-sized businesses have not opened a manual customer service portal and cannot answer user information in a timely manner. In order to satisfy users' quick search for information in the details page, the intelligent assistant assists hotel merchants who have not activated the customer service function to automatically respond, thereby increasing the conversion rate of users placing orders. Users can inquire about various types of information on the hotel and room page, as shown in Figure 18 below:
3.2 Ticket promotion
The ticket promotion is dedicated to helping tourist businesses solve the main ticket selling business. During peak hours in scenic spots, online ticket purchases are more convenient than queuing. However, many users still maintain the habit of offline ticket purchases. Meituan has improved the convenience for merchants and users to purchase tickets by mentioning the QR code and simple interaction. At the same time, we use the built-in "Smart Ticket Assistant" on the ticket purchase page to solve user problems in the ticket purchase process and help users buy appropriate tickets more quickly, as shown in Figure 19 below:
3.3 Merchant recommendation response
In addition to travel scenarios, users may also have many questions when browsing other local service merchants, such as "Does the barber shop need to make an appointment?" and "When is the latest time the store closes?", these can be consulted through the merchant's customer service. However, the businessmen themselves have limited manpower, and it is inevitable that they will be overwhelmed during peak times. In order to reduce the waiting time for users, our Q&A service will provide merchants with a recommendation function for words, as shown in Figure 20 below. Among them, KBQA is mainly responsible for answering information-related questions about merchants and group buying.
4 Summary and outlook
KBQA is not only a popular research direction, but also a complex system, which involves many algorithms such as entity recognition, syntactic analysis, relationship recognition, etc. It not only pays attention to the overall accuracy rate, but also controls the delay, and proposes both algorithms and engineering. A big challenge. After more than a year of technical exploration, our team has not only implemented multiple applications in Meituan, but also won the first in the A list, the second in the B list and the technology innovation award in the CCKS KBQA evaluation in 2020. At the same time, we released some Meituan data, and cooperated with Peking University to hold the 2021 CCKS KBQA evaluation.
Going back to the technology itself, although our KBQA has been able to solve most of the head problems, long-tail and complex problems are the greater challenges. There are still many cutting-edge technologies worth exploring. We hope to explore the following directions:
- Unsupervised field migration : Since KBQA covers multiple business scenarios such as Meituan Hotel, Travel to Comprehensive, which includes more than a dozen small areas, we hope to improve the Few-Shot and Zero-Shot capabilities of the model and reduce the labeled data Will cause labor costs.
- knowledge enhancement : In a relationship recognition scenario, focusing on irrelevant words in the core words of the model will cause serious interference to the model. We will study how to use prior knowledge to inject the pre-training language model to guide the revision of the Attention process to improve the model Performance.
- More types of complex problems : In addition to the constraints and multi-hop problems mentioned above, users will also ask comparative and multi-relational problems. In the future, we will optimize the graph construction and Query understanding modules to solve users The long tail problem.
- End-to-end KBQA : Whether for industry or academia, KBQA is a complex process. How to use pre-trained models and its own knowledge to simplify the overall process and even end-to-end solutions is what we need to continue to explore direction.
Students who are interested in KBQA are also welcome to join our team and explore more possibilities of KBQA together! Resume delivery address: wangsirui@meituan.com.
About the Author
Ru Mei, Liang Di, Si Rui, Hong Zhi, Ming Yang, and Wu Wei are all from the Knowledge Graph Group of the NLP Center of the Search and NLP Department.
Read more technical articles from the
the front | algorithm | backend | data | security | operation and maintenance | iOS | Android | test
16184f2a218c94 | . You can view the collection of technical articles from the Meituan technical team over the years.
| This article is produced by the Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "the content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activity, please send an email to tech@meituan.com to apply for authorization.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。