ACL is the most important top international conference in the field of computational linguistics and natural language processing. According to the Google Academic Computational Linguistics publication index, ACL ranks first in influence and is a CCF-A recommendation meeting. The Meituan technical team has a total of 7 papers (including 6 long essays and 1 short essay) received by ACL 2021. These papers are used by the Meituan technical team in event extraction, entity recognition, intent recognition, new slot discovery, and unsupervised sentence representation Some cutting-edge explorations and applications in natural language processing tasks such as, semantic analysis, document retrieval, etc.
The Annual Conference of the Association for Computational Linguistics ( ACL 2021 ) will be held in Bangkok, Thailand from August 1 to 6, 2021 (virtual online conference). ACL is the most important top international conference in the field of computational linguistics and natural language processing. The conference is organized by the International Association of Computational Linguistics and is held once a year. According to the Google Academic Computational Linguistics publication index, ACL ranks first in influence and is a CCF-A recommendation meeting. The theme of ACL this year is "NLP for Social Good". According to official statistics, the conference received a total of 3350 valid submissions, a total of 710 main conference papers (acceptance rate 21.3%), and 493 Findings papers (acceptance rate 14.9%).
The Meituan technical team has a total of 7 papers (including 6 long essays and 1 short essay) accepted by ACL 2021. These papers are Meituan’s work on event extraction, entity recognition, intent recognition, new slot discovery, unsupervised sentence representation, and semantics. Some technology precipitation and application in natural language processing tasks such as parsing and document retrieval.
For event extraction, we explicitly use the semantic-level argument role information of surrounding entities and propose a bidirectional entity-level decoder (BERD) to gradually generate argument role sequences for each entity; for entity recognition, we are the first The concept of mobility between slots is proposed, and a calculation method of mobility between slots is proposed for this purpose. By comparing the mobility of the target slot and the source task slot, the corresponding source task slot is found for different target slots. As its source slot, only the training data of these source slots is used to construct a slot filling model for the target slot; for intent recognition, we propose an intent feature learning method based on supervised contrast learning, by maximizing the distance between classes and the minimum The variance within the class is improved to improve the discrimination between intents; for the discovery of new slots, we define the new slot recognition (Novel Slot Detection, NSD) task for the first time. The difference from the traditional slot recognition task is that the new slot The recognition task attempts to discover new slots in the real dialogue data based on the existing slot labeling data in the domain, so as to continuously improve and enhance the ability of the dialogue system.
In addition, in order to solve the "collapse" phenomenon of BERT's native sentence representation, we propose a sentence representation transfer method based on contrast learning—ConSERT, which uses Fine-Tune on the unsupervised corpus of the target field to make the sentence representation generated by the model and downstream tasks The data distribution is more adapted. We also propose a new unsupervised semantic analysis method-Synchronous Semantic Decoding (SSD), which can combine paraphrase and grammatical constraint decoding to simultaneously solve the problem of semantic gap and structural gap. We also start with improving the encoding of documents to improve the semantic representation ability of document encoding, which not only improves the effect but also improves the retrieval efficiency.
Next, we will give a more detailed introduction to these 7 academic papers, hoping to help or enlighten students who are engaged in related research, and welcome everyone to leave a message in the comment area at the end of the article and communicate together.
01 Capturing Event Argument Interaction via A Bi-Directional Entity-Level Recurrent Decoder
| Paper Download
| Paper authors: Xi Xiangyu, Ye Wei (Peking University), Zhang Tong (Peking University), Zhang Shikun (Peking University), Wang Quanxiu (RICHAI), Jiang Huixing, Wuwei
| Paper Type: Main Conference Long Paper (Oral)
Event extraction is an important and challenging task in the field of information extraction. It has a wide range of applications in the fields of automatic summarization, automatic question and answer, information retrieval, knowledge graph construction, etc., aiming to extract structured events from unstructured text information. Event argument extraction extracts the description information of a specific event (called argument information), including event participants, event attributes and other information. It is an important and extremely difficult task in event extraction. Most argument extraction methods usually model argument extraction as an argument role classification task for entities and related events, and conduct separate training and testing for each entity in the entity set in a sentence, ignoring candidate arguments The potential interaction relationship between the two; and the method that partially uses the argument exchange information does not make full use of the semantic level argument role information of the surrounding entities, and at the same time ignores the multi-argument distribution pattern in a specific event.
Aiming at the problems existing in the current event argument detection, this paper proposes to explicitly use the semantic level of the argument role information of surrounding entities. To this end, this article first models argument detection as an entity-level decoding problem. Given a sentence and known events, the argument detection model needs to generate argument role sequences; at the same time, it is different from the traditional word-level Seq2Seq model. This article proposes A bidirectional entity-level decoder (BERD) is used to gradually generate a sequence of argument roles for each entity. Specifically, this paper designs an entity-level decoding cycle unit, which can simultaneously use the current instance information and surrounding argument information; and also uses forward and backward decoders, which can respectively move from left to right and from right to left. Predict the current entity, and use the argument information to the left/right side in the one-way decoding process; finally, this article uses a classifier combined with the features of the two-way encoder to complete the final decoding after the two-way decoding is completed. Prediction, so that the argument information on the left and right sides can be used at the same time.
This article conducted experiments on the public data set ACE 2005, and compared it with a variety of existing models and the latest argument interaction methods. Experimental results show that the performance of this method is better than the existing argument interaction methods, and the improvement effect is more significant in events with a large number of entities.
02 Slot Transferability for Cross-domain Slot Filling
| Paper Download
| Paper authors: Lu Hengtong (Beijing University of Posts and Telecommunications), Han Zhuoxin (Beijing University of Posts and Telecommunications), Yuan Caixia (Beijing University of Posts and Telecommunications), Wang Xiaojie (Beijing University of Posts and Telecommunications), Lei Shuyu, Jiang Huixing, Wuwei
| Paper Type: Findings of ACL 2021, Long Paper
Slot filling aims to identify the task-related slot information in the user's utterance, and is a key part of the task-based dialogue system. When a task (or domain) has more training data, the existing slot filling model can obtain better recognition performance. However, for a new task, there are often few or no slot annotation corpus. How to use the annotated corpus of one or more existing tasks (source task) to train the slot filling model in the new task (target task), which is very important for the task The rapid expansion of the application of the type dialogue system is of great significance.
Existing research on this problem is mainly divided into two types. The first is to directly apply the model trained with source task data to the target by establishing implicit semantic alignment between the source task slot information representation and the target task slot information representation. Tasks, these methods interact with word representations such as slot descriptions, slot value samples, and other content containing slot information in a certain way to obtain slot-related word representations, and then perform slot labeling based on "BIO". The second idea is to adopt a two-stage strategy to treat all slot values as entities. First, use the source task data to train a general entity recognition model to identify all candidate slot values of the target task, and then pass the candidate slot values to the target task slot information. Perform similarity comparison to classify the target task slot.
Existing work mostly focuses on building a cross-task migration model that utilizes the associated information between source and target tasks. The data of all source tasks is generally used when the model is constructed. However, in fact, not all source task data will have transferable value for the slot identification of the target task, or the value of different source task data for a specific target task may be very different. For example, the air ticket reservation task and the train ticket reservation task are highly similar. The slot filling training data of the former will help the latter, while the air ticket reservation task and the weather query task are quite different. The training data of the former has no or only the latter’s training data. It has little reference value, and even plays a role in interference.
Furthermore, even if the source task and the target task are very similar, the training data of the slot of each source task will not be helpful to all the slots of the target task. For example, the training data of the departure time slot of the air ticket reservation task may be useful for the train ticket. It is helpful to fill the departure time slot of the scheduled task, but it is not helpful to the train type slot, and it acts as a disturbance. Therefore, we hope to find one or more source task slots that can provide effective migration information for each slot in the target task, and build a cross-task migration model based on the training data of these slots, so that the source task data can be used more effectively.
To this end, we first proposed the concept of the degree of migration between slots, and for this reason, we proposed a calculation method for the degree of migration between slots. Based on the calculation of the degree of migration, we proposed a way to select the source task for the target task. Can provide a method of efficiently migrating information slots. By comparing the mobility of the target slot and the source task slot, find the corresponding source task slot for different target slots as its source slot, and build a slot filling model for the target slot based only on the training data of these source slots. Specifically, the mobility degree combines the slot value between the target slot and the source slot to indicate the distribution similarity, and the slot value context indicates the distribution similarity as the mobility between the two slots, and then the source task slot is based on its Sort by the degree of transferability between the target slot and the target slot. Use the training corpus corresponding to the slot with the highest transferability to train a slot filling model to obtain its performance on the target slot verification set. Add new ones according to the sorting of transferability The source task slot corresponds to the training corpus training model and the corresponding verification set performance is obtained. The source task slot corresponding to the point with the highest performance and the source task slot with higher mobility than the slot are selected as the source slot. Use the selected source slot to construct a target slot filling model.
The slot filling model recognizes the slot value according to the slot value information and the context information of the slot value, so when we calculate the mobility between slots, we first measure the similarity between the slot value representation distribution and the context representation distribution, and then we Drawing lessons from the fusion method of F value for accuracy and recall, the slot value representation distribution similarity and the slot value context representation distribution similarity were fused, and finally Tanh was used to normalize the obtained value to between 0-1 , And then subtract the obtained value from 1, in order to comply with the intuitive perception that the larger the calculated value, the higher the degree of transferability. The following formula is our proposed calculation method for the mobility between slots:
$sim(p_v(s_a),p_v(s_b))$ and $sim(p_c(s_a),p_c(s_b))$ represent the similarity between slot a and slot b in the distribution of slot value representation and context representation, respectively, We use Maximum Mean Difference (MMD) to measure the similarity between distributions.
We have not proposed a new model, but our proposed source slot selection method can be combined with all known models. Experiments on multiple existing models and data sets show that our proposed method can fill the target task slot The model brings consistent performance improvement (the column of ALL represents the original performance of the existing model, and the column of STM1 represents the performance of the model trained with the data selected by our method.)
03 Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning
| Paper Download
| Paper authors: Zeng Zhiyuan (Beijing University of Posts and Telecommunications), He Keqing, Yan Yuanmeng (Beijing University of Posts and Telecommunications), Liu Zijun (Beijing University of Posts and Telecommunications), Wu Yanan (Beijing University of Posts and Telecommunications), Xu Hong (Beijing University of Posts and Telecommunications), Jiang Huixing, Xu Weiran ( Beijing University of Posts and Telecommunications)
| Paper Type: Main Conference Short Paper (Poster)
In the actual task-based dialogue system, Out-of-Domain Detection is a key link, which is responsible for identifying the abnormal query entered by the user and giving a rejection response. Compared with traditional intent recognition tasks, abnormal intent detection faces the problems of sparse semantic space and lack of annotation data. The existing abnormal intention detection methods can be divided into two categories: one is the supervised abnormal intention detection, which refers to the existence of supervised OOD intention data during the training process. The advantage of this type of method is that the detection effect is better, but the disadvantage is Relying on a large amount of labeled OOD data, this is not feasible in practice. The other type is unsupervised anomaly intent detection, which refers to only using intent data in the domain to identify intent samples outside the domain. Since the prior knowledge of labeled OOD samples cannot be used, unsupervised anomaly intent detection methods face greater challenges. . Therefore, this article mainly studies unsupervised abnormal intention detection.
A core problem of unsupervised abnormal intention detection is how to learn discriminative semantic representations through intent data in the domain. We hope that the representations of samples under the same intention category are close to each other, while samples under different intention categories are far away from each other. Based on this, this paper proposes an intention feature learning method based on supervised contrast learning, which improves the discrimination of features by maximizing the distance between classes and minimizing intra-class variance.
Specifically, we use a BiLSTM/BERT context encoder to obtain the intent representation in the domain, and then use two different objective functions for the intent representation: one is the traditional classification cross-entropy loss, and the other is the supervised contrast learning ( Supervised Contrastive Learning) loss. Supervised comparative learning is based on comparative learning. It improves the shortcomings of the original comparative learning that there is only one Positive Anchor. It uses samples of the same type as positive samples and different types of samples as negative samples to maximize the correlation between positive samples. At the same time, in order to increase the diversity of sample representations, we use adversarial augmentation methods to combat attacks. By adding noise to the hidden space, we can achieve traditional data enhancements such as character replacement, insertion and deletion, and back translation. Effect. The model structure is as follows:
We verify the effect of the model on two public data sets, and the experimental results show that our proposed method can effectively improve the performance of unsupervised abnormal intention detection, as shown in the following table.
04 Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System
| Paper Download
| Paper authors: Wu Yanan (Beijing University of Posts and Telecommunications), Zeng Zhiyuan (Beijing University of Posts and Telecommunications), He Keqing, Xu Hong (Beijing University of Posts and Telecommunications), Yan Yuanmeng (Beijing University of Posts and Telecommunications), Jiang Huixing, Xu Weiran (Beijing University of Posts and Telecommunications)
| Paper Type: Main Conference Long Paper (Oral)
Slot Filling is an important module in the dialogue system, responsible for identifying the key information in the user's input. Existing slot filling models can only identify pre-defined slot types, but there are a large number of external entity types in practical applications. These unrecognized entity types are critical to the optimization of the dialogue system.
In this article, we define the new slot recognition (Novel Slot Detection, NSD) task for the first time. Different from the traditional slot recognition task, the new slot recognition task tries to mine and discover the truth based on the existing slot annotation data in the domain. The new slots in the dialogue data will continue to improve and enhance the capabilities of the dialogue system, as shown in the following figure:
Comparing the existing OOV recognition task and the out-of-domain intent detection task, the NSD task proposed in this paper has significant differences: on the one hand, compared with the OOV recognition task, the object recognized by the OOV is a new slot value that has not appeared in the training set, but The entity type to which these slot values belong is fixed, and the NSD task not only has to deal with the OOV problem, the more serious challenge is the lack of prior knowledge of unknown entity types, and only rely on the slot information in the domain to reason about the entity information outside the domain; on the other hand, Compared with the extra-territorial intent detection task, the extra-territorial intent detection only needs to identify sentence-level intent information, while the NSD task faces the influence of the context between the entities in the territory and the entities outside the territory, as well as the interference of non-entity words on new slots. On the whole, the new slot recognition (Novel Slot Detection, NSD) task proposed in this article is very different from the traditional slot filling task, OOV recognition task, and extraterritorial intent detection task, and faces more challenges. It provides a direction worthy of consideration and research for the future development of the dialogue system.
Based on the existing slot filling public data sets ATIS and Snips, we constructed two new slot identification data sets ATIS-NSD and Snips-NSD. Specifically, we randomly select part of the slot types in the training set as out-of-domain categories, and keep the remaining types as intra-domain categories. For examples where both out-of-domain and intra-domain categories appear in a sentence, we adopt the strategy of directly deleting the entire sample. In order to avoid the bias introduced by the O tag, to ensure that the information of entities outside the domain only appears in the test set, which is closer to the actual scene. At the same time, we have proposed a series of baseline models for the NSD task. The overall framework is shown in the figure below. The model consists of two stages:
- training stage : Based on the slot labeling data in the domain, we train a BERT-based sequence labeling model (multi-class or two-class) to obtain entity representation.
- Test stage : First, use the trained sequence labeling model to predict the entity type in the domain. At the same time, based on the obtained entity representation, use the MSP or GDA algorithm to predict whether a word belongs to Novel Slot, that is, the outside type, and finally the two output results Combine to get the final output.
We use entity recognition F1 as an evaluation indicator, including Span-F1 and Token-F1. The difference between the two is whether the entity boundary is considered. The experimental results are as follows:
We use a lot of experiments and analysis to explore the challenges of new slot recognition: 1. Confusion between non-entity words and new entities; 2. Insufficient context information; 3. Dependence between slots; 4. Openness Slot (Open Vocabulary Slots).
05 ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer
| Paper Download
| Paper authors: Yan Yuanmeng, Li Rumei, Wang Sirui, Zhang Fuzheng, Wuwei, Xu Weiran (Beijing University of Posts and Telecommunications)
| Paper Type: Main Conference Long Paper (Poster)
Sentence vector representation learning occupies an important position in the field of natural language processing (NLP), and the success of many NLP tasks is inseparable from training high-quality sentence representation vectors. Especially in tasks such as Semantic Textual Similarity and Dense Text Retrieval, the model measures the semantic relevance of the two sentences by calculating the similarity of the embedding after encoding the two sentences in the representation space. Degree, which determines its matching score. Although the BERT-based model has achieved good performance on many NLP tasks (through supervised Fine-Tune), the sentence vector derived by itself (averaging all word vectors without Fine-Tune) is of low quality. It is not even comparable to Glove's results, so it is difficult to reflect the semantic similarity of the two sentences.
In order to solve this "collapse" phenomenon of BERT native sentence representation, this paper proposes a sentence representation transfer method based on contrast learning—ConSERT. By fine-tuned on the unsupervised corpus of the target field, the sentence representation generated by the model is consistent with the downstream task. The data distribution is more adapted. At the same time, this article proposes four different data enhancement methods for NLP tasks: counter attack, word order disorder, cropping, and dropout. The experimental results of the sentence semantic matching (STS) task show that under the same setting, ConSERT has a significant increase of 8% compared to the previous SOTA (BERT-Flow), and it still shows a strong performance improvement in a few sample scenarios.
In unsupervised experiments, we directly perform Fine-Tune on unlabeled STS data based on pre-trained BERT. The results show that our method greatly exceeds the previous SOTA-BERT-Flow under completely consistent settings, achieving a relative performance improvement of 8%.
06 From Paraphrasing to Semantic Parsing: Unsupervised Semantic Parsing via Synchronous Semantic Decoding
| Paper Download
| Paper authors: Wu Shan (Institute of Software, Chinese Academy of Sciences), Chen Bo (Institute of Software, Chinese Academy of Sciences), Xin Chunlei (Institute of Software, Chinese Academy of Sciences), Han Xianpei (Institute of Software, Chinese Academy of Sciences), Sun Le (Institute of Software, Chinese Academy of Sciences), Zhang Weipeng, Chen Jiansheng, Yang Fan, Cai Xunliang
| Paper Type: Main Conference Long Paper
Semantic Parsing is one of the core tasks in natural language processing. Its goal is to convert natural language into computer language, so that computers can truly understand natural language. A major challenge currently facing semantic analysis is the lack of annotated data. Neural network methods mostly rely on supervised data, and semantic analysis of data labeling is very time-consuming and laborious. Therefore, how to learn semantic parsing models without supervision has become a very important problem, and it is also a challenging problem. Its challenge is that semantic parsing needs to span natural language and semantic representation without annotated data. Semantic gap and structural gap between. The previous method generally used paraphrase as a reordering or rewriting method to reduce the semantic gap. Different from the previous method, we propose a new unsupervised semantic analysis method-Synchronous Semantic Decoding (SSD), which can combine paraphrase and grammatical constraint decoding to simultaneously solve the semantic gap and structural gap.
The core idea of semantic synchronization decoding is to transform semantic analysis into paraphrase problems. We paraphrase the sentence into a standard sentence pattern and parse out the semantic representation at the same time. Among them, there is a one-to-one correspondence between standard sentence patterns and logical expressions. In order to ensure that effective standard sentence patterns and semantic expressions are generated, standard sentence patterns and semantic expressions are decoded and generated within the constraints of synchronous grammar.
We decode the restricted synchronous grammar through the paraphrase model, and use the text generation model to score the standard sentence patterns to find the standard sentence pattern with the highest score (as mentioned above, the space is also limited by the grammar). This article presents two different algorithms: Rule-Level Inference uses grammatical rules as the search unit and Word-Level Inference uses words as the search unit.
We use GPT2.0 and T5 to train the sequence-to-sequence paraphrase model on the paraphrase data set, and then only need to use the synchronous semantic decoding algorithm to complete the semantic parsing task. In order to reduce the influence of style bias on the generation of standard sentence patterns, we propose adaptive pre-training and sentence reordering methods.
We conducted experiments on three data sets: Overnight (λ-DCS), GEO (FunQL) and GEOGranno. The data covers different domains and semantic representations. Experimental results show that our model can achieve the best results on each data set without using supervised semantic analysis data.
07 Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval
| Paper Download
| Authors of the paper: Tang Hongyin, Sun Xingwu, Jin Beihong (Institute of Software, Chinese Academy of Sciences), Wang Jingang, Zhang Fuzheng, Wuwei
| Paper Type: Main Conference Long Paper (Oral)
The goal of the document retrieval task is to retrieve texts that are semantically similar to a given query in a massive text database. In actual scene applications, the number of document libraries will be very large. In order to improve retrieval efficiency, retrieval tasks are generally divided into two stages, namely, preliminary screening and fine sorting. In the preliminary screening stage, the model uses some efficient retrieval methods to screen out a part of candidate documents, which are used as input in the subsequent fine-ranking stage. In the fine ranking stage, the model uses a high-precision ranking method to rank candidate documents to obtain the final retrieval result.
With the development and application of pre-training models, a lot of work began to send queries and documents into pre-training for encoding at the same time, and output matching scores. However, due to the high computational complexity of the pre-training model, it takes a long time to perform a calculation for each query and document. This application method is usually only used in the refinement stage. In order to speed up the retrieval rate, some work began to use pre-trained models to encode documents and queries separately, and encode the documents in the document library into vector form before querying. In the query stage, only query coding and document coding are needed for similarity calculation. Reduce time consumption. Because this method encodes documents and queries into dense vector forms, this type of retrieval is also called "Dense Retrival".
A basic dense retrieval method encodes the document and query into a vector. However, because the document contains more information, it is easy to cause information loss. In order to improve this, some work has begun to improve the vector representation of queries and documents. The existing improvement methods can be roughly divided into three types, as shown in the following figure:
Our work starts with improving the encoding of documents to improve the semantic representation ability of document encoding. First of all, we believe that the main bottleneck of dense retrieval is that when encoding, the document encoder does not know which part of the information in the document may be queried. During the encoding process, it is likely that different information may affect each other, causing information to be changed or lost. . Therefore, in the process of encoding documents, we construct multiple "Pseudo Query Embeddings" for each document, and each pseudo query vector corresponds to the information that each document may be asked.
Specifically, we use a clustering algorithm to cluster the Token vectors encoded by BERT, and reserve Top-k clustering vectors for each document. These vectors contain the salient semantics of multiple document Token vectors. In addition, since we keep multiple pseudo query vectors for each document, the efficiency may be reduced when calculating the similarity. We use Argmax operation instead of Softmax to improve the efficiency of similarity calculation. Experiments on multiple large-scale document retrieval data sets show that our method can improve both the effect and the retrieval efficiency.
Write at the back
The above papers are the scientific research work done by the Meituan technical team in cooperation with various universities and scientific research institutions in the fields of event extraction, entity recognition, intent recognition, new slot discovery, unsupervised sentence representation, semantic analysis, document retrieval, etc. . The thesis is a manifestation of the specific problems we encountered and solved in actual work scenarios. I hope it can be helpful or enlightening to everyone.
Meituan’s scientific research cooperation is committed to building a bridge and platform for cooperation between Meituan’s various departments and universities, scientific research institutions, and think tanks. Relying on Meituan’s rich business scenarios, data resources and real industrial issues, it is open and innovative, and gathers upward forces. Intelligence, big data, Internet of Things, unmanned driving, operations research optimization, digital economy, public affairs and other fields, jointly explore cutting-edge technology and industry focus macro issues, promote industry-university-research cooperation exchanges and achievement transformation, and promote the cultivation of outstanding talents. Facing the future, we look forward to cooperating with teachers and students from more universities and research institutes. Welcome everyone to contact us (meituan.oi@meituan.com).
Read more technical articles collection of
the front | algorithm | backend | data | security | operation and maintenance | iOS | Android | test
| in the public account menu bar dialog box. You can view the collection of technical articles from the Meituan technical team over the years.
| This article is produced by the Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "the content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activity, please send an email to tech@meituan.com to apply for authorization.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。