The common-sense concept map is a kind of knowledge map built around the entities and the relationships between entities built around common-sense concepts, while focusing on the scenes of Meituan. This article introduces the Schema of Meituan's common-sense conceptual map construction, the challenges encountered in map construction and the algorithm practice in the construction process, and finally introduces some current common-sense conceptual maps in business applications.
I. Introduction
In natural language processing, we often think about how to understand natural language well. For us humans, to understand the text information of a certain natural language, we usually use the current information to correlate the associated information stored in our brains, and finally understand the information. For example, "He doesn't like eating apples, but he likes ice cream." When people understand, they associate the cognitive information in the brain: apples are sweet, with a bit crispy taste; ice cream, sweeter than apples, tastes soft and cold, summer Can relieve heat; children prefer to eat sweets and ice cream. So combined with this knowledge, I will infer several reasons why I prefer ice cream. However, a lot of natural language understanding work is still focused on the level of information. The current understanding work is similar to a Bayesian probability, searching for qualified maximum text information from the known training text.
It is the ultimate goal of natural language processing to understand texts like humans in natural language processing. Therefore, more and more researches have introduced some additional knowledge to help machines understand natural language texts. Pure text information is only the expression of external objective facts, and knowledge is the induction and summary of external objective facts based on text information. Therefore, supplementary knowledge information is added to natural language processing to make natural language understand better.
Establishing a knowledge system is a direct way to help natural language understand more accurately. The knowledge graph is proposed around this idea, hoping that by giving the machine explicit knowledge, the machine can reason and understand like a human. So in 2012, Google formally proposed the concept of Knowledge Graph. Its original intention was to optimize the results returned by search engines and enhance the search quality and experience of users.
2. Introduction to the common-sense concept map
The common-sense concept map is to establish the relationship between concepts and help the understanding of natural language texts. At the same time, our common-sense concept map focuses on the Meituan scene, helping to improve the effects of search, recommendation, and feeds streaming in the Meituan scene.
According to the needs of understanding, there are mainly three dimensions of understanding ability:
- what , what concept is the core concept is to establish what the relationship system. For example, "maintenance washing machine", what is "maintenance", and what is "washing machine".
- What kind of , the attributes of a certain aspect of the core concept, the refinement of a certain aspect of the core concept. "Restaurant with terrace", "Parent-Child Amusement Park", "Fruit Melaleuca", "With Terrace", "Parent-child", and "Fruit Melaleuca" are all attributes of one aspect of the core concept, so core concepts need to be established Correspondence between attributes and attribute values.
- What , to solve the gap between the search concept and the concept of acceptance, such as "reading", "shopping", "walking baby", etc. There is no clear corresponding supply concept, so establish an association network between search and supply concepts to solve This kind of problem.
In summary, it covers the taxonomy architecture of the concept of "what", the conceptual attribute relationship of "what", and the conceptual inheritance relationship of "what". At the same time, POI (Point of Interesting), SPU (Standard Product Unit), and Tuan Shan are examples in the Meituan scene and need to be connected to the concepts in the map.
Starting from the construction goal, disassemble the overall common-sense conceptual map construction work and split it into three types of nodes and four types of relationships. The specific content is as follows.
2.1 Three types of nodes in the graph
Taxonomy node : In the concept map, understanding a concept requires a reasonable knowledge system. The predefined Taxonomy knowledge system is used as the basis for understanding. The predefined system is divided into two types of nodes: the first type is in the Meituan scene Can appear as a core category, for example, food, item, place; the other category appears as a way to limit the core category, for example, color, method, style. The definitions of these two types of nodes can help the understanding of search, recommendation, etc. The currently predefined Taxonomy nodes are shown in the following figure:
Atomic concept node : the smallest semantic unit node that composes the graph, and the smallest granularity words with independent semantics, such as Internet celebrity, dog cafe, face, water supplement, etc. The defined atomic concepts all need to be linked to the defined Taxonomy node.
composite concept node : a concept node composed of atomic concepts and corresponding attributes, such as facial moisturizing, facial moisturizing, etc. The compound concept needs to establish a subordinate relationship with its corresponding core word concept.
2.2 Four types of relationships
synonymous/subordinate relationship : semantically synonymous/subordinate relationship, such as facial moisturizing-syn-facial moisturizing, etc. The defined Taxonomy system is also a subordinate relationship, so it is merged into the synonymous/subordinate relationship.
concept attribute relationship : is a typical CPV (Concept-Property-Value) relationship, describing and defining concepts from various attribute dimensions, such as hot pot-taste-not spicy, hot pot-specifications-single person, etc., examples are as follows:
There are two types of conceptual attribute relationships.
Predefined concept attributes: Currently we predefine typical concept attributes as follows:
Open concept attributes: In addition to the public concept attributes defined by ourselves, we also dig out some specific attribute words from the text and supplement some specific attribute words. For example, posture, theme, comfort, word of mouth, etc.
160d9abc514d1c concept concept, such as spring-place-botanical garden, decompression-project-boxing, etc.
The conceptual inheritance relationship takes "event" as the core, and defines a type of supply concept that can meet user needs such as "place", "article", "crowd", "time", and "efficacy". Take the event "whitening" as an example. As a user's demand, "whitening" can be met by different supply concepts, such as beauty salons, water-light acupuncture, etc. At present, several types of undertaking relations are defined as shown in the figure below:
160d9abc514d79 POI/ : POI is an example in the Meituan scene, and the instance-concept relationship is the last stop in the knowledge graph, which is often the place where the business value of the knowledge graph can be more utilized. In search, recommendation and other business scenarios, the ultimate goal is to be able to display POIs that meet user needs, so establishing the POI/SPU-concept relationship is an important part of the common-sense conceptual map of the entire Meituan scenario, and it is also more valuable data. .
3. Common-sense conceptual map construction
The overall framework of atlas construction is shown in the figure below:
3.1 Concept mining
The various relationships of common-sense conceptual maps are built around concepts, and the mining of these concepts is the first link in the construction of common-sense conceptual maps. According to the two types of atomic concept and compound concept, corresponding methods are adopted for mining.
3.1.1 Atomic concept mining
Candidates for atomic concepts come from the smallest fragments of text segmentation such as Query, UGC (User Generated Content), and Tuandan. The criteria for judging atomic concepts are to meet the three characteristics of popularity, meaning, and completeness.
- Popularity , a concept should be a word or words with high popularity in a certain corpus. This feature is mainly measured by frequency features. For example, the search volume of the word "tablebook kill" is very low and the frequency in the UGC corpus is also It is very low and does not meet the popularity requirements.
- meaning . A concept should be a meaningful word. This feature is mainly measured by semantic features. For example, "Amao" and "Agou" usually only express a simple name without other actual meaning.
- completeness , a concept should be a complete word, this feature is mainly measured by the proportion of independent searches (the word is used as the search volume of the query/the total search volume of the query containing the word), such as "children’s set" is a Wrong word segmentation candidates have a high frequency in UGC, but the proportion of independent searches is low.
Based on the above characteristics of the atomic concept, the XGBoost classification model is trained to judge whether the atomic concept is reasonable or not, combined with the training data of manual annotation and automatic construction of rules.
3.1.2 Compound Concept Mining
Compound concept candidates come from the combination of atomic concepts. Because of the combination involved, the judgment of compound concepts is more complicated than that of atomic concepts. The composite concept requires a certain degree of cognition in the Meituan station while ensuring complete semantics. According to the type of the problem, the Wide&Deep model structure is adopted. The Deep side is responsible for semantic judgment, and the Wide side introduces the information in the station.
The model structure has the following two characteristics, which can make a more accurate judgment on the rationality of the compound concept:
- Wide&Deep model structure : Combine the discrete features with the depth model to judge whether the compound concept is reasonable.
- Graph Embedding feature : Introduce related information between phrase collocations, such as "food" can be matched with "crowd", "cooking method", "quality", etc.
3.2 Conceptual subordinate relationship mining
After acquiring the concept, you also need to understand the "what" of a concept. On the one hand, it is understood through the subordinate relationship in the taxonomy knowledge system that is manually defined, and on the other hand, it is understood through the subordinate relationship between concepts.
3.2.1 Concept-subordinate relationship between Taxonomy
Concept-Taxonomy's upper-lower relationship is to understand what a concept is through a manually defined knowledge system. Since the Taxonomy type is a manually defined type, this problem can be transformed into a classification problem. At the same time, a concept may have multiple types in the Taxonomy system. For example, "lime fish" is not only a kind of "animal", but also belongs to the category of "foodstuff", so this problem is finally handled as a task of Entity Typing. , The concept and its corresponding context are used as model input, and different Taxonomy categories are placed in the same space for judgment. The specific model structure is shown in the following figure:
3.2.2 Concept-subordinate relationship between concepts
The knowledge system understands what a concept is through artificially defined types, but the artificially defined types are always limited. If the hypernym is not in the artificially defined type, such a subordinate relationship cannot be understood. For example, it can be understood through the concept-Taxonomy relationship that "Western musical instruments", "musical instruments", and "erhu" are all "articles", but there is no way to obtain one of "Western musical instruments" and "musical instruments", "erhu" and "musical instruments". The upper-lower relationship between. Based on the above problems, the following two methods are currently used for mining the subordinate relationship between concept and concept:
based on lexical rules : Mainly solve the upper and lower relations between atomic concepts and compound concepts, and use candidate relations to mine the upper and lower relations in the lexical inclusion relations (such as Western musical instruments-musical instruments).
Context-based judgment method : Lexical rules can resolve the judgment of upper and lower relation pairs that have inclusion relations in lexical terms. For the upper and lower relationship pairs that do not have a lexical inclusion relationship, such as "erhu-musical instrument", the upper and lower relationship needs to be discovered first, and the relationship candidates such as "erhu-musical instrument" are extracted, and then the upper and lower relationship is judged to judge " "Erhu-Musical Instrument" is a reasonable upper-lower relationship pair. Considering that when people interpret an object, they will introduce the type of the object. For example, when explaining the concept of "erhu", it is mentioned that "erhu is a traditional musical instrument". From such an explanatory text, both It is possible to extract relational candidate pairs such as "erhu-musical instrument", and at the same time realize whether this relational candidate pair is reasonable or not. Here, the subordinate relationship mining is divided into two parts: candidate relationship description extraction and subordinate relationship classification:
- Candidate relationship description extraction : Two concepts belonging to the same Taxonomy type is a candidate concept pair is a necessary condition for a subordinate relationship pair, such as "erhu" and "musical instrument" belong to the "articles" defined in the Taxonomy system, according to Concept-the result of the subordinate relationship of Taxonomy. For the concept of subordinate relationship to be mined, find candidate concepts consistent with its Taxonomy type to form a candidate relationship pair, and then filter out the subordinate relationship based on the co-occurrence of the candidate relationship pair in the text Candidate relation description sentence for classification.
- Upper and lower relation classification : After obtaining the candidate relation description sentence, it is necessary to judge whether the upper and lower relation is reasonable according to the context. Here, the starting position and ending position of the two concepts in the text are marked with special marks and marked with The vectors of the two concepts at the starting position mark in the text are spliced together as the representation of the relationship between the two. According to this representation, the upper and lower relations are classified. The vector represents the result of using the BERT output. The detailed model structure is shown in the following figure:
In the training data structure, because the sentences expressed by the upper and lower relations are very sparse, a large number of co-occurring sentences do not clearly indicate whether the candidate relationship has a upper and lower relations. The existing upper and lower relations are used to construct the training data by remote supervision. It is not feasible, so directly use the manually labeled training set to train the model. Since the number of manual annotations is relatively limited, and the magnitude is in the thousands, here is combined with Google's semi-supervised learning algorithm UDA (Unsupervised Data Augmentation) to improve the model effect, and the final Precision can reach 90%+. The detailed indicators are shown in Table 1:
3.3 Conceptual attribute relationship mining
The attributes contained in the concept can be divided into public attributes and open attributes according to whether the attributes are universal. Public attributes are attributes that are manually defined and contained in most concepts, such as price, style, and quality. Open attributes refer to attributes that are only contained in certain specific concepts. For example, "hair transplant", "eyelash beauty" and "script kill" contain open attributes "density", "warp" and "logic" respectively. There are far more open properties than public properties. For these two attribute relationships, we use the following two ways to mine.
3.3.1 Mining public attribute relationships based on compound concepts
Due to the versatility of public attributes, Value and Concept in Public Attribute Relations (CPV) usually appear as a combination of composite concepts, such as cheap shopping malls, Japanese cuisine, and red movies in HD. We transform the relationship mining task into dependency analysis and fine-grained NER tasks (refer to the "Exploration and Practice of NER Technology in Search 160d9abc5152c8"), and the dependency analysis identifies the core entities and modifiers in the composite concept, with fine-grained granularity. NER determines the specific attribute value. For example, given the composite concept "Red Movie HD", dependency analysis identifies the core concept of "movie". "Red" and "HD" are attributes of "movie". Fine-grained NER predicts that the attribute values are "Style" )", "Quality Evaluation (HD)".
Dependency analysis and fine-grained NER have information that can be used mutually, such as "graduated doll", "time (Time)" and "product (Product)" entity types, and "doll" is the core word dependency information, which can be mutually used. Facilitate training, so the two tasks are jointly learned. However, because the degree of association between the two tasks is not clear and there is a lot of noise, using Meta-LSTM, the Feature-Level joint learning is optimized to the Function-Level joint learning, and the hard sharing is changed to dynamic sharing, which reduces the two tasks. The impact of noise between tasks.
The overall architecture of the model is as follows:
At present, the overall accuracy of concept modification relations is around 85%.
3.3.2 Mining specific attribute relationships based on open attribute words
Open Attribute Words and Attribute Value Mining
Open attribute relationships need to dig out the unique attributes and attribute values of different concepts. The difficulty lies in the identification of open attributes and open attribute values. By observing the data, it is found that some common attribute values (for example: good, bad, high, low, more, less), usually appear in conjunction with attributes (for example: good environment, high temperature, high flow of people). So we adopt a template-based Bootstrapping method to automatically mine attributes and attribute values from user comments. The mining process is as follows:
After mining the open attribute words and attribute values, the mining of open attribute relations is divided into the mining of "concept-attribute" two-tuples and the mining of "concept-attribute-attribute values" triples.
concept-attribute mining
The mining of the "concept-attribute" two-tuple is to determine whether the concept contains an attribute property. The mining steps are as follows:
- According to the co-occurrence characteristics of concepts and attributes in UGC, the TFIDF variant algorithm is used to mine the typical attributes corresponding to the concepts as candidates.
- The candidate concept attributes are constructed as simple natural expression sentences, and the smoothness language model is used to judge the smoothness of the sentence, and the conceptual attributes with high smoothness are retained.
Concept-attribute-attribute value mining
After obtaining the "concept-attribute" two-tuple, the steps to mine the corresponding attribute value are as follows:
- seed mining . Mining seed triples from UGC based on co-occurrence features and language models.
- template excavation . Use seed triples to construct suitable templates from UGC (for example, "Whether the water temperature is appropriate is an important criterion for selecting a swimming pool.").
- relationship generates . The template is filled with seed triples, and the mask language model is trained for relationship generation.
At present, the accuracy of concept attribute relationships in the open field is around 80%.
3.4 Mining the concept inheritance relationship
The concept inheritance relationship is to establish the relationship between the user search concept and the Meituan inheritance concept. For example, when a user searches for "outing", the real intention is to find a "place suitable for outing", so the platform undertakes through concepts such as "country park" and "botanic garden". Relation mining needs to be carried out from 0 to 1, so the entire concept of inheriting relation mining has designed different mining algorithms according to the mining focus of different stages, which can be divided into three stages: ① initial seed mining; ② mid-term depth discriminant model mining; ③Relationship completion in the later stage. The details are as follows.
3.4.1 Mining seed data based on co-occurrence features
In order to solve the cold-start problem in relation extraction tasks, the industry usually adopts the Bootstrapping method, which automatically expands data from the corpus through a small number of artificially set seeds and templates. However, the Bootstrapping method is not only limited by the quality of the template, but also has natural flaws when applied to the Meituan scene. The main source of Meituan corpus is user reviews, and the expression of user reviews is very colloquial and diversified, and it is difficult to design a universal and effective template. Therefore, we abandon the template-based method, but build a ternary contrast learning network based on the co-occurrence characteristics and category characteristics between entities, and automatically mine the potential correlation information between entity relationships from unstructured text .
Specifically, we have observed that the distribution of entities in user reviews under different merchant categories is quite different. For example, UGC under the food category often involves "gathering", "a la carte", and "restaurant"; UGC under the fitness category often involves "weight loss", "personal education", and "gym"; and "decoration" General entities such as "Lobby" and "Lobby" will appear under each category. Therefore, we constructed a ternary comparative learning network, so that user reviews of the same category are close, and user reviews of different categories are far away. Similar to pre-trained word vector systems such as Word2Vec, the word vector layer obtained through this comparative learning strategy naturally contains rich relational information. When predicting, for any user search concept, a batch of high-quality seed data can be obtained by calculating the semantic similarity between it and all inherited concepts, supplemented by the statistical characteristics of the search business.
3.4.2 Training a deep model based on seed data
Pre-training language models have made great progress in the NLP field in the past two years. Fine-tuning downstream tasks based on large-scale pre-training models is a very popular practice in the NLP field. Therefore, in the mid-term of relationship mining, we use the BERT-based relationship discriminant model (refer to " BERT Exploration and Practice " article), and use the large amount of language knowledge learned during BERT pre-training to help the relationship extraction task.
The model structure is shown in the figure below. First, obtain candidate entity pairs according to the co-occurrence features between entities, and recall user comments containing candidate entity pairs; then, follow the entity marking method in the MTB paper, insert special symbols at the start and end positions of the two entities, respectively After BERT modeling, the special symbols at the beginning of the two entities are spliced together as a relationship representation; finally, the relationship representation is input to the Softmax layer to determine whether there is a relationship between the entities.
3.4.3 Relation completion based on the existing map structure
Through the above two stages, a map of conceptual inheritance relations has been constructed from unstructured text information. However, due to the limitations of the semantic model, there are a large number of missing triples in the current map. In order to further enrich the concept map and complete the missing relationship information, we apply the TransE algorithm and graph neural network technology in the knowledge map link prediction to complete the existing concept map.
In order to make full use of the structural information of the known graph, we use the relation-based graph attention neural network (RGAT, Relational Graph Attention Network) to model the graph structure information. RGAT uses the relational attention mechanism to overcome the defect that traditional GCN and GAT cannot model edge types, and is more suitable for modeling heterogeneous networks such as conceptual maps. After using RGAT to get the entity dense embedding, we use TransE as the loss function. TransE treats r in the triple (h, r, t) as a translation vector from h to t, and agrees that h+r≈t. This method is widely used in the task of complementing knowledge graphs, showing strong robustness and scalability.
The specific details are shown in the following figure. The features of each layer of nodes in RGAT are weighted and spliced by the mean value of neighbor node features and the mean value of neighboring edge features. Through the relational attention mechanism, different nodes and edges have different weight coefficients. . After obtaining the node and edge features of the last layer, we use TransE as the training target to minimize ||h+r=t|| for each pair of triples (h, r, t) in the training set. In the prediction, for each head entity and each relationship, all nodes of the map are used as candidate tail entities to calculate the distance from them, and the final tail entity is obtained.
At present, the overall accuracy of the conception relationship is about 90%.
3.5 POI/SPU-conceptual relationship construction
Establishing the association between the map concept and the Meituan instance will utilize multiple dimensions of information such as POI/SPU name, category, and user comments. The difficulty in establishing associations is how to obtain information related to the concept of the map from the diversified information. Therefore, we use synonyms to recall all the clauses related to the semantics of the concept, and then use the discriminant model to judge the degree of relevance between the concept and the clauses. The specific process is as follows:
- Synonym cluster . For the concept to be marked, a variety of expressions of the concept can be obtained based on the synonym data of the map.
- candidate clause generates . According to the result of synonym clustering, candidate clauses are recalled from multiple sources such as business name, group list name, and user comments.
- discriminant model . Use the concept-text association discriminant model (as shown in the figure below) to determine whether the concept and the clause match.
- marking result . Adjust the threshold to get the final judgment result.
Four, application practice
4.1 To the construction of the comprehensive category word map
Meituan to the comprehensive business covers a wide range of knowledge fields, including parent-child, education, medical beauty, leisure and entertainment, etc., and each field contains more small sub-fields, so it can assist in the construction of knowledge maps in different fields. Do a good job in search, recall, screening, and recommendation.
In the common-sense concept map, in addition to common-sense concept data, it also includes Meituan scene data, as well as the precipitation of basic algorithm capabilities. Therefore, the common-sense map capabilities can be used to help build the map data of comprehensive category words.
With the help of common sense maps, supplement the lack of category word data, construct a reasonable category word map, and help improve search recall through search rewriting, POI marking and other methods. At present, in the field of education, the scale of the map has been expanded from 1,000+ nodes to 2,000+ at the beginning, and synonyms have been expanded from 1,000 levels to 20,000+, which has achieved good results.
The construction process of the category word map is shown in the following figure:
4.2 Review search guide
Comment search SUG recommendation, which helps to reduce the time for users to complete the search while guiding the user's cognition, and to improve the search efficiency. Therefore, two goals need to be focused on SUG recommendation: ① Help enrich users' cognition, increase the cognition of natural text search from the POI of reviews and category search; ② Refine user search requirements, when users are searching for some For general category terms, it helps to refine the search needs of users.
In the common-sense concept map, a wealth of concepts and the relationship between corresponding attributes and their attribute values are established. Through a relatively general Query, a corresponding and refined query can be generated. For example, for cakes, strawberry cakes and cheesecakes can be produced through the attribute of taste, and 6-inch cakes, pocket cakes, etc. can be produced through the attribute of specifications.
An example of the output of the search guide word Query is shown below:
4.3 Go to comprehensive medical beauty content marking
In the display of medical beauty content, users are usually interested in a specific medical beauty service content, so different service labels will be provided in the product form to help users filter accurate medical beauty content and accurately meet user needs. However, when the label is associated with medical aesthetics content, there are many association errors, and users often see content that does not meet their needs after screening. Improving the accuracy of marking can help users focus more on their needs.
With the help of the concept-POI marking ability of Atlas and the marking relationship of the concept-UGC, the accuracy of label-content is improved. Marking through the map capability has significantly improved accuracy and recall.
- accuracy rate : Through the concept-content marking algorithm, compared with keyword matching, the accuracy rate is increased from 51% to 91%.
- Recall rate : Through conceptual synonymous mining, the recall rate increased from 77% to 91%.
V. Summary and Prospects
We gave a detailed introduction to the common-sense concept map construction work and its use in the Meituan scene. In the entire common-sense concept map, three types of nodes and four types of relationships are included in accordance with business needs, and concept mining algorithms and different types of relationship mining algorithms are introduced respectively.
At present, our common-sense conceptual map has 2 million+ concepts, and the relationships between 3 million+ concepts include subordinate, synonymous, attribute, and inheritance relationships, and POI-concept relationships are not included. At present, the accuracy of the overall relationship is around 90%, and the algorithm is still being optimized to increase the accuracy while expanding the relationship. In the future, our common-sense conceptual map will continue to be improved, and we hope to be precise and complete.
Reference
- [1] Onoe Y, Durrett G. Interpretable entity representations through large-scale typing[J]. arXiv preprint arXiv:2005.00147, 2020.
- [2] Bosselut A, Rashkin H, Sap M, et al. Comet: Commonsense transformers for automatic knowledge graph construction[J]. arXiv preprint arXiv:1906.05317, 2019.
- [3] Soares L B, FitzGerald N, Ling J, et al. Matching the blanks: Distributional similarity for relation learning[J]. arXiv preprint arXiv:1906.03158, 2019.
- [4] Peng H, Gao T, Han X, et al. Learning from context or names? an empirical study on neural relation extraction[J]. arXiv preprint arXiv:2010.01923, 2020.
- [5] Jiang, Zhengbao, et al. "How can we know what language models know?." Transactions of the Association for Computational Linguistics 8 (2020): 423-438.
- [6] Li X L, Liang P. Prefix-Tuning: Optimizing Continuous Prompts for Generation[J]. arXiv preprint arXiv:2101.00190, 2021.
- [7] Malaviya, Chaitanya, et al. "Commonsense knowledge base completion with structural and semantic context." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 03. 2020.
- [8] Li Hanyu, Qian Li, Zhou Pengfei. "Sentiment analysis and mining for product review text." Information Science 35.1 (2017): 51-55.
- [9] Yan Bo, Zhang Ye, Su Hongyi, etc. A clustering method of product attributes based on user reviews.
- [10] Wang, Chengyu, Xiaofeng He, and Aoying Zhou. "Open relation extraction for chinese noun phrases." IEEE Transactions on Knowledge and Data Engineering (2019).
- [11] Li, Feng-Lin, et al. "AliMeKG: Domain Knowledge Graph Construction and Application in E-commerce." Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2020.
- [12] Yang, Yaosheng, et al. "Distantly supervised ner with partial annotation learning and reinforcement learning." Proceedings of the 27th International Conference on Computational Linguistics. 2018.
- [13] Luo X, Liu L, Yang Y, et al. AliCoCo: Alibaba e-commerce cognitive concept net[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2020: 313-327.
- [14] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
- [15] Cheng H T, Koc L, Harmsen J, et al. Wide & deep learning for recommender systems[C]//Proceedings of the 1st workshop on deep learning for recommender systems. 2016: 7-10.
- [16] Liu J, Shang J, Wang C, et al. Mining quality phrases from massive text corpora[C]//Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 2015: 1729-1744.
- [17] Shen J, Wu Z, Lei D, et al. Hiexpan: Task-guided taxonomy construction by hierarchical tree expansion[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018: 2180-2189.
- [18] Huang J, Xie Y, Meng Y, et al. Corel: Seed-guided topical taxonomy construction by concept learning and relation transferring[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020: 1928-1936.
- [19] Liu B, Guo W, Niu D, et al. A user-centered concept mining system for query and document understanding at tencent[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 1831-1841.
- [20] Choi E, Levy O, Choi Y, et al. Ultra-fine entity typing[J]. arXiv preprint arXiv:1807.04905, 2018.
- [21] Xie Q, Dai Z, Hovy E, et al. Unsupervised data augmentation for consistency training[J]. arXiv preprint arXiv:1904.12848, 2019.
- [22] Mao X, Wang W, Xu H, et al. Relational Reflection Entity Alignment[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2020: 1095-1104.
- [23] Chen J, Qiu X, Liu P, et al. Meta multi-task learning for sequence modeling[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1).
About the Author
Zong Yu, Junjie, Hui Min, Fu Bao, Xu Jun, Xie Rui, Wu Wei, etc., all come from the NLP Center of the Meituan Search and NLP Department.
Job Offers
Meituan Search and NLP Department/NLP Center is the core team responsible for the research and development of Meituan artificial intelligence technology. The mission is to build world-class core technology and service capabilities for natural language processing, relying on NLP (Natural Language Processing) and Deep Learning (Deep Learning) , Knowledge Graph (Knowledge Graph) and other technologies, which process the massive text data of Meituan, and provide intelligent text semantic understanding services for various businesses of Meituan.
The NLP Center has long-term recruitment of natural language processing algorithm experts/machine learning algorithm experts. Interested students can send their resumes to wangzongyu02@meituan.com.
| This article is produced by the Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "the content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activity, please send an email to tech@meituan.com to apply for authorization.
Read more technical articles from the
the front | algorithm | backend | data | security | operation and maintenance | iOS | Android | test
| in the public account menu bar dialog box, and you can view the collection of technical articles from the Meituan technical team over the years.
| This article is produced by the Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "the content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activity, please send an email to tech@meituan.com to apply for authorization.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。