头图
This article was first published on the Nebula Graph Community public account

中科大脑知识图谱平台建设及业务实践

"In order to support various needs in complex urban scenarios, the Brain Knowledge Graph team of China Science and Technology has designed and developed an integrated platform that includes ontology visualization design, data mapping, data extraction, data writing, and graph data exploration. Their business background, technology selection, platform construction, etc."

01 Background introduction

As a city-level digital asset operator, Zhongke Brain, on the one hand, needs to efficiently store various types of data, and on the other hand, faces the problem of how to make full use of various types of data. Traditional NoSQL and SQL cannot fully meet the requirements of data storage and storage. Using the knowledge graph based on the graph database can solve these problems to a certain extent. The Knowledge Graph Component (KBU) is the core component of the Brainbank City Brain product.

脑库架构

Generally speaking, there are the following three aspects of the demand for knowledge graphs in the brain of Zhongke:

  1. The government affairs knowledge map incorporates information such as policies and regulations, documents, procedures, and organizational structures into the knowledge map. At present, a knowledge map for handling matters in the fields of public security household administration and e-government has been built, and different knowledge maps are built for different functional departments, which improves the efficiency and quality of services.
  2. Assets and equipment management map, establish a knowledge map for a large number of public facilities, real estate, Internet of Things equipment in the city, etc., to form linkage management and operation and maintenance.
  3. Affective Knowledge Graph, builds an affair graph for major urban events, emergencies, and centralized complaint events, including the time, location, subject, and popularity of events, discovers the correlation and evolution law between events, and provides decision support.
    In practice, knowledge graphs of different sectors are not completely isolated, but are integrated according to application requirements, give full play to the relational link characteristics of graphs, connect the elements of the city ontology, realize linkage, and solve the relational storage and mining of data.

02 Graph database selection

In scenarios where data is highly structured and consistent, traditional relational databases are generally chosen; in scenarios where data has huge potential correlations, graph data storage and knowledge graph technology based on it will be a reasonable choice.
The survey found that the data model of graph databases is also simpler and more expressive than relational databases or other NoSQL databases. Graph databases are widely used in social networking, financial risk control, personalized recommendation, network security and other fields.
We mainly consider the following points in the selection of graph database:
1) Complete functions and powerful performance; 2) The project is open source and supports flexible secondary development; 3) It is safe and reliable, and domestic production is preferred;

Zhongke Brain got up early to conduct some performance and function comparisons, and also referred to related evaluations of Meituan and Tencent . From the test results, Nebula Graph is superior to competing products in terms of data import, real-time writing and multi-hop query performance. In addition, the Nebula Graph community is active and the response to related issues is fast, so the team finally chose Nebula Graph as the foundation of the graph database platform.

腾讯云安全BENCHMARK

03 Knowledge Graph Construction Platform

Knowledge graph construction includes business rule formulation, ontology construction, knowledge extraction, knowledge fusion, data storage and other processes, and often requires the participation and cooperation of business experts, engineering, algorithms, project management and other personnel. The organic integration of the above links and division of labor will greatly reduce the speed of the knowledge map landing. At present, there is no open source product to meet this demand. In order to support various needs in complex urban scenarios, we have designed and developed an integrated platform that includes ontology visualization design, data mapping, data extraction (structured and unstructured), data writing, and graph data exploration. The platform structure is as follows: picture.

图谱平台结构

  • project management

The knowledge graph platform uses knowledge graphs in different fields as project units, and each project independently constructs and manages the whole process of knowledge. The project includes ontology design, data mapping, and data extraction, which are carried out step-by-step according to the process, and specialize in "skills" and "professionals" at different stages. The platform realizes the standardization of internal knowledge graph construction and department collaboration, reduces the communication costs and data security issues between personnel at different stages, and greatly improves efficiency.

3平台

  • Ontology design

The construction of knowledge graphs is not entirely a technical work. In the ontology design stage, business work may occupy more than half of the work. Business experts often do not understand the knowledge schema design. The usual process is that business experts mark knowledge in a non-standard way, which brings a lot of rework. There are collaboration problems between different experts and between experts and technology. In response to these pain points, the construction platform draws on open source projects to realize the online design of ontology, supports file import and export in various formats (OWL, RDF, RDFS), and has good compatibility. After testing, more than 90% of the resources in OpenKG can be directly accessed. The way the visualization is built really realizes the composition of the picture.

设计2

  • data extraction

The ontology construction is completed, and for structured data, it supports the mapping of relational data such as EXCEL and CSV with ontology, and completes the writing of graph data. For graph extraction of unstructured data, the platform has built-in model services for triple extraction. There are two built-in models. The first one is based on open source datasets, such as Baidu DuIE 2.0, which satisfies general data extraction, and the other designs models based on their own business. We designed an event and key information extraction model for the citizen hotline, excavated the relationship between different letters of the citizen hotline from a map perspective, and designed a joint extraction model for event extraction. The joint extraction model is more efficient and accurate than the pipeline model. rate has increased significantly.

论文联合抽取

  • Graph Exploration

The results of structured data import and unstructured data extraction will be written to the Nebula Graph database. Graph exploration can easily query and display the written knowledge, and can directly search for point and edge information through the knowledge search box. Builders make knowledge retrieval, exploration, and aggregation simpler. Product features:

  1. Knowledge display, in order to have an intuitive view of the graph, in the graph exploration stage, the automatic display subgraph function is added, similar to Neo4j MATCH (n) RETURN n LIMIT 25) , mainly through a simple algorithm to find the center point of the graph, and then controlled by degrees The size of the subgraph from the center point, while preventing the display from exploding.
  2. Knowledge search, supports fuzzy matching of points and edges to better realize knowledge discovery and recommendation;
  3. Knowledge calculation, built-in lightweight graph algorithm, can calculate node in-out degree, centrality, ethnic group, similar node class, etc.

图探索

In order to meet our own product applications, we have developed a series of API application interfaces based on the underlying interfaces such as Nebula Graph, Elasticsearch, and NetworkX. In the future, our API interface implementation will also actively participate in open source.

04 Business landing

  • Smart Q&A

A domain knowledge map is established around the knowledge of public security household registration, and a knowledge map question answering (KBQA) system is designed to support multi-entity and multi-hop (Multi-hop) matching and reasoning. Based on the knowledge map component of the brain database and the spatiotemporal construction component of the brain database, the spatial and non-spatial data are combined to realize spatial reasoning. Citizens may ask, "Where are the institutions that can apply for visas to go abroad?" Through the combination of knowledge map semantic question and answer and GIS , which accurately returns the location and corresponding attributes, realizes the accessibility and interoperability of knowledge and maps, and provides convenience for urban services.

问答

  • Knowledge Guidance and Decision Making

The urban knowledge graph covers concepts such as Device, Thing, Manage, Event, Field, and Rule, and basically constitutes the knowledge base of various fields of the city, which is used to deal with the city. Services and urban governance issues. For example, when the fire lane is occupied, the function of the fire lane sensor (Device) records the relevant information, the service (Service) records the occupant information and gives an alarm, and feeds back the information such as the license plate of the occupied vehicle to the case manager. Then according to the address area (Area), regulations (Rule) and other information to quickly intervene in illegal parking incidents. The related construction and application research methods were included in CCKS2021 .

本体

  • Knowledge process recommendation

In the personalized recommendation of the city brain, service resources are integrated and customized with 'I' as the center, user behavior habits and environmental information are analyzed through knowledge graphs, and methods such as graph embedding, graph path analysis, and community discovery algorithms are used to intelligently push Information that is highly concerned by users and highly relevant, and proactively provides services.
For citizens, when handling cases, they will automatically receive personalized recommendations, such as similar case handling experience, optional paths, footprints and related information, etc.; for city managers, common sense maps are used in case distribution and similar case recommendation. Combined with professional domain knowledge map to analyze historical information and case behavior. Using knowledge fusion, subgraph space, knowledge reasoning and other methods, it provides more accurate analysis and classification for the distribution and recommendation of cases, and infers the corresponding distribution method and similarity relationship, which improves the smart service level and efficiency of the city brain.

知识推荐 (2)

05 Cooperation & Future

At present, the company's brain database and graph database Nebula Graph have completed the interoperability test certification, and the technicians actively participate in open source community projects and have passed the Knowledge Graph Expert Certification ( NGCP ). In the future, we will continue to support domestic databases and actively contribute code to the community.

In the construction stage of the platform, the built-in graph algorithms such as graph embedding, graph learning, and GNN are used to optimize the performance of large-scale graph algorithms, so as to realize an integrated platform of construction and application, and empower the deep mining and intelligent application of digital assets.

The above is the sharing of knowledge graph platform construction and business practice brought by the Chinese Science Brain Knowledge Graph Development Team.


Exchange graph database technology? To join the Nebula exchange group, please fill in your Nebula business card first, and the Nebula assistant will pull you into the group~~


NebulaGraph
169 声望686 粉丝

NebulaGraph:一个开源的分布式图数据库。欢迎来 GitHub 交流:[链接]