1
头图
This article was first published on Nebula Graph Community public number

Nebula Graph 在企查查的应用

background

is an enterprise credit query tool under Qichacha Technology Co., Ltd., which aims to provide users with quick query of corporate industrial and commercial information, court judgment information, affiliated company information, legal proceedings, information on dishonesty, and information on persons subject to execution. , intellectual property information, company news, corporate annual reports and other services.

In order to better display the legal proceedings, risk information, equity information, directors, supervisors, and high-level legal information between enterprises, we extract structured/unstructured enterprise data to construct an enterprise knowledge map to provide users with authentic and reliable services.

Nebula Graph 在企查查的应用

Graph database selection

In the beginning, we used Neo4j HA cluster as storage side. With the continuous expansion of data and business scale, we need a distributed database with good read and write . After several investigations, I chose among Dgraph, Nebula, Galaxybase, and HugeGraph, and finally chose Nebula Graph .

Regarding the selection dimension , we relatively focus on community activity, the difficulty of data acquisition, and the most basic reading and writing, subgraph query performance and other aspects.

Because the specific evaluation is not as detailed as the articles shared by other users in the community, it will not be expanded here. Here is a link to the previous evaluation of Meituan: Portal .

Introduction to Nebula Graph

What is Nebula Graph

Nebula Graph is an open source, distributed, and easily scalable native graph database, capable of carrying extremely large datasets with hundreds of billions of points and trillions of edges, and providing millisecond-level queries.

Nebula Graph 在企查查的应用

The features of graph database are based on Nebula Graph written in C++, which can provide millisecond-level query. Among many databases, Nebula Graph has shown excellent performance in the field of graph data services. The larger the data scale, the greater the advantages of Nebula Graph.

Nebula Graph adopts a shared-nothing architecture, which supports expansion and contraction without non-stop database services.

Nebula Graph has opened up more and more native tools, such as Nebula Studio , Nebula Console , Nebula Exchange , etc. More tools can be viewed in the overview of ecological tools.

In addition, Nebula Graph also has the ability to integrate with Spark, Flink, HBase and other products, which greatly enhances its competitiveness in this era full of challenges and opportunities.

The above content comes from the Nebula Graph document site

Nebula Graph Architecture

Nebula Graph consists of three services: Graph service, Meta service and Storage service. It is an architecture that separates storage and computing.

Each service has executable binaries and corresponding processes that users can use to deploy a Nebula Graph cluster on one or more machines.

The figure below shows the classic architecture of a Nebula Graph cluster.

Nebula Graph 在企查查的应用

The above content comes from the Nebula Graph document site

Process Optimization

Data Import of Nebula Graph

In the early days of our exposure to Nebula Graph, the surrounding tools were not perfect. We import Nebula Graph data regardless of full or increments of are using Hive table to push to Kafka, and consuming Kafka to write Nebula Graph in . Later, as more and more data and business switched to Nebula Graph, it was found that there are three major problems in the current method:

  1. The time for full import has increased to an unacceptable level
  2. Incremental data consumption due to multiple partitions in Kafka, some timing cannot be guaranteed
  3. How to reduce the deviation of the overall data after the incremental data enters the Nebula Graph after a long time

In response to the above problems, we make different optimization schemes according to the business characteristics of each Space, due to the different real-time requirements.

  1. After trying the Nebula Spark Connector and Nebula Importer, considering the convenience of maintenance and migration, the method of hive table -> csv -> nebula server -> importer is used to import the full amount, and the overall time-consuming time is also greatly improved.
  2. After splitting, the test merges the increments from multiple partitions into one partition for consumption. This will not affect the real-time performance, and the real-time error can be guaranteed within 1min, which is acceptable.
  3. Set the TTL attribute for different fields, import full correction data regularly, and supplement the data during the full import period to avoid data overwriting.

Service failure discovery

At present, we have upgraded to v2.6.1. There are some bugs in the original v1 and v2.0.1. It is often easy to cause OOM to cause graph down. At that time, in order to reduce the impact of graph down as much as possible, relevant scripts were made to monitor the process, and if there was a downtime, it would be restarted immediately.

In addition, in order to improve the availability of the overall service, the CPU , memory, hard disk, TCP , etc. of the cluster nodes are monitored and alarmed accordingly. Some indicators of the service layer of Nebula Graph are also connected to Grafana . The more important alarm indicators are as follows:

nebula_metad_heartbeat_latency_us_avg_60 > 200000

nebula_graphd_num_slow_queries_rate_60 > 60
nebula_graphd_slow_query_latency_us_avg_60 >400000   
nebula_graphd_slow_query_latency_us_p95_60 > 900000

nebula_graphd_num_query_errors_rate_60 > 10        

nebula_storaged_add_edges_latency_us_avg_60 > 90000       
nebula_storaged_add_vertices_latency_us_avg_60 > 90000      
nebula_storaged_delete_edges_latency_us_avg_60 > 90000      
nebula_storaged_delete_vertices_latency_us_avg_60 > 90000    
nebula_storaged_get_neighbors_latency_us_avg_60 > 200000

At the same time, the interface of the application layer has done slow query and traffic monitoring alarm:

Nebula Graph 在企查查的应用

Later, Nebula officially launched Nebula Dashboard for monitoring indicators in various aspects, but it seems that there is no alarm for the time being (the enterprise version has an alarm function).

After the previous introduction, the basic framework process of Nebula Graph on our side is as follows:

Nebula Graph 在企查查的应用

Nebula Graph's classic business in Qichacha

  • subgraph query
    Business requirements: Display the corporate relationship within two hops of a company/person (for example: Director, Supervisor, Gao Fa), and visually display it in the form of a graph.
    For more and more detailed information, you can visit the official website of Qichacha. There are more and more comprehensive enterprise/individual relationship maps, risk maps, and shareholder relationship penetration. Portal

Nebula Graph 在企查查的应用

  • find a relationship
    Business requirements: Find the relationship between any two or more entities (company or person), including but not limited to directors, supervisors, high law, holdings, historical directors, supervisors, high law, and historical holdings.
    Unfortunately, Nebula Graph's current official path query, after several rounds of cross-comparison tests, found that the performance loss after the cut was large, and it was temporarily unable to meet business needs. We have not yet switched from Neo4j to Nebula Graph. After many communications, we raised issue . Currently looking forward to the official optimization in enhancement list , we will continue to follow up.

Nebula Graph 在企查查的应用

Outlook

Nebula currently provides super reading and writing ability and rich ecology, as well as excellent community activity, official support, etc. However, the complex query expression ability, path finding and node filtering still need to be strengthened. We hope that the community will become stronger and stronger, and related functions will be improved as soon as possible. We also conveniently switch to Nebula Graph without having to maintain two sets of databases.

Exchange graph database technology? To join the Nebula exchange group, please fill in your Nebula business card first at , and the Nebula assistant will pull you into the group~~

Follow the public number


NebulaGraph
169 声望684 粉丝

NebulaGraph:一个开源的分布式图数据库。欢迎来 GitHub 交流:[链接]