OPPO 图数据库平台建设及业务落地

This article was first published on the OPPO Digital Intelligence Technology Official Account, WeChat ID: OPPO_tech

1. What is a graph database

Graph database is a database that is stored and queried in a data structure like graphs. Unlike other databases, the relationship occupies the primary position in the graph database. This means that the application does not have to use foreign keys or out-of-band processing (such as MapReduce) to infer data connections. Compared with relational databases or other NoSQL databases, the data model of graph databases is also simpler and more expressive.

Graph databases are widely used in social networks, knowledge graphs, financial risk control, personalized recommendations, network security and other fields.

2. Graph database survey

2.1. Research background

With the continuous growth of business data such as knowledge graphs, the existing graph database JanusGraph has been struggling to deal with it, and the import time can no longer meet the requirements of the business. Therefore, finding a better-performing open source attribute graph database has become an urgent task.

The new graph database should meet the following requirements:

  • Able to support a large-scale graph with 1 billion nodes, 10 billion edges, and 17 billion attributes
  • Full import time does not exceed 10h
  • The average response time of the second query does not exceed 50ms, and the QPS can reach 5000+
  • Open source and support distributed attribute graph database

2.2. Research process

The first step is to collect common open source distributed attribute graph databases, as shown in the following table:

OPPO 图数据库平台建设及业务落地

In the second step, based on the test reports of graph databases such as Meituan, LightGraph, TigerGraph, GalaxyBase, etc., the performance of several graph databases can be analyzed as follows:

  • Import: Nebula Graph> HugeGraph> JanusGraph> ArangoDB> OrientDB
  • Query: Nebula Graph> HugeGraph> JanusGraph> ArangoDB> OrientDB

Nebula Graph has excellent performance both in import and query performance.

In the third step, in order to verify the performance of Nebula Graph, a performance comparison test was performed on Nebula Graph and JanusGraph. The test results are as follows:

OPPO 图数据库平台建设及业务落地

In the above figure, the performance of JanusGraph is regarded as 1. The import performance of Nebula Graph is an order of magnitude faster than that of JanusGraph, and the query performance is 4-7 times that of JanusGraph. And as the amount of concurrency increases, the performance gap will further widen, and JanusGraph starts with 20 threads, and the third-degree neighbor query will have errors. And Nebula Graph has no errors.

Nebula Graph fully imports 1 billion nodes and 10 billion edges in only 10 hours, which meets the requirements. Currently, SST import is being investigated, which can greatly increase the import speed.

Nebula Graph uses 120 threads to perform a second-degree neighbor query stress test, and the final QPS is 6000+, which is a little better than a single machine. The success rate is close to 5 9s, and the response time is relatively stable, with an average of 18.81ms, p95 38ms, and p99 only 115.6ms, which meets the requirements.

OPPO 图数据库平台建设及业务落地

2.3. Research conclusion

Nebula Graph's import performance, response time, and stability meet the requirements, support data segmentation, the distributed version is free and open source, and it is used by many companies. It has Chinese documents, comprehensive documents, and an active community. It is an ideal choice for open source graph databases.

3. Introduction to Nebula Graph

OPPO 图数据库平台建设及业务落地

The picture comes from the Nebula Graph documentation site

Nebula Graph is an open source, distributed, and easily scalable native graph database, capable of hosting ultra-large data sets of hundreds of billions of points and trillions of edges, and providing millisecond-level queries.

Nebula Graph is written in C++ based on the characteristics of the graph database, adopts a shared-nothing architecture, supports expansion and contraction without stopping the database service, and provides a lot of native tools, such as Nebula Graph Studio, Nebula Console, Nebula Exchange, etc., Can greatly reduce the threshold of using the graph database.

OPPO 图数据库平台建设及业务落地

The picture comes from the Nebula Graph documentation site

Nebula Graph is composed of three services: Graph service, Meta service and Storage service. It is an architecture that separates storage and computing.

The Meta service is responsible for data management, such as Schema operation, cluster management, and user authority management. The service is provided by the nebula-metad process. In the production environment, it is recommended to deploy 3 nebula-metad processes in the Nebula Graph cluster. Please deploy these processes on different machines to ensure high availability. All nebula-metad processes form a cluster based on the Raft protocol. One process is the leader and the other processes are followers.

The Graph service is mainly responsible for processing query requests, including parsing query statements, verifying statements, generating execution plans, and executing four major steps according to the execution plan. The service is provided by the nebula-graphd process and can be deployed in multiple steps.

The Storage service is responsible for storing data. The service is provided by the nebula-storaged process. All nebula-storaged processes form a cluster based on the Raft protocol. Data is stored in nebula-storaged partitions. Each partition has a leader and other replica sets. The follower of the partition.

4. Construction of graph database platform

When I used JanusGraph before, I encountered problems such as slow import, slow query, high concurrency OOM (JanusGraph thread pool uses unbounded queues), FULL GC (Business Gremlin statement contains Value, which leads to continuous expansion of metaspace). After switching to Nebula Graph, it was basically solved.

OPPO 图数据库平台建设及业务落地

JanusGraph does not have an easy-to-use management interface. As shown in the figure above, we have developed a set of management interfaces that include multi-graph management, Schema management, graph visualization, graph import, and authority management.

Nebula Graph Studio provides functions such as multi-graph management, Schema management, graph visualization, graph import, etc., which saves a lot of development work and lowers the barrier to use.

OPPO 图数据库平台建设及业务落地

The structure of the entire graph database platform is shown in the figure above. Based on the official tools of Nebula Graph and Nebula Graph, functions such as full import, incremental import, graph export, backup/restore, and query engineering (graph retrieval) are developed.

The official import tool needs to provide an import configuration file. In order to make it easier for business use, we have designed a schema configuration form. The business only needs to fill in the form. When importing, it will automatically create a map, create a schema for the map, and automatically generate an import configuration file. Import data, automatically balance data, balance the leader, create indexes, and perform compact tasks. Currently, it is still the import method of batch writing. SST import will be investigated in the future, and the import performance can be further improved.

The official import tool uses an asynchronous client. It is extremely difficult to control the import rate when importing. If the setting is too large, it will easily lead to a backlog of graph database requests and affect the stable operation of the cluster. If the setting is too small, the speed cannot reach the optimal level, and the import is too slow. We modified the source code of the official import tool and changed the asynchronous client to a synchronous client, which can take into account both performance and stability.

The official does not provide an export tool. We have developed an export tool based on the official nebula-spark. In addition to exporting data, it can also export Schema configuration and index configuration to facilitate business data migration.

In order to support data rollback, we have developed a function to quickly backup and restore the data of the specified map. However, this function cannot back up the map metadata. The full import will delete and rebuild the map. Due to the metadata change, the previous backup is useless. . In the future, we will try to import all the data, only clean up the data without deleting the graphs to avoid this problem.

There are many types of edges in the knowledge graph business, and it is often necessary to query dozens or hundreds of edges in a single query. In fact, each type of edge only needs to return Top 10 results (sorted by rank). This situation is very difficult to achieve through nGQL. You can only query all the data of these edges, or the Top N data of all edges together. The former has performance problems, and the latter can often only return data of some types of edges, which cannot be satisfied. need. In response to this situation, we have classified the edges, and for those edge types with a small number, a statement queries all the data. For a large number of edge types, use multithreading to query the Top 10 of each edge in parallel, so that certain circumvention can be performed.

In order to ensure the high availability of services, we have implemented a dual-computer room deployment. In order not to allow the upper-level business to perceive the switching of the computer room, a query project (graph retrieval) is performed on the upper layer of the graph database. The business directly calls the service of the query project, and the query project will select the appropriate graph database cluster query according to the cluster status. In addition, in order to shield the upper-level business from changes and version upgrades of the underlying graph database, the query project will manage all business query statements. When the query statement in the graph database is incompatible due to the version upgrade, you only need to adjust the graph query language in the query project to avoid affecting the upper-level business. At the same time, the query project also caches the query results, which can greatly improve the throughput of graph query.

Of course, we still encounter some problems, such as ranking invalidation due to large and small end problems, query results only return edge type id, etc. Due to space reasons, we will not list them here. These problems have been circumvented through the help of Nebula Graph community. Or resolve.

*Note: The Nebula Graph issues mentioned above are only for the V1.2.0 version, and many issues have been fixed in subsequent versions.

5. Business landing

5.1. Knowledge Graph and Intelligent Questions and Answers

Before using Atlas, Xiaobu Assistant only supports document-based question and answer DBQA. DBQA uses unstructured text, which is suitable for answering explanatory and narrative questions such as Why and How. However, the accuracy and coverage of answers to factual questions are not accurate. high.

After using Atlas, Xiaobu Assistant supports KBQA based on the knowledge base, and the accuracy and coverage of factual questions such as What and When have been greatly improved. For example: Who is xxx's wife? What is the weight of xxx Ultraman? What is the area of Beijing?

In addition to factual questions and answers, Xiaobu Assistant can also use the reasoning ability of the graph to implement some complex questions and answers: for example: What is the relationship between xxx and xxx? What is the first phone released by OPPO? What movies do xxx and xxx play together? Who are the Gemini stars who were born in xx?

Since knowledge graphs have large-scale semi-structured data, and there are many relationships between the data, the use of relational databases cannot meet the storage and query requirements, while graph databases can solve large-scale graph storage and multi-hop queries. challenge.

5.2, content label

OPPO 图数据库平台建设及业务落地

In some recommended scenarios, you need to understand the content of video, audio, or text, and label it with content-related tags. For example, in short video recommendation, understanding the content of the video is conducive to accurate recommendation to users.

For film and television videos, the actors, film and television programs, and roles are constructed into a film and television entertainment map. When a new film and television short video is released, the actors can be identified through the face of the video, and the film and television roles can be identified in the title or subtitles. Use the graph to quickly infer the corresponding film and television works, and label the content of the video to improve the recommendation effect.

5.3 Data blood source

OPPO 图数据库平台建设及业务落地

In the data warehouse, it is often necessary to run various ETL jobs, and there are many data tables and tasks. How to intuitively observe the relationship between the upstream and downstream of the data table and the tasks has become an urgent problem to be solved.

It is very troublesome to use relational database to process multi-level related queries, not only the development workload is large, but also the query performance is extremely slow. The use of graph database not only greatly reduces the development workload, but also quickly finds out the upstream and downstream relationships of the table, which is convenient for visually observing the blood relationship of the data.

5.4, service architecture topology

OPPO 图数据库平台建设及业务落地

In service resource management, business resources are divided into multiple levels, and each level has corresponding servers, services, and management personnel. If you use a relational database to process, when you need to display multi-level resources, the query will be very troublesome. Performance will be poor. At this time, you can put the relationship between resources, managers, servers, and business levels in the graph database. When it is displayed, a query statement can be completed, and the query speed is still very fast.

6. Summary

Through the implementation of business practices such as knowledge graphs, the transition from JanusGraph to Nebula Graph has been completed. The import performance has been improved by an order of magnitude, and query performance and concurrency have been improved by 3-6 times. Moreover, Nebula Graph is more stable than JanusGraph. In the course of practice, I also encountered many problems and got a lot of help from the Nebula Graph community. Thank you very much for the support of the community!

The graph database has developed rapidly in recent years. Neo4j raised US$325 million in the first half of this year, breaking the database’s financing record. The report released by Gartner pointed out: "By 2023, graph technology will promote the rapid decision-making of 30% of global enterprises. The annual growth rate of graph technology applications exceeds 100%." With the popularization of 5G and the Internet of Things, graph databases will become Infrastructure for handling relationships.

7. Reference documents

About the Author

Qirong, OPPO senior back-end engineer, mainly engaged in graph database, graph computing and related fields.

If you have any errors or omissions in this article, please go to GitHub: https://github.com/vesoft-inc/nebula issue area to us or go to the official forum: https://discuss.nebula-graph. com.cn/ 's suggestion feedback classification suggestion 👏; exchange graph database technology? Please join Nebula exchange group under Nebula fill in your card , Nebula assistant will pull you into the group ~


NebulaGraph
169 声望686 粉丝

NebulaGraph:一个开源的分布式图数据库。欢迎来 GitHub 交流:[链接]