Abstract: graph database, if you are a newcomer, you may be misled by its literal meaning. In fact, the graph database does not refer to the database storing pictures and images, but the database storing the data structure of the graph. So what is the picture?
This article is shared from HUAWEI CLOUD Community " Graph Database? ", the original author: Hello_TT.
In recent years, there is a database that is widely mentioned and used in the process of big data processing, that is, the graph database. So what exactly is a graph database?
If you are new to the graph database, you may be misled by its literal meaning. In fact, the graph database does not refer to the database storing pictures and images, but the database storing the data structure of the graph. So what is the picture?
What is a graph
Let's get to know it through the following example.
At the end of the Eastern Han Dynasty, the combined forces of Sun Quan and Liu Bei defeated Cao's army by attacking enemy ships with fire in the area of Chibi.
If we abstract the relationship between the various camps, taking the camp as the point and the relationship between the camps as the edge, then we can use the following diagram to visualize the above relationship:
The above is the so-called graph (visual display) here.
We call this data structure that stores the relationship between entities and entities as graphs. Graphs are composed of points and edges. A point is an entity. For example, in the camp in the above example, the relationship between two entities is used Directional or non-directional side to indicate, such as the alliance relationship between Liu Bei and Sun Quan. This general structure can model various scenarios in reality, from transportation systems to organizational structure management, from process design to social networks.
What is a graph database
Knowing the concept of graphs, you can understand what a graph database is. To put it simply, the graph database is a tool used to process the data structure of the graph.
Unlike traditional relational databases that use two-dimensional tables to store data, graph databases are classified as a type of NoSQL (Not Only SQL) database in the traditional sense, which means that graph databases are non-relational databases.
A general graph database contains at least three functions: graph storage, graph query, and graph analysis.
Why use graph database
Then why do we use a graph database? Let's use the example from the end of the Eastern Han Dynasty to explain the advantages of graph databases over relational databases.
Assume that there are three tables in a relational database, namely, a list of characters in the late Eastern Han Dynasty, a list of battles in the late Eastern Han Dynasty, and a list of characters participating in the battle in the late Eastern Han Dynasty.
When we want to know "Who is the defender of the Battle of Fancheng", the query is generally faster, which can be obtained directly from Table 2, but when we want to know "Which wars the Liu Bei Group launched", even though we can also get it from the table 2 Find the answer, but we may need to traverse the entire table 2, and the query efficiency will instantly decrease. And when we want to query such as "Which wars Guan Yu fought in the Liu Bei Group", let's take a look at how the relational database does when this query is executed:
A. First find the character ID corresponding to Guan Yu through the character list of the end of the Eastern Han Dynasty
B. Then use the list of characters in the end of the Eastern Han Dynasty to find their battles
C. Finally, find out which battles he participated in through the battle list of the last years of the Eastern Han Dynasty. The attacking party was the Liu Bei Group.
We will find that this query is too cumbersome.
And if we transform the above table into the following relationship map, then it is clear who and who are in what relationship.
So maybe you haven't really appreciated the great power of graph database. Let's take a look at the data of query performance comparison in the most classic social network.
In the book "Neo4j in Action", the author made a test: In a social network containing 1 million people, each of whom has about 50 friends, find friends of friends with a maximum depth of 5, and the experimental results obtained are as follows :
The test results show that the performance of the two databases is not much different when the depth is 2, and they are very fast; when the depth is 3, the relational database takes half a minute to complete the query, and the graph database is still completed within 1 second; when the depth is 4 , The relational database took nearly half an hour to return the results, and the graph database was less than 2 seconds; and when the depth reached 5, the relational database was unable to respond for a long time, but the graph database could still "spike", showing very good performance Performance.
Based on this, we can understand why a graph database is used from the following aspects:
- relational databases are not good at handling relationships between data, while graph databases are flexible and high-performance in handling relationships between data
We cannot deny that relational databases have been the main force in the development of the database field since the 1980s. At present, with the rapid development of social, Internet of Things, finance, e-commerce and other fields, the resulting data has shown exponential growth. , And traditional relational databases perform poorly in dealing with complex relational data. This is because relational databases implement relational references between multiple tables through foreign key constraints. Querying the relationship between entities requires JOIN operations, and JOIN operations are usually very time-consuming.
The original design motivation of graph database is to better describe the relationship between entities. The biggest difference between graph databases and relational databases is index-free adjacency. Each node in the graph data model maintains the relationship between its neighboring nodes, which means that the query time has nothing to do with the overall scale of the graph, only the number of neighbors of each node, which makes the graph database handle a lot of complexity Maintain good performance in relationships.
In addition, the structure of the graph determines its easy to expand characteristics. We don’t have to consider all the details at the beginning of the model design, because it is easy to add new nodes, new relationships, new attributes and even new tags in the follow-up, and it will not destroy the existing query and application functions. .
- The relationship between data is becoming more and more important
When we are asking why graph databases are so important, we are actually asking, why is the relationship between data so important? Just as everyone knows the value of interpersonal relationships, the value of data also lies in the relationship between them.
for example. Recently, live streaming has been very popular. If a certain anchor has millions of fans on Weibo, this data will not be of great value if it is not used, but if he is live streaming, he will pay attention to his fans and possibly come to him. When the customers shopping in the live broadcast room are connected, these data immediately show great commercial value.
- Use graphs to express many things in the real world more directly, more intuitively, and easier to understand
There are a variety of relationships in nature, and relational databases can only flatten these rows and columns into tabular data, while graph data simulates these relationships in an intuitive way based on a graph model, so it is more vivid.
In addition, most graph databases now provide visual graph display, making query and analysis very intuitive.
- Professional graph analysis algorithms provide solutions for actual scenarios
The graph database originated from graph theory. With the help of professional graph analysis algorithms, it can provide suitable solutions for actual scenarios.
How the graph database is stored, queried, and analyzed
Graph storage
How the graph database stores graphs is crucial to the efficiency of query and analysis. The graph database uses graph models to manipulate graph data. The so-called graph model refers to the way the graph database describes and organizes graph data.
At present, the graph model chosen by the mainstream graph database is the attribute graph. The attribute graph is composed of points, edges, labels and attributes. Let's take a look at a specific instance of the attribute graph.
The above attribute diagram can help us understand some related concepts:
1) You can set labels for points, such as person, war, etc. Points with the same label are considered to belong to a group and a set, so Liu Bei and Cao Cao belong to the same group;
2) The label can also be set for the edge, and the label can be relation, etc.;
3) A node can have many attributes, such as style name, year, etc. These attribute values are expressed in the form of key-value pairs, for example: Liu Bei’s style name is Xuande;
4) Edges can also have attributes, such as army, etc.;
5) The sides are allowed to have directions, for example, the direction of the sides between Liu Bei and the Battle of Hanzhong is from Liu Bei to the Battle of Hanzhong;
6) Metadata is used to describe the attribute information of points and edges. Metadata consists of several tags, and each tag consists of several attributes.
Graph query
If we want to know where Liu Bei's birthplace is, what is the relationship between Liu Bei and Cao Cao, who is the initiator of the Battle of Hanzhong, etc., these all belong to the category of graph query.
We know that SQL is the query language of relational databases, but the query language of graph databases does not reuse SQL. This is because in essence the graph database deals with high-dimensional data, while SQL applies to a two-dimensional data structure, which is not good at relational queries and operations. Using a special graph query language is more efficient than SQL.
The current mainstream graph query languages include Gremlin and Cypher.
Graph analysis
Graph analysis refers to a technique for mining graph information through various graph algorithms.
The core graph algorithms can be divided into three categories: path search, centrality analysis, and community discovery.
Path search is to explore the direct or indirect connections established by nodes in the graph through edges. For example, in the figure below, through the path search, we found such a path: Sun Ce-[ Husband and Wife]-Daqiao-[Sisters]-Xiaoqiao-[ Husband and Wife]-Zhou Yu, according to which we know that Sun Ce and Zhou Yu are brother-in-law. Path search algorithms are widely used in scenarios such as logistics distribution and social relationship analysis.
Centrality analysis refers to the analysis of the importance and influence of a particular node in the graph. For example, in the above picture, intuitively, Sun Quan is an important figure because the number of edges directly connected to him is the largest. Centrality analysis algorithms are generally used in scenarios such as web page ranking, opinion leader mining, and influenza spread.
The purpose of community discovery is to discover the more closely connected group structure in the picture. If more characters and relationships of the Three Kingdoms are added to the above figure, using Louvain and other community mining algorithms, we can easily find that these characters belong to three camps, as shown in the figure below.
Community discovery algorithms can be used in scenarios such as criminal group mining.
What is the use of graph database
After introducing the main functions of the graph database, let's take a look at the application scenarios of the graph database. The application areas that graph databases are good at include:
Social area: Facebook, Twitter use it for social relationship management, friend recommendation
Recommended by our familiar friends. The method of recommending friends of friends can be adopted.
Xu Shu and Sima Hui recommended to Liu Bei that Zhuge Liang can be vividly shown in the following picture
E-commerce field: Huawei Mall uses it to realize real-time product recommendation
By analyzing the favorite products of the target user and other users, find other users who are similar, and recommend the products purchased by these users to the target user.
Financial field: Industrial and Commercial Bank of China, JP Morgan General Manager, it will do risk control management
At present, the financial sector has an urgent demand for graph databases. Taking loans as an example, graph databases can play a huge role in the entire loan cycle.
Anping area: police use it to conduct suspicious relationship review and criminal gang excavation
At the end of the Eastern Han Dynasty, Cao Cao assassinated Dong Zhuo, Diao Chan provoked the relationship between Dong Zhuo and his son, and Lu Bu killed Dong Zhuo, but Dong Zhuo didn't know that Wang Yun was one of the main culprits behind these incidents, as shown in the following figure. This may also be the case in reality. The real culprit behind the scenes may not have a direct relationship with the target case, only an indirect relationship.
What kind of scene is suitable for using graph database
You can judge whether your problem requires a graph database based on the following points:
If you frequently have many-to-many relationships in your problem, it is recommended that you choose a graph database;
If the relationship between the data in your question is very important, it is recommended that the graph database is preferred;
If you need to deal with the relationship between large-scale data sets, it is recommended that the graph database is preferred.
Graph database products
Nowadays, a hundred schools of thought have appeared in graph database products. Neo4j, as a representative of the old graph data, although still has many fans, due to its own shortcomings, the number of challengers is increasing. The Huawei Cloud Graph Engine Graph Database GES of the domestic graph databases. Light is becoming one of the best.
GES user interface
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。