5
头图
Neo4j is an open source graph database, and Py2neo provides an interface to access Neo4j in Python. This article introduces the use of Py2neo's NodeMatcher and RelationshipMatcher to query the nodes and relationships in the graph, as well as the query method by executing Cypher statements.

​The Py2neo used in this article is a version after 2021.1. For the manual, please click here:
The Py2neo Handbook

Connect to Neo4j database

A variety of data types will be used in this article, and I will quote them here

import numpy as np
import pandas as pd
from py2neo import Node,Relationship,Graph,Path,Subgraph
from py2neo import NodeMatcher,RelationshipMatcher

Configure the access address, user name and password of the Neo4j database

neo4j_url = 'http://localhost:7474/'
user = 'neo4j'
pwd = 'admin'

The way to access the database before 2021.1 is:

graph = Graph(neo4j_url, username=user, password=pwd)

The way to access the database after 2021.1 is:

graph = Graph(neo4j_url,  auth=(user, pwd))

Take the following picture as an example:

  • The figure contains some Person nodes, and each Person node has name, age, and work attributes;
  • Among them, the "Zhao Zhao" node is a multi-label node. In addition to the Person label, it also has the Teacher label;
  • There are colleagues, neighbors, students, teachers, etc. between Person and Person nodes;
  • There are also some Location nodes in the figure, and there is a containment relationship between them;
  • There is a "visit" relationship between the Person node and the Location node, and the "visit" relationship has two attributes: date and stay_hours.

1. Query the types of nodes and relationships in the graph through graph.schema

See node type with graph.schema.node_labels , see the type of relationship with graph.schema.relationship_types , they return type is frozenset, it can not be added or deleted set of elements.

>>>graph.schema.node_labels 
frozenset({'Location', 'Person', 'Teacher'})
>>>graph.schema.relationship_types
frozenset({'到访', '包含', '同事', '学生', '老师', '邻居'})

2. Use NodeMatcher to query nodes

First create a NodeMatcher object, use match to specify which label node to match, and use where to represent the filter conditions (there are two methods). It should be noted that the NodeMatcher object returned by the match successfully. To transform it into a Node object, you can use first to take out the first node that meets the conditions, or transform it into a list of nodes.

>>>node_matcher = NodeMatcher(graph)
>>>node = node_matcher.match("Person").where(age=20).first()
>>>node
Node('Person', age=20, name='李李', work='宇宙电子厂')
>>>nodes = list(node_matcher.match("Person").where(age=35))
>>>nodes
[Node('Person', age=35, name='王王', work='宇宙电子厂')]

There are two ways of writing where conditions. One is to write the attributes and values ​​to be matched in the form of key=value, such as where(age=20) above. This way of writing can only match according to whether the value is exactly the same, not according to the value. To filter by the size, if it is written as follows, an error will be reported:

node = node_matcher.match("Person").where(age>20).first() # 错误

If you want to filter according to the size of the value or do some fuzzy matching of strings, you can write the conditional expression as a string and put the whole in the where statement. In this string, you can use _ to refer to the matched node. In the following two examples, the first is to match the Person node whose work attribute is "Moon XX" mode, and the other is to match the Person node whose age is greater than 20.

>>>node = node_matcher.match("Person").where("_.work =~ '月亮.*'").first()
>>>node
Node('Person', 'Teacher', age=45, name='赵赵', work='月亮中学')
>>>nodes = list(node_matcher.match("Person").where("_.age > 20"))
>>>nodes
[Node('Person', age=35, name='王王', work='宇宙电子厂'),
 Node('Person', age=30, name='张张', work='宇宙电子厂'),
 Node('Person', 'Teacher', age=45, name='赵赵', work='月亮中学')]

After converting the result returned by NodeMatcher into the Node data type or Node's list, it is very simple to access the attributes. As in the result of the last example above, access the name attribute of the first node:

>>>nodes[0]['name']
'王王'

3. Use RelationshipMatcher to query relationships

The match method of RelationshipMatcher has three or more parameters:

  • The first parameter is the sequence or set of the node, which can be None, which means that any node can be used;
  • The second parameter is the type of relationship, which can be None, which means that any type of relationship can be used;
  • The third parameter starts with the attribute to be matched, written in the form of key=value.

The return value of the match method is of type RelationshipMatcher, which needs to be transformed into a Relationship data structure through first, or into a list.
For example, if you want to query all the relations of the "Li Li" node. Query the node first, and then query the relationship of the node. r_type=None means any type of relationship is acceptable. The returned relationship includes visiting and colleague.

>>>node1 = node_matcher.match("Person").where(name='李李').first()
>>>relationship = list(relationship_matcher.match([node1], r_type=None))
>>>relationship
[到访(Node('Person', age=20, name='李李', work='宇宙电子厂'), Node('Location', name='禄口机场'), date='2021/7/16', stay_hours=1),
 同事(Node('Person', age=20, name='李李', work='宇宙电子厂'), Node('Person', age=30, name='张张', work='宇宙电子厂')),
 同事(Node('Person', age=20, name='李李', work='宇宙电子厂'), Node('Person', age=35, name='王王', work='宇宙电子厂'))]

Example 2, to query the relationship between "Li Li" and "Zhang Zhang", indicates the direction of the relationship to be matched . Therefore, the colleague relationship between the "Li Li" and "Zhang Zhang" nodes in the entire graph is bidirectional, but the query result only gives a relationship from the "Zhang Zhang" node to the "Li Li" node.

>>>node1 = node_matcher.match("Person").where(name='李李').first()
>>>node2 = node_matcher.match("Person").where(name='张张').first()
>>>relationship = list(relationship_matcher.match((node2,node1), r_type=None))
>>>relationship
[同事(Node('Person', age=30, name='张张', work='宇宙电子厂'), Node('Person', age=20, name='李李', work='宇宙电子厂'))]

Example 3, to query a certain type of relationship in the graph, the first parameter is None, and the second parameter r_type specifies the relationship type. Here, all colleague relationships in the graph are queried.

>>>relationship = list(relationship_matcher.match(None, r_type='同事'))
>>>relationship
[同事(Node('Person', age=20, name='李李', work='宇宙电子厂'), Node('Person', age=30, name='张张', work='宇宙电子厂')),
 同事(Node('Person', age=20, name='李李', work='宇宙电子厂'), Node('Person', age=35, name='王王', work='宇宙电子厂')),
 同事(Node('Person', age=35, name='王王', work='宇宙电子厂'), Node('Person', age=20, name='李李', work='宇宙电子厂')),
 同事(Node('Person', age=30, name='张张', work='宇宙电子厂'), Node('Person', age=20, name='李李', work='宇宙电子厂'))]

Example 4, filter according to the value of the attribute when querying the relationship. You can write the attribute as key=value as the third parameter of the match method. Here, the visit relationship in the query graph, and the stay_hours attribute is 1.

>>>relationship = list(relationship_matcher.match(None, r_type='到访', stay_hours=1))
>>>relationship
[到访(Node('Person', age=20, name='李李', work='宇宙电子厂'), Node('Location', name='禄口机场'), date='2021/7/16', stay_hours=1)]

Although it is not written in the Py2neo manual, in fact, the RelationshipMatcher can also be connected to the where method to filter the relationship according to the value of the attribute. The above example can also be written in the following form, and the effect is the same.

relationship = list(relationship_matcher.match(None, r_type='到访').where(stay_hours=1))

Similarly, you can also write a string expression in the where method to filter relationships by value. For example, if you want to filter out all the visiting relationships and stay_hours>=1, you can write:

>>>relationship = list(relationship_matcher.match(None, r_type='到访').where("_.stay_hours>=1"))
>>>relationship
[到访(Node('Person', age=20, name='李李', work='宇宙电子厂'), Node('Location', name='禄口机场'), date='2021/7/16', stay_hours=1),
 到访(Node('Person', age=20, name='刘刘', work='地球电子商务公司'), Node('Location', name='禄口机场'), date='2021/7/19', stay_hours=4)]

How to access each attribute in the returned result, the relationship actually contains a pair of start and end nodes: start_node and end_node , which contains the type of relationship, and the relationship attribute is in the form of dictionary , which can be used get method to get the value of the attribute.
Get the start and end nodes of the relationship:

>>>print(relationship[0].start_node['name'])
>>>print(relationship[0].end_node['name'])
李李
禄口机场

Get the text string of the type of relationship

>>>print(relationship[0])
>>>print(type(relationship[0]).__name__)
(李李)-[:到访 {date: '2021/7/16', stay_hours: 1}]->(禄口机场)
到访

Get the attributes and values ​​in the relationship

>>>print(relationship[0].keys())
>>>print(relationship[0].values())
>>>print(relationship[0].get('date'))
dict_keys(['date', 'stay_hours'])
dict_values(['2021/7/16', 1])
2021/7/16

4. Query by executing Cypher statement

The matching conditions that NodeMatcher and RelationshipMatcher can express are relatively simple, and more complex queries still need to be expressed in Cypher sentences. Py2neo itself supports the execution of Cypher statements. You can write complex queries as Cypher statements, and query through the graph.run method. The returned results can be converted into pandas.DataFrame or pandas.Series objects, thus seamlessly connecting with other data analysis tools.

For example, to query the Person node, and satisfy the work attribute as "Universe Electronics Factory". Cypher statement can use WHERE followed by conditional expression, use AS to rename the returned attribute, when returning multiple attributes, use xxx AS x, yyy AS y. After graph.run method followed to_data_frame () may return data objects becomes DataFrame pandas, and with AS turn is the attribute name in the column name DataFrame .

cypher_ = "MATCH (n:Person) \
WHERE n.work='宇宙电子厂' \
RETURN n.name AS name, n.age AS age "

df = graph.run(cypher_).to_data_frame() # pd.DataFrame


Example 2, to query which nodes are related to a known node, and what kind of relationship does it have. When querying the relationship in Cypher language, use < or > indicate the direction. Here you need to return type(r) . If you directly return r, the result will be a null value.

>>>cypher_ = "MATCH (n:Person)-[r]->(m:Person) \
WHERE n.name='李李' \
RETURN type(r) AS type,m.name AS name"
>>>df = graph.run(cypher_).to_data_frame() # pd.DataFrame


Example 3, Cypher language can also query paths. Because the number of paths returned is uncertain, it is best to convert the results to pandas.Series first, and then traverse the nodes and relationships that visit each path.
The query here is the relationship path between the "Zhao Zhao" node and the "Wang Wang" node. The relationship is designated as a colleague or neighbor, and the relationship does not exceed 4 levels.

>>>cypher_ = "MATCH path=(m:Person)-[:同事|邻居*1..4]->(n:Person) \
WHERE m.name='赵赵' AND n.name='王王' \
RETURN path"
>>>s = graph.run(cypher_).to_series()
>>>print(len(s))
>>>s[0]
1
Path(Node('Person', 'Teacher', age=45, name='赵赵', work='月亮中学'),
邻居(Node('Person', 'Teacher', age=45, name='赵赵', work='月亮中学'), 
Node('Person', age=30, name='张张', work='宇宙电子厂')), 
同事(Node('Person', age=30, name='张张', work='宇宙电子厂'), 
Node('Person', age=20, name='李李', work='宇宙电子厂')), 
同事(Node('Person', age=20, name='李李', work='宇宙电子厂'), 
Node('Person', age=35, name='王王', work='宇宙电子厂')))

The number of relationship paths queried here is only one. It can also be seen from the results in the above figure that Path is a relatively complex structure. The nodes and relationships in Path are represented by nodes and relationships , and are stored in the order of of the nodes and relationships on the path. of. Here is a sample code that directly prints the path data structure and organizes the path text for each path.

for path in s:
    # 直接打印path
    print(path)
    # 获取路径中的节点和关系
    nodes = path.nodes
    relationshis = path.relationships   
    # 自己组织路径文本
    path_text = ""
    for n,r in zip(nodes, relationshis):
        # 每次加入一个节点和一个关系的类型
        path_text += "{} - {} - ".format(n['name'], type(r).__name__)
    # 别忘了最后一个节点
    path_text += nodes[-1]['name'] + '\n'
    print(path_text)

The result of running this code is as follows, the upper line is the result of printing the path directly, and the lower line is the result of organizing the text by yourself.

(赵赵)-[:邻居 {}]->(张张)-[:同事 {}]->(李李)-[:同事 {}]->(王王)
赵赵 - 邻居 - 张张 - 同事 - 李李 - 同事 - 王王

summary

When using Py2neo to query the nodes, relationships and paths in Neo4j, queries with simple conditions can be implemented through NodeMatcher and RelationshipMatcher. For more complex queries, you can write Cypher statements to query, and the query results can be converted into pandas DataFrame or Series data types and combined with other data analysis tools.

My python version

>>> import sys
>>> print(sys.version)
3.7.6 (default, Jan  8 2020, 13:42:34) 
[Clang 4.0.1 (tags/RELEASE_401/final)]

_流浪猫猫_
144 声望16 粉丝

个人订阅号Python拾贝,不定期更新