Neo4j is an open source graph database. You can use Py2neo to access Neo4j in Python. This article introduces how to use Py2neo to access Neo4j and create nodes and relationships in batches.
Py2neo provides a method to directly execute Cypher statements, and also provides a series of data structures such as Node, Relationship, and Path, which can be used flexibly in different scenarios.
The Py2neo used in this article is a version after 2021.1. For the manual, please click here:
The Py2neo Handbook
Install Py2neo
Use pip to install Py2neo, execute:
pip install py2neo
Check what version of Py2neo is installed:
pip show py2neo
Name: py2neo
Version: 2021.1.5
Summary: Python client library and toolkit for Neo4j
Home-page: https://py2neo.org/
Connect to Neo4j database
A variety of data types will be used in this article, and I will quote them here
import numpy as np
import pandas as pd
from py2neo import Node,Relationship,Graph,Path,Subgraph
Configure the access address, user name and password of the Neo4j database
neo4j_url = 'http://localhost:7474/'
user = 'neo4j'
pwd = 'admin'
The way to access the database before 2021.1 is:
graph = Graph(neo4j_url, username=user, password=pwd)
The way to access the database after 2021.1 is (it is so incompatible):
graph = Graph(neo4j_url, auth=(user, pwd))
1. Use graph.run to execute Cypher statements to create nodes
If you are familiar with Cypher statements, you can use graph.run to execute Cypher statements to implement operations such as creating nodes, as shown below:
cypher_ = "CREATE (:Person {name:'王王', age:35, work:'宇宙电子厂'}),\
(:Person {name:'李李', age:20, work:'宇宙电子厂'})"
graph.run(cypher_)
In this way, two nodes with the label of Person are created in Neo4j. The name attribute of the first node is "Wang Wang", the age attribute is 35, the work attribute is "Universe Electronics Factory", and the name attribute of the second node is "Li Li" has an age attribute of 20 and a work attribute of "Universe Electronics Factory".
Similarly, relationships can be created by calling graph.run to execute Cypher statements.
cypher_ = "MATCH (from:Person{name:'王王'}),\
(to:Person{name:'李李'}) MERGE (from)-[r:同事]->(to)"
graph.run(cypher_)
In this way, there are two Person nodes with colleague relationships in Neo4j.
2. Use the Node data structure to create a node
Py2neo also provides graph.create method to create nodes and relationships
node = Node("Person", name="李李", age=20, work="宇宙电子厂")
graph.create(node)
The effect of executing Cypher statement is the same, a Person node is created in Neo4j.
should be noted that these two creation methods, if executed repeatedly, will create duplicate nodes in Neo4j, that is, the name, age, and work attributes are exactly the same, but the id in Neo4j is different. node.
3. Use Node, Relationship, and Subgraph data structures to create nodes and relationships
The above two methods are to create one node or one relationship at a time. Py2neo also provides a method to create nodes and relationships in batches, and the performance is better. Let's take the graph in the following figure as an example, use Py2neo to provide Node, Relationship, and Subgraph data structures to create nodes and relationships in Neo4j.
First create some nodes whose label is Person, that is, Node objects. The first parameter is label, and the attributes are passed in as parameters according to key=value. If the node has multiple labels, you can use Node.add_label("label_text") to add labels.
node1 = Node("Person", name="王王", age=35, work="宇宙电子厂")
node2 = Node("Person", name="李李", age=20, work="宇宙电子厂")
node3 = Node("Person", name="张张", age=30, work="宇宙电子厂")
node4 = Node("Person", name="赵赵", age=45, work="月亮中学")
node4.add_label("Teacher")
node5 = Node("Person", name="刘刘", age=20, work="地球电子商务公司")
Create some more nodes labeled as Location
node6 = Node("Location", name="南京")
node7 = Node("Location", name="江宁区")
node8 = Node("Location", name="禄口机场")
Establish some relationships between Person and Person nodes. The relationship in Neo4j is directional, so the first parameter of the Relationship is the start node, the third parameter is the end node, and the second node is the type of relationship. The relationship between colleagues and neighbors created here is two-way, and the relationship between teachers and students is one-way.
relation1 = Relationship(node1, "同事", node2)
relation2 = Relationship(node2, "同事", node1)
relation3 = Relationship(node2, "同事", node3)
relation4 = Relationship(node3, "同事", node2)
relation5 = Relationship(node3, "邻居", node4)
relation6 = Relationship(node4, "邻居", node3)
relation7 = Relationship(node4, "学生", node5)
relation8 = Relationship(node5, "老师", node4)
Create some relationships between Location and Location nodes, and the containment relationship between regions is one-way.
relation9 = Relationship(node6, "包含", node7)
relation10 = Relationship(node7, "包含", node8)
Create the relationship between the Person node and the Location node, where the "visit" relationship has attributes, date represents the date of the visit, and stay_hours represents the stay time. You can use a key: value dictionary data structure to store attributes, and then assign relationships
properties1={'date':'2021-7-16','stay_hours':1}
relation11 = Relationship(node2, "到访", node8, **properties1)
properties2={'date':'2021-7-19','stay_hours':4}
relation12 = Relationship(node5, "到访", node8, **properties2)
Then combine all the above nodes and relationships into Subgraph
node_ls = [node1, node2, node3, node4,
node5, node6, node7, node8]
relation_ls = [relation1, relation2, relation3, relation4,
relation5, relation6, relation7, relation8,
relation9, relation10, relation11, relation12]
subgraph = Subgraph(node_ls, relation_ls)
Finally, submit these nodes and relationships in batches through transaction submission. Here tx.create does not really create nodes and relationships, until graph.commit is submitted to Neo4j for creation at one time.
tx = graph.begin()
tx.create(subgraph)
graph.commit(tx)
Repeating the above command, will not create duplicate nodes and relationships . This point is stated in the manual: "The entities in the subgraph that have been bound to the database will remain unchanged, and those that are not bound will be created and bound in the database."
create(subgraph) Create remote nodes and relationships that correspond to those in a local subgraph. Any entities in subgraph that are already bound to remote entities will remain unchanged, those which are not will become bound to their newly-created counterparts.
Performance comparison
Do a simple experiment to roughly compare the time cost of one-by-one creation and batch creation. In the case of an empty database in Neo4j, create 10,000 nodes one by one and batch creation respectively. Each node has two attributes of name and age, which are randomly generated. Use the %%time command of jupyter notebook to calculate the time. Overhead.
import random
N = 10000
Create nodes one by one
%%time
for i in range(N):
random_name = "P"+str(round(random.random()*N*2))
random_age = round(random.random()*15)
node = Node("Person", name=random_name, age=random_age)
graph.create(node)
CPU times: user 50.3 s, sys: 4.19 s, total: 54.5 s
Wall time: 5min 16s
Create nodes in batch
%%time
node_ls = []
for i in range(N):
random_name = "P"+str(round(random.random()*N*2))
random_age = round(random.random()*15)
node = Node("Person", name=random_name, age=random_age)
node_ls.append(node)
subgraph = Subgraph(node_ls, [])
tx = graph.begin()
tx.create(subgraph)
graph.commit(tx)
CPU times: user 448 ms, sys: 75.5 ms, total: 523 ms
Wall time: 1.46 s
The experiment also found that the time overhead of the batch creation method increases almost linearly when just creating nodes. When creating tasks for 100,000 nodes are submitted at a time, the time overhead is about 4.5 seconds.
summary
When using Py2neo to build a map, use the batch creation method as much as possible. First create the Node object and the Relationship object, then form the Subgraph, and finally use the transaction class to submit the creation at one time.
The next article will introduce how to use Py2neo to query nodes, relationships and paths.
My python version
>>> import sys
>>> print(sys.version)
3.7.6 (default, Jan 8 2020, 13:42:34)
[Clang 4.0.1 (tags/RELEASE_401/final)]
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。