Abstract: Gremlin is the most common basic query language used in graph database queries. Gremlin's Turing completeness allows it to write very complex queries. For complex problems, how do we write a complex query? And how do we understand the existing complex queries? This article will take you step by step to complete the debugging of complex queries.
This article is shared from Huawei Cloud Community " Complex Gremlin Query Debugging Method ", the original author: Uncle_Tom.
1. Introduction to Gremlin
Gremlin is a graph traversal language under the framework of Apache TinkerPop. Gremlin is a functional data flow language that allows users to express complex property graph traversal or query in a concise way. Each Gremlin traversal consists of a series of steps (which can be nested), and each step performs an atomic operation on the data stream.
Gremlin is a language used to describe walking in attribute graphs. The graph traversal is performed in two steps.
1.1. Traversal Source (TraversalSource)
Start node selection. All traversals start by selecting a set of nodes in the database, which serve as the starting point for walking in the graph. Traversal in Gremlin starts from TraversalSource. GraphTraversalSource provides two traversal methods.
- GraphTraversalSource.V (Object… ids): traverse from the vertices of the graph (if no id is provided, all vertices).
- GraphTraversalSource.E (Object… ids): traverse from the edge of the graph (if no id is provided, then all edges).
1.2. Graph Traversal (GraphTraversal)
Walking the graph. Starting from the node selected in the previous step, the traversal will proceed along the edges of the graph to reach adjacent nodes according to the attributes and types of the nodes and edges. The ultimate goal of traversal is to determine all the nodes that can be reached by traversal. You can think of graph traversal as a subgraph description, which must be executed to return nodes.
The return type of V() and E() is GraphTraversal. GraphTraversal maintains many methods that return GraphTraversal. GraphTraversal supports combination of functions. Each method of GraphTraversal is called a step, and each step modulates the result of the previous step in one of five conventional ways.
- map: Convert the passed traversal object into another object (S→E).
- flatMap: Iterator (S\subseteq E^*S⊆E∗) that transforms the passed traversal object into other objects.
- filter: Allow or prohibit the traverser to proceed to the next step (S→S∪∅).
- sideEffect: Allow the traverser to remain unchanged, but produce some computational side effects in the process (S↬S).
- branch: Split the traverser and send it to any position in the traversal (S→{S1→E^ ,…, S_n→E^ S1→E∗,…, Sn→E∗}→E*) .
- Almost every step in GraphTraversal is extended from MapStep, FlatMapStep, FilterStep, SideEffectStep or BranchStep.
- Example: find someone makro knows
gremlin> g.V().has('name','marko').out('knows').values('name')
==>vadas
==>josh
1.3. Gremlin is Turing Complete
This means that any complicated problem can be described with Gremlin.
The following is to debug and write complex gremlin queries, and give guidance and methodology.
2. Debugging of complex Gremlin queries
Gremlin's queries are composed of simple queries combined into complex queries. Therefore, the complex Gremlin query can be divided into the following three steps, and iteratively complete the verification of all statements step by step. This method is also suitable for writing complex Gremlin queries.
2.1. Iterative debugging steps
- Split the analysis steps, divide the big into the small, and verify it step by step;
- Output the results of the sub-steps, and clarify the specific output content of the steps;
- Derive and verify the output results. Expand or reduce the analysis steps according to the results, and return to step 1 to continue until all the results are clear.
Note: This method refers to the analysis logic and use cases of Stephen Mallette gremlins-anatomy.
2.2. Use cases
2.2.1. Graph structure
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin>g.addV().property('name','alice').as('a').
addV().property('name','bobby').as('b').
addV().property('name','cindy').as('c').
addV().property('name','david').as('d').
addV().property('name','eliza').as('e').
addE('rates').from('a').to('b').property('tag','ruby').property('value',9).
addE('rates').from('b').to('c').property('tag','ruby').property('value',8).
addE('rates').from('c').to('d').property('tag','ruby').property('value',7).
addE('rates').from('d').to('e').property('tag','ruby').property('value',6).
addE('rates').from('e').to('a').property('tag','java').property('value',10).
iterate()
gremlin> graph
==>tinkergraph[vertices:5 edges:5]
2.2.2. Query statement
gremlin>g.V().has('name','alice').as('v').
repeat(outE().as('e').inV().as('v')).
until(has('name','alice')).
store('a').
by('name').
store('a').
by(select(all, 'v').unfold().values('name').fold()).
store('a').
by(select(all, 'e').unfold().
store('x').
by(union(values('value'), select('x').count(local)).fold()).
cap('x').
store('a').by(unfold().limit(local, 1).fold()).unfold().
sack(assign).by(constant(1d)).
sack(div).by(union(constant(1d),tail(local, 1)).sum()).
sack(mult).by(limit(local, 1)).
sack().sum()).
cap('a')
==>[alice,[alice,bobby,cindy,david,eliza,alice],[9,8,7,6,10],18.833333333333332]
so long and complicated! Big head!
See how I rip off and verify the results step by step.
2.3. Debugging process
2.3.1 Split query
According to the execution steps, split into small queries, as shown in the following figure:
- Perform the first part of the steps
gremlin> g.V().has('name','alice').as('v').
......1> repeat(outE().as('e').inV().as('v')).
......2> until(has('name','alice'))
==>v[0]
2.3.2 Clarification of results
Here, the node information is output through valueMap().
gremlin> g.V().has('name','alice').as('v').
......1> repeat(outE().as('e').inV().as('v')).
......2> until(has('name','alice')).valueMap()
==>[name:[alice]]
2.3.3 Validation hypothesis
The query process is deduced according to the semantics of the executed statement, as follows:
Use path() to verify the derivation process
g.V().has('name','alice').as('v').
......1> repeat(outE().as('e').inV().as('v')).
......2> until(has('name','alice')).path().next()
==>v[0]
==>e[10][0-rates->2]
==>v[2]
==>e[11][2-rates->4]
==>v[4]
==>e[12][4-rates->6]
==>v[6]
==>e[13][6-rates->8]
==>v[8]
==>e[14][8-rates->0]
==>v[0]
- The output result is consistent with the deduction result, expand the query sentence, and go back to step 1;
- If the results are inconsistent or do not understand the results, narrow the scope of the step, you can use the previous query step of this step, and go back to step 1;
- Repeat this until you fully understand the entire query.
gremlin> g.V().has('name','alice').as('v').
......1> repeat(outE().as('e').inV().as('v')).
......2> until(has('name','alice')).
......3> store('a').by('name')
==>v[0]
You can peel off the bamboo shoots carefully, 3000 words are omitted here.
3. Summary
- In the process of analysis, the method of dividing the query sentence is adopted to understand step by step, and the funnel method is adopted to gradually expand the understanding of the sentence;
- For the query results of each step, you can use valueMap(), path(), select(), as(), cap() and other functions to output and verify the results;
- For steps with unclear results or inconsistencies with expected values, narrow the query step, and use the previous step of the output step as the output point for output and verification;
- When the result of the upper layer data is clear, the inject() method can be used to inject the upper layer output to continue the subsequent output and verification;
- Pay attention to the effect of the function at the end of the step on the entire output result.
4. Reference
- Introduction to Gremlin
- Gremlin’s Anatomy
- TinkerPop Documentation
- Stephen Mallette gremlins-anatomy
- Practical Gremlin - Why Graph?
Click to follow and learn about Huawei Cloud's fresh technology for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。