This article was first published on the Nebula Graph Community public account
In graph theory, betweenness reflects the role and influence of nodes in the entire network. This article mainly introduces how to realize the calculation of Betweenness Centrality based on the Nebula Graph database.
1. Introduction to the algorithm
Centrality is a concept used to measure the centrality of a node in the entire network graph, including degree centrality, proximity centrality, betweenness centrality, etc. Among them, degree centrality characterizes the popularity of a node by its degree (that is, the number of associated edges), and proximity centrality characterizes the relationship between a node and all other nodes by calculating the sum of the paths from each node to all other nodes in the whole graph. closeness.
Betweenness centrality is used to measure the number of times a vertex appears on the shortest path between any other pair of vertices, thereby characterizing the importance of a node.
Node betweenness centrality is defined as the ratio of the number of paths passing through the node to the total number of shortest paths among all the shortest paths.
The betweenness centrality of nodes in a computational graph is divided into two cases : betweenness centrality on weighted graphs and betweenness centrality on unweighted graphs. The difference between the two lies in the different methods used to find the shortest path. For unweighted graphs, BFS (breadth-first traversal) is used to find the shortest path, and for weighted graphs, the Dijkstra algorithm is used to find the shortest path.
The algorithms described below are all for undirected graphs.
2. Application scenarios
Betweenness reflects the role and influence of nodes in the entire network, and is mainly used to measure the degree to which a vertex assumes the role of a "bridge" in a graph or network. Node C in the graph is an important bridge node.
Centrality can be used to identify intermediary entities in anti-fraud scenarios in the field of financial risk control. It can also be used in the identification of specific disease control genes in the pharmaceutical field to improve the target of drugs.
3. Betweenness centrality formula
The calculation formula of node betweenness centrality is as follows:
(Formula 1)
in
: the number of shortest paths from s to t passing through node v;
: the number of all shortest paths from node s to node t;
s and t are any pair of nodes belonging to the set of nodes.
For the convenience of calculation, the betweenness calculation for each pair of vertices is defined as:
(Formula 2)
So Equation 1 above can be replaced by Equation 2, i.e.
(Formula 3)
4. Solving ideas
Find the betweenness centrality of node v, that is, calculate , we need to know if node v is on the path from s to t.
(1) To find out whether node v is on the shortest path from s to t, use the following formula to judge represents the shortest path length between two points):
When v is on the shortest path from s to t, we have
(Formula 4)
also because and are independent of each other. According to the knowledge of mathematical combination, the total number of shortest paths from s to t is the product of the number of shortest paths from s to v and the number of shortest paths from v to t.
So there is the following formula:
(Formula 5)
(2) According to the above formula, we can get:
The number of shortest paths through w from node s to node t is , in the graph node v is the predecessor node of w, so the calculation formula of the number of shortest paths between st passing through nodes v and w is:
(Formula 6)
There are two cases as follows: and
(one)
(Formula 7)
(two) Time
(Formula 8)
(3) So add up the above two cases to get the ratio of the number of shortest paths from s to all vertices through v to the number of shortest paths from s to all vertices.
(Equation 9)
in That is, v is the predecessor node of w in the path from s to w.
(4) According to the above requirements The formula of , the algorithm flow when solving the unweighted graph in the paper is given below, as shown below.
For the unweighted graph implementation, it is implemented according to the above process.
The betweenness centrality calculation of the weighted graph needs to change the method of solving the shortest path to use the Dijkstra method, that is, change the code in the first while loop.
Based on Nebula Graph, Betweenness Centrality implements computation for both weighted and unweighted graphs, see https://github.com/vesoft-inc/nebula-algorithm/blob/master/nebula-algorithm/src/main/ scala/com/vesoft/nebula/algorithm/lib/BetweennessCentralityAlgo.scala .
5. Calculation example
First read the graph data in Nebula Graph, you can specify its edge data for data reading.
Secondly, construct a topology graph for the edge data of Nebula Graph, and perform centrality calculation.
The read Nebula Graph graph data takes this unweighted graph as an example:
Calculate the BC of node 1 :
Shortest path node pair passing through 1 node | Total number of shortest paths between pairs of nodes | Number of shortest paths through 1 node |
---|---|---|
2-4 | 3 (2-3-4, 2-5-4, 2-1-4) | 1 |
BC of node 1: | 1/3 |
Calculate the BC of node 2 :
Shortest path node pair through 2 nodes | Total number of shortest paths between pairs of nodes | Number of shortest paths through 1 node |
---|---|---|
1-3 | 2 (1-2-3, 1-4-3) | 1 |
3-5 | 2 (3-2-5, 3-4-5) | 1 |
BC of node 2: | 1 |
Calculate the BC of node 3 :
Shortest path node pair through 3 nodes | Total number of shortest paths between pairs of nodes | Number of shortest paths through 1 node |
---|---|---|
2-4 | 3 (2-3-4, 2-5-4, 2-1-4) | 1 |
BC of node 3: | 1/3 |
Calculate the BC of node 4 :
Shortest path node pair through 4 nodes | Total number of shortest paths between pairs of nodes | Number of shortest paths through 1 node |
---|---|---|
1-3 | 2 (1-4-3, 1-2-3) | 1 |
3-5 | 2 (3-4-5.3-2-5) | 1 |
BC of node 4: | 1 |
Calculate the BC of node 5 :
Shortest path node pair through 5 nodes | Total number of shortest paths between pairs of nodes | Percentage of shortest paths through 1 node |
---|---|---|
2-4 | 3 (2-3-4, 2-5-4, 2-1-4) | 1 |
BC for node 5: | 1/3 |
So the BC value for each node is:
1: 1/3
twenty one
3: 1/3
4:1
5: 1/3
6. Example of Algorithm Results
Data: Read the edge data in Nebula Graph test, and use srcId, dstId and rank as the triplet (starting point, focus, weight) of the edge in the topology graph respectively
(root@nebula) [test]> match (v:node) -[e:relation] -> () return e
+------------------------------------+
| e |
+------------------------------------+
| [:relation "3"->"4" @1 {col: "f"}] |
+------------------------------------+
| [:relation "2"->"3" @2 {col: "d"}] |
+------------------------------------+
| [:relation "2"->"5" @4 {col: "e"}] |
+------------------------------------+
| [:relation "4"->"5" @2 {col: "g"}] |
+------------------------------------+
| [:relation "1"->"5" @1 {col: "a"}] |
+------------------------------------+
| [:relation "1"->"2" @3 {col: "b"}] |
+------------------------------------+
| [:relation "1"->"4" @5 {col: "c"}] |
+------------------------------------+
Read the Nebula Graph edge data, set the weightless and execute the BC algorithm, the output is as follows:
vid: 4 BC: 1.0
vid: 1 BC: 0.3333333333333333
vid: 3 BC: 0.3333333333333333
vid: 5 BC: 0.3333333333333333
vid: 2 BC: 1.0
Read the Nebula Graph edge data, set the weight and execute the BC algorithm. The output is as follows:
vid: 4 BC: 2.0
vid: 1 BC: 0.5
vid: 3 BC: 1.0
vid: 5 BC: 2.0
vid: 2 BC: 0.0
7. References
- Paper "A Faster Algorithm for Betweenness Centrality"
- The source code of Python's NetworkX implementation of betweenness centrality: https://github.com/networkx/networkx/blob/master/networkx/algorithms/centrality
If there are any errors or omissions in this article, please go to GitHub: https://github.com/vesoft-inc/nebula issue area to raise issues with us or go to the official forum: https://discuss.nebula-graph.com.cn / 's 建议反馈
Categories and suggestions 👏; Communication graph database technology? To join the Nebula exchange group, please fill in your Nebula business card first, and the Nebula assistant will pull you into the group~~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。