Dong Ge takes you to the fifth issue of graph theory: Kruskal minimum spanning tree algorithm

After reading this article, you will not only learn the algorithm routines, but also take the following topics on LeetCode:

Judge the tree by picture (medium)
Connect all cities at the lowest cost (medium)
Minimum cost to connect all points (medium)

-----------

The well-known algorithm in graph theory should be the Dijkstra shortest path algorithm , ring detection and topological sorting , bipartite graph decision algorithm and today’s minimum spanning tree algorithm.

The minimum spanning tree algorithm mainly includes Prim algorithm (Prim algorithm) and Kruskal algorithm (Kruskal algorithm). Although both algorithms use greedy ideas, the difference in implementation is still quite large. This article Let's talk about the Kruskal algorithm first, and the Prim algorithm will write another article.

Kruskal algorithm is actually very easy to understand and remember. The key is to be familiar with the union algorithm. If you are not familiar with it, it is recommended to read the previous 1619b7f4a26fef Union-Find algorithm .

Next, we start with the definition of minimum spanning tree.

What is minimum spanning tree

First, let me talk about the fundamental difference between "tree" and "graph": a tree does not contain rings, and a graph can contain rings .

If a picture has no loops, it can be stretched to look like a tree. In terms of professionalism, a tree is an "acyclic connected graph".

So what is the "spanning tree" of a graph, in fact, it is easy to understand literally, which is to find a tree in the graph that contains all the nodes in the graph. Professionally speaking, spanning tree is an "acyclic connected subgraph" containing all vertices in the graph.

It is easy to think that a picture can have many different spanning trees. For example, in the following picture, the red edges form two different spanning trees:

For a weighted graph, each edge has a weight, so each spanning tree has a weight sum. For example, in the figure above, the weight sum of the spanning tree on the right is obviously smaller than the weight sum of the spanning tree on the left.

So the minimum spanning tree is well understood. Among all possible spanning trees, the spanning tree with the smallest weight and the smallest spanning tree is called the "minimum spanning tree" .

PS: Generally speaking, we undirected weighted graph , so in the real scene using the minimum spanning tree algorithm, the edge weights of the graph generally represent scalars such as cost and distance.

Before talking about the Kruskal algorithm, we need to review the Union-Find algorithm.

Union-Find algorithm

As I just said, the spanning tree of a graph is an "acyclic connected subgraph" containing all its vertices, and the minimum spanning tree is the weight and the smallest spanning tree.

So when it comes to connectivity, I believe that old readers should be able to think of the Union-Find algorithm to efficiently deal with the problem of connected components in the graph.

The previous Union-Find Union-Find algorithm detailed explanation introduces the realization principle of Union-Find algorithm in detail, mainly using size array and path compression techniques to improve the judgment efficiency of connected components.

Readers who do not understand the Union-Find algorithm can read the previous article. In order to save space, this article directly gives the implementation of the Union-Find algorithm:

class UF {
    // 连通分量个数
    private int count;
    // 存储一棵树
    private int[] parent;
    // 记录树的「重量」
    private int[] size;

    // n 为图中节点的个数
    public UF(int n) {
        this.count = n;
        parent = new int[n];
        size = new int[n];
        for (int i = 0; i < n; i++) {
            parent[i] = i;
            size[i] = 1;
        }
    }
    
    // 将节点 p 和节点 q 连通
    public void union(int p, int q) {
        int rootP = find(p);
        int rootQ = find(q);
        if (rootP == rootQ)
            return;
        
        // 小树接到大树下面，较平衡
        if (size[rootP] > size[rootQ]) {
            parent[rootQ] = rootP;
            size[rootP] += size[rootQ];
        } else {
            parent[rootP] = rootQ;
            size[rootQ] += size[rootP];
        }
        // 两个连通分量合并成一个连通分量
        count--;
    }

    // 判断节点 p 和节点 q 是否连通
    public boolean connected(int p, int q) {
        int rootP = find(p);
        int rootQ = find(q);
        return rootP == rootQ;
    }

    // 返回节点 x 的连通分量根节点
    private int find(int x) {
        while (parent[x] != x) {
            // 进行路径压缩
            parent[x] = parent[parent[x]];
            x = parent[x];
        }
        return x;
    }

    // 返回图中的连通分量个数
    public int count() {
        return count;
    }
}

The previous Union-Find Union-Find algorithm used introduce some algorithm scenarios of the Union-Find algorithm, and its main role in the Kruskal algorithm is to ensure the legitimacy of the minimum spanning tree.

Because in the process of constructing the minimum spanning tree, you must first ensure that the generated thing is a tree (without loops), right, then the Union-Find algorithm will help you do this.

How did it do it? Let’s take a look at Likou’s question 261 "Judging Trees by Pictures". I will describe the following topics:

n nodes numbered from 0 to n - 1 , and an undirected edge list edges (each edge is represented by a node two-tuple). Please judge whether the structure composed of the input edges is a tree.

The function signature is as follows:

boolean validTree(int n, int[][] edges);

For example, enter the following:

n = 5
edges = [[0,1], [0,2], [0,3], [1,4]]

These edges form a tree, and the algorithm should return true:

But if you enter:

n = 5
edges = [[0,1],[1,2],[2,3],[1,3],[1,4]]

What is formed is not a tree structure, because it contains a ring:

For this question, we can think about it. Under what circumstances will adding an edge make the tree a graph (a ring appears) ?

Obviously, adding edges like this will result in loops:

However, if you add an edge like this, no loop will appear:

To sum up the law is:

For the added edge, if the two nodes of the edge are originally in the same connected component, then adding this edge will produce a ring; on the contrary, if the two nodes of the edge are not in the same connected component, add this The edge does not produce a ring .

And judging whether two nodes are connected (whether they are in the same connected component) is the unique skill of the Union-Find algorithm, so the solution code for this problem is as follows:

// 判断输入的若干条边是否能构造出一棵树结构
boolean validTree(int n, int[][] edges) {
    // 初始化 0...n-1 共 n 个节点
    UF uf = new UF(n);
    // 遍历所有边，将组成边的两个节点进行连接
    for (int[] edge : edges) {
        int u = edge[0];
        int v = edge[1];
        // 若两个节点已经在同一连通分量中，会产生环
        if (uf.connected(u, v)) {
            return false;
        }
        // 这条边不会产生环，可以是树的一部分
        uf.union(u, v);
    }
    // 要保证最后只形成了一棵树，即只有一个连通分量
    return uf.count() == 1;
}

class UF {
    // 见上文代码实现
}

If you can understand the idea of the solution to this problem, then mastering the Kruskal algorithm is very simple.

Kruskal algorithm

The so-called minimum spanning tree is the set of several edges in the graph (we will call this set mst , the English abbreviation of minimum spanning tree). You must ensure these edges:

1. Include all nodes in the graph.

2. The structure formed is a tree structure (that is, there is no ring).

3. The weight and minimum.

As a foreshadowing of the previous topic, the first two can actually be easily achieved using the Union-Find algorithm. The key is the third point, how to ensure that the spanning tree obtained has the smallest weight and the smallest.

Greedy thinking is used here:

all edges according to their weights from smallest to largest, traversing from the edge with the smallest weight. If this edge and mst do not form a ring, then this edge is part of the minimum spanning tree. Add it to the mst set ; Otherwise, this edge is not part of the minimum spanning tree, do not add it to the mst set .

In this way, mst set form a minimum spanning tree. Let's look at two examples to use Kruskal's algorithm.

The first question is the 1135th question "Connect all cities at the lowest cost". This is a standard minimum spanning tree problem:

Each city is equivalent to a node in the graph, the cost of connecting cities is equivalent to the weight of edges, and the minimum cost of connecting all cities is the sum of the weights of the minimum spanning tree.

int minimumCost(int n, int[][] connections) {
    // 城市编号为 1...n，所以初始化大小为 n + 1
    UF uf = new UF(n + 1);
    // 对所有边按照权重从小到大排序
    Arrays.sort(connections, (a, b) -> (a[2] - b[2]));
    // 记录最小生成树的权重之和
    int mst = 0;
    for (int[] edge : connections) {
        int u = edge[0];
        int v = edge[1];
        int weight = edge[2];
        // 若这条边会产生环，则不能加入 mst
        if (uf.connected(u, v)) {
            continue;
        }
        // 若这条边不会产生环，则属于最小生成树
        mst += weight;
        uf.union(u, v);
    }
    // 保证所有节点都被连通
    // 按理说 uf.count() == 1 说明所有节点被连通
    // 但因为节点 0 没有被使用，所以 0 会额外占用一个连通分量
    return uf.count() == 2 ? mst : -1;
}

class UF {
    // 见上文代码实现
}

This problem is solved. The overall idea is very similar to the previous one. You can think that the tree decision algorithm plus the logic of sorting by weight becomes the Kruskal algorithm.

Let's take a look at Leikou question 1584 "Minimum cost to connect all points":

For example, the example given by the title:

points = [[0,0],[2,2],[3,10],[5,2],[7,0]]

The algorithm should return 20 and connect the points as follows:

Obviously this is also a standard minimum spanning tree problem: each point is a node in an undirected weighted graph, the weight of the edge is the Manhattan distance, and the minimum cost of connecting all points is the sum of the weights of the minimum spanning tree.

So the solution idea is to first generate all the edges and weights, and then execute the Kruskal algorithm on these edges:

int minCostConnectPoints(int[][] points) {
    int n = points.length;
    // 生成所有边及权重
    List<int[]> edges = new ArrayList<>();
    for (int i = 0; i < n; i++) {
        for (int j = i + 1; j < n; j++) {
            int xi = points[i][0], yi = points[i][1];
            int xj = points[j][0], yj = points[j][1];
            // 用坐标点在 points 中的索引表示坐标点
            edges.add(new int[] {
                i, j, Math.abs(xi - xj) + Math.abs(yi - yj)
            });
        }
    }
    // 将边按照权重从小到大排序
    Collections.sort(edges, (a, b) -> {
        return a[2] - b[2];
    });
    // 执行 Kruskal 算法
    int mst = 0;
    UF uf = new UF(n);
    for (int[] edge : edges) {
        int u = edge[0];
        int v = edge[1];
        int weight = edge[2];
        // 若这条边会产生环，则不能加入 mst
        if (uf.connected(u, v)) {
            continue;
        }
        // 若这条边不会产生环，则属于最小生成树
        mst += weight;
        uf.union(u, v);
    }
    return mst;
}

This question made a small modification: each coordinate point is a two-tuple, then it is reasonable to use a five-tuple to represent a weighted edge, but it is inconvenient to perform the Union-Find algorithm; so we use the points array The index in represents each coordinate point, so that the previous Kruskal algorithm logic can be reused directly.

Through the above three algorithm questions, I believe you have mastered the Kruskal algorithm. The main difficulty is to use the Union-Find algorithm to add edges to the minimum spanning tree, and to cooperate with the greedy idea of sorting, so as to obtain a tree with the smallest sum of weights. Spanning tree.

Finally, let's talk about the complexity analysis of Kruskal algorithm:

Assuming that the number of nodes in a graph is V and the E O(E) is required to install all the edges, and the Union-Find algorithm also needs O(V) , so the total space complexity of the Kruskal algorithm is O(V + E) .

The time complexity is mainly spent on sorting, which requires O(ElogE) . The complexity of all operations of the Union-Find algorithm is O(1) , and a for loop is only O(E) , so the total time complexity is O(ElogE) .

This is the end of this article. Regarding the simple proof of this greedy idea and Prim's minimum spanning tree algorithm, we will leave it to a follow-up article to talk about it.

＿＿＿＿＿＿＿＿＿＿＿＿＿

View more high-quality algorithm articles Click on my avatar , and take you through the buckle hand by hand, dedicated to making the algorithm clear! My algorithm tutorial has won 90k stars, welcome to like it!

Dong Ge takes you to the fifth issue of graph theory: Kruskal minimum spanning tree algorithm

What is minimum spanning tree

Union-Find algorithm

Kruskal algorithm

labuladong

引用和评论

王炸！算法可视化功能全面上线，包括递归算法可视化！

ATRNX.AI 引领金融量化交易变革，开启智能决策新时代

贵金属实时高频报价API调研对比

专访金融时报中文网总编：你怎么看 Crypto？

屠龙者困境：比特币已沦为资本新权杖？加密货币行业应如何破局？

提示词工程师自白：我如何用一个技巧解放自己的生产力

香港通过《稳定币条例草案》，京东币链科技解读「里程碑时刻」

Dong Ge takes you to the fifth issue of graph theory: Kruskal minimum spanning tree algorithm

What is minimum spanning tree

Union-Find algorithm

Kruskal algorithm

labuladong

引用和评论

王炸！算法可视化功能全面上线，包括递归算法可视化！

ATRNX.AI 引领金融量化交易变革，开启智能决策新时代

贵金属实时高频报价API调研对比

专访金融时报中文网总编：你怎么看 Crypto？

屠龙者困境：比特币已沦为资本新权杖？加密货币行业应如何破局？

提示词工程师自白：我如何用一个技巧解放自己的生产力

香港通过《稳定币条例草案》，京东币链科技解读 「里程碑时刻」

香港通过《稳定币条例草案》，京东币链科技解读「里程碑时刻」