Kruskal’s Algorithm Explained

Imagine you are a network engineer tasked with connecting multiple cities with fiber cables. Your company wants to minimize costs, but every city must remain connected. How do you decide the optimal way to connect them? This is where graph algorithms like Kruskal’s Algorithm step in.

Kruskal’s Algorithm is a greedy approach to solve the Minimum Spanning Tree (MST) problem in weighted graphs. It ensures that the total cost of connecting all nodes is the minimum possible, without forming unnecessary loops.

Understanding Graphs

Graphs are mathematical structures used to model relationships. They consist of:

Vertices (nodes): Points representing entities (e.g., cities, computers).
Edges (connections): Links between vertices (e.g., roads, cables).
Weights: Costs associated with edges (e.g., distance, expense, time).

Graphs can be connected (all nodes are reachable) or disconnected (some nodes isolated). For Kruskal’s Algorithm, the graph should ideally be connected.

What is a Minimum Spanning Tree (MST)?

A spanning tree connects all vertices with exactly (V-1) edges, where V = number of vertices.
A minimum spanning tree minimizes the total edge weights while keeping all nodes connected.

Real-world example: Suppose we want to lay roads between villages. An MST ensures all villages are connected with the minimum total road length.

Introduction to Kruskal’s Algorithm

Introduced by Joseph Kruskal in 1956, this algorithm builds the MST by always picking the lowest-weight edge available, provided it doesn’t form a cycle. That’s why it is a greedy algorithm — it takes the locally optimal choice (smallest edge) at each step.

Core Idea of Kruskal’s Algorithm

The main principle is simple:

Start with all nodes separate.
Add edges one by one in increasing weight order.
Use Union-Find (Disjoint Set Union) to avoid cycles.
Stop once we have (V-1) edges in the MST.

Detailed Steps of Kruskal’s Algorithm

Sort all edges in non-decreasing order of weight.
Initialize an empty MST.
For each edge in sorted order:
- If it connects two different sets (no cycle), include it in MST.
- Otherwise, discard it.
Repeat until MST has (V-1) edges.

Step-by-Step Example

Consider a graph with 5 nodes and 7 edges:

Edges: (A-B, 2), (A-C, 3), (B-C, 1), (B-D, 4), (C-D, 5), (C-E, 6), (D-E, 7).

Step 1: Sort → (B-C,1), (A-B,2), (A-C,3), (B-D,4), (C-D,5), (C-E,6), (D-E,7).
Step 2: Pick (B-C,1).
Step 3: Pick (A-B,2).
Step 4: Skip (A-C,3) (forms cycle).
Step 5: Pick (B-D,4).
Step 6: Pick (C-E,6).

Final MST edges: (B-C,1), (A-B,2), (B-D,4), (C-E,6).
Total cost = 13.

Pseudocode

Kruskal(Graph):
   MST = {}
   Sort all edges by weight
   For edge (u, v) in sorted order:
       If find(u) != find(v):
           Add (u, v) to MST
           Union(u, v)
   Return MST

Kruskal’s Algorithm in Programming Languages

C Implementation

#include <stdio.h>
#include <stdlib.h>

struct Edge {
    int u, v, w;
};

struct Edge edges[100];
int parent[100];

int find(int x) {
    if (parent[x] != x)
        parent[x] = find(parent[x]);
    return parent[x];
}

void unionSet(int x, int y) {
    parent[find(x)] = find(y);
}

int compare(const void* a, const void* b) {
    return ((struct Edge*)a)->w - ((struct Edge*)b)->w;
}

int main() {
    int V = 5, E = 7;
    struct Edge result[100];

    // Example graph
    edges[0] = (struct Edge){0,1,2};
    edges[1] = (struct Edge){0,2,3};
    edges[2] = (struct Edge){1,2,1};
    edges[3] = (struct Edge){1,3,4};
    edges[4] = (struct Edge){2,3,5};
    edges[5] = (struct Edge){2,4,6};
    edges[6] = (struct Edge){3,4,7};

    for (int i = 0; i < V; i++) parent[i] = i;

    qsort(edges, E, sizeof(struct Edge), compare);

    int count = 0, cost = 0;
    for (int i = 0; i < E && count < V-1; i++) {
        if (find(edges[i].u) != find(edges[i].v)) {
            result[count++] = edges[i];
            cost += edges[i].w;
            unionSet(edges[i].u, edges[i].v);
        }
    }

    printf("Edges in MST:\n");
    for (int i = 0; i < count; i++)
        printf("%d - %d (%d)\n", result[i].u, result[i].v, result[i].w);
    printf("Total cost = %d\n", cost);
}

C++ Implementation

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

struct Edge {
    int u, v, w;
};

int parent[100];

int find(int x) {
    if (parent[x] != x) parent[x] = find(parent[x]);
    return parent[x];
}

void unionSet(int x, int y) {
    parent[find(x)] = find(y);
}

int main() {
    int V = 5, E = 7;
    vector<Edge> edges = {
        {0,1,2}, {0,2,3}, {1,2,1}, {1,3,4}, {2,3,5}, {2,4,6}, {3,4,7}
    };

    for (int i = 0; i < V; i++) parent[i] = i;

    sort(edges.begin(), edges.end(), [](Edge a, Edge b) {
        return a.w < b.w;
    });

    int cost = 0;
    cout << "Edges in MST:\n";
    for (auto e : edges) {
        if (find(e.u) != find(e.v)) {
            cout << e.u << " - " << e.v << " (" << e.w << ")\n";
            cost += e.w;
            unionSet(e.u, e.v);
        }
    }
    cout << "Total cost = " << cost << endl;
}

Java Implementation

import java.util.*;

class Edge implements Comparable<Edge> {
    int u, v, w;
    Edge(int u, int v, int w) { this.u = u; this.v = v; this.w = w; }
    public int compareTo(Edge e) { return this.w - e.w; }
}

class Kruskal {
    static int[] parent;

    static int find(int x) {
        if (parent[x] != x)
            parent[x] = find(parent[x]);
        return parent[x];
    }

    static void unionSet(int x, int y) {
        parent[find(x)] = find(y);
    }

    public static void main(String[] args) {
        int V = 5, E = 7;
        parent = new int[V];
        for (int i = 0; i < V; i++) parent[i] = i;

        List<Edge> edges = Arrays.asList(
            new Edge(0,1,2), new Edge(0,2,3), new Edge(1,2,1),
            new Edge(1,3,4), new Edge(2,3,5), new Edge(2,4,6), new Edge(3,4,7)
        );

        Collections.sort(edges);

        int cost = 0;
        System.out.println("Edges in MST:");
        for (Edge e : edges) {
            if (find(e.u) != find(e.v)) {
                System.out.println(e.u + " - " + e.v + " (" + e.w + ")");
                cost += e.w;
                unionSet(e.u, e.v);
            }
        }
        System.out.println("Total cost = " + cost);
    }
}

Python Implementation

class DisjointSet:
    def __init__(self, n):
        self.parent = [i for i in range(n)]
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    def union(self, x, y):
        self.parent[self.find(x)] = self.find(y)

def kruskal(V, edges):
    edges.sort(key=lambda x: x[2])
    ds = DisjointSet(V)
    mst, cost = [], 0
    for u, v, w in edges:
        if ds.find(u) != ds.find(v):
            ds.union(u, v)
            mst.append((u, v, w))
            cost += w
    return mst, cost

edges = [(0,1,2),(0,2,3),(1,2,1),(1,3,4),(2,3,5),(2,4,6),(3,4,7)]
mst, cost = kruskal(5, edges)
print("Edges in MST:", mst)
print("Total cost =", cost)

Time Complexity

Sorting edges → $O(E log E)$ .
Union-Find operations → $O(α(V)) ≈ O(1)$ .
Overall: $O(E log E)$ .

Space Complexity

Storage for edges and Union-Find arrays → $O(V + E)$ .

Advantages

The algorithm is straightforward just sort edges and keep adding the smallest ones without forming cycles.
Its simplicity makes it ideal for learning MST algorithms.
Sorting can be parallelized, and Union-Find operations are efficient.
Kruskal’s Algorithm focuses directly on edges, making it suitable for problems where connections matter more than vertices.

Disadvantages

Sorting all edges takes O(E log E) time, which can be expensive for very dense graphs (graphs with many edges).
Cycle detection in Kruskal depends on the Union-Find data structure.
When the graph has too many edges (close to the maximum possible), sorting becomes inefficient.
Doesn’t Work Well on Disconnected Graphs, If the graph is disconnected, Kruskal’s produces a Minimum Spanning Forest, not a single tree.

Kruskal’s vs Prim’s

Kruskal’s: Works edge by edge, great for sparse graphs.
Prim’s: Works node by node, efficient for dense graphs.

Applications

Designing computer networks
- Suppose an internet provider wants to connect multiple cities with fiber optic cables. Each connection (edge) has a cost depending on the distance or terrain. The company must ensure every city is connected while minimizing the total cable cost.
- Working: By building an MST, Kruskal ensures the provider connects all cities at the lowest possible cost without creating unnecessary loops (extra cables).
- Example: A telecom company laying undersea cables between islands in Southeast Asia could use Kruskal’s Algorithm to determine the cheapest connections.
Railway and road construction
- Governments often need to connect cities with roads or railways. Constructing a road between two cities has a cost (distance, resources, or land acquisition).
- Working: It chooses the cheapest routes to connect all cities without building extra roads that would waste money.
- Example: Indian Railways expanding new tracks between cities could apply Kruskal’s Algorithm to minimize construction costs.
Social Network Analysis
- In social networks (like Facebook or LinkedIn), we might want to cluster users into groups based on common connections or similarities.
- Working: By using MST in clustering algorithms, it groups similar people together while keeping the total “connection strength” minimal.
- Example: In machine learning, Kruskal’s Algorithm is used in hierarchical clustering for big data.

Kruskal’s Algorithm is one of the most efficient methods to find an MST. Its greedy nature, combined with Union-Find, makes it fast and reliable across multiple domains. Whether implemented in C, C++, Java, or Python, the logic remains the same: always pick the cheapest edge without forming cycles.

FAQs

1. What is Kruskal’s Algorithm used for?
It’s used to find the minimum spanning tree of a graph.

2. Why is Kruskal’s Algorithm greedy?
Because it always picks the smallest edge at each step.

3. Can Kruskal’s Algorithm work on disconnected graphs?
It creates a minimum spanning forest in disconnected graphs.

4. What is the difference between Kruskal’s and Prim’s?
Kruskal’s builds MST edge by edge, while Prim’s grows from a starting node.

5. Is it used in real life?
Yes, in network design, transportation, and clustering problems.