In some applications, a graph $G$ is augmented by associating a weight
or cost with each edge; such a graph is called a
**weighted graph**. For
example, if a graph represents a network of roads, the weight of an
edge might be the length of the road between its two endpoints, or the
amount of time required to travel from one endpoint to the other, or
the cost to bury cable along the road from one end to the other. In
such cases, instead of being interested in just any spanning tree, we
may be interested in a
**least cost spanning tree**, that is, a spanning
tree such that the sum of the costs of the edges of the tree is as
small as possible. For example, this would be the least expensive way
to connect a set of towns by a communication network, burying the
cable in such a way as to minimize the total cost of laying the cable.

This problem is one that can be solved by a **greedy
algorithm**. Roughly speaking, a greedy algorithm is one that makes
choices that are optimal in the short run. Typically this strategy
does not result in an optimal solution in the long run, but in this
case this approach works.

Definition 5.6.1 A weighted graph is a graph $G$ together with a cost function $c\colon E(G)\to \R^{>0}$. If $H$ is a subgraph of $G$, the cost of $H$ is $c(H)=\sum_{e\in E(H)} c(e)$. $\square$

**The Jarník Algorithm.**
Given a weighted connected graph $G$, we
construct a minimum cost spanning tree $T$ as follows.
Choose any vertex $v_0$ in $G$ and
include it in $T$. If
vertices $S=\{v_0, v_1,\ldots,v_k\}$ have been chosen, choose an edge
with one endpoint in $S$ and one endpoint not in $S$ and with smallest
weight among all such edges. Let $v_{k+1}$ be the endpoint of this
edge not in $S$, and add it and the associated edge to $T$. Continue
until all vertices of $G$ are in $T$.

This algorithm was discovered by Vojtěch Jarník in 1930, and rediscovered independently by Robert C. Prim in 1957 and Edsger Dijkstra in 1959. It is often called Prim's Algorithm.

The algorithm proceeds by constructing a sequence of trees $T_1, T_2,\ldots,T_{n-1}$, with $T_{n-1}$ a spanning tree for $G$. At each step, the algorithm adds an edge that will make $c(T_{i+1})$ as small as possible among all trees that consist of $T_i$ plus one edge. This is the best choice in the short run, but it is not obvious that in the long run, that is, by the time $T_{n-1}$ is constructed, that this will turn out to have been the best choice.

**Proof.**
Suppose $G$ is connected on $n$ vertices.
Let $T$ be the spanning tree produced by the algorithm, and
$T_m$ a minimum cost spanning tree. We prove that $c(T)=c(T_m)$.

Let $e_1, e_2,\ldots,e_{n-1}$ be the edges of $T$ in the order in which they were added to $T$; one endpoint of $e_i$ is $v_i$, the other is in $\{v_0,\ldots,v_{i-1}\}$. We form a sequence of trees $T_m=T_0, T_1,\ldots, T_{n-1}=T$ such that for each $i$, $c(T_i)=c(T_{i+1})$, and we conclude that $c(T_m)=c(T)$.

If $e_1$ is in $T_0$, let $T_1=T_0$. Otherwise, add edge $e_1$ to $T_0$. This creates a cycle containing $e_1$ and another edge incident at $v_0$, say $f_1$. Remove $f_1$ to form $T_1$. Since the algorithm added edge $e_1$, $c(e_1)\le c(f_1)$. If $c(e_1)< c(f_1)$, then $c(T_1)< c(T_0)=c(T_m)$, a contradiction, so $c(e_1)=c(f_1)$ and $c(T_1)=c(T_0)$.

Suppose we have constructed tree $T_i$. If $e_{i+1}$ is in $T_i$, let $T_{i+1}=T_i$. Otherwise, add edge $e_{i+1}$ to $T_i$. This creates a cycle, one of whose edges, call it $f_{i+1}$, is not in $e_1, e_2,\ldots,e_i$ and has exactly one endpoint in $\{v_0,\ldots,v_i\}$. Remove $f_{i+1}$ to create $T_{i+1}$. Since the algorithm added $e_{i+1}$, $c(e_{i+1})\le c(f_{i+1})$. If $c(e_{i+1})< c(f_{i+1})$, then $c(T_{i+1})< c(T_i)=c(T_m)$, a contradiction, so $c(e_{i+1})=c(f_{i+1})$ and $c(T_{i+1})=c(T_i)$. $\qed$

## Exercises 5.6

**Ex 5.6.1**
Kruskal's Algorithm
is also a greedy algorithm that produces
a minimum cost spanning tree for a connected graph $G$.
Begin by choosing an edge in $G$ of
smallest cost. Assuming that edges $e_1, e_2,\ldots,e_i$ have been
chosen, pick an edge $e_{i+1}$ that does not form a cycle together
with $e_1, e_2,\ldots,e_i$ and that has smallest cost among all such
edges. The edges $e_1, e_2,\ldots,e_{n-1}$ form a spanning tree for
$G$. Prove that this spanning tree has minimum cost.

**Ex 5.6.2**
Prove that if the edge costs of $G$ are distinct, there is exactly one
minimum cost spanning tree. Give an example of a graph $G$ with more
than one minimum cost spanning tree.

**Ex 5.6.3**
In both the Jarník and Kruskal algorithms, it may be that two or
more edges can be added at any particular step, and some method is
required to choose one over the other. For the graph below, use both
algorithms to find a minimum cost spanning tree. Using the labels
$e_i$ on the graph, at each stage pick the edge $e_i$ that the
algorithm specifies and that has the lowest possible $i$ among all
edges available. For the Jarník algorithm, use the designated
$v_0$ as the starting vertex. For each algorithm, list the edges in
the order in which they are added.
The edge weights $e_1,e_2,\ldots,e_{10}$
are $6,7,8,2,3,2,4,6,1,1$, shown in red.