Trees and Spanning Trees
Trees are the connected graphs with no wasted edges. They carry information from one vertex to another along unique routes, which makes them central to search algorithms, hierarchical data, electrical networks, phylogenies, and proofs by induction. In Wilson's development, trees appear after paths and cycles because they are exactly the graphs in which cycles have been eliminated without losing connectedness.
Spanning trees connect tree theory back to arbitrary connected graphs. A connected graph may have many cycles, but deleting cycle edges carefully leaves a minimal connected backbone. That backbone is often the object an algorithm actually needs.
Definitions
A tree is a connected graph with no cycles. A forest is a graph with no cycles; each component of a forest is a tree. A leaf is a vertex of degree .
A spanning tree of a connected graph is a subgraph such that and is a tree. A rooted tree is a tree with one distinguished vertex called the root. Once rooted, every non-root vertex has a parent, and vertices farther from the root are children, descendants, or leaves depending on their position.
A branch or pendant subtree is a connected piece attached to the rest of the tree through a single vertex or edge. In algorithmic settings, rooted trees encode search order, recursion, heaps, expression syntax, and decision processes.
Key results
For a finite graph with vertices, the following are equivalent:
- is a tree.
- is connected and has edges.
- is acyclic and has edges.
- Any two vertices of are joined by a unique path.
- is minimally connected: deleting any edge disconnects it.
- is maximally acyclic: adding any missing edge creates exactly one cycle.
Proof sketch: if a connected graph has a cycle, deleting one edge of that cycle keeps it connected. Repeating this until no cycles remain gives a connected acyclic graph with at most the original number of edges. Acyclic connected graphs satisfy by induction on , deleting a leaf at each step.
Leaf lemma. Every tree with at least two vertices has at least two leaves.
Proof sketch: take a longest path in the tree. If either endpoint had degree greater than , the path could be extended, contradicting maximality.
Spanning tree existence. Every connected graph has a spanning tree.
Repeatedly delete an edge from a cycle. Connectivity is preserved, and the process stops with a connected acyclic spanning subgraph.
Induction on trees. Many tree proofs use the leaf lemma. Remove a leaf and its incident edge. The remaining graph is still a tree on vertices. Prove the desired statement for the smaller tree, then add back and check what changes. This explains why formulas involving trees often have clean terms: each induction step removes exactly one vertex and one edge.
Center of a tree. Repeatedly delete all current leaves from a finite tree. The process ends with either one vertex or one edge. This remaining object is the center of the tree. It is useful for understanding tree symmetry and for choosing good roots. A path with an odd number of vertices has one central vertex; a path with an even number of vertices has one central edge.
Spanning trees as backbones. In a connected graph with cycles, a spanning tree is not just a smaller graph. It preserves reachability while discarding redundancy. Different algorithms produce different spanning trees depending on their priority: DFS trees reveal depth and backtracking structure, BFS trees preserve shortest unweighted distances from a root, and minimum spanning trees minimize weighted construction cost.
Counting components in forests. If a forest has vertices, edges, and connected components, then
Equivalently,
This generalizes the tree formula , which is the special case . The proof is componentwise: if the components have vertices, then they have edges in total.
Fundamental cycles and cuts. If is a spanning tree of , adding one edge creates a unique cycle, called the fundamental cycle of with respect to . Conversely, deleting one tree edge from splits the vertices into two parts; the original graph edges crossing that split form a fundamental cut. These two ideas are central in matroids, network design, and proofs about spanning trees.
Tree metrics. Distances in a tree behave especially well because the path between two vertices is unique. This uniqueness lets us define lowest common ancestors in rooted trees, compute distances by depths, and reason about medians of three vertices. Many algorithms on trees are faster than their general-graph versions precisely because no alternative cycle route has to be considered.
Visual
| View | Meaning | Typical use |
|---|---|---|
| Unrooted tree | Only adjacency matters | chemical trees, spanning trees |
| Rooted tree | One vertex is the root | search, parsing, recursion |
| Ordered rooted tree | Children have left-to-right order | syntax trees, heaps |
| Spanning tree | Includes all vertices of a graph | network backbone |
Worked example 1: Verify that a graph is a tree
Problem. Let have vertices and edges
Show that is a tree.
Method 1: connected and edge count.
- The graph has vertices.
- The graph has edges.
- Since , it remains to check connectedness.
- From we reach and .
- From we reach and .
- From we reach , and from we reach .
Every vertex is reachable from , so the graph is connected. A connected graph with edges is a tree.
Method 2: unique paths.
There is one visible route from the central vertex to each branch. For example, the path from to is
Any alternate route would create a cycle, but none of the listed edges joins branches sideways.
Checked answer. is a tree.
Worked example 2: Extract a spanning tree
Problem. Let have vertices and edges
Find a spanning tree and justify it.
Method.
- The graph has vertices, so a spanning tree must have edges.
- Start with all edges.
- The edges form the cycle . Delete .
- The edges form the cycle . Delete .
- The remaining edges are
This still has edges, one too many for a tree. Check for a cycle:
uses . Delete .
The selected spanning tree is
It is the path
Checked answer. It has all vertices, exactly edges, and is connected; therefore it is a spanning tree.
This example also shows why the deletion method is flexible. We could have deleted instead of from the first cycle and still found a spanning tree later. The final tree is not unique; what matters is that every cycle deletion preserves connectedness until no cycles remain.
Code
A depth-first search naturally returns a spanning tree of the component containing the start vertex.
def dfs_spanning_tree(adj, start):
seen = {start}
tree_edges = []
stack = [start]
while stack:
u = stack.pop()
for v in sorted(adj[u], reverse=True):
if v not in seen:
seen.add(v)
tree_edges.append((u, v))
stack.append(v)
return seen, tree_edges
adj = {
1: {2, 3},
2: {1, 3, 5},
3: {1, 2, 4},
4: {3, 5, 6},
5: {2, 4, 6},
6: {4, 5},
}
seen, tree = dfs_spanning_tree(adj, 1)
print(seen)
print(tree)
print(len(tree) == len(adj) - 1)
The DFS tree depends on the neighbor order. If the sorted order changes, the selected tree edges may change while still forming a valid spanning tree. A BFS version would produce a spanning tree whose root-to-vertex paths are shortest in the original unweighted graph; DFS makes no such shortest-path promise.
A useful way to check any tree argument is to ask which of the equivalent characterizations is doing the work. If the proof uses connectedness and edges, say so. If it uses acyclicity and edges, say so. If it uses unique paths, say so. Tree theory is clean because these conditions coincide, but a proof should still name the condition it actually uses.
Common pitfalls
- Saying "acyclic" when "tree" is intended. A disconnected acyclic graph is a forest, not a tree.
- Forgetting that a spanning tree must include every vertex of the original graph.
- Assuming a graph with edges is automatically a tree. It also must be connected, or equivalently acyclic.
- Counting isolated vertices incorrectly in forests. Each isolated vertex is a tree component by itself.
- Thinking a spanning tree is unique. Most connected graphs with cycles have several spanning trees.
- Confusing rooted-tree parent-child relationships with properties of the unrooted tree. The root is extra structure.