Skip to main content

Trees

Trees are connected graphs with no cycles. They are minimal connected structures: removing any edge disconnects them, and adding any edge creates a cycle. This makes them useful for hierarchy, searching, parsing, routing, spanning networks, and optimization.

Tree arguments are often simpler than general graph arguments because there is exactly one simple path between any two vertices. That uniqueness supports recursive definitions, induction proofs, traversal algorithms, and greedy optimization methods such as minimum spanning trees.

Definitions

A tree is a connected undirected graph with no simple circuits. A graph with no simple circuits but possibly several components is a forest. Each connected component of a forest is a tree.

A rooted tree has one distinguished vertex called the root. In a rooted tree, parent, child, sibling, ancestor, descendant, leaf, and internal vertex describe relative positions. The level of a vertex is its distance from the root. The height is the maximum level.

An ordered rooted tree gives an order to the children of each vertex. A binary tree is a rooted tree where each vertex has at most two children, usually called left and right. A full binary tree is one in which each internal vertex has exactly two children.

A spanning tree of a connected graph GG is a subgraph that includes every vertex of GG and is a tree. For a weighted connected graph, a minimum spanning tree is a spanning tree whose total edge weight is as small as possible.

Key results

For a tree with nn vertices, the number of edges is n1n-1.

Proof by induction: the one-vertex tree has 00 edges. Every finite tree with at least two vertices has a leaf. Remove a leaf and its incident edge; the remaining graph is still a tree with one fewer vertex. By induction it has n2n-2 edges, so the original has n1n-1.

Equivalent characterizations for a finite simple graph GG with nn vertices:

  • GG is a tree.
  • GG is connected and has n1n-1 edges.
  • GG has no cycles and has n1n-1 edges.
  • Between every pair of vertices there is a unique simple path.
  • GG is minimally connected: removing any edge disconnects it.
  • GG is maximally acyclic: adding any edge creates a cycle.

Every connected graph has a spanning tree. Repeatedly remove edges that belong to cycles; connectivity is preserved, and the process stops when no cycles remain.

Kruskal's and Prim's algorithms find minimum spanning trees in weighted connected graphs. Their correctness follows from the cut property: for any cut, a lightest edge crossing the cut is safe to add to some minimum spanning tree.

In a full binary tree with ii internal vertices, the number of leaves is i+1i+1. Counting edges gives one proof: the tree has i+i+\ell vertices and therefore i+1i+\ell-1 edges. It also has 2i2i child edges because each internal vertex has two children. Hence 2i=i+12i=i+\ell-1, so =i+1\ell=i+1.

Visual

TraversalVisit order ruleCommon use
preorderroot, then children/subtreescopying trees, prefix expressions
inorderleft, root, right for binary treessorted order in binary search trees
postorderchildren/subtrees, then rootdeleting trees, postfix expressions
level orderbreadth-first by levelshortest root distance, heaps
DFS spanning treedepth-first explorationcomponents, cycle analysis
MST algorithmsadd safe edgesminimum-cost connected network

Worked example 1: Use tree edge counts

Problem. A tree has 1818 vertices. How many edges does it have? If the sum of degrees of 1717 vertices is 3131, what is the degree of the remaining vertex?

Method.

  1. A tree with nn vertices has n1n-1 edges.
  2. With n=18n=18:
E=181=17.|E|=18-1=17.
  1. By the handshaking theorem, the total degree sum is
2E=34.2|E|=34.
  1. If 1717 vertices have degree sum 3131, the remaining degree is
3431=3.34-31=3.

Checked answer. The tree has 1717 edges, and the remaining vertex has degree 33. The answer uses both the tree edge formula and the graph handshaking theorem.

Worked example 2: Run Kruskal's algorithm

Problem. Find a minimum spanning tree for vertices A,B,C,DA,B,C,D with weighted edges:

AB:1,AC:4,AD:3,BC:2,BD:5,CD:6.AB:1,\quad AC:4,\quad AD:3,\quad BC:2,\quad BD:5,\quad CD:6.

Method.

  1. Sort edges by weight:
AB(1), BC(2), AD(3), AC(4), BD(5), CD(6).AB(1),\ BC(2),\ AD(3),\ AC(4),\ BD(5),\ CD(6).
  1. Add ABAB; it creates no cycle.
  2. Add BCBC; it connects CC to the component {A,B}\{A,B\} and creates no cycle.
  3. Add ADAD; it connects DD to the component and creates no cycle.
  4. Now all 44 vertices are connected with 33 edges, so the spanning tree is complete.
  5. Its total weight is
1+2+3=6.1+2+3=6.

Checked answer. One MST is {AB,BC,AD}\{AB,BC,AD\} with total weight 66. Edges AC,BD,CDAC,BD,CD are skipped because a spanning tree on 44 vertices needs exactly 33 edges, and adding any further edge would create a cycle.

Code

def preorder(tree, root):
yield root
for child in tree.get(root, []):
yield from preorder(tree, child)

def kruskal(vertices, edges):
parent = {v: v for v in vertices}

def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x

mst = []
for w, u, v in sorted(edges):
ru, rv = find(u), find(v)
if ru != rv:
parent[ru] = rv
mst.append((u, v, w))
return mst

tree = {"A": ["B", "C"], "B": ["D", "E"], "C": ["F", "G"]}
edges = [(1, "A", "B"), (4, "A", "C"), (3, "A", "D"), (2, "B", "C"), (5, "B", "D"), (6, "C", "D")]
print(list(preorder(tree, "A")))
print(kruskal({"A", "B", "C", "D"}, edges))

The traversal function is recursive because a rooted tree is recursively made of a root and its subtrees. Kruskal's algorithm uses disjoint-set representatives to avoid cycles.

Common pitfalls

  • Forgetting that a tree must be connected and acyclic. A forest is acyclic but may be disconnected.
  • Using n1n-1 edges as the only tree test without checking connectedness or acyclicity.
  • Assuming a rooted tree has an inherent root. Rooting is extra structure.
  • Confusing height with number of vertices.
  • Thinking every spanning tree is minimum. Minimum spanning trees depend on edge weights.
  • Adding an edge in Kruskal's algorithm just because it is light, without checking whether it creates a cycle.

Tree proofs often become easy after choosing the right characterization. If a graph is connected and has n1n-1 edges, it is a tree; if it is acyclic and has n1n-1 edges, it is also a tree. If every pair of vertices has a unique simple path, it is a tree. Switching between these forms can turn a difficult cycle argument into a simple edge-count argument.

Leaves are the engine behind many induction proofs on trees. Every finite tree with at least two vertices has at least two leaves. Removing a leaf preserves the tree property for the remaining graph, while adding a leaf to a smaller tree preserves connectedness and acyclicity. This is why induction on the number of vertices works so naturally for trees.

Rooted-tree terminology depends on the chosen root. The same unrooted tree can produce different parent-child relationships if a different root is selected. Ancestor, descendant, level, and height are therefore not properties of the unrooted graph alone. In algorithms, the root is often determined by a search start vertex or by a data-structure invariant.

For binary search trees, shape matters as much as stored keys. The inorder traversal property gives sorted order for any binary search tree, but efficiency depends on height. A path-shaped tree with nn vertices has height n1n-1 and gives linear search time. A balanced tree has height O(logn)O(\log n) and supports efficient operations.

For minimum spanning trees, equal edge weights can produce multiple valid MSTs. Kruskal's algorithm may choose different safe edges depending on tie-breaking, but the total weight remains minimum. Do not assume uniqueness unless the problem gives a condition, such as all edge weights being distinct.

For spanning tree problems, check the result has exactly n1n-1 edges and is connected. These two facts together prove the result is a tree. If an alleged spanning tree has too many edges, it contains a cycle. If it has too few edges, it cannot connect all vertices. This quick edge-count check catches many algorithm traces.

In rooted trees, traversal order is a convention with consequences. Preorder lists a parent before descendants, postorder lists descendants before the parent, and inorder is special to binary trees. When reconstructing a tree from traversals, state which traversal conventions are being used; otherwise the same sequence can be interpreted differently.

Connections