Counting Trees and Pruefer Sequences
Tree counting asks how many different tree shapes or labelled trees are possible. The labelled problem has a remarkably clean answer: there are labelled trees on vertex set . The usual proof uses Pruefer sequences, which convert a labelled tree into a sequence of labels and back again.
This is one of the best examples of a bijective proof in graph theory. Rather than deriving Cayley's formula by algebra alone, we build a reversible encoding. The encoding also reveals how degrees are distributed in a labelled tree, because the number of times a label appears in the Pruefer code is directly tied to the degree of that vertex.
Definitions
A labelled tree on is a tree whose vertices are those exact labels. Two labelled trees are different if their edge sets are different, even if the unlabelled shape is the same.
The Pruefer code of a labelled tree with vertices is a sequence of length constructed as follows:
- Find the leaf with smallest label.
- Record its unique neighbor.
- Delete that leaf.
- Repeat until only two vertices remain.
The resulting sequence lies in , the set of all sequences of length with entries from .
To decode a Pruefer sequence, repeatedly connect the smallest label not appearing in the current sequence to the first sequence entry, delete that first entry, and continue until two labels remain. Connect the final two labels.
Key results
Cayley's formula. The number of labelled trees on vertices is
Proof sketch: the Pruefer encoding maps every labelled tree on to a sequence in . The decoding algorithm maps every such sequence back to a labelled tree. These two procedures are inverse to each other, so the number of labelled trees equals the number of sequences, namely .
Degree formula from Pruefer codes. If a vertex label appears times in the Pruefer sequence of a tree, then
Reason: a vertex is recorded once for each incident leaf removed before the vertex itself becomes a leaf or survives to the last pair. A label that never appears has degree and is a leaf at some stage.
Spanning tree count for complete graphs. Since every labelled tree on is a spanning tree of , Cayley's formula also says
where denotes the number of spanning trees of .
Degree sequences of labelled trees. A sequence of positive integers is the degree sequence of a labelled tree on only if
For labelled trees this condition is also compatible with Pruefer codes in a precise counting sense: if for every and the sum is , then label must appear times in the code. The number of labelled trees with that exact degree sequence is therefore
This formula is often the quickest way to count labelled trees with prescribed leaves or hubs.
Relation to the matrix-tree theorem. Pruefer sequences count spanning trees of bijectively. The matrix-tree theorem counts spanning trees of any finite graph by a determinant. The two methods agree on , but they answer different kinds of questions: Pruefer codes are explicit and constructive for complete graphs, while Laplacian cofactors handle missing edges and weighted variants.
Unlabelled contrast. Counting unlabelled trees is much harder because symmetries collapse many labelled trees into the same shape. For example, all stars on vertices have the same unlabelled shape, but there are different labelled stars depending on the center. Cayley's formula is therefore a labelled result; it should not be used for chemical isomer counts or rooted shape counts without additional symmetry analysis.
Visual
Here is a labelled tree whose Pruefer code is computed in the first worked example.
| Step | Smallest leaf removed | Neighbor recorded | Remaining code prefix |
|---|---|---|---|
| 1 | |||
| 2 | |||
| 3 | |||
| 4 |
Worked example 1: Encode a labelled tree
Problem. Find the Pruefer code of the tree with edges
Method.
The vertices are , so the code has length .
- Initial leaves are . The smallest is . Its neighbor is , so record and delete .
- Remaining leaves are . The smallest is . Its neighbor is , so record and delete .
- Remaining leaves are . The smallest is . Its neighbor is , so record and delete .
- The remaining graph is the path . Leaves are and . The smallest is . Its neighbor is , so record and delete .
- Only and remain, so stop.
The Pruefer code is
Check by degrees. In the code, label appears three times, so . Label appears once, so . Labels do not appear, so they have degree . This matches the original tree.
The same check also verifies the total degree:
This is a useful guard against encoding errors. If the multiplicities in a proposed code produce degrees whose sum is not , then either the code length or the vertex set has been misread.
Worked example 2: Decode a Pruefer sequence
Problem. Decode the Pruefer sequence
on vertex set .
Method.
Start with labels and sequence .
- The labels not appearing in the sequence are . The smallest is . Connect to the first sequence entry , giving edge . Delete the first . New sequence: .
- Remaining labels are . Labels not appearing in are . The smallest is . Connect to , giving edge . Delete . New sequence: .
- Remaining labels are . Labels not appearing in are . The smallest is . Connect to , giving edge . Delete . New sequence: .
- Remaining labels are . Labels not appearing in are . The smallest is . Connect to , giving edge . Delete . New sequence: empty.
- The two remaining labels are and . Connect them, giving edge .
The decoded tree has edges
Check. There are vertices and edges. The construction connects a new leaf at each step and finishes with one edge, so the result is connected and acyclic.
The degree formula predicts the same shape. In , label appears twice, labels and appear once, and labels appear zero times. Thus the degrees should be
The decoded edge list gives exactly those degrees.
Code
from collections import Counter
import heapq
def decode_pruefer(code):
n = len(code) + 2
degree = {i: 1 for i in range(1, n + 1)}
for x in code:
degree[x] += 1
leaves = [v for v in degree if degree[v] == 1]
heapq.heapify(leaves)
edges = []
for x in code:
leaf = heapq.heappop(leaves)
edges.append((leaf, x))
degree[leaf] -= 1
degree[x] -= 1
if degree[x] == 1:
heapq.heappush(leaves, x)
u = heapq.heappop(leaves)
v = heapq.heappop(leaves)
edges.append((u, v))
return edges
code = [2, 5, 2, 4]
print(decode_pruefer(code))
print(Counter(code))
The decoder uses a min-heap so that the smallest currently available leaf can be found efficiently. This matches the standard definition of the Pruefer code. If a different leaf-selection rule is used for encoding, the decoder must use the corresponding inverse rule; otherwise the sequence will decode to a different labelled tree.
For Pruefer-sequence exercises, keep a running table of current leaves, the recorded neighbor, and the remaining sequence. This prevents two common errors: deleting the wrong leaf and forgetting that a vertex can become a leaf only after all but one of its incident edges have been removed. The degree formula is an independent check after the encoding or decoding is complete.
Common pitfalls
- Removing the largest leaf instead of the smallest leaf when using the standard Pruefer encoding. A different deterministic rule would give a different code.
- Recording the deleted leaf instead of its neighbor. The Pruefer code records neighbors.
- Stopping when three vertices remain. The code stops when two vertices remain.
- Forgetting that the code length is , not .
- Applying Cayley's formula to unlabelled trees. counts labelled trees.
- Misreading the degree formula. A label appearing times has degree , not .