Skip to main content

Counting Trees and Pruefer Sequences

Tree counting asks how many different tree shapes or labelled trees are possible. The labelled problem has a remarkably clean answer: there are nn2n^{n-2} labelled trees on vertex set {1,,n}\{1,\dots,n\}. The usual proof uses Pruefer sequences, which convert a labelled tree into a sequence of labels and back again.

This is one of the best examples of a bijective proof in graph theory. Rather than deriving Cayley's formula by algebra alone, we build a reversible encoding. The encoding also reveals how degrees are distributed in a labelled tree, because the number of times a label appears in the Pruefer code is directly tied to the degree of that vertex.

Definitions

A labelled tree on [n]={1,2,,n}[n]=\{1,2,\dots,n\} is a tree whose vertices are those exact labels. Two labelled trees are different if their edge sets are different, even if the unlabelled shape is the same.

The Pruefer code of a labelled tree with n2n\ge 2 vertices is a sequence of length n2n-2 constructed as follows:

  1. Find the leaf with smallest label.
  2. Record its unique neighbor.
  3. Delete that leaf.
  4. Repeat until only two vertices remain.

The resulting sequence lies in [n]n2[n]^{n-2}, the set of all sequences of length n2n-2 with entries from [n][n].

To decode a Pruefer sequence, repeatedly connect the smallest label not appearing in the current sequence to the first sequence entry, delete that first entry, and continue until two labels remain. Connect the final two labels.

Key results

Cayley's formula. The number of labelled trees on nn vertices is

nn2.n^{n-2}.

Proof sketch: the Pruefer encoding maps every labelled tree on [n][n] to a sequence in [n]n2[n]^{n-2}. The decoding algorithm maps every such sequence back to a labelled tree. These two procedures are inverse to each other, so the number of labelled trees equals the number of sequences, namely nn2n^{n-2}.

Degree formula from Pruefer codes. If a vertex label ii appears aia_i times in the Pruefer sequence of a tree, then

deg(i)=ai+1.\deg(i)=a_i+1.

Reason: a vertex is recorded once for each incident leaf removed before the vertex itself becomes a leaf or survives to the last pair. A label that never appears has degree 11 and is a leaf at some stage.

Spanning tree count for complete graphs. Since every labelled tree on [n][n] is a spanning tree of KnK_n, Cayley's formula also says

τ(Kn)=nn2,\tau(K_n)=n^{n-2},

where τ(G)\tau(G) denotes the number of spanning trees of GG.

Degree sequences of labelled trees. A sequence (d1,,dn)(d_1,\dots,d_n) of positive integers is the degree sequence of a labelled tree on [n][n] only if

d1++dn=2(n1).d_1+\cdots+d_n=2(n-1).

For labelled trees this condition is also compatible with Pruefer codes in a precise counting sense: if di1d_i\ge 1 for every ii and the sum is 2n22n-2, then label ii must appear di1d_i-1 times in the code. The number of labelled trees with that exact degree sequence is therefore

(n2)!(d11)!(d21)!(dn1)!.\frac{(n-2)!}{(d_1-1)!(d_2-1)!\cdots(d_n-1)!}.

This formula is often the quickest way to count labelled trees with prescribed leaves or hubs.

Relation to the matrix-tree theorem. Pruefer sequences count spanning trees of KnK_n bijectively. The matrix-tree theorem counts spanning trees of any finite graph by a determinant. The two methods agree on KnK_n, but they answer different kinds of questions: Pruefer codes are explicit and constructive for complete graphs, while Laplacian cofactors handle missing edges and weighted variants.

Unlabelled contrast. Counting unlabelled trees is much harder because symmetries collapse many labelled trees into the same shape. For example, all stars on nn vertices have the same unlabelled shape, but there are nn different labelled stars depending on the center. Cayley's formula is therefore a labelled result; it should not be used for chemical isomer counts or rooted shape counts without additional symmetry analysis.

Visual

Here is a labelled tree whose Pruefer code is computed in the first worked example.

StepSmallest leaf removedNeighbor recordedRemaining code prefix
11144(4)(4)
22244(4,4)(4,4)
33344(4,4,4)(4,4,4)
44455(4,4,4,5)(4,4,4,5)

Worked example 1: Encode a labelled tree

Problem. Find the Pruefer code of the tree with edges

14, 24, 34, 45, 56.14,\ 24,\ 34,\ 45,\ 56.

Method.

The vertices are {1,2,3,4,5,6}\{1,2,3,4,5,6\}, so the code has length 62=46-2=4.

  1. Initial leaves are 1,2,3,61,2,3,6. The smallest is 11. Its neighbor is 44, so record 44 and delete 11.
  2. Remaining leaves are 2,3,62,3,6. The smallest is 22. Its neighbor is 44, so record 44 and delete 22.
  3. Remaining leaves are 3,63,6. The smallest is 33. Its neighbor is 44, so record 44 and delete 33.
  4. The remaining graph is the path 4564-5-6. Leaves are 44 and 66. The smallest is 44. Its neighbor is 55, so record 55 and delete 44.
  5. Only 55 and 66 remain, so stop.

The Pruefer code is

(4,4,4,5).(4,4,4,5).

Check by degrees. In the code, label 44 appears three times, so deg(4)=4\deg(4)=4. Label 55 appears once, so deg(5)=2\deg(5)=2. Labels 1,2,3,61,2,3,6 do not appear, so they have degree 11. This matches the original tree.

The same check also verifies the total degree:

4+2+1+1+1+1=10=2(61).4+2+1+1+1+1=10=2(6-1).

This is a useful guard against encoding errors. If the multiplicities in a proposed code produce degrees whose sum is not 2n22n-2, then either the code length or the vertex set has been misread.

Worked example 2: Decode a Pruefer sequence

Problem. Decode the Pruefer sequence

(2,5,2,4)(2,5,2,4)

on vertex set {1,2,3,4,5,6}\{1,2,3,4,5,6\}.

Method.

Start with labels {1,2,3,4,5,6}\{1,2,3,4,5,6\} and sequence (2,5,2,4)(2,5,2,4).

  1. The labels not appearing in the sequence are {1,3,6}\{1,3,6\}. The smallest is 11. Connect 11 to the first sequence entry 22, giving edge 1212. Delete the first 22. New sequence: (5,2,4)(5,2,4).
  2. Remaining labels are {2,3,4,5,6}\{2,3,4,5,6\}. Labels not appearing in (5,2,4)(5,2,4) are {3,6}\{3,6\}. The smallest is 33. Connect 33 to 55, giving edge 3535. Delete 55. New sequence: (2,4)(2,4).
  3. Remaining labels are {2,4,5,6}\{2,4,5,6\}. Labels not appearing in (2,4)(2,4) are {5,6}\{5,6\}. The smallest is 55. Connect 55 to 22, giving edge 2525. Delete 22. New sequence: (4)(4).
  4. Remaining labels are {2,4,6}\{2,4,6\}. Labels not appearing in (4)(4) are {2,6}\{2,6\}. The smallest is 22. Connect 22 to 44, giving edge 2424. Delete 44. New sequence: empty.
  5. The two remaining labels are 44 and 66. Connect them, giving edge 4646.

The decoded tree has edges

12, 35, 25, 24, 46.12,\ 35,\ 25,\ 24,\ 46.

Check. There are 66 vertices and 55 edges. The construction connects a new leaf at each step and finishes with one edge, so the result is connected and acyclic.

The degree formula predicts the same shape. In (2,5,2,4)(2,5,2,4), label 22 appears twice, labels 44 and 55 appear once, and labels 1,3,61,3,6 appear zero times. Thus the degrees should be

deg(2)=3,deg(4)=2,deg(5)=2,deg(1)=deg(3)=deg(6)=1.\deg(2)=3,\quad \deg(4)=2,\quad \deg(5)=2,\quad \deg(1)=\deg(3)=\deg(6)=1.

The decoded edge list gives exactly those degrees.

Code

from collections import Counter
import heapq

def decode_pruefer(code):
n = len(code) + 2
degree = {i: 1 for i in range(1, n + 1)}
for x in code:
degree[x] += 1

leaves = [v for v in degree if degree[v] == 1]
heapq.heapify(leaves)
edges = []

for x in code:
leaf = heapq.heappop(leaves)
edges.append((leaf, x))
degree[leaf] -= 1
degree[x] -= 1
if degree[x] == 1:
heapq.heappush(leaves, x)

u = heapq.heappop(leaves)
v = heapq.heappop(leaves)
edges.append((u, v))
return edges

code = [2, 5, 2, 4]
print(decode_pruefer(code))
print(Counter(code))

The decoder uses a min-heap so that the smallest currently available leaf can be found efficiently. This matches the standard definition of the Pruefer code. If a different leaf-selection rule is used for encoding, the decoder must use the corresponding inverse rule; otherwise the sequence will decode to a different labelled tree.

For Pruefer-sequence exercises, keep a running table of current leaves, the recorded neighbor, and the remaining sequence. This prevents two common errors: deleting the wrong leaf and forgetting that a vertex can become a leaf only after all but one of its incident edges have been removed. The degree formula is an independent check after the encoding or decoding is complete.

Common pitfalls

  • Removing the largest leaf instead of the smallest leaf when using the standard Pruefer encoding. A different deterministic rule would give a different code.
  • Recording the deleted leaf instead of its neighbor. The Pruefer code records neighbors.
  • Stopping when three vertices remain. The code stops when two vertices remain.
  • Forgetting that the code length is n2n-2, not n1n-1.
  • Applying Cayley's formula to unlabelled trees. nn2n^{n-2} counts labelled trees.
  • Misreading the degree formula. A label appearing aa times has degree a+1a+1, not aa.

Connections