Counting and Combinatorics
Probability begins with counting because many first models have finitely many equally likely outcomes. If every outcome in a sample space is equally likely, then probabilities reduce to ratios: favorable outcomes divided by total outcomes. The hard part is usually not the division; it is counting the numerator and denominator without double-counting, under-counting, or accidentally treating identical objects as distinct.

Figure: Pascal's triangle organizes binomial coefficients, combinations, and recurrence patterns. Image: Wikimedia Commons, Hersfold, public domain.
The first MIT 18.440 lectures use permutations, combinations, Pascal's triangle, multinomial coefficients, and partition-style problems as the basic language for finite probability. These methods remain useful even after the course moves to random variables, because binomial, multinomial, hypergeometric, geometric, and many occupancy models are all organized by the same counting principles.
Definitions
A permutation of distinct objects is an ordering of all objects. There are
such orderings. More generally, the number of injective assignments of labeled positions from distinct objects is
A combination is an unordered selection. The number of -element subsets of an -element set is the binomial coefficient
The factor removes the orderings of the selected objects, and the factor removes the orderings of the unselected objects if one starts from all permutations.
A multinomial coefficient counts ways to divide distinct objects into labeled groups of sizes where :
Equivalently, it counts words of length containing copies of one letter, copies of another, and so on.
An integer partition writes an integer as a sum of positive integers where order is ignored, such as . Integer partitions differ from multinomial counting because the parts themselves are not labeled by positions like "breakfast", "lunch", and "dinner".
Key results
The multiplication principle says that if a procedure has choices at stage 1, choices at stage 2 no matter what happened at stage 1, and so on through choices at stage , then the total number of outcomes is
The phrase "no matter what happened earlier" is important. In many problems the number of later choices depends on earlier choices; then the multiplication principle must be applied after breaking the problem into cases.
The addition principle says that if a set of outcomes is split into disjoint cases with sizes , then the total number is . When cases overlap, inclusion-exclusion is needed; see probability axioms and inclusion-exclusion.
The binomial theorem is
Proof sketch: expanding means choosing either or from each of factors. To obtain , choose which factors contribute . There are such choices.
The Pascal identity is
Proof sketch: count -subsets of according to whether they contain . If they contain , choose the other elements from the first . If they do not, choose all elements from the first .
The multinomial theorem is
It follows from the same expansion idea: every term records how many times each symbol was chosen from the factors.
A useful discipline in counting problems is to write down what the objects actually are before writing a formula. For example, "choose three people" and "choose a first, second, and third person" use the same underlying set of people but count different mathematical objects. The first object is a subset; the second is an ordered tuple with no repeated entries. Many mistakes come from silently changing the object halfway through the solution.
Another recurring method is to count the same set in two ways. Pascal's identity is one example, but the method is broader. If the same collection of objects can be described by first choosing a committee and then a chair, or by first choosing a chair and then the rest of the committee, the two resulting formulas must agree. Such identities are not just algebraic coincidences; they are evidence that the counting interpretation is correct.
The convention is also forced by the formulas. There is exactly one way to arrange zero objects: do nothing. With this convention, , , and the binomial theorem works at the edges of Pascal's triangle. Similarly, multinomial coefficients allow some , meaning one labeled group receives no elements.
Counting with replacement should be separated from counting without replacement. If independent selections are made from possibilities and repeats are allowed, the count is . If repeats are forbidden and order matters, the count is . If repeats are forbidden and order does not matter, the count is . These three formulas answer different questions, even though the English wording may sound similar.
Finally, symmetry can simplify probability only after the equally likely outcomes have been identified. In a shuffled deck, all card orders are equally likely, but not all descriptions of hands are equally likely. For instance, "one pair", "two pair", and "full house" are not equally likely poker categories. The safe method is to define a uniform sample space first, usually all ordered shuffles or all unordered hands, and then count the favorable subset inside that space.
Visual
| Pattern | Typical question | Count |
|---|---|---|
| Order all distinct objects | Assign hats to people | |
| Order of distinct objects | Award gold, silver, bronze among people | |
| Select of distinct objects | Choose a poker hand by ranks only | |
| Split into labeled sizes | Choose 3 breakfast, 2 lunch, 3 dinner items from 8 | |
| Repeated-letter word | Arrange AAABBCCC |
Worked example 1: arranging a repeated-letter word
Problem: How many length-8 strings can be made from the multiset containing three A's, two B's, and three C's?
Method:
- First pretend all letters are distinct. Label them . There are arrangements.
- For any final visible string, the three A labels can be permuted in ways without changing the visible string.
- The two B labels can be permuted in ways.
- The three C labels can be permuted in ways.
- Therefore every visible string was counted times by the labeled count.
Thus
Checked answer: the same number is obtained by sequentially choosing positions:
The two methods agree because the first chooses all labels and divides out internal symmetries, while the second chooses positions for each letter type.
Worked example 2: coefficient in a multinomial expansion
Problem: Find the coefficient of in .
Method:
- The term has exponents , and they add to , so it can occur in the expansion.
- In expanding the six factors, choose which two factors contribute , which three contribute , and which one contributes .
- The number of such choices is the multinomial coefficient
- Compute:
Checked answer: choose the positions first, then positions:
So the coefficient of is .
Code
from math import factorial, comb
def multinomial(*parts):
n = sum(parts)
out = factorial(n)
for p in parts:
out //= factorial(p)
return out
print("AAABBCCC strings:", multinomial(3, 2, 3))
print("Coefficient of x^2 y^3 z in (x+y+z)^6:", multinomial(2, 3, 1))
# Pascal identity check for a row
n = 10
for k in range(1, n):
assert comb(n, k) == comb(n - 1, k - 1) + comb(n - 1, k)
print("Pascal identity verified for row", n)
Common pitfalls
- Treating ordered and unordered selections as the same problem. A committee of three people and a president/secretary/treasurer assignment have different counts.
- Applying the multiplication principle when the number of later choices depends on previous choices. If dependencies matter, split into cases or track the state carefully.
- Dividing by a symmetry factor that is not constant. Division works cleanly only when every final object is counted the same number of times.
- Forgetting that groups may be labeled. Three foods for breakfast and two for lunch is different from merely splitting foods into an unlabeled pile of size three and an unlabeled pile of size two.
- Using when repetitions are allowed. The usual binomial coefficient selects distinct objects without replacement.