Skip to main content

Markov Chains

A Markov chain is a model for random movement among states where the next state depends on the current state, not on the full past. This is a simple assumption, but it leads to a rich theory for queues, genetics, board games, web ranking, weather models, text generation, inventory systems, and random walks.

A finite-state machine diagram shows states and transitions for a simple process.

Figure: A finite-state machine represents computation as transitions among discrete states. Image: Wikimedia Commons, Macguy314, reworked by Perhelion, public domain.

This page introduces finite discrete-time Markov chains. The central objects are states, transition probabilities, transition matrices, multi-step probabilities, and stationary distributions. The emphasis is on computation and interpretation rather than advanced convergence theory.

Definitions

A discrete-time stochastic process is a sequence of random variables

X0,X1,X2,X_0,X_1,X_2,\ldots

indexed by time. A process has the Markov property if

P(Xn+1=jXn=i,Xn1=in1,,X0=i0)=P(Xn+1=jXn=i).P(X_{n+1}=j\mid X_n=i,X_{n-1}=i_{n-1},\ldots,X_0=i_0) =P(X_{n+1}=j\mid X_n=i).

The possible values are called states. For a finite chain with states 1,,m1,\ldots,m, the transition probability from state ii to state jj is

pij=P(Xn+1=jXn=i).p_{ij}=P(X_{n+1}=j\mid X_n=i).

The transition matrix is

P=(p11p12p1mp21p22p2mpm1pm2pmm).P= \begin{pmatrix} p_{11} & p_{12} & \cdots & p_{1m}\\ p_{21} & p_{22} & \cdots & p_{2m}\\ \vdots & \vdots & \ddots & \vdots\\ p_{m1} & p_{m2} & \cdots & p_{mm} \end{pmatrix}.

Each row sums to 11:

j=1mpij=1.\sum_{j=1}^m p_{ij}=1.

An initial distribution is a row vector

π(0)=(P(X0=1),,P(X0=m)).\pi^{(0)}=(P(X_0=1),\ldots,P(X_0=m)).

The distribution after nn steps is

π(n)=π(0)Pn.\pi^{(n)}=\pi^{(0)}P^n.

A distribution π\pi is stationary if

π=πP\pi=\pi P

and

iπi=1.\sum_i \pi_i=1.

Key results

Chapman-Kolmogorov equation. The nn-step transition matrix is PnP^n. Its (i,j)(i,j) entry gives

P(Xn=jX0=i).P(X_n=j\mid X_0=i).

Matrix multiplication works because a path from ii to jj in two steps must pass through some intermediate state kk:

Pij2=kpikpkj.P^2_{ij}=\sum_k p_{ik}p_{kj}.

Stationary distribution. A stationary distribution is unchanged by one transition. If the chain is irreducible and aperiodic on a finite state space, then the distribution often converges to the unique stationary distribution regardless of the initial state.

Absorbing states. A state ii is absorbing if

pii=1.p_{ii}=1.

Once entered, it cannot be left. Absorbing chains are useful for modeling completion, ruin, failure, or success.

Detailed balance. A distribution π\pi satisfies detailed balance if

πipij=πjpji\pi_i p_{ij}=\pi_j p_{ji}

for all states i,ji,j. Detailed balance implies stationarity.

Expected long-run proportions. Under standard finite irreducible aperiodic conditions, πj\pi_j is the long-run fraction of time spent in state jj.

Several classification terms help decide what long-run behavior to expect. A state jj is reachable from state ii if there is some nn such that (Pn)ij>0(P^n)_{ij}\gt 0. Two states communicate if each is reachable from the other. A finite chain is irreducible if all states communicate, meaning the state space is one connected class under possible transitions.

A state has period dd if returns to that state can occur only at multiples of dd, and dd is the greatest common divisor of possible return times. A state with period 11 is aperiodic. Periodicity can cause oscillation: a chain may have a stationary distribution but fail to settle toward it from every starting state.

A state is recurrent if the chain returns to it with probability 11 after leaving, and transient if there is a positive probability of never returning. In a finite irreducible chain, all states are recurrent. These ideas explain why the existence of a stationary distribution is not the whole story; the structure of communication and return times controls convergence.

Markov modeling often succeeds or fails based on state design. If tomorrow's demand depends on today's inventory and yesterday's unfilled orders, then a state containing only today's inventory is not Markov. Expanding the state to include the missing memory can restore the Markov property.

Absorbing-chain calculations use the same matrix ideas with a different goal. Instead of long-run proportions among recurring states, one may ask for the probability of eventual absorption in a success state or the expected time until absorption. Games that end, machines that eventually fail, and random walks with barriers can often be written this way. The transition matrix is partitioned into transient and absorbing states, and powers of the matrix show probability mass gradually moving into absorbing states.

Simulation is often the simplest first check for a Markov model. By generating a long path and counting state frequencies or transition frequencies, one can compare empirical behavior with stationary calculations. Simulation does not prove the theorem, but it catches row-order mistakes and impossible transitions quickly.

Markov chains can be time-homogeneous or time-inhomogeneous. This page assumes the same transition matrix PP is used at every step. If transition probabilities change with time, write P0,P1,P2,P_0,P_1,P_2,\ldots instead, and the nn-step update becomes a product of different matrices. Time-inhomogeneous chains can model seasons, learning systems, or policies that change over time, but stationary-distribution ideas need more care.

Continuous-time Markov chains use rates rather than one-step probabilities. They are important in queues and reliability, but their generator matrices are beyond this introductory page.

For finite chains, drawing the directed graph of possible transitions is often the fastest way to spot closed classes, absorbing states, and unreachable states before doing matrix algebra.

Visual

The Markov-chain diagram separates the state graph from the matrix contract that drives computation. Directed transition arrows give the one-step probabilities, while the matrix subgraph shows row normalization, distribution updates, and powers of PP. The long-run branch makes convergence depend on communication classes and periodicity, not just on solving π=πP\pi=\pi P.

ConceptFormulaMeaning
one-step transitionpijp_{ij}chance next state is jj from ii
transition matrixPPall one-step probabilities
nn-step transitionPnP^nprobabilities after nn steps
next distributionπ(1)=π(0)P\pi^{(1)}=\pi^{(0)}Pupdate probabilities
stationary distributionπ=πP\pi=\pi Punchanged by transitions

Worked example 1: weather chain after several days

Problem. A simple weather chain has states Sunny and Rainy with transition matrix

P=(0.70.30.40.6),P= \begin{pmatrix} 0.7 & 0.3\\ 0.4 & 0.6 \end{pmatrix},

where rows are today's weather and columns are tomorrow's weather. If today is Sunny, find the probability it is Rainy two days from now.

Method.

  1. Today's distribution is
π(0)=(1,0).\pi^{(0)}=(1,0).
  1. Two days from now:
π(2)=π(0)P2.\pi^{(2)}=\pi^{(0)}P^2.
  1. Compute P2P^2:
P2=(0.70.30.40.6)(0.70.30.40.6).P^2= \begin{pmatrix} 0.7 & 0.3\\ 0.4 & 0.6 \end{pmatrix} \begin{pmatrix} 0.7 & 0.3\\ 0.4 & 0.6 \end{pmatrix}.
  1. The Sunny to Rainy entry is
(P2)SR=0.7(0.3)+0.3(0.6)=0.21+0.18=0.39.(P^2)_{SR}=0.7(0.3)+0.3(0.6)=0.21+0.18=0.39.
  1. Therefore
π(2)=(1,0)P2=(0.61,0.39).\pi^{(2)}=(1,0)P^2=(0.61,0.39).

Checked answer. The probability of Rainy two days from now, starting Sunny, is 0.390.39.

Worked example 2: stationary distribution of the weather chain

Problem. Find the stationary distribution for the same chain:

P=(0.70.30.40.6).P= \begin{pmatrix} 0.7 & 0.3\\ 0.4 & 0.6 \end{pmatrix}.

Method.

  1. Let
π=(s,r),\pi=(s,r),

where ss is long-run Sunny probability and rr is long-run Rainy probability.

  1. Stationarity requires
(s,r)(0.70.30.40.6)=(s,r).(s,r) \begin{pmatrix} 0.7 & 0.3\\ 0.4 & 0.6 \end{pmatrix} =(s,r).
  1. This gives equations:
0.7s+0.4r=s,0.7s+0.4r=s, 0.3s+0.6r=r.0.3s+0.6r=r.

The two equations are redundant with s+r=1s+r=1.

  1. Use the first equation:
0.7s+0.4r=s0.7s+0.4r=s

so

0.4r=0.3s.0.4r=0.3s.
  1. Thus
r=0.30.4s=0.75s.r=\frac{0.3}{0.4}s=0.75s.
  1. Use s+r=1s+r=1:
s+0.75s=1,s+0.75s=1, 1.75s=1,1.75s=1, s=47.s=\frac{4}{7}.
  1. Then
r=147=37.r=1-\frac{4}{7}=\frac{3}{7}.

Checked answer. The stationary distribution is

π=(47,37)(0.5714,0.4286).\pi=\left(\frac{4}{7},\frac{3}{7}\right)\approx(0.5714,0.4286).

Code

import numpy as np

P = np.array([
[0.7, 0.3],
[0.4, 0.6],
])

pi0 = np.array([1.0, 0.0])
pi2 = pi0 @ np.linalg.matrix_power(P, 2)
print("distribution after two days:", pi2)

# Stationary distribution: solve (P.T - I) pi = 0 with sum(pi)=1.
A = P.T - np.eye(2)
A[-1, :] = 1.0
b = np.array([0.0, 1.0])
stationary = np.linalg.solve(A, b)
print("stationary:", stationary)

# Simulate a long path.
rng = np.random.default_rng(4)
state = 0
counts = np.zeros(2)
for _ in range(100_000):
counts[state] += 1
state = rng.choice([0, 1], p=P[state])
print("simulated proportions:", counts / counts.sum())

Common pitfalls

  • Multiplying distributions on the wrong side. This page uses row vectors, so updates are πP\pi P.
  • Forgetting that rows of the transition matrix must sum to 11.
  • Confusing pijp_{ij} with pjip_{ji}. Direction matters.
  • Assuming a stationary distribution automatically means convergence from every initial state. Periodicity or reducibility can block convergence.
  • Treating the Markov property as saying the past is irrelevant in every sense. The past can matter through the current state.
  • Using a Markov chain when the state definition omits important memory. Add state variables if the next transition depends on more history.

Connections