Skip to main content

Probability and Density Matrices

Statistical mechanics begins by replacing an impossible microscopic description with a precise probabilistic one. A gas with 102310^{23} particles has a well-defined Hamiltonian, but the detailed microstate is neither measurable nor useful for thermodynamic questions. What matters is the probability distribution over microstates and the way macroscopic observables become sharply predictable when many weakly correlated degrees of freedom are combined.

Schwabl starts here because the later ensembles are not guesses about ignorance. They are distributions or density matrices constrained by macroscopic data and by the microscopic equations of motion. The same language also ties classical phase-space densities to quantum statistical operators, so this page is the common vocabulary for the rest of the section.

Definitions

A random variable XX has probability density w(x)w(x) when w(x)dxw(x)\,dx is the probability that XX lies in [x,x+dx][x,x+dx]. Normalization and expectation are

w(x)dx=1,F(X)=F(x)w(x)dx.\int_{-\infty}^{\infty} w(x)\,dx = 1, \qquad \langle F(X)\rangle = \int_{-\infty}^{\infty} F(x) w(x)\,dx .

The moments and variance are

μn=Xn,σ2=(XX)2=X2X2.\mu_n=\langle X^n\rangle,\qquad \sigma^2=\langle (X-\langle X\rangle)^2\rangle =\langle X^2\rangle-\langle X\rangle^2 .

The characteristic function is the Fourier transform of the density,

ϕ(k)=eikX=eikxw(x)dx,w(x)=dk2πeikxϕ(k).\phi(k)=\langle e^{-ikX}\rangle=\int e^{-ikx}w(x)\,dx , \qquad w(x)=\int {dk\over 2\pi}e^{ikx}\phi(k).

For many variables XiX_i, the covariance matrix is

Kij=(XiXi)(XjXj).K_{ij}=\langle (X_i-\langle X_i\rangle)(X_j-\langle X_j\rangle)\rangle .

In quantum mechanics, a pure state ψ\vert \psi\rangle defines the density operator

ρ=ψψ,A=TrρA.\rho=|\psi\rangle\langle\psi|,\qquad \langle A\rangle=\mathrm{Tr}\,\rho A .

A mixed ensemble with states ψi\vert \psi_i\rangle and probabilities pip_i has

ρ=ipiψiψi,ipi=1,Trρ=1.\rho=\sum_i p_i |\psi_i\rangle\langle\psi_i|, \qquad \sum_i p_i=1, \qquad \mathrm{Tr}\,\rho=1.

Pure states obey ρ2=ρ\rho^2=\rho and Trρ2=1\mathrm{Tr}\,\rho^2=1. A genuine mixed state has Trρ2<1\mathrm{Tr}\,\rho^2\lt 1. This distinction becomes central when we use Z=TreβHZ=\mathrm{Tr}\,e^{-\beta H} rather than a single wavefunction to describe equilibrium.

Key results

The first structural result is the central limit theorem in its statistical-mechanical form. Let X1,,XNX_1,\ldots,X_N be independent identically distributed random variables with mean μ\mu and variance σ2\sigma^2, and let

Y=i=1NXi.Y=\sum_{i=1}^N X_i .

Then for large NN, the distribution of YY approaches a Gaussian with

Y=Nμ,Var(Y)=Nσ2,Var(Y)Y=σμN.\langle Y\rangle=N\mu,\qquad \mathrm{Var}(Y)=N\sigma^2,\qquad {\sqrt{\mathrm{Var}(Y)}\over \langle Y\rangle} = {\sigma\over \mu\sqrt{N}} .

This is why thermodynamic quantities are reproducible. Energy, magnetization, and particle number have fluctuations of order N\sqrt{N} while their means are usually of order NN, so relative fluctuations vanish in the thermodynamic limit.

The cumulant generating form of the characteristic function makes this transparent:

ϕ(k)=exp[n=1(ik)nn!Cn],\phi(k)=\exp\left[\sum_{n=1}^{\infty}{(-ik)^n\over n!} C_n\right],

where C1=μC_1=\mu, C2=σ2C_2=\sigma^2, and higher cumulants describe non-Gaussian corrections. For YY, the characteristic function is ϕY(k)=ϕ(k)N\phi_Y(k)=\phi(k)^N. After centering and scaling by N\sqrt{N}, the second cumulant survives and higher cumulants are suppressed by powers of N1/2N^{-1/2}.

For density matrices, the dynamical counterpart is the von Neumann equation,

ρt=i[H,ρ].{\partial \rho\over \partial t}=-{i\over \hbar}[H,\rho].

It is the quantum analogue of Liouville flow. A stationary equilibrium density matrix must commute with the Hamiltonian, [H,ρ]=0[H,\rho]=0, which is why equilibrium ensembles can be written as functions of conserved operators such as HH and NN.

Another key result is entropy as a functional of a distribution or density matrix:

S=kBipilnpiorS=kBTrρlnρ.S=-k_B\sum_i p_i\ln p_i \quad\text{or}\quad S=-k_B\mathrm{Tr}\,\rho\ln\rho .

Schwabl later derives the usual ensembles by maximizing this entropy subject to macroscopic constraints, or equivalently by considering weakly coupled subsystems of a microcanonical total system.

Two refinements are worth keeping in mind throughout the subject. First, independence is an idealization; many statistical-mechanical variables are weakly correlated rather than strictly independent. The central-limit conclusion remains useful when correlations decay fast enough that the system can be divided into many almost independent correlation volumes. If the correlation length ξ\xi is finite in dd dimensions, a macroscopic sample of volume VV contains roughly V/ξdV/\xi^d independent blocks, so relative fluctuations are suppressed by the square root of that number. This estimate fails near a critical point, where ξ\xi grows and fluctuations become macroscopic.

Second, entropy maximization is not a substitute for dynamics. It selects the least biased stationary distribution compatible with known constraints, but the list of constraints must come from conserved quantities and physical preparation. For an ordinary isolated gas, energy, volume, and particle number are enough. For an integrable or nearly integrable system, there may be many additional conserved quantities, and a simple microcanonical or canonical ensemble may not describe all relaxation phenomena. Schwabl's later discussion of irreversibility returns to this distinction between formal probability assignments and actual approach to equilibrium.

In quantum notation, diagonalizing ρ\rho shows why the von Neumann entropy has the same form as the classical expression. If

ρ=ipiii,\rho=\sum_i p_i |i\rangle\langle i|,

then

S=kBipilnpi.S=-k_B\sum_i p_i\ln p_i.

The basis-independent trace formula matters because the same mixed state can be represented by different convex decompositions into nonorthogonal pure states. The eigenvalues of ρ\rho, not a particular story about preparation, determine entropy and equilibrium averages.

Visual

ObjectClassical probabilityQuantum statisticsTypical use
State descriptionw(x)w(x) or ρ(q,p)\rho(q,p)density operator ρ\rhoencode incomplete microscopic information
Normalizationw(x)dx=1\int w(x)\,dx=1Trρ=1\mathrm{Tr}\,\rho=1total probability
Observable averageA(x)w(x)dx\int A(x)w(x)\,dxTrρA\mathrm{Tr}\,\rho Amacroscopic prediction
Pure state testdelta-like idealizationρ2=ρ\rho^2=\rhoone microscopic state
Mixed state testbroad distributionTrρ2<1\mathrm{Tr}\,\rho^2\lt 1statistical ensemble
Equilibrium conditionstationary under flow[H,ρ]=0[H,\rho]=0time-independent ensemble

Worked example 1: Central-limit scaling of energy fluctuations

Problem: A model solid has N=1020N=10^{20} independent modes. Each mode has mean energy μ=4.0×1021J\mu=4.0\times 10^{-21}\,\mathrm{J} and standard deviation σ=3.0×1021J\sigma=3.0\times 10^{-21}\,\mathrm{J}. Estimate the mean total energy, the standard deviation, and the relative fluctuation.

Method:

  1. The total energy is E=iXiE=\sum_i X_i.
  2. The mean is additive:
E=Nμ=1020(4.0×1021J)=0.40J.\langle E\rangle=N\mu =10^{20}(4.0\times 10^{-21}\,\mathrm{J}) =0.40\,\mathrm{J}.
  1. The variance is additive for independent modes:
Var(E)=Nσ2.\mathrm{Var}(E)=N\sigma^2.

Thus

ΔE=Nσ=1010(3.0×1021J)=3.0×1011J.\Delta E=\sqrt{N}\sigma =10^{10}(3.0\times 10^{-21}\,\mathrm{J}) =3.0\times 10^{-11}\,\mathrm{J}.
  1. The relative fluctuation is
ΔEE=3.0×10110.40=7.5×1011.{\Delta E\over \langle E\rangle} ={3.0\times 10^{-11}\over 0.40} =7.5\times 10^{-11}.

Checked answer: even though the microscopic mode energy is random, the macroscopic energy is effectively sharp. The suppression factor is 1/N1/\sqrt{N}.

Worked example 2: Density matrix for a spin mixture

Problem: A spin-1/21/2 ensemble contains 70%70\% z\vert \uparrow_z\rangle and 30%30\% z\vert \downarrow_z\rangle. Compute ρ\rho, Sz\langle S_z\rangle, and Trρ2\mathrm{Tr}\,\rho^2.

Method:

  1. In the basis {z,z}\{\vert \uparrow_z\rangle,\vert \downarrow_z\rangle\},
ρ=0.7+0.3=(0.7000.3).\rho=0.7|\uparrow\rangle\langle\uparrow| +0.3|\downarrow\rangle\langle\downarrow| = \begin{pmatrix} 0.7 & 0\\ 0 & 0.3 \end{pmatrix}.
  1. The operator SzS_z is
Sz=2(1001).S_z={\hbar\over 2} \begin{pmatrix} 1 & 0\\ 0 & -1 \end{pmatrix}.
  1. The expectation value is
Sz=TrρSz=2(0.70.3)=0.2.\langle S_z\rangle=\mathrm{Tr}\,\rho S_z ={\hbar\over 2}(0.7-0.3) =0.2\hbar .
  1. The purity is
Trρ2=0.72+0.32=0.58.\mathrm{Tr}\,\rho^2 =0.7^2+0.3^2 =0.58.

Checked answer: Trρ2<1\mathrm{Tr}\,\rho^2\lt 1, so this is mixed. It is not the pure spinor 0.7+0.3\sqrt{0.7}\vert \uparrow\rangle+\sqrt{0.3}\vert \downarrow\rangle, which would also contain off-diagonal coherence.

Code

import numpy as np

rng = np.random.default_rng(7)
N = 200_000
trials = 4_000

# Exponential variables are visibly non-Gaussian, but their sums become Gaussian.
samples = rng.exponential(scale=1.0, size=(trials, N // 1000))
sums = samples.sum(axis=1)

mean = sums.mean()
std = sums.std(ddof=1)
relative = std / mean

rho = np.diag([0.7, 0.3])
sz = 0.5 * np.array([[1.0, 0.0], [0.0, -1.0]])

print("sum mean", mean)
print("sum std", std)
print("relative fluctuation", relative)
print("<Sz>/hbar", np.trace(rho @ sz))
print("purity", np.trace(rho @ rho))

Common pitfalls

  • Confusing a classical probability mixture with a coherent quantum superposition. A mixture has probabilities; a superposition has amplitudes and phase-sensitive off-diagonal density-matrix elements.
  • Forgetting that fluctuations add as variances, not as standard deviations, when variables are independent.
  • Treating the central limit theorem as requiring Gaussian microscopic variables. It requires finite variance and weak enough dependence, not Gaussian inputs.
  • Using ρ2=ρ\rho^2=\rho for every density matrix. That identity is special to pure states.
  • Dropping normalization factors in probability densities; without normalization, averages and entropies are meaningless.

Connections