Skip to main content

Common Discrete Distributions

Discrete distributions model counts, categories, and waiting times measured in whole trials. They appear whenever the outcome is a finite choice, a number of successes, a number of failures before success, or a count of rare events in a fixed region. Many examples in introductory statistics, including those in Lane et al.'s probability chapter, use binomial, Poisson, multinomial, and hypergeometric models.

The main skill is not memorizing formulas in isolation. It is matching the story to the assumptions: fixed number of independent trials, sampling with or without replacement, waiting until a success, or counting events that occur at an average rate. A wrong distribution can produce a polished but wrong answer.

A bar plot shows the probability mass function of a binomial distribution.

Figure: Binomial probability mass function. Image: Wikimedia Commons, Tayste, public domain.

Definitions

A Bernoulli random variable records one success/failure trial:

XBernoulli(p),P(X=1)=p,P(X=0)=1p.X\sim \operatorname{Bernoulli}(p),\quad P(X=1)=p,\quad P(X=0)=1-p.

A Binomial random variable counts successes in nn independent Bernoulli trials with common success probability pp:

XBinomial(n,p),X\sim \operatorname{Binomial}(n,p), P(X=k)=(nk)pk(1p)nk,k=0,1,,n.P(X=k)=\binom{n}{k}p^k(1-p)^{n-k},\quad k=0,1,\ldots,n.

A Geometric random variable counts the trial number of the first success:

XGeometric(p),X\sim \operatorname{Geometric}(p), P(X=k)=(1p)k1p,k=1,2,.P(X=k)=(1-p)^{k-1}p,\quad k=1,2,\ldots.

Some books define geometric as the number of failures before the first success, with support 0,1,2,0,1,2,\ldots. Always check the convention.

A Negative binomial random variable counts the trial number of the rr-th success:

P(X=k)=(k1r1)pr(1p)kr,k=r,r+1,.P(X=k)=\binom{k-1}{r-1}p^r(1-p)^{k-r},\quad k=r,r+1,\ldots.

A Poisson random variable counts events occurring in a fixed interval when events happen independently at average rate λ\lambda:

XPoisson(λ),X\sim \operatorname{Poisson}(\lambda), P(X=k)=eλλkk!,k=0,1,2,.P(X=k)=e^{-\lambda}\frac{\lambda^k}{k!},\quad k=0,1,2,\ldots.

A Hypergeometric random variable counts successes in nn draws without replacement from a population of size NN containing KK successes:

P(X=k)=(Kk)(NKnk)(Nn).P(X=k)=\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}.

A Multinomial random vector counts outcomes in mm categories across nn independent trials:

P(X1=x1,,Xm=xm)=n!x1!xm!p1x1pmxm,P(X_1=x_1,\ldots,X_m=x_m) =\frac{n!}{x_1!\cdots x_m!}p_1^{x_1}\cdots p_m^{x_m},

where ixi=n\sum_i x_i=n and ipi=1\sum_i p_i=1.

Key results

DistributionSupportMeanVarianceTypical use
Bernoulli(p)(p)0,10,1ppp(1p)p(1-p)one yes/no trial
Binomial(n,p)(n,p)0,,n0,\ldots,nnpnpnp(1p)np(1-p)successes in fixed independent trials
Geometric(p)(p)1,2,1,2,\ldots1/p1/p(1p)/p2(1-p)/p^2waiting time to first success
NegBin(r,p)(r,p)r,r+1,r,r+1,\ldotsr/pr/pr(1p)/p2r(1-p)/p^2waiting time to rr successes
Poisson(λ)(\lambda)0,1,0,1,\ldotsλ\lambdaλ\lambdacounts at a rate
Hypergeometric(N,K,n)(N,K,n)valid countsnK/NnK/Nn(K/N)(1K/N)NnN1n(K/N)(1-K/N)\frac{N-n}{N-1}without-replacement sampling

Binomial as a sum. If X1,,XnX_1,\ldots,X_n are independent Bernoulli(p)(p) variables, then

X=X1++XnBinomial(n,p).X=X_1+\cdots+X_n\sim \operatorname{Binomial}(n,p).

Poisson approximation to binomial. If nn is large, pp is small, and λ=np\lambda=np is moderate, then

Binomial(n,p)Poisson(λ).\operatorname{Binomial}(n,p)\approx \operatorname{Poisson}(\lambda).

This is useful for rare-event counts.

Hypergeometric versus binomial. The binomial assumes independent trials, which fits sampling with replacement or a very large population. The hypergeometric accounts for dependence created by sampling without replacement.

Memorylessness of the geometric distribution. For XGeometric(p)X\sim\operatorname{Geometric}(p),

P(X>s+tX>s)=P(X>t).P(X>s+t\mid X>s)=P(X>t).

After ss failures, the remaining waiting-time distribution is unchanged because independent trials restart the same success probability.

The assumptions behind each distribution are part of the model. A binomial distribution needs a fixed number of trials, two outcome classes, constant success probability, and independence. A Poisson distribution needs a rate interpretation and is most natural when events in disjoint intervals are approximately independent. A hypergeometric distribution deliberately violates independence because the population changes after each draw. Checking these assumptions is usually more important than recognizing the formula.

There are also parameterization traps. Negative binomial distributions may count trials until the rr-th success or failures before the rr-th success. Geometric distributions have the same convention issue. Software libraries differ, so read the documentation and verify the support by calculating a simple probability such as the probability of success on the first trial.

Approximation is another modeling decision. A hypergeometric distribution can be approximated by a binomial distribution when the sample is small compared with the population, because removing a few items barely changes the success probability. A binomial distribution can be approximated by a Poisson distribution when successes are rare and npnp is moderate. These approximations are useful, but the exact distribution should remain clear.

Visual

Model cueDistribution to tryRed flag
"exactly kk successes in nn trials"Binomialprobabilities change across trials
"draw nn from a finite population"Hypergeometricreplacement is actually used
"first success occurs on trial kk"Geometrictrials are not independent
"third success occurs on trial kk"Negative binomialrr successes not specified
"calls per hour" or "defects per meter"Poissonevents cluster strongly

Worked example 1: binomial and Poisson approximation

Problem. A manufacturing process produces a defective part with probability 0.010.01, independently from part to part. In a batch of 200200 parts, find the probability of exactly 33 defective parts using the binomial model, then approximate it with a Poisson distribution.

Method.

  1. Let XX be the number of defective parts. A fixed number of independent parts is inspected, each with the same defect probability. Thus
XBinomial(200,0.01).X\sim \operatorname{Binomial}(200,0.01).
  1. The exact probability is
P(X=3)=(2003)(0.01)3(0.99)197.P(X=3)=\binom{200}{3}(0.01)^3(0.99)^{197}.
  1. Compute the combination:
(2003)=200199198321=1313400.\binom{200}{3}=\frac{200\cdot199\cdot198}{3\cdot2\cdot1}=1313400.
  1. Substitute:
P(X=3)=1313400(0.000001)(0.99)197.P(X=3)=1313400(0.000001)(0.99)^{197}.

Since (0.99)1970.1380(0.99)^{197}\approx 0.1380,

P(X=3)1313400(0.000001)(0.1380)=0.1812.P(X=3)\approx 1313400(0.000001)(0.1380)=0.1812.
  1. For the Poisson approximation, use λ=np=200(0.01)=2\lambda=np=200(0.01)=2:
P(Y=3)=e2233!=e2860.1804.P(Y=3)=e^{-2}\frac{2^3}{3!} =e^{-2}\frac{8}{6} \approx 0.1804.

Checked answer. The exact binomial probability is about 0.18120.1812, and the Poisson approximation is about 0.18040.1804. The approximation is close because pp is small and nn is large.

Worked example 2: hypergeometric sampling

Problem. A lot contains 3030 components, of which 66 are faulty. An inspector samples 55 components without replacement. What is the probability that exactly 22 sampled components are faulty?

Method.

  1. The sample is without replacement from a finite population, so use a hypergeometric model:
N=30,K=6,n=5,k=2.N=30,\quad K=6,\quad n=5,\quad k=2.
  1. Count favorable samples:

    • choose 22 faulty components from 66;
    • choose 33 good components from 2424.

    The favorable count is

(62)(243).\binom{6}{2}\binom{24}{3}.
  1. Count all possible samples:
(305).\binom{30}{5}.
  1. Form the probability:
P(X=2)=(62)(243)(305).P(X=2)=\frac{\binom{6}{2}\binom{24}{3}}{\binom{30}{5}}.
  1. Compute:
(62)=15,(243)=2024,(305)=142506.\binom{6}{2}=15,\quad \binom{24}{3}=2024,\quad \binom{30}{5}=142506.

Therefore

P(X=2)=152024142506=303601425060.2130.P(X=2)=\frac{15\cdot 2024}{142506} =\frac{30360}{142506} \approx 0.2130.
  1. Check reasonableness. The expected number is
nKN=5630=1.n\frac{K}{N}=5\cdot\frac{6}{30}=1.

Exactly 22 faulty components is above the mean but not extreme.

Checked answer. The probability is approximately 0.21300.2130.

Code

from math import comb, exp, factorial
from scipy.stats import binom, poisson, hypergeom, nbinom, geom

# Example 1: binomial and Poisson approximation.
n, p, k = 200, 0.01, 3
exact = binom.pmf(k, n, p)
approx = poisson.pmf(k, n * p)
print("binomial exact:", exact)
print("poisson approximation:", approx)

# Example 2: hypergeometric.
N, K, draws, observed = 30, 6, 5, 2
manual = comb(K, observed) * comb(N - K, draws - observed) / comb(N, draws)
library = hypergeom.pmf(observed, N, K, draws)
print("hypergeometric manual:", manual)
print("hypergeometric scipy:", library)

# Geometric and negative binomial conventions in scipy:
# geom counts trial number of first success; nbinom counts failures before r successes.
print("P(first success on trial 4):", geom.pmf(4, 0.25))
print("P(2 failures before 3 successes):", nbinom.pmf(2, 3, 0.25))

Common pitfalls

  • Using binomial for sampling without replacement from a small population. Use hypergeometric unless replacement or approximate independence is justified.
  • Mixing geometric conventions. Some formulas count the trial of first success; others count failures before first success.
  • Forgetting that Poisson mean and variance are both λ\lambda. If data are much more variable, a Poisson model may be too restrictive.
  • Treating multinomial category counts as independent. The counts sum to nn, so increasing one count forces others down.
  • Using a probability such as p=0.01p=0.01 as if it were a rate λ\lambda. A probability is unitless and bounded by 11; a rate depends on interval length.
  • Rounding early in factorial-heavy calculations. Use exact combinations or software for large counts.

Connections