Skip to main content

Random Variables and Probability Distributions

A random variable turns uncertain outcomes into numbers. Once outcomes are numeric, we can describe their long-run behavior with a probability distribution, compute expected values, and model real processes such as successes in repeated trials, arrivals in a time interval, or draws from a finite collection. The Lane text introduces the binomial, Poisson, multinomial, and hypergeometric distributions as probability models that later support inference.

The art is matching assumptions to context. A binomial model is natural for a fixed number of independent success/failure trials with constant success probability. A Poisson model is natural for counts of rare events in a fixed interval when events occur independently at a stable rate. A hypergeometric model is natural for sampling without replacement. When the model assumptions fail, the formula may still produce a number, but the number no longer answers the intended question.

Definitions

A random variable XX assigns a numerical value to each outcome of a random process. A discrete random variable has countable possible values, such as 0, 1, 2, and so on. A continuous random variable has possible values over intervals, such as time, length, or measurement error.

A probability mass function for a discrete random variable gives

P(X=x)P(X=x)

for each possible value xx. The probabilities must be nonnegative and sum to 1. A cumulative distribution function is

F(x)=P(Xx).F(x)=P(X\le x).

For continuous variables, probabilities are areas under a density curve rather than heights at individual points. For a continuous random variable, P(X=a)=0P(X=a)=0 for any exact value aa, even though intervals can have positive probability.

The expected value or mean of a discrete random variable is

E(X)=xxP(X=x).E(X)=\sum_x xP(X=x).

The variance is

Var(X)=E[(Xμ)2],\mathrm{Var}(X)=E[(X-\mu)^2],

where μ=E(X)\mu=E(X). Standard deviation is Var(X)\sqrt{\mathrm{Var}(X)}.

A Bernoulli trial has two outcomes, usually called success and failure, with success probability pp. If XX is 1 for success and 0 for failure, then XX has a Bernoulli distribution with E(X)=pE(X)=p and Var(X)=p(1p)\mathrm{Var}(X)=p(1-p).

If XX counts successes in nn independent Bernoulli trials with constant success probability pp, then XX has a binomial distribution:

P(X=k)=(nk)pk(1p)nk,k=0,1,,n.P(X=k)=\binom{n}{k}p^k(1-p)^{n-k},\quad k=0,1,\dots,n.

A Poisson distribution with rate λ\lambda models counts in a fixed interval:

P(X=k)=eλλkk!,k=0,1,2,.P(X=k)=e^{-\lambda}\frac{\lambda^k}{k!},\quad k=0,1,2,\dots.

For a Poisson random variable, E(X)=λE(X)=\lambda and Var(X)=λ\mathrm{Var}(X)=\lambda.

Key results

The binomial distribution has mean and variance

E(X)=np,E(X)=np, Var(X)=np(1p).\mathrm{Var}(X)=np(1-p).

These results match the idea that XX is the sum of nn independent Bernoulli variables. Expected values add, and independent variances add. If X=X1++XnX=X_1+\cdots+X_n and each XiX_i is Bernoulli(pp), then

E(X)=E(X1)++E(Xn)=np.E(X)=E(X_1)+\cdots+E(X_n)=np.

The hypergeometric distribution models the number of successes in nn draws without replacement from a population of NN objects containing KK successes:

P(X=k)=(Kk)(NKnk)(Nn).P(X=k)=\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}.

This distribution differs from the binomial because the draws are dependent: after one success is drawn, fewer successes remain.

The multinomial distribution generalizes the binomial to more than two categories. If nn independent trials fall into cc categories with probabilities p1,,pcp_1,\dots,p_c, then counts X1,,XcX_1,\dots,X_c have probability

P(X1=x1,,Xc=xc)=n!x1!xc!p1x1pcxc,P(X_1=x_1,\dots,X_c=x_c)= \frac{n!}{x_1!\cdots x_c!}p_1^{x_1}\cdots p_c^{x_c},

where x1++xc=nx_1+\cdots+x_c=n.

The Poisson distribution can approximate a binomial distribution when nn is large, pp is small, and λ=np\lambda=np is moderate. This approximation is useful for rare-event counts, but it should not hide the assumptions: events should occur independently and at a stable average rate across the interval.

Distribution choice is also a modeling claim. If a call center receives more calls during lunch than at midnight, a single Poisson rate for the whole day may be too crude even if the total count is a nonnegative integer. If survey responses are clustered by classroom, a binomial model that treats every student response as independent may underestimate variability. If a quality inspector samples a large warehouse without replacement but the sample is tiny relative to the warehouse, a binomial approximation may be acceptable. The formulas become useful only after the data-generating process has been described clearly enough to defend the assumptions.

Visual

DistributionRandom variableParametersMeanVarianceTypical setting
Bernoullione success/failure trialppppp(1p)p(1-p)one yes/no outcome
Binomialsuccesses in fixed nn trialsn,pn,pnpnpnp(1p)np(1-p)independent repeated trials
Poissonevents in intervalλ\lambdaλ\lambdaλ\lambdaarrivals, rare counts
Hypergeometricsuccesses without replacementN,K,nN,K,nnK/NnK/Ndepends on finite correctionfinite sampling
Multinomialcounts in several categoriesn,p1,,pcn,p_1,\dots,p_cnpjnp_j for category jjcategory-specificsurvey choices

Worked example 1: Binomial probability

Problem: A website experiment has a historical conversion probability of p=0.12p=0.12. Suppose 20 independent visitors see a page. Let XX be the number who convert. Find P(X=3)P(X=3), P(X1)P(X\le 1), and the mean and standard deviation.

Method:

  1. Identify the model. There are n=20n=20 fixed trials, each visitor either converts or does not, and the problem assumes independent visitors with constant p=0.12p=0.12. Thus XBinomial(20,0.12)X\sim\mathrm{Binomial}(20,0.12).
  2. Compute P(X=3)P(X=3):
P(X=3)=(203)(0.12)3(0.88)17.P(X=3)=\binom{20}{3}(0.12)^3(0.88)^{17}.
  1. Evaluate the combination:
(203)=20!3!17!=1140.\binom{20}{3}=\frac{20!}{3!17!}=1140.
  1. Substitute:
P(X=3)=1140(0.001728)(0.1142)0.225.P(X=3)=1140(0.001728)(0.1142)\approx 0.225.
  1. Compute P(X1)P(X\le 1):
P(X1)=P(X=0)+P(X=1).P(X\le 1)=P(X=0)+P(X=1).
  1. Calculate terms:
P(X=0)=(0.88)200.0776,P(X=0)=(0.88)^{20}\approx 0.0776, P(X=1)=(201)(0.12)(0.88)190.2117.P(X=1)=\binom{20}{1}(0.12)(0.88)^{19}\approx 0.2117.
  1. Add:
P(X1)0.0776+0.2117=0.2893.P(X\le 1)\approx 0.0776+0.2117=0.2893.
  1. Mean and standard deviation:
E(X)=np=20(0.12)=2.4,E(X)=np=20(0.12)=2.4, σ=np(1p)=20(0.12)(0.88)1.45.\sigma=\sqrt{np(1-p)}=\sqrt{20(0.12)(0.88)}\approx 1.45.

Answer: P(X=3)0.225P(X=3)\approx 0.225, P(X1)0.289P(X\le1)\approx 0.289, the expected number of conversions is 2.4, and the standard deviation is about 1.45.

Checked answer: A value of 3 conversions is near the expected value 2.4, so a probability around 0.225 is plausible. Zero or one conversion is below average but not rare with only 20 visitors.

Worked example 2: Hypergeometric versus binomial

Problem: A shipment contains 50 devices, 6 of which are defective. An inspector selects 5 devices without replacement. What is the probability that exactly 2 are defective? Why is a binomial model not exact?

Method:

  1. Identify parameters: N=50N=50 total devices, K=6K=6 defectives, n=5n=5 draws, and k=2k=2 defective draws.
  2. Use the hypergeometric formula:
P(X=2)=(62)(443)(505).P(X=2)=\frac{\binom{6}{2}\binom{44}{3}}{\binom{50}{5}}.
  1. Compute pieces:
(62)=15,\binom{6}{2}=15, (443)=444342321=13244,\binom{44}{3}=\frac{44\cdot43\cdot42}{3\cdot2\cdot1}=13244, (505)=2118760.\binom{50}{5}=2118760.
  1. Substitute:
P(X=2)=15(13244)2118760=19866021187600.0938.P(X=2)=\frac{15(13244)}{2118760} =\frac{198660}{2118760} \approx 0.0938.

Answer: The exact probability is about 0.094. A binomial model with p=6/50=0.12p=6/50=0.12 would treat each draw as independent with constant defect probability. That is not exact because after a defective device is drawn, only 5 defectives remain among 49 devices; the probability changes.

Checked answer: The numerator counts ways to choose 2 defectives and 3 nondefectives. The denominator counts all possible sets of 5 devices, so the ratio matches the sampling mechanism.

Code

from scipy.stats import binom, hypergeom, poisson

# Binomial conversion example
n, p = 20, 0.12
print("P(X=3):", binom.pmf(3, n, p))
print("P(X<=1):", binom.cdf(1, n, p))
print("mean:", binom.mean(n, p), "sd:", binom.std(n, p))

# Hypergeometric inspection example
N, K, draws = 50, 6, 5
print("P(exactly 2 defective):", hypergeom.pmf(2, N, K, draws))

# Poisson rare-event example: average 4 calls per hour
print("P(6 calls in an hour):", poisson.pmf(6, mu=4))

SciPy uses pmf for discrete probability mass functions and cdf for cumulative probabilities. Naming the parameters in comments is useful because different distributions use different conventions.

Common pitfalls

  • Using a binomial model for sampling without replacement from a small finite population.
  • Forgetting that "at most 1" means P(X=0)+P(X=1)P(X=0)+P(X=1).
  • Treating expected value as the most likely exact outcome. The expectation can be non-integer.
  • Using the Poisson distribution for counts whose rate changes sharply over time or space.
  • Ignoring dependence among trials, such as multiple purchases by the same customer.
  • Rounding intermediate probabilities too aggressively.

Connections