Bernoulli, Binomial, Geometric, and Negative Binomial Laws
Repeated independent trials are the simplest laboratory for random variables. A single trial with success probability gives a Bernoulli random variable. Counting successes in a fixed number of trials gives a binomial random variable. Waiting for the first success gives a geometric random variable. Waiting for the th success gives a negative binomial random variable.
These distributions appear early in MIT 18.440 because they connect counting, independence, expectation, variance, and recursion. They also teach a useful modeling distinction: sometimes the number of trials is fixed and the number of successes is random; sometimes the number of successes is fixed and the waiting time is random.
Figure: Binomial probability mass function for repeated Bernoulli trials. Image: Wikimedia Commons, Tayste, public domain.
Definitions
A Bernoulli random variable with parameter takes values and with
A binomial random variable with parameters counts successes in independent Bernoulli trials:
A geometric random variable with parameter is the trial number of the first success:
This convention counts the successful trial itself. Some books use the number of failures before the first success; then the support is .
A negative binomial random variable with parameters is the trial number of the th success:
The factor chooses the locations of the first successes among the first trials; the th trial must be a success.
Key results
For Bernoulli ,
This follows because .
For ,
Proof sketch: write , where the are independent Bernoulli indicators. Linearity gives . Independence gives
For ,
A short expectation proof uses conditioning on the first trial. Let . With probability , . With probability , one trial is used and the remaining waiting time has the same distribution as . Thus
so .
For ,
This is because the waiting time for successes is a sum of independent geometric waiting times, one for each successive success.
The geometric distribution has the memoryless property:
In words, after failures, the remaining waiting time has the same distribution as the original waiting time.
The binomial and negative binomial distributions are complementary views of the same repeated-trial process. In the binomial model, time is fixed and the number of successes is random. In the negative binomial model, the number of successes is fixed and the time required is random. Many word problems can be solved only after identifying which of these two quantities is being held fixed.
For the binomial distribution, the coefficient counts locations of the successes. It does not count different probabilities; every sequence with successes and failures has probability . The coefficient appears because there are such sequences. This is the direct link back to the binomial theorem:
For the geometric distribution, the tail probability is especially simple:
This says the first trials all failed. The tail form is often easier than summing the PMF, and it makes memorylessness immediate:
Negative binomial waiting times can be decomposed as
where the are independent geometric waiting times for successive successes. After each success, the future sequence of trials has the same distribution as a fresh sequence. This decomposition gives the expectation and variance quickly, and it also explains why the negative binomial is a discrete analogue of the gamma distribution, which is a sum of exponential waiting times.
The parameter should be tied to a clearly defined success event. A "success" might mean heads, a defective item, a goal, or a baby crying in a particular minute. Changing the success definition changes and may change whether trials are plausibly independent.
Visual
| Distribution | Random quantity | Support | PMF | Mean | Variance |
|---|---|---|---|---|---|
| Bernoulli | one success indicator | ||||
| Binomial | successes in trials | ||||
| Geometric | trial of first success | ||||
| Negative binomial | trial of th success |
Read the table by first identifying the random quantity. "How many successes occur in ten trials?" points to the binomial row because the trial count is fixed. "How long until the first success?" points to the geometric row because the stopping time is random. "How long until the fifth success?" points to the negative binomial row. This classification step is often more important than the algebra, because the formulas can look similar but answer different questions.
Worked example 1: six fair coin tosses
Problem: Toss a fair coin times. Let be the number of heads. Compute , , and .
Method:
- The number of heads in independent tosses is binomial with and .
- The probability of exactly heads is
- Compute:
Thus
- The expectation is
- The variance is
Checked answer: , and the mean is exactly the symmetry center of the distribution.
Worked example 2: waiting for the third success
Problem: Independent trials have success probability . What is the probability that the third success occurs on trial ? What is the expected trial number of the third success?
Method:
- Let be the trial number of the third success. Then is negative binomial with and .
- If the third success occurs on trial , then among the first trials there must be exactly successes.
- Therefore
- Compute step by step:
- Hence
- The expectation is
Checked answer: trial is earlier than the mean waiting time , but it has nonnegligible probability because several successes can occur before the typical time.
Code
from math import comb
def binomial_pmf(n, p, k):
return comb(n, k) * (p ** k) * ((1 - p) ** (n - k))
def negative_binomial_trial_pmf(r, p, k):
if k < r:
return 0.0
q = 1 - p
return comb(k - 1, r - 1) * (p ** r) * (q ** (k - r))
print("P(Binomial(6, .5)=4):", binomial_pmf(6, 0.5, 4))
print("P(third success on trial 10):", negative_binomial_trial_pmf(3, 0.2, 10))
print("Expected third success trial:", 3 / 0.2)
# Check that the first 1000 geometric probabilities nearly sum to 1.
p = 0.2
geom_sum = sum(((1 - p) ** (k - 1)) * p for k in range(1, 1001))
print("truncated geometric mass:", geom_sum)
Common pitfalls
- Mixing the two geometric conventions. Always check whether counts trials until success or failures before success.
- Treating the final success in a negative binomial problem as optional. The last trial must be a success.
- Forgetting independence in the binomial model. The formula assumes independent repeated trials with the same .
- Using binomial when the stopping rule is "continue until success". Fixed-trial and waiting-time questions have different distributions.
- Assuming the memoryless property applies to all waiting-time distributions. In this discrete setting it is special to the geometric law.