Skip to main content

Probability Basics

Probability supplies the language for uncertainty in statistical inference. A sample result is rarely interpreted as a fixed fact about a population; instead, it is interpreted against a probability model for what could have happened under repeated sampling or repeated trials. The Lane text introduces probability before sampling distributions because confidence intervals, p-values, power, and regression inference all depend on probability statements.

The practical goal is not to memorize gambling examples. It is to learn how events combine, how conditional information changes probabilities, and how independence differs from mutual exclusivity. These ideas appear whenever a researcher asks whether a medical test is reliable, whether two traits are associated, whether a result is surprising under a null hypothesis, or whether base rates change the interpretation of evidence.

Definitions

A random experiment is a process with uncertain outcome but describable possible results. The sample space SS is the set of all possible outcomes. An event is a subset of the sample space. If AA is an event, then P(A)P(A) is its probability and must satisfy

0P(A)1.0 \le P(A) \le 1.

The complement of AA, written AcA^c, is the event that AA does not occur. The complement rule is

P(Ac)=1P(A).P(A^c)=1-P(A).

The union ABA\cup B is the event that AA or BB or both occur. The intersection ABA\cap B is the event that both occur. The general addition rule is

P(AB)=P(A)+P(B)P(AB).P(A\cup B)=P(A)+P(B)-P(A\cap B).

Events are mutually exclusive if they cannot occur together, so P(AB)=0P(A\cap B)=0. In that special case,

P(AB)=P(A)+P(B).P(A\cup B)=P(A)+P(B).

The conditional probability of AA given BB is

P(AB)=P(AB)P(B),P(B)>0.P(A\mid B)=\frac{P(A\cap B)}{P(B)},\quad P(B)>0.

This is not usually equal to P(BA)P(B\mid A). Conditional probabilities reverse the reference group: P(AB)P(A\mid B) is among cases where BB occurred, while P(BA)P(B\mid A) is among cases where AA occurred.

Events AA and BB are independent if knowing that one occurred does not change the probability of the other:

P(AB)=P(A)P(A\mid B)=P(A)

or equivalently

P(AB)=P(A)P(B).P(A\cap B)=P(A)P(B).

Independence is not the same as mutual exclusivity. If two nonempty events are mutually exclusive, the occurrence of one makes the other impossible, so they are dependent.

Key results

Bayes' theorem reverses conditional probabilities:

P(AB)=P(BA)P(A)P(B).P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B)}.

When AA and AcA^c split the sample space, the denominator can be expanded:

P(AB)=P(BA)P(A)P(BA)P(A)+P(BAc)P(Ac).P(A\mid B)= \frac{P(B\mid A)P(A)} {P(B\mid A)P(A)+P(B\mid A^c)P(A^c)}.

This form is essential for diagnostic testing and statistical literacy. A rare condition can have a low probability even after a positive test if the false-positive rate is not very small relative to the base rate.

Counting rules support equally likely probability calculations. If a task has mm choices followed by nn choices, the multiplication rule gives mnmn ordered outcomes. The number of permutations of nn distinct objects taken rr at a time is

P(n,r)=n!(nr)!,P(n,r)=\frac{n!}{(n-r)!},

where order matters. The number of combinations is

(nr)=n!r!(nr)!,\binom{n}{r}=\frac{n!}{r!(n-r)!},

where order does not matter.

A probability model must match the mechanism. Drawing cards without replacement creates dependence between draws. Tossing a fair coin repeatedly is often modeled as independent trials. Sampling people from a finite population without replacement is not exactly independent, although independence may be a useful approximation when the population is much larger than the sample.

When solving applied probability problems, translate words into events before computing. Phrases such as "at least one," "exactly two," "given that," and "either" have mathematical meanings. "At least one" is often easiest through the complement rule; "given that" changes the denominator; "either" usually calls for the addition rule; and "exactly" often requires counting the favorable arrangements. Writing the event first prevents many arithmetic errors because the formula then follows the structure of the event rather than the surface wording of the story.

Visual

IdeaFormulaWarning
ComplementP(Ac)=1P(A)P(A^c)=1-P(A)Only covers "not A"
AdditionP(AB)=P(A)+P(B)P(AB)P(A\cup B)=P(A)+P(B)-P(A\cap B)Do not double-count overlap
ConditionalP(AB)=P(AB)/P(B)P(A\mid B)=P(A\cap B)/P(B)Denominator is the given group
IndependenceP(AB)=P(A)P(B)P(A\cap B)=P(A)P(B)Must be justified, not assumed
BayesP(AB)=P(BA)P(A)/P(B)P(A\mid B)=P(B\mid A)P(A)/P(B)Base rates matter
Combination(nr)\binom{n}{r}Use only when order does not matter

Worked example 1: Conditional probability from a two-way table

Problem: A survey of 200 students records whether they commute and whether they work at least 10 hours per week.

Work 10+ hoursWork under 10 hoursTotal
Commute543690
Do not commute2684110
Total80120200

Find P(commute)P(\text{commute}), P(work)P(\text{work}), P(commute and work)P(\text{commute and work}), P(workcommute)P(\text{work}\mid\text{commute}), and decide whether commuting and working appear independent.

Method:

  1. Convert counts to probabilities by dividing by 200:
P(commute)=90200=0.45.P(\text{commute})=\frac{90}{200}=0.45.
  1. Work probability:
P(work)=80200=0.40.P(\text{work})=\frac{80}{200}=0.40.
  1. Joint probability:
P(commute and work)=54200=0.27.P(\text{commute and work})=\frac{54}{200}=0.27.
  1. Conditional probability among commuters:
P(workcommute)=5490=0.60.P(\text{work}\mid\text{commute})=\frac{54}{90}=0.60.
  1. Check independence. If independent, the joint probability should equal the product of marginal probabilities:
P(commute)P(work)=0.45(0.40)=0.18.P(\text{commute})P(\text{work})=0.45(0.40)=0.18.

Answer: The observed joint probability is 0.27, not 0.18, and the conditional probability of working among commuters is 0.60, not the overall work probability 0.40. In this sample, commuting and working appear associated rather than independent.

Checked answer: The table totals are consistent: 54+36+26+84=20054+36+26+84=200. The conditional denominator is 90 because the phrase "given commute" restricts attention to commuters.

Worked example 2: Bayes' theorem with a medical test

Problem: A condition affects 2% of a population. A screening test has sensitivity 95%, meaning P(+D)=0.95P(+\mid D)=0.95, and specificity 90%, meaning P(Dc)=0.90P(-\mid D^c)=0.90. If a randomly chosen person tests positive, what is the probability they have the condition?

Method:

  1. State the base rate:
P(D)=0.02,P(Dc)=0.98.P(D)=0.02,\quad P(D^c)=0.98.
  1. Translate specificity into false-positive probability:
P(+Dc)=1P(Dc)=10.90=0.10.P(+\mid D^c)=1-P(-\mid D^c)=1-0.90=0.10.
  1. Compute the positive probability using total probability:
P(+)=P(+D)P(D)+P(+Dc)P(Dc)=0.95(0.02)+0.10(0.98)=0.019+0.098=0.117.\begin{aligned} P(+) &= P(+\mid D)P(D)+P(+\mid D^c)P(D^c) \\ &=0.95(0.02)+0.10(0.98) \\ &=0.019+0.098 \\ &=0.117. \end{aligned}
  1. Apply Bayes' theorem:
P(D+)=0.95(0.02)0.117=0.0190.1170.162.P(D\mid +)=\frac{0.95(0.02)}{0.117} =\frac{0.019}{0.117} \approx 0.162.

Answer: The probability of having the condition after a positive test is about 16.2%. This may seem low because the test is fairly sensitive, but the condition is rare and the false-positive rate applies to the much larger disease-free group.

Checked answer: In 10,000 people, about 200 have the condition and 9,800 do not. The test finds about 190 true positives and 980 false positives. Among 190+980=1,170190+980=1,170 positives, only 190/1,1700.162190/1,170\approx 0.162 have the condition.

Code

def bayes_positive(prevalence, sensitivity, specificity):
p_d = prevalence
p_not_d = 1 - prevalence
p_pos_d = sensitivity
p_pos_not_d = 1 - specificity
p_pos = p_pos_d * p_d + p_pos_not_d * p_not_d
return (p_pos_d * p_d) / p_pos

posterior = bayes_positive(prevalence=0.02, sensitivity=0.95, specificity=0.90)
print(f"P(condition | positive) = {posterior:.3f}")

# Two-way table check from worked example 1
commute_work = 54
commuters = 90
workers = 80
total = 200
print("P(work | commute):", commute_work / commuters)
print("P(work):", workers / total)

The function exposes the three inputs that drive diagnostic interpretation: prevalence, sensitivity, and specificity. Changing prevalence often changes the answer more dramatically than intuition expects.

Common pitfalls

  • Confusing P(AB)P(A\mid B) with P(BA)P(B\mid A).
  • Treating mutually exclusive events as independent. Except for impossible events, mutual exclusivity creates dependence.
  • Forgetting to subtract the overlap in the addition rule.
  • Ignoring base rates when interpreting tests, alarms, and screening tools.
  • Assuming sampling without replacement is independent in small populations.
  • Using combinations when order matters, or permutations when it does not.

Connections