Probability Basics
Probability supplies the language for uncertainty in statistical inference. A sample result is rarely interpreted as a fixed fact about a population; instead, it is interpreted against a probability model for what could have happened under repeated sampling or repeated trials. The Lane text introduces probability before sampling distributions because confidence intervals, p-values, power, and regression inference all depend on probability statements.
The practical goal is not to memorize gambling examples. It is to learn how events combine, how conditional information changes probabilities, and how independence differs from mutual exclusivity. These ideas appear whenever a researcher asks whether a medical test is reliable, whether two traits are associated, whether a result is surprising under a null hypothesis, or whether base rates change the interpretation of evidence.
Definitions
A random experiment is a process with uncertain outcome but describable possible results. The sample space is the set of all possible outcomes. An event is a subset of the sample space. If is an event, then is its probability and must satisfy
The complement of , written , is the event that does not occur. The complement rule is
The union is the event that or or both occur. The intersection is the event that both occur. The general addition rule is
Events are mutually exclusive if they cannot occur together, so . In that special case,
The conditional probability of given is
This is not usually equal to . Conditional probabilities reverse the reference group: is among cases where occurred, while is among cases where occurred.
Events and are independent if knowing that one occurred does not change the probability of the other:
or equivalently
Independence is not the same as mutual exclusivity. If two nonempty events are mutually exclusive, the occurrence of one makes the other impossible, so they are dependent.
Key results
Bayes' theorem reverses conditional probabilities:
When and split the sample space, the denominator can be expanded:
This form is essential for diagnostic testing and statistical literacy. A rare condition can have a low probability even after a positive test if the false-positive rate is not very small relative to the base rate.
Counting rules support equally likely probability calculations. If a task has choices followed by choices, the multiplication rule gives ordered outcomes. The number of permutations of distinct objects taken at a time is
where order matters. The number of combinations is
where order does not matter.
A probability model must match the mechanism. Drawing cards without replacement creates dependence between draws. Tossing a fair coin repeatedly is often modeled as independent trials. Sampling people from a finite population without replacement is not exactly independent, although independence may be a useful approximation when the population is much larger than the sample.
When solving applied probability problems, translate words into events before computing. Phrases such as "at least one," "exactly two," "given that," and "either" have mathematical meanings. "At least one" is often easiest through the complement rule; "given that" changes the denominator; "either" usually calls for the addition rule; and "exactly" often requires counting the favorable arrangements. Writing the event first prevents many arithmetic errors because the formula then follows the structure of the event rather than the surface wording of the story.
Visual
| Idea | Formula | Warning |
|---|---|---|
| Complement | Only covers "not A" | |
| Addition | Do not double-count overlap | |
| Conditional | Denominator is the given group | |
| Independence | Must be justified, not assumed | |
| Bayes | Base rates matter | |
| Combination | Use only when order does not matter |
Worked example 1: Conditional probability from a two-way table
Problem: A survey of 200 students records whether they commute and whether they work at least 10 hours per week.
| Work 10+ hours | Work under 10 hours | Total | |
|---|---|---|---|
| Commute | 54 | 36 | 90 |
| Do not commute | 26 | 84 | 110 |
| Total | 80 | 120 | 200 |
Find , , , , and decide whether commuting and working appear independent.
Method:
- Convert counts to probabilities by dividing by 200:
- Work probability:
- Joint probability:
- Conditional probability among commuters:
- Check independence. If independent, the joint probability should equal the product of marginal probabilities:
Answer: The observed joint probability is 0.27, not 0.18, and the conditional probability of working among commuters is 0.60, not the overall work probability 0.40. In this sample, commuting and working appear associated rather than independent.
Checked answer: The table totals are consistent: . The conditional denominator is 90 because the phrase "given commute" restricts attention to commuters.
Worked example 2: Bayes' theorem with a medical test
Problem: A condition affects 2% of a population. A screening test has sensitivity 95%, meaning , and specificity 90%, meaning . If a randomly chosen person tests positive, what is the probability they have the condition?
Method:
- State the base rate:
- Translate specificity into false-positive probability:
- Compute the positive probability using total probability:
- Apply Bayes' theorem:
Answer: The probability of having the condition after a positive test is about 16.2%. This may seem low because the test is fairly sensitive, but the condition is rare and the false-positive rate applies to the much larger disease-free group.
Checked answer: In 10,000 people, about 200 have the condition and 9,800 do not. The test finds about 190 true positives and 980 false positives. Among positives, only have the condition.
Code
def bayes_positive(prevalence, sensitivity, specificity):
p_d = prevalence
p_not_d = 1 - prevalence
p_pos_d = sensitivity
p_pos_not_d = 1 - specificity
p_pos = p_pos_d * p_d + p_pos_not_d * p_not_d
return (p_pos_d * p_d) / p_pos
posterior = bayes_positive(prevalence=0.02, sensitivity=0.95, specificity=0.90)
print(f"P(condition | positive) = {posterior:.3f}")
# Two-way table check from worked example 1
commute_work = 54
commuters = 90
workers = 80
total = 200
print("P(work | commute):", commute_work / commuters)
print("P(work):", workers / total)
The function exposes the three inputs that drive diagnostic interpretation: prevalence, sensitivity, and specificity. Changing prevalence often changes the answer more dramatically than intuition expects.
Common pitfalls
- Confusing with .
- Treating mutually exclusive events as independent. Except for impossible events, mutual exclusivity creates dependence.
- Forgetting to subtract the overlap in the addition rule.
- Ignoring base rates when interpreting tests, alarms, and screening tools.
- Assuming sampling without replacement is independent in small populations.
- Using combinations when order matters, or permutations when it does not.