Algorithms and Complexity
An algorithm is a finite, precise procedure for solving a class of problems. Discrete mathematics studies algorithms at two levels: correctness and cost. Correctness asks whether the algorithm always returns the intended answer. Complexity asks how resource use grows as the input size grows.
The subject is not just about writing code. It is about describing a procedure independently of a programming language, proving that the procedure terminates with the right result, and estimating the amount of work needed for large inputs. This is why algorithms connect proof techniques, functions, sums, recurrence relations, number theory, graphs, and computation theory.
Definitions
An algorithm has specified input, specified output, definiteness of each step, correctness, finiteness, and effectiveness. Pseudocode abstracts away machine details while keeping the logical structure. A step is effective when it can be carried out exactly in a finite amount of time by the intended model of computation.
The input size is the parameter used to measure growth. For a list, it may be the number of elements . For an integer , it is often the number of bits, about . This distinction matters: trial division up to is roughly arithmetic divisions as a function of the integer value, but that is exponential in the number of input bits.
Asymptotic notation compares functions for large input:
- if for all sufficiently large .
- if for all sufficiently large .
- if both and .
Worst-case complexity bounds the maximum cost over all inputs of size . Best-case complexity bounds the minimum. Average-case complexity uses a probability distribution over inputs. Space complexity measures memory use rather than time.
A loop invariant is a statement true before and after each loop iteration. It is the standard tool for proving iterative algorithms correct. A recursive algorithm is usually proved correct by induction on the input size or on the recursive structure.
Key results
If
with , then . The leading term dominates because each lower power is eventually bounded by a constant multiple of .
For a simple nested loop
for i = 1 to n:
for j = 1 to n:
do constant work
the number of constant-work operations is , so the time is . If the inner loop instead runs from to , the count is
Binary search on a sorted list takes comparisons. Each comparison discards about half the remaining candidates. After comparisons, at most candidates remain. The search ends when , so .
Merge sort has recurrence
For powers of two, the recursion tree has nontrivial levels, and each level costs for merging. Thus . This style of reasoning prepares for recurrence relations and the master theorem.
Polynomial time is usually treated as feasible in theoretical computer science, while exponential time often becomes infeasible quickly. This is a classification, not a stopwatch. Constants, memory locality, implementation details, and input distribution still matter in real programs.
Visual
| Growth | Example source | Typical description |
|---|---|---|
| array index, fixed arithmetic | constant time | |
| binary search, repeated halving | logarithmic | |
| scan a list | linear | |
| merge sort | near linear | |
| all unordered pairs | quadratic | |
| all subsets | exponential | |
| all permutations | factorial |
Worked example 1: Count comparisons in binary search
Problem. A sorted list has elements. What is the maximum number of comparisons binary search needs to decide whether a target is present?
Method.
- After comparisons, binary search leaves at most candidates.
- We need this number to be at most :
- Rearranging gives
- Compare powers of two:
- Therefore comparisons are enough in the worst case. Nine are not always enough, because .
Checked answer. The worst-case number of comparisons is . This count is independent of the target value; it depends on the repeated-halving structure.
Worked example 2: Analyze a brute-force closest-pair algorithm
Problem. Given points in the plane, analyze the brute-force algorithm that computes the squared distance between every pair and returns a closest pair.
Method.
- The unordered pairs of distinct points correspond to two-element subsets of an -element set.
- The number of pairs is
- For each pair , compute
- Squared distance is enough because is increasing for nonnegative distances, so the pair with smallest squared distance also has smallest distance.
- Each pair costs constant arithmetic operations if coordinate arithmetic is treated as constant time.
- Keeping the smallest distance seen so far costs one comparison per pair after the first.
Checked answer. The algorithm performs distance computations, so the time is under the usual unit-cost model. It uses extra space beyond the input if it stores only the current best pair and current best squared distance.
Code
def binary_search(xs, target):
lo, hi = 0, len(xs) - 1
while lo <= hi:
mid = (lo + hi) // 2
if xs[mid] == target:
return mid
if xs[mid] < target:
lo = mid + 1
else:
hi = mid - 1
return None
def closest_pair(points):
best = None
best_dist = None
for i in range(len(points)):
x1, y1 = points[i]
for j in range(i + 1, len(points)):
x2, y2 = points[j]
dist = (x1 - x2) ** 2 + (y1 - y2) ** 2
if best_dist is None or dist < best_dist:
best_dist = dist
best = (points[i], points[j])
return best, best_dist
print(binary_search([2, 5, 8, 13, 21, 34], 21))
print(closest_pair([(0, 0), (3, 4), (1, 1), (8, 8)]))
The binary-search loop invariant is that if the target is present, it is always within xs[lo:hi+1]. The closest-pair loop invariant is that after each completed pair check, best stores a pair with minimum squared distance among all pairs checked so far.
Common pitfalls
- Measuring integer algorithms by the value of the integer when the input size is the number of bits.
- Ignoring loop bounds that depend on the outer loop index.
- Dropping lower-order terms correctly but dropping dominant terms accidentally.
- Calling an exact runtime. Big-O is an upper-bound class, not equality of operation counts.
- Forgetting to prove correctness. A fast procedure is not an algorithmic solution until it is shown to return the right answer.
- Assuming average case without specifying an input distribution.
Correctness proofs usually have two parts: partial correctness and termination. Partial correctness says that if the algorithm terminates, its output is correct. Termination says it actually stops on every valid input. For loops, a variant such as a shrinking interval or decreasing counter often proves termination. For recursion, the input size or another well-founded measure should decrease.
Big-O notation suppresses constants and lower-order terms, but it does not justify ignoring the input model. Sorting integers and multiplying two -bit integers both involve a parameter called , but the primitive operations being counted may differ. State whether the analysis counts comparisons, arithmetic operations, bit operations, memory cells, or graph edge scans.
Worst-case analysis is deliberately pessimistic. It gives a guarantee for every input of size . Average-case analysis can be more predictive, but only after specifying a distribution over inputs. Amortized analysis is different again: it bounds the average cost per operation over a worst-case sequence of operations, without assuming random inputs.
Lower bounds are as important as upper bounds. Showing an algorithm runs in does not prove no faster algorithm exists. A lower bound such as for comparison sorting explains why merge sort is asymptotically optimal in that model. Always ask whether a bound describes an algorithm or the inherent difficulty of the problem.
When comparing algorithms, keep the problem fixed. Linear search and binary search both solve search, but binary search assumes sorted input. Sorting first and then using binary search changes the task cost if only one lookup is needed. Complexity comparisons are meaningful only when input assumptions and required outputs match.
For loops, derive counts from bounds rather than from visual nesting alone. Two nested loops are not automatically : the inner loop might halve a variable, stop early, or depend on sparse graph degree. Summation notation is the bridge between pseudocode and asymptotic classification.
Connections
- Functions, sequences, and sums supplies the summations used in operation counts.
- Proof techniques supplies loop invariants, contradiction, and induction-style correctness proofs.
- Recurrence relations solves divide-and-conquer running times.
- Number theory basics gives algorithms such as Euclid's algorithm and trial division.
- Graph paths, connectivity, and shortest paths studies BFS, DFS, and shortest-path algorithms.