Sorting Algorithms
Sorting rearranges records into order by key. It is one of the best ways to compare algorithmic design techniques because many algorithms solve the same problem with different tradeoffs: selection sort is simple, insertion sort adapts to nearly sorted input, quicksort partitions, merge sort divides and combines, heap sort uses a priority structure, and radix sort escapes comparison sorting by processing digits.
The source textbook has a full sorting chapter after graphs. It covers internal sorting methods such as insertion, quick, merge, heap, and radix sort, and also discusses external sorting. This page focuses on the standard in-memory algorithms from a C data-structures course, while keeping the connection to arrays, heaps, and searching visible.
Definitions
A sorting problem takes a sequence of records:
with keys and rearranges the records so that:
Important properties:
- Stable sort: records with equal keys keep their original relative order.
- In-place sort: uses only or small auxiliary memory beyond the input array, ignoring recursion stack in some definitions.
- Adaptive sort: runs faster on inputs that are already nearly sorted.
- Comparison sort: learns order only by comparing keys. Any comparison sort has worst-case lower bound.
- Distribution sort: uses key structure, such as digits or ranges, rather than only comparisons.
Core algorithms:
- Selection sort: repeatedly select the minimum remaining item and place it next.
- Insertion sort: grow a sorted prefix by inserting the next item into its correct position.
- Bubble sort: repeatedly swap adjacent out-of-order pairs.
- Quicksort: partition around a pivot, then recursively sort the sides.
- Merge sort: recursively sort halves, then merge two sorted sequences.
- Heap sort: build a heap, then repeatedly remove the maximum.
- Radix sort: sort by digits, often using stable counting sort for each digit.
Key results
Selection sort always performs comparisons, regardless of input order. It performs relatively few swaps, so it can be useful when swaps are extremely expensive, but it is rarely the best general-purpose sort.
Insertion sort has worst-case time but best-case when the array is already sorted. It is simple, stable, and excellent for small arrays or nearly sorted data.
Quicksort has average time and excellent cache behavior, but worst-case if pivots are consistently poor. Randomized pivots or median-of-three pivots reduce that risk.
Merge sort guarantees time and is stable when implemented carefully, but array merge sort needs extra memory.
Heap sort guarantees time and can sort in place, but it is usually not stable and often has less friendly cache behavior than quicksort.
Radix sort can run in , where d is the number of digits or passes and b is the base. It is not a comparison sort, so the comparison lower bound does not apply.
| Algorithm | Best | Average | Worst | Extra space | Stable? |
|---|---|---|---|---|---|
| Selection | no | ||||
| Insertion | yes | ||||
| Bubble | with early stop | yes | |||
| Quicksort | average stack | no typical | |||
| Merge sort | yes | ||||
| Heap sort | no | ||||
| Radix sort | yes if digit sort stable |
The right sorting algorithm depends on record size and movement cost. If records are large, it may be better to sort an array of pointers or indices rather than move entire records repeatedly. Stability matters when sorting by multiple keys: for example, sorting student records by first name and then using a stable sort by last name preserves first-name order among equal last names. If stability is not required, quicksort or heap sort may be simpler.
Sorting also changes the cost of later operations. A sorted array supports binary search in time and makes duplicate grouping easy. But maintaining sorted order during frequent insertions can be expensive because array insertion shifts elements. This is why dictionaries with frequent updates often use search trees or hash tables instead of repeatedly sorting arrays.
For C implementations, comparison functions must impose a consistent order. A qsort comparator should return negative, zero, or positive according to the two records; returning only a - b can overflow for large integers. Careful comparator design is part of the data-structure contract.
For nearly sorted arrays, insertion sort is often used as a finishing pass inside more complex sorts because its constant factors are small.
Visual
Quicksort partition sketch:
before: [9, 3, 7, 1, 6, 2, 8], pivot = 6
after: [3, 1, 2, 6, 7, 9, 8]
< pivot pivot > pivot
Worked example 1: insertion sort trace
Problem: Sort [8, 3, 5, 2, 9] using insertion sort.
Method: maintain a sorted prefix. At step i, insert a[i] into a[0..i-1].
- Start with prefix
[8]sorted. - Insert
3into[8]:- Compare
8 > 3, shift8right. - Place
3at index0. - Array:
[3, 8, 5, 2, 9].
- Compare
- Insert
5into[3, 8]:- Compare
8 > 5, shift8right. - Compare
3 <= 5, stop. - Place
5after3. - Array:
[3, 5, 8, 2, 9].
- Compare
- Insert
2into[3, 5, 8]:- Shift
8, shift5, shift3. - Place
2at index0. - Array:
[2, 3, 5, 8, 9].
- Shift
- Insert
9into[2, 3, 5, 8]:8 <= 9, no shift.- Array remains
[2, 3, 5, 8, 9].
Checked answer: final array is sorted. The trace also shows adaptiveness: inserting 9 into an already-correct position took one comparison and no shifts.
Worked example 2: merge step
Problem: Merge sorted arrays L = [2, 5, 8] and R = [1, 3, 9].
Method: keep one pointer into each array and repeatedly copy the smaller current item.
- Compare
2and1. Copy1fromR. Output:[1]. - Compare
2and3. Copy2fromL. Output:[1, 2]. - Compare
5and3. Copy3fromR. Output:[1, 2, 3]. - Compare
5and9. Copy5fromL. Output:[1, 2, 3, 5]. - Compare
8and9. Copy8fromL. Output:[1, 2, 3, 5, 8]. Lis exhausted. Copy remaining9fromR.
Checked answer: merged output is [1, 2, 3, 5, 8, 9]. Each step chose the smallest not-yet-copied item among the two sorted prefixes, so the result is sorted.
Code
This C program implements quicksort using Lomuto partitioning for a compact runnable example. Production quicksorts often use better pivot selection and switch to insertion sort for small partitions.
#include <stdio.h>
#include <stdlib.h>
static void swap(int *a, int *b) {
int tmp = *a;
*a = *b;
*b = tmp;
}
static int partition(int a[], int lo, int hi) {
int pivot = a[hi];
int i = lo;
for (int j = lo; j < hi; ++j) {
if (a[j] <= pivot) {
swap(&a[i], &a[j]);
i++;
}
}
swap(&a[i], &a[hi]);
return i;
}
static void quicksort(int a[], int lo, int hi) {
if (lo >= hi) return;
int p = partition(a, lo, hi);
quicksort(a, lo, p - 1);
quicksort(a, p + 1, hi);
}
static void print_array(const int a[], int n) {
for (int i = 0; i < n; ++i) {
printf("%d%s", a[i], i + 1 == n ? "\n" : " ");
}
}
int main(void) {
int a[] = {9, 3, 7, 1, 6, 2, 8};
int n = (int)(sizeof(a) / sizeof(a[0]));
quicksort(a, 0, n - 1);
print_array(a, n);
return EXIT_SUCCESS;
}
Common pitfalls
- Calling every algorithm "the same." Stability, memory, worst-case behavior, and constant factors matter.
- Implementing quicksort partition so the pivot is not moved into its final position.
- Forgetting that merge sort needs auxiliary storage for array merging.
- Using bubble sort by default. It is mainly pedagogical; insertion sort is usually better for simple small-array sorting.
- Assuming radix sort applies to arbitrary comparison keys. It needs key structure and a stable digit-level sort.
- Breaking stability by swapping equal keys across each other.