Orthogonality, Gram-Schmidt, and QR
Orthogonal bases are computationally valuable because coordinates become dot products and projections separate cleanly. The Gram-Schmidt process converts an independent set into an orthogonal or orthonormal set with the same span. In matrix form, this becomes QR decomposition, one of the central tools for least squares and numerical linear algebra.
The main idea is to remove from each new vector the parts already explained by previous directions. What remains is perpendicular to the earlier directions. Repeating this produces a basis with the same span but much cleaner geometry.
Definitions
A set is orthonormal if
The orthogonal projection of onto a subspace with orthonormal basis is
A QR decomposition of an matrix with independent columns is
where has orthonormal columns and is upper triangular with positive diagonal entries in the standard construction.
Classical Gram-Schmidt starts with independent vectors and produces orthogonal vectors :
and
Then gives an orthonormal set.
Key results
Gram-Schmidt preserves span at each step:
The reason is that is obtained from by subtracting a vector in the span of the previous 's, and the operation can be reversed.
The vectors produced by Gram-Schmidt are orthogonal. When constructing , the projection terms remove exactly the components in the earlier directions. For ,
If has independent columns , applying Gram-Schmidt to those columns gives
The entries of are dot products and norms:
Because , QR simplifies least squares. Instead of solving normal equations , one can solve
This is often more numerically stable.
The upper triangular shape of records the order in which new directions are explained by previous orthonormal directions. If and , then the equation says each original column can be reconstructed from the orthonormal columns:
No later appears in this expression, which is why has zeros below the diagonal. The diagonal entry is the length of the new component that remains after subtracting earlier projections. If that length is zero, the original columns were dependent and the standard full-column QR construction breaks down.
For hand computation, classical Gram-Schmidt is transparent because each projection is visible. For numerical computation, however, classical Gram-Schmidt can lose orthogonality when vectors are nearly dependent. Modified Gram-Schmidt rearranges the same mathematical operations to reduce error accumulation. Householder QR uses orthogonal reflections to zero out entire subcolumns and is the usual dense-matrix workhorse. These algorithms produce the same kind of factorization, but they differ in stability and implementation cost.
The QR factorization also explains why orthonormal columns are ideal for least squares. If has orthonormal columns, then
The second term is independent of , so the best choice is . For a general , the vector plays the role of those orthonormal coordinates. This is why QR is more than a neat decomposition: it converts a geometric projection problem into a triangular system.
When the columns of are not independent, QR still exists in variants with column pivoting or reduced rank. In that setting, the diagonal entries of help reveal numerical rank. A very small diagonal entry means a new column adds little independent information beyond the previous columns. This diagnostic role is one reason QR appears in data fitting and numerical rank estimation.
Visual
ASCII projection removal:
v2
*
|\
| \
| \ u2 = v2 - proj_u1(v2)
| *
| /
| /
*--------------------> u1
0 proj_u1(v2)
Worked example 1: Apply Gram-Schmidt in R2
Problem: orthonormalize
Step 1: set
Normalize:
Step 2: subtract the projection of onto :
Thus
Step 3: normalize .
so
Checked answer: , and both vectors have norm .
Worked example 2: Build QR and solve a least-squares system
Problem: let
Find the QR decomposition by Gram-Schmidt and solve the least-squares problem.
Step 1: columns are
Normalize :
Step 2: compute
Remove this component:
Then
Step 3: solve . The right side entries are
The triangular system is
The second equation gives . The first gives
Multiply by :
Checked answer: .
Code
import numpy as np
A = np.array([[1, 1],
[1, 0],
[0, 1]], dtype=float)
b = np.array([2, 1, 1], dtype=float)
Q, R = np.linalg.qr(A, mode="reduced")
x = np.linalg.solve(R, Q.T @ b)
print(Q)
print(R)
print(x)
print(np.allclose(A, Q @ R))
NumPy uses stable QR algorithms rather than the classical hand version. The mathematical factorization is the same: orthonormal columns in and an upper triangular matrix .
Common pitfalls
- Forgetting to subtract all previous projections, not only the immediately previous one.
- Normalizing before subtracting projections without adjusting formulas consistently.
- Treating an orthogonal basis as automatically orthonormal.
- Using Gram-Schmidt on a dependent list without handling the zero vector that appears.
- Assuming classical Gram-Schmidt is the most stable numerical method. Modified Gram-Schmidt or Householder QR is preferred in serious computation.
- Mixing the full and reduced QR shapes.
A good hand-check for Gram-Schmidt is to verify two properties at every stage: the new vector is orthogonal to all earlier vectors, and the span has not changed. Orthogonality checks the projection arithmetic. Span preservation checks that no direction was accidentally discarded. If a nonzero vector becomes zero during the process, the original list was dependent and cannot produce the requested number of orthonormal vectors.
For QR, shape awareness prevents many mistakes. If is with independent columns and , the reduced is and is . The full is , but the additional columns are not needed for most least-squares computations. Both conventions are valid, so always identify which one is being used.
The diagonal entries of have geometric meaning. Each is the length of the component of that remains after removing projections onto the earlier vectors. Large values mean the column contributes a substantial new direction. Tiny values mean the column is nearly dependent on earlier columns, which can be a warning sign in data fitting.
When QR is used for least squares, the residual is still the central geometric object. QR does not change the goal; it provides a stable way to compute the projection onto the column space. After solving , it is still useful to compute and check that is close to zero.
The order of the input vectors affects the intermediate Gram-Schmidt vectors. Different orders can produce different orthonormal bases for the same subspace. This is not a contradiction, because bases are not unique. The subspace is the invariant object; the particular orthonormal basis depends on the construction choices.
In exact arithmetic, the formula using projections onto and the formula using dot products with normalized are equivalent. In computation, the normalized version is often easier to organize because the coefficient of is simply . Keeping vectors normalized also makes it easier to detect loss of orthogonality.
QR also gives a clean determinant fact for square matrices. If and is orthogonal, then , so
Since is triangular, its determinant is the product of its diagonal entries. This is another example of a factorization turning a hard matrix question into an easier triangular one.
A compact verification for QR is to check both and . The first condition confirms orthonormal columns; the second confirms that the original matrix was reconstructed. One without the other is not enough.
Together they verify the geometry and the factorization.