Skip to main content

Orthogonality, Gram-Schmidt, and QR

Orthogonal bases are computationally valuable because coordinates become dot products and projections separate cleanly. The Gram-Schmidt process converts an independent set into an orthogonal or orthonormal set with the same span. In matrix form, this becomes QR decomposition, one of the central tools for least squares and numerical linear algebra.

The main idea is to remove from each new vector the parts already explained by previous directions. What remains is perpendicular to the earlier directions. Repeating this produces a basis with the same span but much cleaner geometry.

Definitions

A set {q1,,qk}\{\mathbf{q}_1,\ldots,\mathbf{q}_k\} is orthonormal if

qiqj={1,i=j,0,ij.\mathbf{q}_i\cdot\mathbf{q}_j= \begin{cases} 1,&i=j,\\ 0,&i\neq j. \end{cases}

The orthogonal projection of v\mathbf{v} onto a subspace WW with orthonormal basis q1,,qk\mathbf{q}_1,\ldots,\mathbf{q}_k is

projW(v)=(vq1)q1++(vqk)qk.\operatorname{proj}_W(\mathbf{v}) = (\mathbf{v}\cdot\mathbf{q}_1)\mathbf{q}_1+\cdots+ (\mathbf{v}\cdot\mathbf{q}_k)\mathbf{q}_k.

A QR decomposition of an m×nm\times n matrix AA with independent columns is

A=QR,A=QR,

where QQ has orthonormal columns and RR is upper triangular with positive diagonal entries in the standard construction.

Classical Gram-Schmidt starts with independent vectors v1,,vk\mathbf{v}_1,\ldots,\mathbf{v}_k and produces orthogonal vectors u1,,uk\mathbf{u}_1,\ldots,\mathbf{u}_k:

u1=v1,\mathbf{u}_1=\mathbf{v}_1,

and

uj=vji=1j1projui(vj).\mathbf{u}_j=\mathbf{v}_j-\sum_{i=1}^{j-1} \operatorname{proj}_{\mathbf{u}_i}(\mathbf{v}_j).

Then qj=uj/uj\mathbf{q}_j=\mathbf{u}_j/\|\mathbf{u}_j\| gives an orthonormal set.

Key results

Gram-Schmidt preserves span at each step:

span{v1,,vj}=span{u1,,uj}.\operatorname{span}\{\mathbf{v}_1,\ldots,\mathbf{v}_j\} = \operatorname{span}\{\mathbf{u}_1,\ldots,\mathbf{u}_j\}.

The reason is that uj\mathbf{u}_j is obtained from vj\mathbf{v}_j by subtracting a vector in the span of the previous u\mathbf{u}'s, and the operation can be reversed.

The vectors produced by Gram-Schmidt are orthogonal. When constructing uj\mathbf{u}_j, the projection terms remove exactly the components in the earlier directions. For i<ji\lt j,

ujui=0.\mathbf{u}_j\cdot\mathbf{u}_i=0.

If AA has independent columns a1,,an\mathbf{a}_1,\ldots,\mathbf{a}_n, applying Gram-Schmidt to those columns gives

A=QR.A=QR.

The entries of RR are dot products and norms:

rij=qiaj(ij),rjj=uj.r_{ij}=\mathbf{q}_i\cdot\mathbf{a}_j \quad (i\leq j), \qquad r_{jj}=\|\mathbf{u}_j\|.

Because QTQ=IQ^TQ=I, QR simplifies least squares. Instead of solving normal equations ATAx^=ATbA^TA\hat{\mathbf{x}}=A^T\mathbf{b}, one can solve

Rx^=QTb.R\hat{\mathbf{x}}=Q^T\mathbf{b}.

This is often more numerically stable.

The upper triangular shape of RR records the order in which new directions are explained by previous orthonormal directions. If A=[a1  an]A=[\mathbf{a}_1\ \cdots\ \mathbf{a}_n] and Q=[q1  qn]Q=[\mathbf{q}_1\ \cdots\ \mathbf{q}_n], then the equation A=QRA=QR says each original column can be reconstructed from the orthonormal columns:

aj=r1jq1+r2jq2++rjjqj.\mathbf{a}_j=r_{1j}\mathbf{q}_1+r_{2j}\mathbf{q}_2+\cdots+r_{jj}\mathbf{q}_j.

No later q\mathbf{q} appears in this expression, which is why RR has zeros below the diagonal. The diagonal entry rjjr_{jj} is the length of the new component that remains after subtracting earlier projections. If that length is zero, the original columns were dependent and the standard full-column QR construction breaks down.

For hand computation, classical Gram-Schmidt is transparent because each projection is visible. For numerical computation, however, classical Gram-Schmidt can lose orthogonality when vectors are nearly dependent. Modified Gram-Schmidt rearranges the same mathematical operations to reduce error accumulation. Householder QR uses orthogonal reflections to zero out entire subcolumns and is the usual dense-matrix workhorse. These algorithms produce the same kind of factorization, but they differ in stability and implementation cost.

The QR factorization also explains why orthonormal columns are ideal for least squares. If QQ has orthonormal columns, then

Qxb2=xQTb2+bQQTb2.\|Q\mathbf{x}-\mathbf{b}\|^2 = \|\mathbf{x}-Q^T\mathbf{b}\|^2+\|\mathbf{b}-QQ^T\mathbf{b}\|^2.

The second term is independent of x\mathbf{x}, so the best choice is x=QTb\mathbf{x}=Q^T\mathbf{b}. For a general A=QRA=QR, the vector Rx^R\hat{\mathbf{x}} plays the role of those orthonormal coordinates. This is why QR is more than a neat decomposition: it converts a geometric projection problem into a triangular system.

When the columns of AA are not independent, QR still exists in variants with column pivoting or reduced rank. In that setting, the diagonal entries of RR help reveal numerical rank. A very small diagonal entry means a new column adds little independent information beyond the previous columns. This diagnostic role is one reason QR appears in data fitting and numerical rank estimation.

Visual

ASCII projection removal:

v2
*
|\
| \
| \ u2 = v2 - proj_u1(v2)
| *
| /
| /
*--------------------> u1
0 proj_u1(v2)

Worked example 1: Apply Gram-Schmidt in R2

Problem: orthonormalize

v1=[11],v2=[10].\mathbf{v}_1= \begin{bmatrix} 1\\1 \end{bmatrix}, \qquad \mathbf{v}_2= \begin{bmatrix} 1\\0 \end{bmatrix}.

Step 1: set

u1=v1=[11].\mathbf{u}_1=\mathbf{v}_1= \begin{bmatrix} 1\\1 \end{bmatrix}.

Normalize:

u1=12+12=2,q1=12[11].\|\mathbf{u}_1\|=\sqrt{1^2+1^2}=\sqrt2, \qquad \mathbf{q}_1=\frac{1}{\sqrt2} \begin{bmatrix} 1\\1 \end{bmatrix}.

Step 2: subtract the projection of v2\mathbf{v}_2 onto u1\mathbf{u}_1:

proju1(v2)=v2u1u1u1u1=12[11]=[1/21/2].\operatorname{proj}_{\mathbf{u}_1}(\mathbf{v}_2) = \frac{\mathbf{v}_2\cdot\mathbf{u}_1}{\mathbf{u}_1\cdot\mathbf{u}_1}\mathbf{u}_1 = \frac{1}{2} \begin{bmatrix} 1\\1 \end{bmatrix} = \begin{bmatrix} 1/2\\1/2 \end{bmatrix}.

Thus

u2=[10][1/21/2]=[1/21/2].\mathbf{u}_2= \begin{bmatrix} 1\\0 \end{bmatrix} - \begin{bmatrix} 1/2\\1/2 \end{bmatrix} = \begin{bmatrix} 1/2\\-1/2 \end{bmatrix}.

Step 3: normalize u2\mathbf{u}_2.

u2=1/4+1/4=12,\|\mathbf{u}_2\|=\sqrt{1/4+1/4}=\frac{1}{\sqrt2},

so

q2=[1/21/2].\mathbf{q}_2= \begin{bmatrix} 1/\sqrt2\\ -1/\sqrt2 \end{bmatrix}.

Checked answer: q1q2=1/21/2=0\mathbf{q}_1\cdot\mathbf{q}_2=1/2-1/2=0, and both vectors have norm 11.

Worked example 2: Build QR and solve a least-squares system

Problem: let

A=[111001],b=[211].A= \begin{bmatrix} 1&1\\ 1&0\\ 0&1 \end{bmatrix}, \qquad \mathbf{b}= \begin{bmatrix} 2\\1\\1 \end{bmatrix}.

Find the QR decomposition by Gram-Schmidt and solve the least-squares problem.

Step 1: columns are

a1=[110],a2=[101].\mathbf{a}_1= \begin{bmatrix}1\\1\\0\end{bmatrix}, \qquad \mathbf{a}_2= \begin{bmatrix}1\\0\\1\end{bmatrix}.

Normalize a1\mathbf{a}_1:

r11=a1=2,q1=12[110].r_{11}=\|\mathbf{a}_1\|=\sqrt2, \qquad \mathbf{q}_1=\frac{1}{\sqrt2}\begin{bmatrix}1\\1\\0\end{bmatrix}.

Step 2: compute

r12=q1a2=12.r_{12}=\mathbf{q}_1\cdot\mathbf{a}_2=\frac{1}{\sqrt2}.

Remove this component:

u2=a2r12q1=[101]12[110]=[1/21/21].\mathbf{u}_2=\mathbf{a}_2-r_{12}\mathbf{q}_1 = \begin{bmatrix}1\\0\\1\end{bmatrix} - \frac12\begin{bmatrix}1\\1\\0\end{bmatrix} = \begin{bmatrix}1/2\\-1/2\\1\end{bmatrix}.

Then

r22=u2=14+14+1=32.r_{22}=\|\mathbf{u}_2\|=\sqrt{\frac14+\frac14+1}=\sqrt{\frac32}.

Step 3: solve Rx^=QTbR\hat{\mathbf{x}}=Q^T\mathbf{b}. The right side entries are

q1b=32,q2b=13/2.\mathbf{q}_1\cdot\mathbf{b}=\frac{3}{\sqrt2}, \qquad \mathbf{q}_2\cdot\mathbf{b}=\frac{1}{\sqrt{3/2}}.

The triangular system is

[21/203/2][x1x2]=[3/21/3/2].\begin{bmatrix} \sqrt2&1/\sqrt2\\ 0&\sqrt{3/2} \end{bmatrix} \begin{bmatrix} x_1\\x_2 \end{bmatrix} = \begin{bmatrix} 3/\sqrt2\\ 1/\sqrt{3/2} \end{bmatrix}.

The second equation gives x2=2/3x_2=2/3. The first gives

2x1+1223=32.\sqrt2x_1+\frac{1}{\sqrt2}\frac23=\frac{3}{\sqrt2}.

Multiply by 2\sqrt2:

2x1+23=3x1=76.2x_1+\frac23=3 \quad\Longrightarrow\quad x_1=\frac76.

Checked answer: x^=[7/62/3]T\hat{\mathbf{x}}=\begin{bmatrix}7/6&2/3\end{bmatrix}^T.

Code

import numpy as np

A = np.array([[1, 1],
[1, 0],
[0, 1]], dtype=float)
b = np.array([2, 1, 1], dtype=float)

Q, R = np.linalg.qr(A, mode="reduced")
x = np.linalg.solve(R, Q.T @ b)

print(Q)
print(R)
print(x)
print(np.allclose(A, Q @ R))

NumPy uses stable QR algorithms rather than the classical hand version. The mathematical factorization is the same: orthonormal columns in QQ and an upper triangular matrix RR.

Common pitfalls

  • Forgetting to subtract all previous projections, not only the immediately previous one.
  • Normalizing before subtracting projections without adjusting formulas consistently.
  • Treating an orthogonal basis as automatically orthonormal.
  • Using Gram-Schmidt on a dependent list without handling the zero vector that appears.
  • Assuming classical Gram-Schmidt is the most stable numerical method. Modified Gram-Schmidt or Householder QR is preferred in serious computation.
  • Mixing the full and reduced QR shapes.

A good hand-check for Gram-Schmidt is to verify two properties at every stage: the new vector is orthogonal to all earlier vectors, and the span has not changed. Orthogonality checks the projection arithmetic. Span preservation checks that no direction was accidentally discarded. If a nonzero vector becomes zero during the process, the original list was dependent and cannot produce the requested number of orthonormal vectors.

For QR, shape awareness prevents many mistakes. If AA is m×nm\times n with independent columns and mnm\geq n, the reduced QQ is m×nm\times n and RR is n×nn\times n. The full QQ is m×mm\times m, but the additional columns are not needed for most least-squares computations. Both conventions are valid, so always identify which one is being used.

The diagonal entries of RR have geometric meaning. Each rjjr_{jj} is the length of the component of aj\mathbf{a}_j that remains after removing projections onto the earlier q\mathbf{q} vectors. Large values mean the column contributes a substantial new direction. Tiny values mean the column is nearly dependent on earlier columns, which can be a warning sign in data fitting.

When QR is used for least squares, the residual is still the central geometric object. QR does not change the goal; it provides a stable way to compute the projection onto the column space. After solving Rx^=QTbR\hat{\mathbf{x}}=Q^T\mathbf{b}, it is still useful to compute r=bAx^\mathbf{r}=\mathbf{b}-A\hat{\mathbf{x}} and check that ATrA^T\mathbf{r} is close to zero.

The order of the input vectors affects the intermediate Gram-Schmidt vectors. Different orders can produce different orthonormal bases for the same subspace. This is not a contradiction, because bases are not unique. The subspace is the invariant object; the particular orthonormal basis depends on the construction choices.

In exact arithmetic, the formula using projections onto ui\mathbf{u}_i and the formula using dot products with normalized qi\mathbf{q}_i are equivalent. In computation, the normalized version is often easier to organize because the coefficient of qi\mathbf{q}_i is simply vqi\mathbf{v}\cdot\mathbf{q}_i. Keeping vectors normalized also makes it easier to detect loss of orthogonality.

QR also gives a clean determinant fact for square matrices. If A=QRA=QR and QQ is orthogonal, then det(Q)=1\vert \det(Q)\vert =1, so

det(A)=det(R).|\det(A)|=|\det(R)|.

Since RR is triangular, its determinant is the product of its diagonal entries. This is another example of a factorization turning a hard matrix question into an easier triangular one.

A compact verification for QR is to check both QTQ=IQ^TQ=I and QR=AQR=A. The first condition confirms orthonormal columns; the second confirms that the original matrix was reconstructed. One without the other is not enough.

Together they verify the geometry and the factorization.

Connections