Singular Value Decomposition
The singular value decomposition, or SVD, is a universal matrix factorization. Unlike diagonalization, it applies to every real matrix, square or rectangular. It separates a linear map into orthogonal input directions, nonnegative stretch factors, and orthogonal output directions. This makes it central for rank, least squares, conditioning, compression, and data analysis.
The geometric picture is especially useful: a matrix first rotates or reflects the input space, then stretches along coordinate axes by singular values, then rotates or reflects into the output space. Small singular values mark directions that are nearly lost; zero singular values mark directions that are completely collapsed.
Definitions
For an matrix , a singular value decomposition is
where is an orthogonal matrix, is an orthogonal matrix, and is an diagonal-shaped matrix with nonnegative entries
The numbers are the singular values of . They are the square roots of the eigenvalues of :
If has rank , its reduced SVD is
using only the positive singular values.
The singular value expansion is
Each term is a rank-one matrix.
Key results
Every real matrix has an SVD. This is stronger than diagonalization, which requires a square matrix and enough eigenvectors. The SVD exists because is symmetric and positive semidefinite, so it has an orthonormal eigenbasis.
The rank of equals the number of positive singular values. The null space of is spanned by the right singular vectors corresponding to zero singular values.
The matrix norm induced by the Euclidean vector norm is the largest singular value:
For an invertible square matrix, the -norm condition number is
Small singular values signal directions where the map nearly collapses, making inverse problems sensitive.
The best rank- approximation theorem says that if
then is the closest rank- matrix to in both spectral norm and Frobenius norm. This is the mathematical basis of low-rank compression.
The Moore-Penrose pseudoinverse can be written from the SVD:
where reciprocates the positive singular values and transposes the diagonal-shaped matrix. It gives least-squares solutions, including minimum-norm solutions for rank-deficient systems.
The SVD is closely related to four fundamental subspaces. The right singular vectors associated with positive singular values form an orthonormal basis for the row space of . The right singular vectors associated with zero singular values form an orthonormal basis for the null space. The left singular vectors associated with positive singular values form an orthonormal basis for the column space. The remaining left singular vectors span the left null space.
This structure makes the SVD a complete coordinate description of what does. If
then each right singular direction is sent to the corresponding left singular direction and scaled by . If , that input direction is collapsed to zero. No other standard factorization gives such a direct geometric account for every rectangular matrix.
The SVD also explains why solving linear systems can be sensitive. Suppose is invertible with small singular value . Inverting divides by singular values, so components of the data in directions corresponding to small singular values are amplified by . If measurement noise has any component in those directions, the computed solution can change dramatically. Regularization methods often work by damping or discarding the effect of tiny singular values.
For data matrices, singular values measure how much variation is captured by each rank-one term. Keeping the largest few singular values gives a low-dimensional summary. In image compression, for example, a grayscale image can be treated as a matrix. A rank- SVD approximation stores only singular values and pairs of singular vectors, often preserving broad visual structure while discarding fine detail.
Although the SVD is powerful, it is not always the cheapest tool. QR is usually preferred for ordinary full-rank least squares, and LU is usually preferred for square solves. The SVD is the tool to reach for when rank, conditioning, compression, or near-dependence is central.
Visual
ASCII block form:
A = U Sigma V^T
m x n m x m m x n n x n
[s1 0 0]
[0 s2 0]
[0 0 0]
| Factorization | Applies to | Core diagonal values | Orthogonal factors? |
|---|---|---|---|
| Eigendecomposition | some square matrices | eigenvalues | not always |
| Orthogonal diagonalization | real symmetric matrices | eigenvalues | yes |
| SVD | every real matrix | singular values | yes |
Worked example 1: Compute an SVD by hand for a diagonal matrix
Problem: find an SVD of
Step 1: compute .
Step 2: eigenvalues of are and , so the singular values are
Step 3: the right singular vectors can be the standard basis:
Step 4: compute left singular vectors by .
Complete with . Then , , and . Checked answer: .
Worked example 2: Low-rank approximation from singular values
Problem: suppose
with and , where all singular vectors are unit and mutually orthogonal in their respective spaces. Find the best rank-one approximation and its error in spectral norm.
Step 1: keep only the largest singular term:
Step 2: subtract:
Step 3: the spectral norm of a rank-one singular term with unit vectors is . Therefore
Checked answer: the best rank-one approximation is , and the spectral-norm error is the next singular value, .
Code
import numpy as np
A = np.array([[3, 0],
[0, 2],
[0, 0]], dtype=float)
U, s, Vt = np.linalg.svd(A, full_matrices=True)
Sigma = np.zeros_like(A, dtype=float)
Sigma[:len(s), :len(s)] = np.diag(s)
print(s)
print(np.allclose(A, U @ Sigma @ Vt))
A1 = s[0] * np.outer(U[:, 0], Vt[0, :])
print(A1)
print(np.linalg.norm(A - A1, 2))
The final norm equals the second singular value for this example, illustrating the best rank-one approximation error.
Common pitfalls
- Confusing singular values with eigenvalues. Singular values are always nonnegative and exist for rectangular matrices.
- Forgetting that and may have different sizes.
- Assuming and have the same dimensions. They share the same positive eigenvalues, but their sizes differ.
- Dropping small singular values without considering the scale and purpose of the problem.
- Treating the SVD as unique. Singular vectors can change sign, and repeated singular values allow rotations within singular subspaces.
- Using full SVD when reduced SVD is enough for a rank or least-squares computation.
A good SVD interpretation starts with the equations . They say exactly what happens to each orthonormal input direction. If is large, that direction is stretched strongly. If is small, that direction is nearly collapsed. If , that direction lies in the null space. This direction-by-direction description is often clearer than the full matrix product.
When using the SVD for rank decisions, numerical tolerance matters. In exact algebra, a singular value is either zero or positive. In floating-point computation, a value such as may be effectively zero relative to the scale of the matrix. Numerical rank is therefore a judgment based on tolerance, units, noise, and the purpose of the computation.
Low-rank approximation should be interpreted through the discarded singular values. Keeping terms preserves the strongest modes of the matrix. The spectral-norm error is the next singular value, while the Frobenius-norm error depends on the square root of the sum of squares of all discarded singular values. This gives a quantitative way to decide how many terms are enough.
For least squares, the SVD is especially valuable when columns are nearly dependent. Tiny singular values indicate directions in parameter space that change the prediction very little. Dividing by those values in an inverse problem amplifies noise. Truncated SVD and other regularization methods reduce this amplification by limiting the influence of unstable directions.
The relationship between and the SVD should be read carefully. If , then
Thus the right singular vectors are eigenvectors of , and the eigenvalues are squared singular values. Similarly,
so the left singular vectors are eigenvectors of . The positive eigenvalues match, but the matrices have different sizes when is rectangular.
Sign ambiguity is normal. If and are both multiplied by , the rank-one product is unchanged. For repeated singular values, even more freedom exists: singular vectors can rotate within the repeated singular subspace. Therefore software output may differ from hand output while representing the same SVD.