Skip to main content

Singular Value Decomposition

The singular value decomposition, or SVD, is a universal matrix factorization. Unlike diagonalization, it applies to every real matrix, square or rectangular. It separates a linear map into orthogonal input directions, nonnegative stretch factors, and orthogonal output directions. This makes it central for rank, least squares, conditioning, compression, and data analysis.

The geometric picture is especially useful: a matrix first rotates or reflects the input space, then stretches along coordinate axes by singular values, then rotates or reflects into the output space. Small singular values mark directions that are nearly lost; zero singular values mark directions that are completely collapsed.

Definitions

For an m×nm\times n matrix AA, a singular value decomposition is

A=UΣVT,A=U\Sigma V^T,

where UU is an m×mm\times m orthogonal matrix, VV is an n×nn\times n orthogonal matrix, and Σ\Sigma is an m×nm\times n diagonal-shaped matrix with nonnegative entries

σ1σ20.\sigma_1\geq\sigma_2\geq\cdots\geq 0.

The numbers σi\sigma_i are the singular values of AA. They are the square roots of the eigenvalues of ATAA^TA:

ATAvi=σi2vi.A^TA\mathbf{v}_i=\sigma_i^2\mathbf{v}_i.

If AA has rank rr, its reduced SVD is

A=UrΣrVrT,A=U_r\Sigma_rV_r^T,

using only the positive singular values.

The singular value expansion is

A=σ1u1v1T++σrurvrT.A=\sigma_1\mathbf{u}_1\mathbf{v}_1^T+\cdots+\sigma_r\mathbf{u}_r\mathbf{v}_r^T.

Each term is a rank-one matrix.

Key results

Every real matrix has an SVD. This is stronger than diagonalization, which requires a square matrix and enough eigenvectors. The SVD exists because ATAA^TA is symmetric and positive semidefinite, so it has an orthonormal eigenbasis.

The rank of AA equals the number of positive singular values. The null space of AA is spanned by the right singular vectors corresponding to zero singular values.

The matrix norm induced by the Euclidean vector norm is the largest singular value:

A2=σ1.\|A\|_2=\sigma_1.

For an invertible square matrix, the 22-norm condition number is

κ2(A)=σ1σn.\kappa_2(A)=\frac{\sigma_1}{\sigma_n}.

Small singular values signal directions where the map nearly collapses, making inverse problems sensitive.

The best rank-kk approximation theorem says that if

Ak=σ1u1v1T++σkukvkT,A_k=\sigma_1\mathbf{u}_1\mathbf{v}_1^T+\cdots+\sigma_k\mathbf{u}_k\mathbf{v}_k^T,

then AkA_k is the closest rank-kk matrix to AA in both spectral norm and Frobenius norm. This is the mathematical basis of low-rank compression.

The Moore-Penrose pseudoinverse can be written from the SVD:

A+=VΣ+UT,A^+=V\Sigma^+U^T,

where Σ+\Sigma^+ reciprocates the positive singular values and transposes the diagonal-shaped matrix. It gives least-squares solutions, including minimum-norm solutions for rank-deficient systems.

The SVD is closely related to four fundamental subspaces. The right singular vectors associated with positive singular values form an orthonormal basis for the row space of AA. The right singular vectors associated with zero singular values form an orthonormal basis for the null space. The left singular vectors associated with positive singular values form an orthonormal basis for the column space. The remaining left singular vectors span the left null space.

This structure makes the SVD a complete coordinate description of what AA does. If

Avi=σiui,A\mathbf{v}_i=\sigma_i\mathbf{u}_i,

then each right singular direction vi\mathbf{v}_i is sent to the corresponding left singular direction ui\mathbf{u}_i and scaled by σi\sigma_i. If σi=0\sigma_i=0, that input direction is collapsed to zero. No other standard factorization gives such a direct geometric account for every rectangular matrix.

The SVD also explains why solving linear systems can be sensitive. Suppose AA is invertible with small singular value σn\sigma_n. Inverting AA divides by singular values, so components of the data in directions corresponding to small singular values are amplified by 1/σn1/\sigma_n. If measurement noise has any component in those directions, the computed solution can change dramatically. Regularization methods often work by damping or discarding the effect of tiny singular values.

For data matrices, singular values measure how much variation is captured by each rank-one term. Keeping the largest few singular values gives a low-dimensional summary. In image compression, for example, a grayscale image can be treated as a matrix. A rank-kk SVD approximation stores only kk singular values and kk pairs of singular vectors, often preserving broad visual structure while discarding fine detail.

Although the SVD is powerful, it is not always the cheapest tool. QR is usually preferred for ordinary full-rank least squares, and LU is usually preferred for square solves. The SVD is the tool to reach for when rank, conditioning, compression, or near-dependence is central.

Visual

ASCII block form:

A = U Sigma V^T

m x n m x m m x n n x n
[s1 0 0]
[0 s2 0]
[0 0 0]
FactorizationApplies toCore diagonal valuesOrthogonal factors?
Eigendecompositionsome square matriceseigenvaluesnot always
Orthogonal diagonalizationreal symmetric matriceseigenvaluesyes
SVDevery real matrixsingular valuesyes

Worked example 1: Compute an SVD by hand for a diagonal matrix

Problem: find an SVD of

A=[300200].A= \begin{bmatrix} 3&0\\ 0&2\\ 0&0 \end{bmatrix}.

Step 1: compute ATAA^TA.

ATA=[300020][300200]=[9004].A^TA= \begin{bmatrix} 3&0&0\\ 0&2&0 \end{bmatrix} \begin{bmatrix} 3&0\\ 0&2\\ 0&0 \end{bmatrix} = \begin{bmatrix} 9&0\\ 0&4 \end{bmatrix}.

Step 2: eigenvalues of ATAA^TA are 99 and 44, so the singular values are

σ1=3,σ2=2.\sigma_1=3, \qquad \sigma_2=2.

Step 3: the right singular vectors can be the standard basis:

v1=[10],v2=[01].\mathbf{v}_1=\begin{bmatrix}1\\0\end{bmatrix}, \qquad \mathbf{v}_2=\begin{bmatrix}0\\1\end{bmatrix}.

Step 4: compute left singular vectors by ui=Avi/σi\mathbf{u}_i=A\mathbf{v}_i/\sigma_i.

u1=13[300]=[100],u2=12[020]=[010].\mathbf{u}_1= \frac{1}{3} \begin{bmatrix} 3\\0\\0 \end{bmatrix} = \begin{bmatrix} 1\\0\\0 \end{bmatrix}, \qquad \mathbf{u}_2= \frac{1}{2} \begin{bmatrix} 0\\2\\0 \end{bmatrix} = \begin{bmatrix} 0\\1\\0 \end{bmatrix}.

Complete UU with u3=[001]T\mathbf{u}_3=\begin{bmatrix}0&0&1\end{bmatrix}^T. Then U=I3U=I_3, V=I2V=I_2, and Σ=A\Sigma=A. Checked answer: A=UΣVTA=U\Sigma V^T.

Worked example 2: Low-rank approximation from singular values

Problem: suppose

A=σ1u1v1T+σ2u2v2TA=\sigma_1\mathbf{u}_1\mathbf{v}_1^T+\sigma_2\mathbf{u}_2\mathbf{v}_2^T

with σ1=10\sigma_1=10 and σ2=1\sigma_2=1, where all singular vectors are unit and mutually orthogonal in their respective spaces. Find the best rank-one approximation and its error in spectral norm.

Step 1: keep only the largest singular term:

A1=10u1v1T.A_1=10\mathbf{u}_1\mathbf{v}_1^T.

Step 2: subtract:

AA1=1u2v2T.A-A_1=1\mathbf{u}_2\mathbf{v}_2^T.

Step 3: the spectral norm of a rank-one singular term σuvT\sigma\mathbf{u}\mathbf{v}^T with unit vectors is σ\sigma. Therefore

AA12=1.\|A-A_1\|_2=1.

Checked answer: the best rank-one approximation is 10u1v1T10\mathbf{u}_1\mathbf{v}_1^T, and the spectral-norm error is the next singular value, 11.

Code

import numpy as np

A = np.array([[3, 0],
[0, 2],
[0, 0]], dtype=float)

U, s, Vt = np.linalg.svd(A, full_matrices=True)
Sigma = np.zeros_like(A, dtype=float)
Sigma[:len(s), :len(s)] = np.diag(s)

print(s)
print(np.allclose(A, U @ Sigma @ Vt))

A1 = s[0] * np.outer(U[:, 0], Vt[0, :])
print(A1)
print(np.linalg.norm(A - A1, 2))

The final norm equals the second singular value for this example, illustrating the best rank-one approximation error.

Common pitfalls

  • Confusing singular values with eigenvalues. Singular values are always nonnegative and exist for rectangular matrices.
  • Forgetting that UU and VV may have different sizes.
  • Assuming ATAA^TA and AATAA^T have the same dimensions. They share the same positive eigenvalues, but their sizes differ.
  • Dropping small singular values without considering the scale and purpose of the problem.
  • Treating the SVD as unique. Singular vectors can change sign, and repeated singular values allow rotations within singular subspaces.
  • Using full SVD when reduced SVD is enough for a rank or least-squares computation.

A good SVD interpretation starts with the equations Avi=σiuiA\mathbf{v}_i=\sigma_i\mathbf{u}_i. They say exactly what happens to each orthonormal input direction. If σi\sigma_i is large, that direction is stretched strongly. If σi\sigma_i is small, that direction is nearly collapsed. If σi=0\sigma_i=0, that direction lies in the null space. This direction-by-direction description is often clearer than the full matrix product.

When using the SVD for rank decisions, numerical tolerance matters. In exact algebra, a singular value is either zero or positive. In floating-point computation, a value such as 101410^{-14} may be effectively zero relative to the scale of the matrix. Numerical rank is therefore a judgment based on tolerance, units, noise, and the purpose of the computation.

Low-rank approximation should be interpreted through the discarded singular values. Keeping kk terms preserves the strongest kk modes of the matrix. The spectral-norm error is the next singular value, while the Frobenius-norm error depends on the square root of the sum of squares of all discarded singular values. This gives a quantitative way to decide how many terms are enough.

For least squares, the SVD is especially valuable when columns are nearly dependent. Tiny singular values indicate directions in parameter space that change the prediction very little. Dividing by those values in an inverse problem amplifies noise. Truncated SVD and other regularization methods reduce this amplification by limiting the influence of unstable directions.

The relationship between ATAA^TA and the SVD should be read carefully. If A=UΣVTA=U\Sigma V^T, then

ATA=VΣTΣVT.A^TA=V\Sigma^T\Sigma V^T.

Thus the right singular vectors are eigenvectors of ATAA^TA, and the eigenvalues are squared singular values. Similarly,

AAT=UΣΣTUT,AA^T=U\Sigma\Sigma^TU^T,

so the left singular vectors are eigenvectors of AATAA^T. The positive eigenvalues match, but the matrices have different sizes when AA is rectangular.

Sign ambiguity is normal. If ui\mathbf{u}_i and vi\mathbf{v}_i are both multiplied by 1-1, the rank-one product uiviT\mathbf{u}_i\mathbf{v}_i^T is unchanged. For repeated singular values, even more freedom exists: singular vectors can rotate within the repeated singular subspace. Therefore software output may differ from hand output while representing the same SVD.

Connections