Conjugate Gradient and Iterative Refinement
Conjugate gradient is an iterative method designed for symmetric positive definite linear systems. It improves on stationary iterations by choosing search directions that are conjugate with respect to the matrix , which prevents the method from repeatedly correcting the same error component. For large sparse SPD systems, CG is often the first serious method to try.
Iterative refinement is a different idea: compute a solution, form the residual, solve for a correction, and update the solution. It can improve the accuracy of a direct solve when the residual is computed accurately enough and the original problem is not too ill-conditioned.
Definitions
For an SPD matrix , solving is equivalent to minimizing the quadratic function
The residual is
CG starts from , sets , and chooses the first search direction . The basic recurrences are
Iterative refinement computes , solves , and replaces by .
Key results
In exact arithmetic, CG terminates in at most steps for an SPD matrix. In floating-point arithmetic it is used as an iterative method, and convergence depends strongly on the eigenvalue distribution of . A standard bound is
The -norm is defined by . The bound shows why preconditioning matters: reducing the effective condition number can dramatically reduce iteration count.
CG search directions satisfy for in exact arithmetic, and residuals are mutually orthogonal. These properties explain the method's efficiency. They also explain why roundoff can eventually slow progress; finite precision gradually erodes exact orthogonality.
Iterative refinement improves a computed solution when the correction equation is solved accurately enough relative to the conditioning. It is especially useful when factorization is done once and corrections are cheap triangular solves.
A reliable way to use these results is to keep the analysis tied to the actual numerical question rather than to the formula alone. For conjugate gradient and iterative refinement, the input record should include SPD structure, residual norm, preconditioner choice, and correction precision. Without that record, two computations that look similar on paper may have different numerical meanings. The same formula can be a safe production tool in one scaling and a fragile experiment in another. This is why the examples on this page show the intermediate arithmetic: the goal is not only to reach a number, but to expose what assumptions made that number meaningful.
The next record is the verification record. Useful diagnostics for this topic include residual norm, A-norm error estimates, and improvement after refinement. A diagnostic should be chosen before the computation is trusted, not after a pleasing answer appears. When an exact answer is unavailable, compare two independent approximations, refine the mesh or tolerance, check a residual, or test the method on a neighboring problem with known behavior. If several diagnostics disagree, treat the disagreement as information about conditioning, stability, or implementation rather than as a nuisance to be averaged away.
The cost record matters as well. In this topic the dominant costs are usually sparse matrix-vector products, preconditioner solves, and residual recomputation. Numerical analysis is full of methods that are mathematically attractive but computationally mismatched to the problem size. A dense factorization may be acceptable for a classroom matrix and impossible for a PDE grid. A high-order rule may use fewer steps but more expensive stages. A guaranteed method may take many iterations but provide a bound that a faster method cannot. The right comparison is therefore cost to reach a verified tolerance, not order or elegance in isolation.
Finally, every method here has a recognizable failure mode: loss of conjugacy, poor conditioning, and refinement residuals computed too inaccurately. These failures are not edge cases to memorize; they are signals that the hypotheses behind the result have been violated or that a different numerical model is needed. A good implementation makes such failures visible through exceptions, warnings, residual reports, or conservative stopping rules. A good hand solution does the same thing in prose by naming the assumption being used and checking it at the point where it matters.
For study purposes, the most useful habit is to separate four layers: the continuous mathematical problem, the discrete approximation, the algebraic or iterative algorithm used to compute it, and the diagnostic used to judge the result. Many mistakes come from mixing these layers. A small algebraic residual may not mean a small modeling error. A small step-to-step change may not mean the discrete equations are solved. A high-order truncation formula may not help when the data are noisy or the arithmetic is unstable. Keeping the layers separate makes the results on this page portable to larger examples.
Visual
| Method | Matrix class | Main cost | Strength | Limitation |
|---|---|---|---|---|
| CG | SPD | sparse matrix-vector products | fast for large sparse SPD | needs preconditioning for hard spectra |
| Jacobi/GS | special convergence cases | sparse sweeps | simple smoothers | slow alone |
| Iterative refinement | already factored systems | residual plus correction solves | improves direct solution | limited by conditioning and residual precision |
| Preconditioned CG | SPD with preconditioner | solve with plus matvec | reduces effective condition number | preconditioner design is problem-specific |
Worked example 1: two CG steps solve a 2 by 2 system
Problem. Use CG from for
Method. Start with and .
- Compute
- Update
- Compute
- The next direction is
- The second step gives .
Checked answer. Direct solution gives , matching the CG result after two steps in exact arithmetic.
Worked example 2: one iterative refinement correction
Problem. For the same matrix and right-hand side, suppose
Compute one refinement correction.
Method. Form .
- Multiply:
- Residual:
- Solve . Since ,
- Correct:
Checked answer. One correction recovers the exact solution to the displayed precision.
Code
import numpy as np
def conjugate_gradient(A, b, x0=None, tol=1e-10, max_iter=None):
A = np.asarray(A, dtype=float)
b = np.asarray(b, dtype=float)
n = len(b)
x = np.zeros(n) if x0 is None else np.asarray(x0, dtype=float)
max_iter = n if max_iter is None else max_iter
r = b - A @ x
p = r.copy()
rs_old = float(r @ r)
for k in range(1, max_iter + 1):
Ap = A @ p
alpha = rs_old / float(p @ Ap)
x = x + alpha * p
r = r - alpha * Ap
rs_new = float(r @ r)
if rs_new**0.5 < tol:
return x, k
p = r + (rs_new / rs_old) * p
rs_old = rs_new
return x, max_iter
def iterative_refinement(A, b, x, steps=1):
A = np.asarray(A, dtype=float)
b = np.asarray(b, dtype=float)
x = np.asarray(x, dtype=float).copy()
for _ in range(steps):
r = b - A @ x
d = np.linalg.solve(A, r)
x = x + d
return x
A = np.array([[4.0, 1.0], [1.0, 3.0]])
b = np.array([1.0, 2.0])
print(conjugate_gradient(A, b, max_iter=2))
print(iterative_refinement(A, b, np.array([0.09, 0.64])))
Common pitfalls
- Using CG on a nonsymmetric or indefinite matrix. The standard method requires SPD structure.
- Monitoring only iterate changes instead of residual norms.
- Expecting the exact -step termination property in floating-point arithmetic.
- Ignoring preconditioning for ill-conditioned SPD systems.
- Performing refinement with a residual computed at the same low precision that caused the original error.