Skip to main content

Conjugate Gradient and Iterative Refinement

Conjugate gradient is an iterative method designed for symmetric positive definite linear systems. It improves on stationary iterations by choosing search directions that are conjugate with respect to the matrix AA, which prevents the method from repeatedly correcting the same error component. For large sparse SPD systems, CG is often the first serious method to try.

Iterative refinement is a different idea: compute a solution, form the residual, solve for a correction, and update the solution. It can improve the accuracy of a direct solve when the residual is computed accurately enough and the original problem is not too ill-conditioned.

Definitions

For an SPD matrix AA, solving Ax=bAx=b is equivalent to minimizing the quadratic function

ϕ(x)=12xTAxbTx.\phi(x)=\frac12x^TAx-b^Tx.

The residual is

r(k)=bAx(k).r^{(k)}=b-Ax^{(k)}.

CG starts from x(0)x^{(0)}, sets r(0)=bAx(0)r^{(0)}=b-Ax^{(0)}, and chooses the first search direction p(0)=r(0)p^{(0)}=r^{(0)}. The basic recurrences are

αk=(r(k))Tr(k)(p(k))TAp(k),\alpha_k=\frac{(r^{(k)})^Tr^{(k)}}{(p^{(k)})^TAp^{(k)}}, x(k+1)=x(k)+αkp(k),r(k+1)=r(k)αkAp(k),x^{(k+1)}=x^{(k)}+\alpha_kp^{(k)}, \qquad r^{(k+1)}=r^{(k)}-\alpha_kAp^{(k)}, βk=(r(k+1))Tr(k+1)(r(k))Tr(k),p(k+1)=r(k+1)+βkp(k).\beta_k=\frac{(r^{(k+1)})^Tr^{(k+1)}}{(r^{(k)})^Tr^{(k)}}, \qquad p^{(k+1)}=r^{(k+1)}+\beta_kp^{(k)}.

Iterative refinement computes r=bAx^r=b-A\hat{x}, solves Ad=rAd=r, and replaces x^\hat{x} by x^+d\hat{x}+d.

Key results

In exact arithmetic, CG terminates in at most nn steps for an n×nn\times n SPD matrix. In floating-point arithmetic it is used as an iterative method, and convergence depends strongly on the eigenvalue distribution of AA. A standard bound is

x(k)xA2(κ(A)1κ(A)+1)kx(0)xA.\|x^{(k)}-x\|_A \le 2\left(\frac{\sqrt{\kappa(A)}-1}{\sqrt{\kappa(A)}+1}\right)^k \|x^{(0)}-x\|_A.

The AA-norm is defined by vA=vTAv\|v\|_A=\sqrt{v^TAv}. The bound shows why preconditioning matters: reducing the effective condition number can dramatically reduce iteration count.

CG search directions satisfy piTApj=0p_i^TAp_j=0 for iji\ne j in exact arithmetic, and residuals are mutually orthogonal. These properties explain the method's efficiency. They also explain why roundoff can eventually slow progress; finite precision gradually erodes exact orthogonality.

Iterative refinement improves a computed solution when the correction equation is solved accurately enough relative to the conditioning. It is especially useful when factorization is done once and corrections are cheap triangular solves.

A reliable way to use these results is to keep the analysis tied to the actual numerical question rather than to the formula alone. For conjugate gradient and iterative refinement, the input record should include SPD structure, residual norm, preconditioner choice, and correction precision. Without that record, two computations that look similar on paper may have different numerical meanings. The same formula can be a safe production tool in one scaling and a fragile experiment in another. This is why the examples on this page show the intermediate arithmetic: the goal is not only to reach a number, but to expose what assumptions made that number meaningful.

The next record is the verification record. Useful diagnostics for this topic include residual norm, A-norm error estimates, and improvement after refinement. A diagnostic should be chosen before the computation is trusted, not after a pleasing answer appears. When an exact answer is unavailable, compare two independent approximations, refine the mesh or tolerance, check a residual, or test the method on a neighboring problem with known behavior. If several diagnostics disagree, treat the disagreement as information about conditioning, stability, or implementation rather than as a nuisance to be averaged away.

The cost record matters as well. In this topic the dominant costs are usually sparse matrix-vector products, preconditioner solves, and residual recomputation. Numerical analysis is full of methods that are mathematically attractive but computationally mismatched to the problem size. A dense factorization may be acceptable for a classroom matrix and impossible for a PDE grid. A high-order rule may use fewer steps but more expensive stages. A guaranteed method may take many iterations but provide a bound that a faster method cannot. The right comparison is therefore cost to reach a verified tolerance, not order or elegance in isolation.

Finally, every method here has a recognizable failure mode: loss of conjugacy, poor conditioning, and refinement residuals computed too inaccurately. These failures are not edge cases to memorize; they are signals that the hypotheses behind the result have been violated or that a different numerical model is needed. A good implementation makes such failures visible through exceptions, warnings, residual reports, or conservative stopping rules. A good hand solution does the same thing in prose by naming the assumption being used and checking it at the point where it matters.

For study purposes, the most useful habit is to separate four layers: the continuous mathematical problem, the discrete approximation, the algebraic or iterative algorithm used to compute it, and the diagnostic used to judge the result. Many mistakes come from mixing these layers. A small algebraic residual may not mean a small modeling error. A small step-to-step change may not mean the discrete equations are solved. A high-order truncation formula may not help when the data are noisy or the arithmetic is unstable. Keeping the layers separate makes the results on this page portable to larger examples.

Visual

MethodMatrix classMain costStrengthLimitation
CGSPDsparse matrix-vector productsfast for large sparse SPDneeds preconditioning for hard spectra
Jacobi/GSspecial convergence casessparse sweepssimple smoothersslow alone
Iterative refinementalready factored systemsresidual plus correction solvesimproves direct solutionlimited by conditioning and residual precision
Preconditioned CGSPD with preconditionersolve with MM plus matvecreduces effective condition numberpreconditioner design is problem-specific

Worked example 1: two CG steps solve a 2 by 2 system

Problem. Use CG from x(0)=(0,0)Tx^{(0)}=(0,0)^T for

A=[4113],b=[12].A=\begin{bmatrix}4&1\\1&3\end{bmatrix}, \qquad b=\begin{bmatrix}1\\2\end{bmatrix}.

Method. Start with r(0)=br^{(0)}=b and p(0)=bp^{(0)}=b.

  1. Compute
Ap(0)=[67],α0=bTbbTAb=520=0.25.Ap^{(0)}=\begin{bmatrix}6\\7\end{bmatrix}, \qquad \alpha_0=\frac{b^Tb}{b^TA b}=\frac{5}{20}=0.25.
  1. Update
x(1)=[0.250.5],r(1)=b0.25[67]=[0.50.25].x^{(1)}=\begin{bmatrix}0.25\\0.5\end{bmatrix}, \qquad r^{(1)}=b-0.25\begin{bmatrix}6\\7\end{bmatrix} =\begin{bmatrix}-0.5\\0.25\end{bmatrix}.
  1. Compute
β0=(r(1))Tr(1)(r(0))Tr(0)=0.31255=0.0625.\beta_0=\frac{(r^{(1)})^Tr^{(1)}}{(r^{(0)})^Tr^{(0)}}=\frac{0.3125}{5}=0.0625.
  1. The next direction is
p(1)=r(1)+0.0625p(0)=[0.43750.375].p^{(1)}=r^{(1)}+0.0625p^{(0)} =\begin{bmatrix}-0.4375\\0.375\end{bmatrix}.
  1. The second step gives x(2)=(0.090909,0.636364)Tx^{(2)}=(0.090909,0.636364)^T.

Checked answer. Direct solution gives (1/11,7/11)T(1/11,7/11)^T, matching the CG result after two steps in exact arithmetic.

Worked example 2: one iterative refinement correction

Problem. For the same matrix and right-hand side, suppose

x^=[0.090.64].\hat{x}=\begin{bmatrix}0.09\\0.64\end{bmatrix}.

Compute one refinement correction.

Method. Form r=bAx^r=b-A\hat{x}.

  1. Multiply:
Ax^=[4(0.09)+0.640.09+3(0.64)]=[1.002.01].A\hat{x}=\begin{bmatrix}4(0.09)+0.64\\0.09+3(0.64)\end{bmatrix} =\begin{bmatrix}1.00\\2.01\end{bmatrix}.
  1. Residual:
r=[12][1.002.01]=[00.01].r=\begin{bmatrix}1\\2\end{bmatrix}-\begin{bmatrix}1.00\\2.01\end{bmatrix} =\begin{bmatrix}0\\-0.01\end{bmatrix}.
  1. Solve Ad=rAd=r. Since A1=111[3114]A^{-1}=\frac1{11}\begin{bmatrix}3&-1\\-1&4\end{bmatrix},
d=111[0.010.04]=[0.0009090.003636].d=\frac1{11}\begin{bmatrix}0.01\\-0.04\end{bmatrix} =\begin{bmatrix}0.000909\\-0.003636\end{bmatrix}.
  1. Correct:
x^+d=[0.0909090.636364].\hat{x}+d=\begin{bmatrix}0.090909\\0.636364\end{bmatrix}.

Checked answer. One correction recovers the exact solution to the displayed precision.

Code

import numpy as np

def conjugate_gradient(A, b, x0=None, tol=1e-10, max_iter=None):
A = np.asarray(A, dtype=float)
b = np.asarray(b, dtype=float)
n = len(b)
x = np.zeros(n) if x0 is None else np.asarray(x0, dtype=float)
max_iter = n if max_iter is None else max_iter
r = b - A @ x
p = r.copy()
rs_old = float(r @ r)
for k in range(1, max_iter + 1):
Ap = A @ p
alpha = rs_old / float(p @ Ap)
x = x + alpha * p
r = r - alpha * Ap
rs_new = float(r @ r)
if rs_new**0.5 < tol:
return x, k
p = r + (rs_new / rs_old) * p
rs_old = rs_new
return x, max_iter

def iterative_refinement(A, b, x, steps=1):
A = np.asarray(A, dtype=float)
b = np.asarray(b, dtype=float)
x = np.asarray(x, dtype=float).copy()
for _ in range(steps):
r = b - A @ x
d = np.linalg.solve(A, r)
x = x + d
return x

A = np.array([[4.0, 1.0], [1.0, 3.0]])
b = np.array([1.0, 2.0])
print(conjugate_gradient(A, b, max_iter=2))
print(iterative_refinement(A, b, np.array([0.09, 0.64])))

Common pitfalls

  • Using CG on a nonsymmetric or indefinite matrix. The standard method requires SPD structure.
  • Monitoring only iterate changes instead of residual norms.
  • Expecting the exact nn-step termination property in floating-point arithmetic.
  • Ignoring preconditioning for ill-conditioned SPD systems.
  • Performing refinement with a residual computed at the same low precision that caused the original error.

Connections