Skip to main content

Certified Defenses and Randomized Smoothing

Empirical defenses try strong attacks and report whether the model survives. Certified defenses make a different kind of claim: within a specified perturbation set, no adversarial example exists for a particular input. This distinction is essential. A failed PGD attack is evidence; a valid certificate is a proof under assumptions.

The field contains many certification approaches, including interval bound propagation, linear relaxations such as CROWN-style methods, convex relaxations, exact verification for small networks, and randomized smoothing. Randomized smoothing is especially important because it scales to large classifiers and gives clean 2\ell_2 certificates by wrapping a base classifier with Gaussian noise.

Definitions

Let h(x)h(x) be a classifier. A pointwise robustness certificate at (x,y)(x,y) with radius rr under norm pp proves:

x such that xxpr,h(x)=y.\forall x' \text{ such that } \|x'-x\|_p \le r,\quad h(x') = y.

A certificate is sound if every certified point is truly robust under the stated assumptions. It may be conservative: some robust points may not be certified.

A certified accuracy curve reports the fraction of test examples that are both correctly classified and certified at radius rr:

CertAcc(r)=1ni=1n1[h(xi)=yi and certificate radius at xir].\mathrm{CertAcc}(r) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}\left[h(x_i)=y_i \ \text{and certificate radius at } x_i \ge r\right].

Randomized smoothing builds a smoothed classifier gg from a base classifier ff:

g(x)=argmaxcPrηN(0,σ2I)(f(x+η)=c).g(x) = \arg\max_c \Pr_{\eta \sim \mathcal{N}(0,\sigma^2 I)}(f(x+\eta)=c).

If class AA has sufficiently higher probability than every other class under Gaussian noise, then gg is certifiably robust in 2\ell_2. A common form of the radius is:

R=σ2(Φ1(pA)Φ1(pB)),R = \frac{\sigma}{2} \left( \Phi^{-1}(p_A) - \Phi^{-1}(p_B) \right),

where pAp_A is a lower confidence bound on the top-class probability, pBp_B is an upper confidence bound on the runner-up probability, and Φ1\Phi^{-1} is the inverse standard normal CDF.

Interval Bound Propagation (IBP) propagates lower and upper activation bounds through the network. Linear bound propagation methods propagate affine upper and lower relaxations. Convex relaxation methods replace the nonconvex neural network verification problem with a tractable relaxation that upper-bounds worst-case loss or lower-bounds margins.

Key results

Certification is threat-model-specific. A certificate for 2\ell_2 radius 0.50.5 does not imply robustness to \ell_\infty radius 8/2558/255, patches, rotations, corruptions, or semantic edits. The certificate is only as broad as the perturbation set and assumptions used by the verifier.

For classifiers, many verifiers reason about margins. Let zk(x)z_k(x) be the logit for class kk. To certify class yy, it is enough to prove:

zy(x)zj(x)>0for all jyand all xBp(x,ϵ).z_y(x') - z_j(x') > 0 \quad \text{for all } j \ne y \quad \text{and all } x' \in B_p(x,\epsilon).

Equivalently, prove a lower bound on every margin:

my,jminxBp(x,ϵ)(zy(x)zj(x)),\underline{m}_{y,j} \le \min_{x' \in B_p(x,\epsilon)} \left(z_y(x') - z_j(x')\right),

and show my,j>0\underline{m}_{y,j} \gt 0 for all jyj \ne y. Bound-propagation and relaxation methods differ in how tightly and efficiently they compute these lower bounds.

Randomized smoothing gives a probabilistic certificate with statistical confidence. Since pAp_A and pBp_B are estimated by sampling noisy inputs, the implementation must report sample counts, confidence level, abstention rule, noise level σ\sigma, and base classifier training procedure. If the top class is not sufficiently likely, the smoothed classifier abstains rather than certifying.

Certified training often optimizes a surrogate upper bound on worst-case loss. For IBP-style training, the model is trained so interval bounds remain tight enough to prove positive margins. A common practical pattern is to warm up the perturbation radius or the weight on certified loss, because training from the full robust objective can be unstable.

The central tradeoff is tightness versus scalability. Exact verification is tight but limited to small networks or small inputs. Loose bounds scale but certify smaller radii. Randomized smoothing scales well and gives strong 2\ell_2 certificates, but it requires many noisy samples at inference and is naturally tied to Gaussian noise and 2\ell_2 geometry.

Certification also has a workflow distinction between prediction and certification. A smoothed classifier may use a modest number of samples to predict a label, then many more samples to certify that label with a chosen confidence level. A bound-propagation verifier may first run the ordinary network for a prediction, then run a separate bound computation to prove all competing logits stay below the predicted logit. These extra steps affect latency and should be included in the system claim. A certificate that is too slow for deployment may still be scientifically useful, but it is not the same as a cheap runtime defense.

Certificates can be abstaining by design. If the verifier cannot prove the margin or the smoothing vote is not decisive, the honest result is "not certified," not "not robust." This conservatism is why certified accuracy is usually lower than empirical robust accuracy.

Visual

MethodCertificate norm or setMain object boundedStrengthLimitation
Randomized smoothingUsually 2\ell_2Class probabilities under Gaussian noiseScales to large modelsSampling cost, abstentions, 2\ell_2 focus
IBPCommonly \ell_\inftyActivation intervals and logitsFast and trainableBounds can be loose
CROWN-style boundsp\ell_p variants depending methodLinear relaxations of networkTighter than simple intervalsMore complex and costly
Convex relaxationsNorm-bounded setsRelaxed worst-case marginsSound and often tighterScalability limits
Exact verificationSpecified finite network/input settingExact satisfiability or optimizationStrongest proofUsually small-scale

Worked example 1: Certified accuracy from radii

Problem: A test set has five examples. Their certified radii are:

0.10,0.25,0.00,0.40,0.18.0.10,\quad 0.25,\quad 0.00,\quad 0.40,\quad 0.18.

The third example is misclassified, so its radius is reported as 00. Compute certified accuracy at radius r=0.20r=0.20.

  1. Certified accuracy at r=0.20r=0.20 counts examples with radius at least 0.200.20.

  2. Check each radius:

0.10<0.20not counted0.10 < 0.20 \quad \text{not counted} 0.250.20counted0.25 \ge 0.20 \quad \text{counted} 0.00<0.20not counted0.00 < 0.20 \quad \text{not counted} 0.400.20counted0.40 \ge 0.20 \quad \text{counted} 0.18<0.20not counted0.18 < 0.20 \quad \text{not counted}
  1. Count:
certified examples=2.\text{certified examples}=2.
  1. Divide by the test size:
CertAcc(0.20)=25=0.40.\mathrm{CertAcc}(0.20)=\frac{2}{5}=0.40.

Checked answer: the certified accuracy at radius 0.200.20 is 40%40\%.

Worked example 2: Randomized smoothing radius

Problem: A smoothed classifier uses σ=0.50\sigma=0.50. Suppose the top class has lower confidence bound pA=0.90p_A=0.90 and the runner-up has upper confidence bound pB=0.05p_B=0.05. Use approximate values Φ1(0.90)=1.282\Phi^{-1}(0.90)=1.282 and Φ1(0.05)=1.645\Phi^{-1}(0.05)=-1.645. Compute the certified 2\ell_2 radius.

  1. Use the smoothing radius formula:
R=σ2(Φ1(pA)Φ1(pB)).R=\frac{\sigma}{2}(\Phi^{-1}(p_A)-\Phi^{-1}(p_B)).
  1. Substitute:
R=0.502(1.282(1.645)).R=\frac{0.50}{2}(1.282-(-1.645)).
  1. Compute the probability-margin term:
1.282+1.645=2.927.1.282+1.645=2.927.
  1. Compute σ/2\sigma/2:
0.502=0.25.\frac{0.50}{2}=0.25.
  1. Multiply:
R=0.25(2.927)=0.73175.R=0.25(2.927)=0.73175.

Checked answer: the certified 2\ell_2 radius is approximately 0.7320.732. If the statistical confidence bounds were weaker, the radius would shrink even with the same observed top class.

Code

import math
import torch

def certify_from_counts(count_top, count_runner_up, n, sigma, normal_icdf):
# Simplified plug-in sketch. Production smoothing uses confidence bounds,
# abstention rules, and careful binomial intervals.
p_a = count_top / n
p_b = count_runner_up / n
if p_a <= p_b:
return 0.0
radius = 0.5 * sigma * (normal_icdf(torch.tensor(p_a)) - normal_icdf(torch.tensor(p_b)))
return float(torch.clamp(radius, min=0.0))

@torch.no_grad()
def smoothed_predict(model, x, sigma=0.25, samples=128):
counts = None
for _ in range(samples):
logits = model((x + sigma * torch.randn_like(x)).clamp(0.0, 1.0))
preds = logits.argmax(dim=1)
if counts is None:
counts = torch.zeros(x.shape[0], logits.shape[1], device=x.device)
counts.scatter_add_(1, preds[:, None], torch.ones_like(preds[:, None], dtype=counts.dtype))
return counts.argmax(dim=1), counts

The code illustrates noisy voting, not a production certificate. Correct randomized-smoothing certification requires confidence intervals for pAp_A and pBp_B, a fixed confidence level, and an abstain decision when the top class is not statistically dominant.

Common pitfalls

  • Calling a defense certified when it has only been tested against attacks.
  • Reporting certified accuracy without the radius, norm, confidence level, verifier, and abstention policy.
  • Comparing empirical robust accuracy with certified accuracy as if they measure the same thing.
  • Forgetting that randomized smoothing certificates are usually 2\ell_2 certificates, not universal robustness claims.
  • Using plug-in class probabilities without statistical confidence bounds.
  • Certifying the wrong model, for example the base classifier instead of the smoothed classifier or a preprocessing-free version.
  • Ignoring the clean-accuracy cost and inference-time sampling cost of certification.

Connections

Further reading

  • Cohen, Rosenfeld, and Kolter, "Certified Adversarial Robustness via Randomized Smoothing."
  • Gowal et al., work on interval bound propagation for certified robustness.
  • Zhang et al., CROWN and related linear bound propagation methods.
  • Wong and Kolter, work on provable defenses via convex outer adversarial polytopes.
  • Salman et al., work combining adversarial training and randomized smoothing.