Certified Defenses and Randomized Smoothing
Empirical defenses try strong attacks and report whether the model survives. Certified defenses make a different kind of claim: within a specified perturbation set, no adversarial example exists for a particular input. This distinction is essential. A failed PGD attack is evidence; a valid certificate is a proof under assumptions.
The field contains many certification approaches, including interval bound propagation, linear relaxations such as CROWN-style methods, convex relaxations, exact verification for small networks, and randomized smoothing. Randomized smoothing is especially important because it scales to large classifiers and gives clean certificates by wrapping a base classifier with Gaussian noise.
Definitions
Let be a classifier. A pointwise robustness certificate at with radius under norm proves:
A certificate is sound if every certified point is truly robust under the stated assumptions. It may be conservative: some robust points may not be certified.
A certified accuracy curve reports the fraction of test examples that are both correctly classified and certified at radius :
Randomized smoothing builds a smoothed classifier from a base classifier :
If class has sufficiently higher probability than every other class under Gaussian noise, then is certifiably robust in . A common form of the radius is:
where is a lower confidence bound on the top-class probability, is an upper confidence bound on the runner-up probability, and is the inverse standard normal CDF.
Interval Bound Propagation (IBP) propagates lower and upper activation bounds through the network. Linear bound propagation methods propagate affine upper and lower relaxations. Convex relaxation methods replace the nonconvex neural network verification problem with a tractable relaxation that upper-bounds worst-case loss or lower-bounds margins.
Key results
Certification is threat-model-specific. A certificate for radius does not imply robustness to radius , patches, rotations, corruptions, or semantic edits. The certificate is only as broad as the perturbation set and assumptions used by the verifier.
For classifiers, many verifiers reason about margins. Let be the logit for class . To certify class , it is enough to prove:
Equivalently, prove a lower bound on every margin:
and show for all . Bound-propagation and relaxation methods differ in how tightly and efficiently they compute these lower bounds.
Randomized smoothing gives a probabilistic certificate with statistical confidence. Since and are estimated by sampling noisy inputs, the implementation must report sample counts, confidence level, abstention rule, noise level , and base classifier training procedure. If the top class is not sufficiently likely, the smoothed classifier abstains rather than certifying.
Certified training often optimizes a surrogate upper bound on worst-case loss. For IBP-style training, the model is trained so interval bounds remain tight enough to prove positive margins. A common practical pattern is to warm up the perturbation radius or the weight on certified loss, because training from the full robust objective can be unstable.
The central tradeoff is tightness versus scalability. Exact verification is tight but limited to small networks or small inputs. Loose bounds scale but certify smaller radii. Randomized smoothing scales well and gives strong certificates, but it requires many noisy samples at inference and is naturally tied to Gaussian noise and geometry.
Certification also has a workflow distinction between prediction and certification. A smoothed classifier may use a modest number of samples to predict a label, then many more samples to certify that label with a chosen confidence level. A bound-propagation verifier may first run the ordinary network for a prediction, then run a separate bound computation to prove all competing logits stay below the predicted logit. These extra steps affect latency and should be included in the system claim. A certificate that is too slow for deployment may still be scientifically useful, but it is not the same as a cheap runtime defense.
Certificates can be abstaining by design. If the verifier cannot prove the margin or the smoothing vote is not decisive, the honest result is "not certified," not "not robust." This conservatism is why certified accuracy is usually lower than empirical robust accuracy.
Visual
| Method | Certificate norm or set | Main object bounded | Strength | Limitation |
|---|---|---|---|---|
| Randomized smoothing | Usually | Class probabilities under Gaussian noise | Scales to large models | Sampling cost, abstentions, focus |
| IBP | Commonly | Activation intervals and logits | Fast and trainable | Bounds can be loose |
| CROWN-style bounds | variants depending method | Linear relaxations of network | Tighter than simple intervals | More complex and costly |
| Convex relaxations | Norm-bounded sets | Relaxed worst-case margins | Sound and often tighter | Scalability limits |
| Exact verification | Specified finite network/input setting | Exact satisfiability or optimization | Strongest proof | Usually small-scale |
Worked example 1: Certified accuracy from radii
Problem: A test set has five examples. Their certified radii are:
The third example is misclassified, so its radius is reported as . Compute certified accuracy at radius .
-
Certified accuracy at counts examples with radius at least .
-
Check each radius:
- Count:
- Divide by the test size:
Checked answer: the certified accuracy at radius is .
Worked example 2: Randomized smoothing radius
Problem: A smoothed classifier uses . Suppose the top class has lower confidence bound and the runner-up has upper confidence bound . Use approximate values and . Compute the certified radius.
- Use the smoothing radius formula:
- Substitute:
- Compute the probability-margin term:
- Compute :
- Multiply:
Checked answer: the certified radius is approximately . If the statistical confidence bounds were weaker, the radius would shrink even with the same observed top class.
Code
import math
import torch
def certify_from_counts(count_top, count_runner_up, n, sigma, normal_icdf):
# Simplified plug-in sketch. Production smoothing uses confidence bounds,
# abstention rules, and careful binomial intervals.
p_a = count_top / n
p_b = count_runner_up / n
if p_a <= p_b:
return 0.0
radius = 0.5 * sigma * (normal_icdf(torch.tensor(p_a)) - normal_icdf(torch.tensor(p_b)))
return float(torch.clamp(radius, min=0.0))
@torch.no_grad()
def smoothed_predict(model, x, sigma=0.25, samples=128):
counts = None
for _ in range(samples):
logits = model((x + sigma * torch.randn_like(x)).clamp(0.0, 1.0))
preds = logits.argmax(dim=1)
if counts is None:
counts = torch.zeros(x.shape[0], logits.shape[1], device=x.device)
counts.scatter_add_(1, preds[:, None], torch.ones_like(preds[:, None], dtype=counts.dtype))
return counts.argmax(dim=1), counts
The code illustrates noisy voting, not a production certificate. Correct randomized-smoothing certification requires confidence intervals for and , a fixed confidence level, and an abstain decision when the top class is not statistically dominant.
Common pitfalls
- Calling a defense certified when it has only been tested against attacks.
- Reporting certified accuracy without the radius, norm, confidence level, verifier, and abstention policy.
- Comparing empirical robust accuracy with certified accuracy as if they measure the same thing.
- Forgetting that randomized smoothing certificates are usually certificates, not universal robustness claims.
- Using plug-in class probabilities without statistical confidence bounds.
- Certifying the wrong model, for example the base classifier instead of the smoothed classifier or a preprocessing-free version.
- Ignoring the clean-accuracy cost and inference-time sampling cost of certification.
Connections
- Mathematical formulation defines pointwise robustness and robust risk.
- Adversarial training gives empirical robustness methods that can be combined with certified training.
- Gradient masking and obfuscation contrasts certificates with attacks that merely fail.
- Evaluation and benchmarks explains certified accuracy curves and benchmark reporting.
- Robustness-accuracy tradeoff discusses the cost of robustness objectives.
Further reading
- Cohen, Rosenfeld, and Kolter, "Certified Adversarial Robustness via Randomized Smoothing."
- Gowal et al., work on interval bound propagation for certified robustness.
- Zhang et al., CROWN and related linear bound propagation methods.
- Wong and Kolter, work on provable defenses via convex outer adversarial polytopes.
- Salman et al., work combining adversarial training and randomized smoothing.