Skip to main content

One Pixel Attack

The One Pixel Attack asks how much damage a classifier can suffer when the attacker changes only one pixel. Instead of relying on gradients, it uses differential evolution, a population-based black-box optimizer, to search over pixel location and color values.

The attack is not the strongest robustness evaluation for modern models, but it is conceptually useful. It exposes a different axis of vulnerability: sparse, low-dimensional changes. It also shows how evolutionary search can attack models when gradients are unavailable or inconvenient.

Threat model

The attack is a black-box evasion attack with a strict sparsity constraint:

δ01.\|\delta\|_0\le 1.

For an RGB image, a candidate perturbation usually includes the pixel coordinates and replacement channel values:

(u,v,r,g,b).(u,v,r,g,b).

The attacker queries model scores or probabilities to evaluate candidate fitness. The attack can be targeted, trying to maximize a chosen target probability, or untargeted, trying to reduce the true-class probability or cause any misclassification. The primary budget is the number of model queries consumed by the population search.

Method

Differential evolution maintains a population of candidate pixel edits. For each candidate vector aa, it creates a mutant from other population members, crosses it with the current candidate, evaluates the resulting image, and keeps whichever candidate has better fitness.

A targeted fitness can be:

F(a)=pt(xa),F(a)=p_t(x_a),

where xax_a is the image after applying the pixel edit and ptp_t is the target-class probability. An untargeted fitness can be:

F(a)=py(xa),F(a)=-p_y(x_a),

or a logit margin that encourages any class other than yy to win.

For one pixel on a W×HW\times H RGB image:

a=(u,v,c1,c2,c3),a=(u,v,c_1,c_2,c_3),

with integer coordinates and bounded color channels. The search space is low-dimensional but nonconvex and discrete in coordinates, which makes evolutionary search a natural fit.

Visual

ConstraintExample attackSearch variableTypical message
Dense \ell_\inftyFGSM/PGDAll pixels, small changesMany tiny changes add up
Sparse 1\ell_1EADFewer larger changesMetric choice matters
One pixel 0\ell_0One Pixel AttackOne coordinate plus colorVery sparse changes can fool some models
Patch areaAdversarial PatchLocal visible regionPhysical visibility can be acceptable

Worked example 1: Candidate dimensionality

Problem: A one-pixel attack is run on a 32×3232\times32 RGB image. A candidate encodes integer row, integer column, and three channel values. What is the dimension of one candidate vector? How many possible pixel locations exist?

  1. Candidate fields:
(row,column,R,G,B).(\text{row},\text{column},R,G,B).
  1. Number of fields:
5.5.
  1. Pixel locations:
3232=1024.32\cdot32=1024.

Checked answer: the search vector is 5-dimensional, with 1,024 possible spatial locations before considering color values.

Worked example 2: Targeted success calculation

Problem: A targeted one-pixel attack tries target class "frog." Before the edit, class probabilities are:

pfrog=0.04,pcat=0.80.p_{\mathrm{frog}}=0.04,\qquad p_{\mathrm{cat}}=0.80.

After an edit, probabilities are:

pfrog=0.51,pcat=0.20.p_{\mathrm{frog}}=0.51,\qquad p_{\mathrm{cat}}=0.20.

Assuming "frog" is now the top class, is the targeted attack successful?

  1. Targeted success requires:
argmaxkpk(x)=frog.\arg\max_k p_k(x')=\mathrm{frog}.
  1. The target probability after the edit is:
0.51.0.51.
  1. The original class probability after the edit is:
0.20.0.20.
  1. If all other classes are below 0.510.51, the target is top-1.

Checked answer: yes, under the stated assumption the one-pixel edit is a targeted success. The confidence assigned to the target is 51%51\%.

Implementation

import torch

@torch.no_grad()
def apply_one_pixel(x, candidate):
x_edit = x.clone()
row, col, r, g, b = candidate
row = int(row)
col = int(col)
x_edit[:, 0, row, col] = float(r)
x_edit[:, 1, row, col] = float(g)
x_edit[:, 2, row, col] = float(b)
return x_edit.clamp(0.0, 1.0)

@torch.no_grad()
def targeted_fitness(model, x, candidate, target):
x_edit = apply_one_pixel(x, candidate)
probs = model(x_edit).softmax(dim=1)
return probs[0, target].item()

A real differential evolution loop maintains a population, performs mutation and crossover, clips candidate coordinates and colors, and counts every model evaluation as a query.

Original paper results

Su, Vargas, and Sakurai reported that changing one pixel could fool a substantial fraction of images under their black-box differential-evolution setup. The abstract reports targeted perturbability for 67.97%67.97\% of natural images in the Kaggle CIFAR-10 test dataset and 16.04%16.04\% of ImageNet test images, with average confidences of 74.03%74.03\% and 22.91%22.91\% respectively for at least one target class.

These numbers should be read in the original setting: target models, datasets, score access, and evolutionary search budget matter. The result is a sparse-attack stress test, not a universal claim about all modern robust models.

Connections

Common pitfalls / when this attack is used today

  • Comparing one-pixel success to \ell_\infty robustness without noting the different threat sets.
  • Omitting query counts for differential evolution.
  • Assuming score access when the real API exposes only labels.
  • Treating target confidence as success when another class is still top-1.
  • Ignoring preprocessing; one pixel before resizing may affect several pixels after resizing.
  • Using the attack today as a sparse black-box demonstration and robustness sanity check, not as a complete benchmark.

The one-pixel constraint is easy to state but subtle in a real pipeline. If the model resizes, crops, normalizes, or compresses the image before classification, one input pixel may be blurred into multiple model pixels or removed entirely. A faithful threat model should specify where the one-pixel edit is applied: before resizing, after resizing, in raw sensor space, or in the model's tensor input. Otherwise two experiments with the same phrase can mean different attacks.

Differential evolution is stochastic. Success rates depend on population size, generations, mutation factor, crossover probability, target selection, and query budget. A single lucky or unlucky run is not enough. Reports should average over images and random seeds, include failures, and state whether the attack is targeted to every possible class or only to at least one target class.

The attack also highlights the difference between confidence and correctness. A candidate that raises the target probability from 1%1\% to 30%30\% may look impressive, but it is not a targeted success unless the target becomes the predicted class. Conversely, an untargeted attack may succeed with low confidence if it barely crosses the boundary. The success condition must be separate from the confidence statistic.

As a robustness benchmark, One Pixel Attack is intentionally narrow. A defense against one-pixel edits may still fail against two pixels, a patch, or an \ell_\infty perturbation. A model vulnerable to one-pixel edits may still perform well under conventional PGD budgets. The point is not to replace standard attacks; it is to explore sparse vulnerability and black-box search in an extreme low-dimensional setting.

Modern variants often change more than one pixel, use evolutionary strategies over patches, or combine sparse constraints with saliency maps. The original attack remains a useful teaching example because students can see the whole candidate vector and query loop. It makes black-box optimization concrete: propose an edit, query the model, keep better candidates, and count the cost.

A compact One Pixel Attack reporting checklist is:

FieldWhat to write down
Pixel spaceRaw image, resized image, normalized tensor, or sensor coordinates
ChannelsGrayscale, RGB, or another feature layout
GoalTargeted to all classes, one target, or untargeted
OptimizerPopulation size, generations, mutation, crossover, and seeds
QueriesAverage and maximum model evaluations per image
SuccessTop-1 success and confidence, with failed targets included

For reproduction, distinguish "at least one target class succeeds" from "all target classes succeed." The former is much easier: the optimizer only needs to find one class region reachable by a one-pixel edit. The latter is a much stronger targeted claim. The original paper reports target-class perturbability in a specific experimental setup, so later comparisons should match that target protocol before comparing percentages.

The attack is also useful for explaining why 0\ell_0 and \ell_\infty measure different things. A one-pixel edit can have a large \ell_\infty value but tiny 0\ell_0 count. A dense FGSM perturbation can have small \ell_\infty value but change every pixel. Neither is "smaller" without naming the metric and application.

A final interpretation point is that sparse attacks can look surprising because humans often focus on global visual similarity. A single pixel may be invisible in a large image, but it can still alter early convolutional features or interact with preprocessing. That does not mean one pixel is always a realistic threat; it means the model's sensitivity should be measured under multiple perturbation geometries.

For reproduction, beware of image display. A changed pixel may be hard to see after browser scaling or antialiasing, and screenshots may hide the actual tensor value. If examples are shown visually, include coordinates, channel values, and the model-input resolution. The scientific object is the tensor edit, not the rendered thumbnail.

Further reading

  • Su, Vargas, and Sakurai, "One Pixel Attack for Fooling Deep Neural Networks."
  • Chen et al., "EAD."
  • Papernot et al., "The Limitations of Deep Learning in Adversarial Settings."