Skip to main content

ZOO

ZOO, short for Zeroth Order Optimization, is a score-based black-box attack that estimates gradients from model outputs instead of backpropagation. It was designed to attack deep neural networks without training a substitute model, using only queries to obtain confidence scores or logits.

The attack is a direct black-box analogue of optimization-based white-box attacks such as C&W. If the attacker can evaluate a loss value at nearby inputs, finite differences can approximate the input gradient coordinate by coordinate or through stochastic variants.

Threat model

ZOO assumes score-query black-box access. The attacker cannot inspect weights or compute exact gradients, but can submit inputs and receive enough output information to define an attack loss:

J(x)=attack_loss(f(x),y,t).J(x)=\mathrm{attack\_loss}(f(x),y,t).

The attack can be targeted or untargeted and usually searches for low-distortion adversarial examples under valid input bounds:

x[0,1]d.x'\in[0,1]^d.

Because each loss evaluation costs a target query, the query budget is central. ZOO is stronger than transfer-only attacks in knowledge of the target outputs, but weaker than white-box attacks in direct gradient access.

Method

The core finite-difference estimate for coordinate ii is:

JxiJ(x+hei)J(xhei)2h.\frac{\partial J}{\partial x_i} \approx \frac{J(x+he_i)-J(x-he_i)}{2h}.

This uses two queries per coordinate. For an image with dimension dd, one full central-difference gradient costs 2d2d queries. ZOO reduces cost using techniques such as coordinate sampling, dimension reduction, hierarchical attack strategies, and optimization details inspired by C&W.

A simplified black-box gradient descent step for a targeted objective is:

xt+1=Π[0,1]d(xtα^J(xt)),x^{t+1}=\Pi_{[0,1]^d}(x^t-\alpha \hat{\nabla}J(x^t)),

where ^J\hat{\nabla}J is a finite-difference estimate. For an untargeted objective, the sign of the step changes depending on whether JJ is defined as a loss to maximize or a success penalty to minimize.

The attack is zeroth order because it uses function values, not derivatives. It is still optimization-based: the loss, confidence margin, distortion penalty, step schedule, and coordinate choices matter.

Visual

MethodTarget feedbackGradient sourceQuery cost issue
C&WWhite-box logitsBackpropagationOptimizer iterations
ZOOBlack-box scores/logitsFinite differencesQueries scale with dimension
NES/SPSABlack-box scalar lossRandom directionsNoisy estimates
Boundary AttackLabels onlyNo gradient estimateMany accept/reject queries

Worked example 1: Coordinate query count

Problem: A 28×2828\times28 grayscale image has 784784 input coordinates. ZOO uses central finite differences for every coordinate once. How many queries are needed for one full gradient estimate?

  1. Dimension:
d=2828=784.d=28\cdot28=784.
  1. Central differences use two queries per coordinate:
x+hei,xhei.x+he_i,\qquad x-he_i.
  1. Total queries:
Q=2d=2(784)=1568.Q=2d=2(784)=1568.

Checked answer: one full coordinate gradient costs 1,5681{,}568 model queries, before any optimizer iterations or success checks.

Worked example 2: Finite-difference derivative

Problem: Suppose a black-box loss returns:

J(x+hei)=1.24,J(xhei)=1.16,J(x+he_i)=1.24,\qquad J(x-he_i)=1.16,

with h=0.01h=0.01. Estimate J/xi\partial J/\partial x_i.

  1. Difference in loss values:
1.241.16=0.08.1.24-1.16=0.08.
  1. Denominator:
2h=0.02.2h=0.02.
  1. Estimate:
0.080.02=4.\frac{0.08}{0.02}=4.

Checked answer: the finite-difference estimate for coordinate ii is 44. If the attack maximizes JJ, it should increase this coordinate; if it minimizes JJ, it should decrease it.

Implementation

import torch
import torch.nn.functional as F

@torch.no_grad()
def coordinate_fd_grad(model, x, y, coords, h=1e-3):
grad = torch.zeros_like(x)
flat_grad = grad.view(grad.size(0), -1)
flat_x = x.view(x.size(0), -1)

for idx in coords:
xp = flat_x.clone()
xm = flat_x.clone()
xp[:, idx] = (xp[:, idx] + h).clamp(0, 1)
xm[:, idx] = (xm[:, idx] - h).clamp(0, 1)
loss_p = F.cross_entropy(model(xp.view_as(x)), y, reduction="none")
loss_m = F.cross_entropy(model(xm.view_as(x)), y, reduction="none")
flat_grad[:, idx] = (loss_p - loss_m) / (2 * h)

return grad

The snippet assumes the API returns logits so cross-entropy can be computed. If the API returns rounded probabilities or top-kk scores, the loss and estimator need to match the actual feedback.

Original paper results

Chen et al. introduced ZOO at AISec 2017 and attacked deep neural networks without training substitute models. The paper showed that zeroth-order optimization could produce adversarial examples on standard image datasets and compete with white-box optimization attacks in settings where sufficient score queries were available.

The conservative headline is that score access can substitute for gradients at high query cost. ZOO helped make query accounting a first-class part of black-box robustness evaluation.

Connections

Common pitfalls / when this attack is used today

  • Ignoring query counts and reporting only final success.
  • Assuming the API exposes logits when it exposes only labels or rounded scores.
  • Estimating all coordinates of a high-resolution image without considering cost.
  • Using a finite-difference step hh too small for numerical precision or score rounding.
  • Calling a surrogate-gradient attack "ZOO" when no target scores are queried.
  • Using ZOO today as a reference point for score-based black-box optimization and query-cost intuition.

ZOO is most informative when the score interface is realistic. Some APIs return logits, some return calibrated probabilities, some return rounded percentages, some return only the top few classes, and some add noise or rate limits. A finite-difference estimate that works with full-precision logits may become unstable with rounded probabilities. If a paper uses ZOO, it should describe exactly what scalar objective is computed from the returned output.

The finite-difference step hh is a numerical hyperparameter. If hh is too large, the estimate no longer approximates the local derivative. If hh is too small, floating-point precision, input quantization, compression, or output rounding can dominate the difference. In image systems with 8-bit inputs, a perturbation smaller than one pixel level may be meaningless after quantization. This is one reason query attacks often use carefully tuned estimators rather than textbook finite differences.

Coordinate selection is the main scalability challenge. A full gradient for a 224×224×3224\times224\times3 image would require more than 300,000 central-difference queries. ZOO-style attacks therefore use coordinate sampling, importance heuristics, resizing to lower-dimensional spaces, or hierarchical refinement. Those choices define the actual attack. A report that says "we used ZOO" but omits the coordinate policy leaves out much of the method.

Score-based access is stronger than label-only access. If a defense hides probabilities, ZOO may no longer apply, but that does not prove robustness. It changes the attacker to a decision-based threat model. Conversely, if a commercial API exposes confidence scores, a score-based attack may be much more practical than transfer-only attacks. The defense claim should match the deployed interface, not a convenient abstraction.

In modern benchmarks, ZOO is less common than Square Attack or NES/SPSA-style estimators for large images, but it remains a foundational black-box method. It teaches the central tradeoff: gradients are not magical privileged information; they can be approximated from queries, but the price is query complexity.

A compact ZOO reporting checklist is:

FieldWhat to write down
Output accessLogits, probabilities, rounded scores, or top-kk scores
ObjectiveExact scalar loss computed from scores
EstimatorCentral difference, coordinate sampling, or another zeroth-order rule
Query budgetPer example limit and total queries consumed
Dimension strategyFull resolution, resized variables, hierarchical refinement, or coordinate subset
DistortionNorm, box constraint, and success-distortion tradeoff

For reproduction, query accounting should include every model call used for gradient estimates, line searches, confidence checks, and early stopping. It is easy to undercount by reporting optimizer iterations rather than actual API calls. If one iteration estimates 100 coordinates with central differences, that is at least 200 score queries before any extra checks.

ZOO is also a useful way to reason about defenses that rely on hiding gradients. If the defense leaves a smooth score surface exposed, finite differences can often recover enough directional information. If the defense rounds or hides scores, the threat model changes, but a determined attacker may switch to decision-based search. Robustness should therefore be stated relative to the available output channel.

A final interpretation point is that ZOO turns model evaluation into an API-security problem. The same classifier can be easier or harder to attack depending on whether it returns logits, calibrated scores, rounded probabilities, or only labels. That does not change the mathematical decision boundary, but it changes the practical cost of discovering it. Query limits, monitoring, and output restrictions are therefore part of the operational threat model.

For readers comparing ZOO with transfer attacks, the key difference is feedback. Transfer attacks spend effort before querying the target; ZOO spends queries to learn target-specific directions. If target queries are cheap and scores are rich, ZOO-style attacks can be powerful. If target queries are expensive or labels only, other black-box families become more relevant.

The attack is also a reminder that API design affects measurable robustness. Returning fewer decimals or fewer classes may raise attack cost, but it is not a substitute for model robustness. Treat output restriction as one layer of risk reduction, not as a proof that nearby adversarial examples are absent.

If a paper evaluates both ZOO and white-box C&W, the comparison should make the query-gradient tradeoff explicit. Similar distortion with far more queries means the black-box interface is costly but still exploitable. Much worse distortion may reflect query limits rather than a genuinely safer model.

Further reading

  • Chen et al., "ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models."
  • Carlini and Wagner, "Towards Evaluating the Robustness of Neural Networks."
  • Andriushchenko et al., "Square Attack."