Quantum Machine Learning
Quantum machine learning studies whether quantum circuits can improve learning, inference, optimization, or data analysis. The honest view is mixed: quantum kernels, parametrized circuits, QAOA-style optimization, and fault-tolerant quantum linear-algebra subroutines are mathematically rich, but broad practical advantage over strong classical machine learning and deep learning baselines is not established.
Nielsen and Chuang do not cover QML as a separate topic. This page keeps the wiki's modern QML treatment and uses N&C-style notation from Chapters 2, 8, 11, and 12: density operators , channels , POVMs, trace distance, fidelity, von Neumann entropy, and quantum information-processing resource accounting.
Definitions
A parametrized quantum circuit is a unitary family depending on input data and trainable parameters . A common supervised model prepares
and predicts from an expectation value
for some observable . In N&C density-operator notation, this becomes
With noise, the prediction is better written as
where is a quantum operation. This notation matters: many QML claims change substantially when the ideal pure state is replaced by the noisy state actually measured on hardware.
A variational quantum classifier combines a feature map , a trainable ansatz , measurement, and a classical loss. Training is hybrid: a quantum device estimates expectation values, while a classical optimizer updates .
A quantum kernel maps data to quantum states and defines
Equivalently, using density operators,
for pure feature states. A classical kernel method such as an SVM or kernel ridge regressor can then use the estimated kernel matrix.
QAOA, the quantum approximate optimization algorithm, alternates cost and mixer unitaries. For a combinatorial objective encoded as Hamiltonian and a mixer , a depth- QAOA state is
QAOA is not machine learning by itself, but it sits near QML because it uses parametrized circuits, measurement estimates, and classical optimization.
A POVM measurement is a collection of positive operators with . In classification, one can interpret
as the model's predicted class probabilities.
A barren plateau is a training regime where gradients concentrate near zero, often exponentially in the number of qubits for global costs or sufficiently random deep ansatzes:
in common idealized settings.
The von Neumann entropy
is not a training loss by default, but it is the N&C language for mixedness, compression, and information flow. It becomes relevant when evaluating noisy encodings, learned quantum channels, privacy leakage, or information bottleneck analogues.
Key results
The parameter-shift rule gives exact gradients for many gates. If a parameter appears in
where has eigenvalues , then for an expectation-value component ,
with other parameters held fixed. The identity is exact in the circuit model, but estimating the two shifted values on hardware introduces shot noise.
Quantum kernels are valid positive semidefinite kernels because they are Hilbert-space inner products. For any coefficients ,
For the squared-overlap kernel, positive semidefiniteness follows by viewing as the feature vector in Hilbert-Schmidt space. A possible advantage requires both a feature map whose kernel is hard to estimate classically and a learning task that benefits from that kernel. Either condition alone is insufficient.
Noisy QML should be expressed with channels. If the intended model is
but the actual hardware implements
then the learned function is not the ideal circuit plus small after-the-fact noise; it is a noisy quantum operation interleaved with the computation. N&C's Chapter 8 operator-sum language is the right notation for this.
The trace distance and fidelity supply evaluation tools beyond accuracy. For two states , trace distance measures distinguishability, while fidelity measures overlap. A QML embedding that maps nearby classical examples to nearly indistinguishable states may be hard to classify; an embedding that maps every training example to nearly orthogonal states may overfit and be expensive to estimate.
Generalization is a statistical question, not a quantum slogan. A high-dimensional Hilbert space can make training data separable, but useful learning requires an inductive bias aligned with the data distribution. The same discipline used in classical learning still applies: train/test splits, hyperparameter control, baseline strength, sample complexity, and uncertainty estimates.
NISQ QML and fault-tolerant QML should be separated. NISQ QML uses shallow circuits and accepts device noise as part of the training environment. Fault-tolerant QML could use deeper subroutines such as amplitude estimation, phase estimation, block-encoding, Hamiltonian simulation, or HHL-like linear algebra. Those subroutines are closer to quantum algorithms and require the logical qubits supplied by quantum error correction.
Visual
The diagram separates three QML workflows that often get blurred together. VQE and QAOA are variational loops with explicit ansatz layers, measurements, noise, and optimizer feedback, while the kernel circuit estimates state overlaps by applying one feature map followed by the inverse of another. The labeled shapes show where classical tensors enter, where shot data leaves the device, and where parameters return through the dotted feedback arrows.
| QML approach | Quantum object | N&C notation that clarifies it | Main risk |
|---|---|---|---|
| Variational classifier | and observables | , POVMs | Noise, barren plateaus, weak baselines |
| Quantum kernel | State overlaps | Fidelity and Hilbert-Schmidt inner product | Kernel may be classically easy or uninformative |
| QAOA-style optimizer | Alternating unitaries | Hamiltonian evolution and expectation values | Depth, landscape, and shot cost |
| Noisy training | Interleaved channels | Kraus maps and process tomography | Learned model differs from ideal circuit |
| Fault-tolerant QML | Algorithmic subroutines | Phase estimation, amplitude estimation, entropy | Input/output assumptions dominate |
Worked example 1: Parameter-shift gradient for one qubit
Problem. Let
Compute and verify the parameter-shift gradient at .
Method.
- Apply the rotation:
- The expectation is probability of minus probability of :
- Differentiate analytically:
At ,
- Apply the parameter-shift rule:
- Evaluate the two shifted terms:
- Subtract and divide by :
Answer. The parameter-shift estimate equals the analytic derivative, . The checked condition is that has a Pauli generator with the required two-eigenvalue spectrum.
Worked example 2: A two-point quantum kernel
Problem. Use the one-qubit feature map
and compute the kernel value .
Method.
- For ,
- For ,
- Compute the inner product:
- Square the magnitude:
- Check with density operators. Since and ,
Answer. The kernel value is . The states are neither identical nor orthogonal, so the kernel lies strictly between and .
Code
This NumPy example trains a one-qubit variational classifier with parameter-shift gradients and evaluates the noisy prediction using a simple depolarizing channel in density-matrix notation.
import numpy as np
I = np.eye(2)
X = np.array([[0, 1], [1, 0]], dtype=complex)
Y = np.array([[0, -1j], [1j, 0]], dtype=complex)
Z = np.array([[1, 0], [0, -1]], dtype=complex)
ZERO = np.array([[1.0], [0.0]], dtype=complex)
RHO0 = ZERO @ ZERO.conj().T
def ry(theta):
c = np.cos(theta / 2)
s = np.sin(theta / 2)
return np.array([[c, -s], [s, c]], dtype=complex)
def depolarize(rho, p):
return (1 - p) * rho + p * I / 2
def prediction(x, theta, noise=0.0):
u = ry(theta) @ ry(x)
rho = u @ RHO0 @ u.conj().T
rho = depolarize(rho, noise)
return float(np.real(np.trace(Z @ rho)))
def loss(xs, ys, theta, noise=0.0):
return np.mean([(prediction(x, theta, noise) - y) ** 2 for x, y in zip(xs, ys)])
def gradient(xs, ys, theta, noise=0.0):
plus = loss(xs, ys, theta + np.pi / 2, noise)
minus = loss(xs, ys, theta - np.pi / 2, noise)
return 0.5 * (plus - minus)
xs = np.linspace(-1.0, 1.0, 9)
ys = np.where(xs >= 0, 1.0, -1.0)
theta = 0.0
for _ in range(50):
theta -= 0.2 * gradient(xs, ys, theta, noise=0.03)
print(f"theta={theta:.3f} noisy_loss={loss(xs, ys, theta, noise=0.03):.3f}")
for x in xs:
print(f"x={x:+.2f} prediction={prediction(x, theta, noise=0.03):+.3f}")
Common pitfalls
- Assuming QML means automatic speedup. A quantum model must beat classical baselines under the same data, tuning, and resource accounting.
- Ignoring data encoding. Loading a large classical dataset into amplitudes can cost more than the intended speedup saves.
- Treating an ideal circuit as the implemented model. Hardware realizes noisy channels, not exact unitaries.
- Reporting training accuracy without generalization. A circuit can fit a small dataset without providing useful inductive bias.
- Overusing the phrase "quantum neural network." The circuit architecture, loss, observables, and data map matter more than the analogy.
- Neglecting shot noise. Gradients and losses estimated from finite measurements have variance.
- Choosing overly expressive ansatzes. Random deep circuits can suffer barren plateaus and become trainability failures.
- Treating separability as generalization. A feature map that separates the training set may still fail on unseen data.
- Comparing against weak classical baselines. Kernel methods, tensor networks, randomized features, and modern neural networks are serious competitors.
- Hiding optimizer cost. Many QML experiments spend substantial classical time on tuning, restarts, and learning-rate choices.
Connections
- Quantum algorithms supplies phase estimation, amplitude amplification, HHL-style linear algebra, and oracle models.
- Quantum hardware determines circuit depth, noise channels, measurement budget, and connectivity.
- Quantum error correction separates NISQ QML from future fault-tolerant QML.
- Machine learning provides kernels, generalization, optimization, model selection, and baseline discipline.
- Deep learning is the natural benchmark for claims involving high-dimensional data and learned representations.
- Linear algebra supplies Hilbert spaces, kernels, eigensystems, matrix conditioning, and tensor products.
- Quantum communication connects when QML models process distributed quantum states or learned channels.
- Quantum mechanics supplies density operators, observables, measurement, and open-system language.
Further reading
- Michael A. Nielsen and Isaac L. Chuang, Quantum Computation and Quantum Information, Chapters 2, 8, 11, and 12 for the notation used here.
- Maria Schuld and Francesco Petruccione, Supervised Learning with Quantum Computers.
- Jacob Biamonte and collaborators, review on quantum machine learning.
- Edward Farhi, Jeffrey Goldstone, and Sam Gutmann, QAOA.
- Jarrod McClean and collaborators, barren plateaus in quantum neural network training landscapes.
- Maria Schuld, Ryan Sweke, and Johannes Meyer, effect of data encoding on expressive power.
- Vojtech Havlicek and collaborators, supervised learning with quantum-enhanced feature spaces.
- John Preskill, writing on NISQ computing and near-term quantum devices.