Physics
Fundamental limits of quantum error mitigation
R. Takagi, S. Endo, et al.
This groundbreaking research by Ryuji Takagi, Suguru Endo, Shintaro Minagawa, and Mile Gu explores the limits of quantum error mitigation techniques, revealing how much these algorithms can alleviate computational errors. The study showcases the exponential scaling of error mitigation for local depolarizing noise within layered circuits and establishes a benchmark for future error mitigation strategies.
~3 min • Beginner • English
Introduction
Noisy intermediate-scale quantum (NISQ) devices promise advantages with tens to hundreds of qubits, but unavoidable noise in gates accumulates and threatens practical utility. While quantum error correction can, in principle, suppress errors indefinitely, it requires adaptive operations often unavailable on NISQ hardware. This has spurred non-adaptive quantum error mitigation (QEM) techniques—such as zero-noise extrapolation, probabilistic error cancellation, and virtual distillation—that use repeated runs of noisy devices and classical post-processing to suppress error effects. Despite many protocol-specific studies, a fundamental question remains: what are the ultimate limits of QEM irrespective of specific algorithms? Motivated by an analogy to Carnot’s theorem in thermodynamics, the authors seek universal performance bounds that any non-adaptive QEM protocol must obey. They formalize QEM, introduce a universal performance metric—the maximum estimator spread that governs sampling overhead for a target accuracy—and derive lower bounds on this spread using state distinguishability degradation under noise. They then apply these bounds to layered circuits with local depolarizing noise and to local dephasing noise, revealing exponential sampling overhead with depth and demonstrating optimality of probabilistic error cancellation in a broad setting.
Literature Review
The paper surveys key non-adaptive QEM approaches proposed for NISQ devices: zero-noise extrapolation (including Richardson extrapolation and variants), probabilistic error cancellation (a stochastic reversal of noise using quasi-probability representations), and virtual distillation (using multiple copies and coherent operations to suppress mixedness). It references applications in variational quantum algorithms and quantum chemistry, and notes prior observations of exponential sampling overhead in particular methods. It also situates the work within broader concepts of state distinguishability under restricted measurements and data-processing inequalities, highlighting the lack of universal, method-agnostic performance limits prior to this work.
Methodology
The authors formalize QEM protocols without adaptive quantum operations. For an ideal computation producing state ρ and observable A (rescaled to satisfy −1/2 ≤ A ≤ 1/2), only distorted states εi(ψ) are accessible. A QEM strategy samples N such configurations and applies a non-adaptive physical process P that outputs a classical estimator EA of Tr(Aρ). They define (Q, K)-error mitigation: N = KQ distorted states are partitioned into K groups of size Q; for each group k, a (possibly entangling) local POVM Mk acts on its Q inputs to yield outcome ℓk, and a deterministic classical function e(ℓ1,…,ℓK) produces a single sample of EA per round. Performance is characterized by (i) bias bA = ⟨EA⟩ − Tr(Aρ), with worst case bmax over all A in [−1/2, 1/2] and states of interest, and (ii) maximum estimator spread Δemax, the worst-case range of EA over all A and outcomes. Via Hoeffding’s inequality, the number of rounds M needed for additive error δ and failure probability ε scales as M ≥ [log(2/ε)]/(2δ2) × (Δemax)2, implying Δemax directly quantifies sampling overhead. They derive fundamental lower bounds on Δemax by relating QEM to quantum channels and invoking data-processing inequalities for distinguishability. Specifically, they consider the trace distance Dtr and a local distinguishability measure DLM (optimized over product POVMs consistent with the (Q, K) structure). The key result (Theorem 1) lower-bounds Δemax, for any (Q, K)-protocol with maximum bias bmax, in terms of how much the available noise channels reduce distinguishability between pairs of states under allowed local measurements. A corollary translates this into a bound on sampling cost M for target accuracy given bmax. They also provide a refined analysis for practically relevant constraints (e.g., limited coherent interaction size Q) and analyze layered circuits: under local depolarizing noise acting layer-by-layer, the bounds imply Δemax grows exponentially with circuit depth L, establishing an unavoidable exponential sampling overhead for general QEM protocols in this setting (Theorem 3). The framework encompasses standard QEM methods: probabilistic error cancellation as (1,1), Richardson extrapolation as (1,R+1), and R-copy virtual distillation as (R,1). The Methods section gives a precise channel-based definition, the proof sketch linking estimator spread to DLM via data processing, and discusses measurable proxies (e.g., subfidelity via SWAP-like tests) to evaluate bounds on hardware.
Key Findings
- Universal lower bounds: For any non-adaptive (Q, K)-error mitigation protocol with worst-case bias bmax, the maximum estimator spread Δemax is lower-bounded by quantities determined by noise-induced reductions in state distinguishability (via local distinguishability DLM aligned with the (Q, K) constraint). This yields protocol-agnostic limits that no QEM method can surpass.
- Sampling cost relation: Hoeffding-based bounds imply the number of rounds M required to achieve additive error (bmax + δ) with failure probability ≤ ε scales as M ≥ [log(2/ε)]/(2δ2) × (Δemax)2. Thus, increased bias tolerance can trade off with reduced sampling cost, and greater noise-induced indistinguishability forces larger sampling overhead.
- Exponential overhead with depth: For layered circuits with local depolarizing noise, Δemax (hence sampling overhead) grows exponentially with circuit depth L for general QEM protocols (Theorem 3), confirming that exponential overhead observed in specific methods reflects a fundamental limitation.
- Optimality of probabilistic error cancellation (PEC) for dephasing: In the setting of local dephasing noise on an arbitrary number of qubits and (1,1)-protocols with unbiased estimators (bmax = 0 given known noise), PEC’s estimator spread ΔEPEC = γ (its quasi-probability cost) attains the fundamental lower bound from Theorem 1, proving PEC is optimal among unbiased (1,1) strategies for this noise model. For global depolarizing noise, PEC is shown to be near-optimal (its spread closely matches, up to small O(ε) differences, the lower bound).
- Benchmarking other methods: For virtual distillation ((Q,1)) and Richardson extrapolation ((1,R+1)), computed estimator spreads (for specific observables/states such as GHZ) align closely with the lower bounds in low-noise regimes; at higher noise levels, performance can diverge, consistent with the generality and looseness of universal bounds.
- Practical evaluation: The paper outlines constant-depth destructive SWAP-like measurements to estimate subfidelity-related quantities (Tr(ρσ), Tr(ρσρσ)) to tighten bounds experimentally.
Discussion
The bounds provide a universal metric—maximum estimator spread—to assess QEM feasibility and resource needs. They formalize a trade-off between systematic bias and sampling overhead and connect mitigation cost to noise-induced loss of distinguishability constrained by accessible measurements. The results show that exponential sampling overhead with circuit depth under local depolarizing noise is intrinsic to general non-adaptive QEM, and that PEC is already optimal for local dephasing within unbiased (1,1)-protocols. Comparing specific protocols’ spreads to the fundamental bounds offers a benchmark to gauge optimality and identify room for improvement. The work focuses on rounds M per fixed N; fully characterizing methods leveraging large N (e.g., highly non-linear post-processing) suggests further study of how error scales with N or K. The framework and distinguishability-based bounds also hint at connections to non-Markovian dynamics (where backflow can affect distinguishability) and to quantum error correction, potentially illuminating the transition from mitigation to correction and informing hybrid suppression strategies.
Conclusion
This work introduces a general, non-adaptive framework for quantum error mitigation and establishes fundamental, protocol-independent lower bounds on the maximum estimator spread—and thus on sampling overhead—based on noise-induced reductions in state distinguishability under constrained measurements. Two key consequences are proven: (i) exponential scaling of sampling overhead with circuit depth for layered circuits with local depolarizing noise, and (ii) optimality of probabilistic error cancellation among unbiased (1,1)-protocols for local dephasing noise across any number of qubits. The framework serves as a universal benchmark to assess current and future QEM methods, clarifying what performance is physically unattainable and where improvements are possible. Future research directions include: tightening bounds and identifying more cases of optimality; analyzing full scaling with the number of processed distorted outputs per round (N or K); exploring implications for non-Markovian noise; and bridging insights between QEM and quantum error correction to characterize the mitigation–correction transition.
Limitations
- Worst-case metric: The bounds use maximum estimator spread Δemax, a worst-case measure; if the estimator variance is smaller in practice, actual sampling costs can be lower. However, predicting variance a priori is generally difficult.
- Focus on rounds M: Results primarily bound the number of rounds M for fixed per-round sample size N = KQ. Protocols that require large N and highly non-linear post-processing (e.g., exponential extrapolation, subspace expansion) may exhibit gaps between the bound on M and true total sampling cost.
- Non-adaptive constraint: The framework excludes adaptive quantum operations (i.e., standard error correction). While effective noise channels can incorporate some non-adaptive circuit modifications, conclusions do not apply to adaptive schemes.
- General bounds can be loose: Especially in high-noise regimes, observed estimator spreads for specific protocols (e.g., virtual distillation, extrapolation) may deviate significantly from the universal lower bounds due to their broad generality.
- Model assumptions: Observables are rescaled to lie within [−1/2, 1/2], and unbiasedness/bias bounds rely on pre-knowledge of noise in some analyses (e.g., PEC). Layered local depolarizing/dephasing models underpin certain explicit consequences (e.g., exponential depth scaling, PEC optimality).
Related Publications
Explore these studies to deepen your understanding of the subject.

