logo
Loading...
Polylogarithmic-depth controlled-NOT gates without ancilla qubits

Computer Science

Polylogarithmic-depth controlled-NOT gates without ancilla qubits

B. Claudon, J. Zylberman, et al.

Discover groundbreaking circuits for *n*-control-NOT gates developed by Baptiste Claudon, Julien Zylberman, César Feniou, Fabrice Debbasch, Alberto Peruzzo, and Jean-Philip Piquemal. With impressive depth efficiency and potential for exponential speedup, these innovations could revolutionize several quantum algorithms, enhancing their performance remarkably.... show more
Introduction

The paper addresses the longstanding challenge of efficiently decomposing multi-controlled NOT (MCX) gates, which are key primitives in many quantum algorithms including qubitisation within the quantum singular value transformation, Hamiltonian simulation, quantum search and phase estimation. Prior approaches typically achieve linear or quadratic depth, often requiring ancilla qubits or accepting approximations. The authors focus on optimizing circuit depth (a proxy for algorithm runtime) while considering circuit size and ancilla counts, distinguishing between zeroed ancillae (initialised in |0⟩) and borrowed ancillae (in arbitrary states but restored after use). The central research question is whether C^(n)(X) can be decomposed with polylogarithmic depth, ideally with minimal ancilla overhead, and how such decompositions trade off ancilla availability versus depth. The study proposes three constructions that achieve polylogarithmic depth scaling, including an exact borrowed-ancilla method, an ancilla-free approximate method, and an exact adjustable-depth method leveraging a tunable number of zeroed ancillae, thereby promising significant reductions in depth for numerous quantum algorithms.

Literature Review

Foundational linear-depth constructions for MCX date to Barenco et al. (1995), which used ancillae or approximations, with the first exact ancilla-free method exhibiting quadratic depth. Subsequent works achieved exact linear depth and quadratic size. In 2015, Gidney gave an exact, ancilla-free linear-size method but with a large depth constant; nonetheless, it has been used in later applications. A 2017 result demonstrated logarithmic depth for MCX at the cost of using as many zeroed ancillae as controls, motivating trade-offs between ancilla count and depth, including computational approaches suggesting each zeroed ancilla reduces depth. Reviews of MCX cost and target-register strategies have been provided. The authors specifically focus on n > 2 controls, comparing metrics of depth, size (in single-qubit and CNOT basis), and ancilla counts, and emphasize the practical availability of borrowed ancillae compared to zeroed ancillae in many computations.

Methodology

The authors introduce three circuit identities to implement C^(n)(X):

  1. Polylogarithmic-depth exact circuit using one borrowed ancilla (Proposition 1): The construction first develops a zeroed-ancilla version that reduces an n-controlled operation into five layers whose depths match those of quadratically smaller operations. The control register R of n qubits is partitioned with p = ceil(√n) into subregisters R_0 of size up to 2p and R_i (i = 1..p) of size at most p, with a further split of R_0 and appropriate positioning of control qubits between subregisters. Using a single zeroed ancilla a, the circuit prepares a conditional flag when R_0 is active, and implements blocks C_{R_i→t} that flip the target conditioned on a and matching activation states across subregisters, ensuring correctness and uncomputation. The zeroed ancilla is then transformed into a borrowed ancilla via a layered construction U = C_b C_{b-1} ... C_0 C U_0, which guarantees that the ancilla is restored for all activation patterns while implementing the desired C^(n)(X). For recursion, each block (e.g., C^p(X), C^{p+1}(X), C^b(X)) is implemented with its own locally borrowed ancilla since disjoint supports provide enough borrowable qubits. The depth satisfies the recurrence D_n = 2 D_{2p} + 4 D_p + 2 D_{b+1} + 4 with p = floor(√n) and p − 2 ≤ b ≤ p. Letting D(k) = D_{2^k}, the Master Theorem yields D(k) ≤ 8 D(k/2) + 4, giving D_n ∈ O(log(n)^3); the size scales as O(n log n). Corollary constructions show how to control arbitrary unitaries: with one zeroed ancilla, C^(n)(U) decomposes into two C^(n)(X) and one C(U) using a square-root trick, yielding depth O(S + log n) and size O(S + n log n) for a size-S unitary U, and controlled SU(2) single-qubit gates C^(n)(W) achieve depth O(log(n)^3) and size O(n log n) without ancilla (via standard SU(2) identities).
  2. Polylogarithmic-depth ancilla-free approximate circuit (Proposition 2): To control a single-qubit U without ancillae, the method recursively decomposes a C^(n)(U) into (n−1)-controlled unitaries using a square-root ladder (each step requiring controlled roots V_k of U). Applying the recursion k times gives 2^k one-controlled roots and 2^k MCX gates. To avoid linear depth, the (n−k)-controlled V_k gate is neglected, incurring error bounded by ||C^{(n−k)}(V_k) − C^{(n−k)}(I)|| ≤ π / 2^k. Choosing k = ceil(log(π/ε)) yields spectral-norm error ε. Each resulting MCX (with j ≤ n−1 controls) can borrow an unaffected qubit and be implemented with the polylog-depth method (Proposition 1). This gives depth O(log(n)^3 log(1/ε)) and size O(n log n log(1/ε)) without ancilla qubits.
  3. Adjustable-depth exact circuit with m zeroed ancillae (Proposition 3): For 2 ≤ m ≤ n (even m and n divisible by m/2 for simplicity), divide the n control qubits into m/2 balanced subregisters. Step 1: in parallel, compute the activation of each subregister into m/2 ancilla qubits using C^{n/(m/2)}(X), each implemented with depth O(log(n/(m/2))^3) by Proposition 1 using one of the remaining m/2 zeroed ancillae. Step 2: apply a C^{m/2}(X) to the target, implemented in depth O(log(m/2)) using m/2 ancillae via a logarithmic-depth method. Step 3: uncompute to reset ancillae. In general, with arbitrary n and m, the first-stage operations are performed in parallel with depth equal to the maximum among them; the second stage has depth bounded by 16 ceil(log(m/2)) + 12. Overall depth is O(log(n/(m/2))^3 + log(m/2)). This provides a tunable trade-off between ancilla count and depth.
Key Findings
  • Proposition 1: Exact implementation of C^(n)(X) with one borrowed ancilla achieves depth O(log(n)^3) and size O(n log n). Numerical fits indicate depth ≈ 43 log(n)^3 − 1287 with one borrowed ancilla (Table 2), and ≈ 27 log(n)^3 − 808 with one zeroed ancilla (Table 3).
  • Corollary 1: For a unitary U of size S, C^(n)(U) with one zeroed ancilla has depth O(S + log n) and size O(S + n log n).
  • Corollary 2: For W ∈ SU(2), C^(n)(W) can be implemented without ancilla with depth O(log(n)^3) and size O(n log n).
  • Proposition 2: Ancilla-free approximate control of a single-qubit U achieves depth O(log(n)^3 log(1/ε)) and size O(n log n log(1/ε)); Table 1 reports a fitted depth of [log(π/ε)](86 log(n)^3 − 2564) for the implementation considered, outperforming prior ancilla-free methods for n ≥ 10^5 controls.
  • Proposition 3: Exact adjustable-depth C^(n)(X) with 2 ≤ m ≤ n zeroed ancillae has depth O(log(n/(m/2))^3 + log(m/2)), decreasing as m increases. Table 3 shows this approach reaches depths competitive with the best prior methods using far fewer zeroed ancillae; when m = n, it matches the logarithmic-depth method of He et al.
  • Comparative performance: Against Barenco’s and Gidney’s methods, the proposed constructions yield superpolynomial (exponential) depth speedups asymptotically. With one borrowed ancilla, advantage becomes evident for n ≳ 10^2; without ancilla (approximate), Proposition 2 is shallower for n ≳ 10^5. The controlled-SU(2) depth via these techniques can have leading term 76 log(n)^3 through optimized use of the borrowed-ancilla MCX building blocks.
  • Algorithmic impact: In QSVT and qubitisation oracles, PREPARE depth improves from O(s n) to O(s log(n)^3); SELECT depth reduces to O(s log(log(s))^3 C) from O(s log(s)^2 C), suggesting broad benefits across Hamiltonian simulation, QPE, search, and state preparation.
Discussion

The results directly address the core objective of reducing the depth of multi-controlled operations. By introducing a recursive partitioning and borrowed-ancilla technique, Proposition 1 achieves polylogarithmic depth while maintaining exactness and using only one borrowed ancilla, making it practical in scenarios where unaffected qubits can serve as temporary workspace. Proposition 2 shows that ancilla-free control of single-qubit unitaries can be achieved with polylogarithmic depth in both n and 1/ε by strategically truncating the control-root decomposition, aligning approximation error with hardware error budgets. Proposition 3 offers a flexible, exact construction that interpolates between polylogarithmic and logarithmic depth as more zeroed ancillae are available, enabling adaptation to hardware resource constraints. These advances translate into significant depth reductions in key algorithmic oracles in the QSVT framework, notably improving PREPARE and SELECT, and thereby the overall depth of Hamiltonian simulation and QPE pipelines. The paper demonstrates both asymptotic advantages and concrete depth fits, highlighting regimes where the methods surpass state-of-the-art constructions. The findings suggest that prioritizing depth even at the cost of modest size overhead can be advantageous in fault-tolerant settings where parallelism is critical.

Conclusion

The paper introduces three new decompositions for multi-controlled NOT gates: (1) an exact polylogarithmic-depth method using one borrowed ancilla, (2) an ancilla-free approximate method with polylogarithmic depth in both the number of controls and the inverse error, and (3) an exact adjustable-depth method that exploits m zeroed ancillae to reduce depth to O(log(n/(m/2))^3 + log(m/2)). These constructions yield superpolynomial depth improvements over prior leading methods and deliver practical benefits in algorithmic primitives such as PREPARE and SELECT within QSVT, thereby impacting Hamiltonian simulation, QPE, search, and state preparation. Future work includes exploring broader oracle design strategies inspired by these circuit identities, optimizing T-count and non-Clifford resource overheads, and integrating the methods into full-stack compiler toolchains to exploit hardware-specific parallelism and connectivity constraints.

Limitations
  • Increased circuit size: While depth decreases superpolynomially, the total gate count increases by a polylogarithmic factor, potentially elevating non-Clifford (e.g., T, Toffoli) counts. In fault-tolerant implementations, magic state distillation costs can dominate and may limit practical speedups.
  • Approximation trade-offs: The ancilla-free method relies on truncating high-order controlled roots, introducing a controlled spectral-norm error ε. Selecting ε consistent with hardware noise is essential; overly stringent ε may negate depth benefits.
  • Hardware precision and scalability: For very large n (e.g., n ≥ 10^6), achieving errors below 2^(−n) is likely infeasible even with error correction, and in some applications it may be preferable to omit extremely small-phase controlled operations. Conversely, omitting MCX in sparse state preparation can significantly affect correctness, so applicability is problem dependent.
  • Ancilla availability and layout: The borrowed-ancilla approach assumes the presence of unaffected qubits for temporary use and benefits from disjoint supports; limited connectivity or layout constraints may reduce parallelism and increase constant factors.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny