
Computer Science
Quantum compiling by deep reinforcement learning
L. Moro, M. G. A. Paris, et al.
Discover a groundbreaking approach to quantum compiling through deep reinforcement learning (DRL), presented by Lorenzo Moro, Matteo G. A. Paris, Marcello Restelli, and Enrico Prati. This innovative method not only learns to approximate single-qubit unitaries efficiently but also drastically reduces execution time, potentially enabling real-time applications. Don't miss out on this cutting-edge research!
Playback language: English
Introduction
Quantum computation relies on unitary matrices acting on qubits. However, practical gate model quantum computers offer limited transformations due to architectural constraints. Computation is achieved through quantum gate circuits β ordered sequences of unitary operators. While the Solovay-Kitaev theorem guarantees that any computation can be approximated within arbitrary tolerance using a finite gate set, finding an optimal sequence is challenging β a problem known as quantum compiling. Quantum compilers balance sequence length (ideally short), precompilation time (algorithm preparation time), and execution time (sequence generation time). Previous work, often based on the Solovay-Kitaev theorem, yielded algorithms with polylogarithmic scaling for length and execution time (π(log<sup>c</sup>(1/Ι)), where Ι is accuracy, and c is between 3 and 4). For example, the Dawson-Nielsen (DNSK) algorithm achieves length π(log<sup>3.97</sup>(1/Ι)) and time π(log<sup>2.71</sup>(1/Ι)). While optimizations are possible by choosing specific gate sets, they often increase precompilation time. Hybrid approaches combining planning algorithms and deep neural networks offer improvements but may suffer from suboptimal scaling of execution time at high accuracy. No algorithm can use less than π(log(1/Ι)) gates. Existing compilers, with high precompilation and execution times, are impractical for online operations. Deep reinforcement learning (DRL), using deep neural networks to learn optimal policies for decision-making problems, offers an alternative approach. Effective in high-dimensional control and with limited prior knowledge, DRL has shown promise in quantum system control, including qubit control and initialization using continuous and digital pulse sequences, optimizing quantum computation speed and fidelity, and quantum gate control. This work leverages DRL to approximate single-qubit unitary operators as circuits from an arbitrary initial set of elementary quantum gates. The training strategy involves generating a uniform distribution of single-qubit unitary matrices as training targets, allowing the agent to learn a policy without explicit instructions on approximation.
Literature Review
Existing quantum compiling algorithms, predominantly rooted in the Solovay-Kitaev theorem, struggle to find the optimal balance between sequence length, precompilation time, and execution speed. While algorithms like Dawson-Nielsen offer polylogarithmic scaling in length and execution time concerning accuracy, they are computationally expensive, making real-time application challenging. Attempts to improve performance by selecting specialized gate sets often trade off with increased precompilation times. Hybrid approaches that incorporate planning algorithms and neural networks show potential but often encounter scalability issues in the execution time, particularly at higher accuracy levels. This research gap highlights the need for an alternative approach that can offer both efficiency and speed suitable for real-time quantum computations. Deep reinforcement learning (DRL) emerges as a promising candidate due to its capacity to handle high-dimensional spaces and learn optimal control strategies without exhaustive prior knowledge. The successful application of DRL in related fields such as quantum system control and optimization further justifies its exploration for quantum compiling.
Methodology
The proposed method frames quantum compiling as a deep reinforcement learning problem. The environment is a quantum circuit initialized as the identity at the beginning of each episode. The agent incrementally builds the circuit by selecting gates from a predefined universal gate set (B) at each time step. The agent's actions correspond to gates in B. The observation at time step n is a vector of the real and imaginary parts of the matrix O<sub>n</sub>, where U = U<sub>n</sub>O<sub>n</sub> (U is the target unitary matrix). This observation provides the information needed to generate an approximating sequence. Two reward functions are designed: a dense reward for quasi-continuous gate sets (small rotations) and a sparse reward for discrete gate sets. The dense reward function is: r(S<sub>n</sub>, a<sub>n</sub>) = {(L-n)+1 if d(U<sub>n</sub>, U<sub>t</sub>) < Ξ΅; -d(U<sub>n</sub>, U<sub>t</sub>)/L otherwise}, where L is the maximum episode length, a<sub>n</sub> is the action, S<sub>n</sub> is the environment state, and d(U<sub>n</sub>, U<sub>t</sub>) is the distance between the target and the approximation at time n. The sparse reward is binary: 0 if d(U<sub>n</sub>, U<sub>t</sub>) < Ξ΅, -1/L otherwise. Deep Q-Learning (DQL) and Proximal Policy Optimization (PPO) algorithms are used, DQL for the sparse reward (off-policy method required), and PPO for its robustness. For the small rotation experiments, the target U<sub>t</sub> is generated as a composition of 87 small rotations around the x, y, and z axes by Ο/128. The DQL agent uses a dense reward function to learn to approximate this target. In the Haar random unitary matrix experiments, Proximal Policy Optimization (PPO) is used with Haar unitary matrices as targets to train the agent to generalize to a wider range of unitary transformations. For the HRC base of gates, a sparse reward is used in conjunction with DQL and Hindsight Experience Replay (HER) to guide the agent in the high-dimensional unitary space.
Key Findings
The DRL agents successfully learn to approximate single-qubit unitary transformations. Using six small rotations (Ο/128) around the Bloch sphere axes, the DQL agent achieves an average gate fidelity (AGF) of approximately 0.9999, with execution time scaling empirically as O(log<sup>1.25</sup>(1/(1 β Ι))). Using Haar random unitary matrices as training targets and the PPO algorithm, the agent achieves AGF β0.99. For the HRC efficiently universal gates, the DQL agent with HER achieves AGF β0.99, generating shorter circuits than in the rotations case. The length distribution of solutions shows that using the HRC base yields significantly shorter sequences than the small rotations base. The performance of the trained agents is evaluated on a validation set of 10<sup>6</sup> Haar unitary matrices; both algorithms achieve a success rate exceeding 95%. The empirically determined scaling of sequence length with tolerance for the small rotation experiment is approximately O(log<sup>1.25</sup>(1/Ξ΅)). The execution time of the DRL compiler is fast (5.4Β·10<sup>-4</sup>s per time step on a single CPU core), potentially suitable for real-time applications. The method was also tested on two-qubit gates, showing consistent performance.
Discussion
The results demonstrate the effectiveness of DRL as a flexible quantum compiler applicable to various gate bases, unlike existing methods which are often limited to specific gate sets. The DRL approach avoids the need for designing tailored algorithms for each base, providing a general strategy that is learned once and applied to approximate any single-qubit unitary matrix within the specified tolerance. This generalizability is a significant advantage over traditional algorithms. The fast execution time, which scales proportionally to the sequence length, highlights the potential for real-time quantum compilation, a crucial aspect for many quantum computing applications. The ability to reach high accuracy (AGF β 0.9999) with relatively short sequences is a strong indicator of the method's efficiency. The extension of this methodology to two-qubit gates further supports its broad applicability and scalability for larger systems.
Conclusion
This paper introduces a novel deep reinforcement learning approach to quantum compiling, successfully demonstrating its ability to approximate single-qubit unitary transformations with high fidelity and efficiency. The method's flexibility, adaptability to different gate bases, and fast execution time suggest its suitability for real-time quantum computation. Future research should focus on scaling the approach to larger numbers of qubits and integrating hardware-specific constraints into the DRL environment.
Limitations
The DRL method does not guarantee finding a solution for every unitary target, although success rates in the experiments were high. The performance of the DRL compiler is highly dependent on the choice of reward function and the hyperparameters. While the method has shown good scalability to two qubits, further research is needed to determine its performance and limitations on larger qubit systems and more complex quantum algorithms.
Related Publications
Explore these studies to deepen your understanding of the subject.