logo
ResearchBunny Logo
Variational Monte Carlo with large patched transformers

Physics

Variational Monte Carlo with large patched transformers

K. Sprague and S. Czischek

Discover groundbreaking advancements in simulating qubit systems with transformer models as wavefunction ansatz! This research, conducted by Kyle Sprague and Stefanie Czischek, showcases large patched transformers that outperform traditional methods, opening doors to studying extensive Rydberg atom arrays and their fascinating phase transitions.

00:00
00:00
Playback language: English
Introduction
The simulation of quantum many-body systems is a computationally challenging problem. Artificial neural networks (ANNs) have emerged as powerful tools for approximating wavefunctions, offering a variational approach to finding ground states. Various ANN architectures have been explored, including restricted Boltzmann machines, RNNs, and PixelCNNs. Autoregressive networks like RNNs and PixelCNNs are particularly promising due to their efficient wavefunction encoding, but they struggle with long-range correlations, especially in higher-dimensional systems. Transformer (TF) models, known for their ability to capture long-range dependencies in sequence data, present a potential solution. This paper investigates the application of TFs as wavefunction ansatz for variational ground state searches, focusing on two-dimensional Rydberg atom arrays. Rydberg atom arrays are experimentally controllable systems suitable for quantum computation and simulation, allowing for benchmarking against existing quantum Monte Carlo (QMC) methods. The authors introduce a novel architecture, Large Patched Transformers (LPTFs), inspired by vision transformers, which aims to improve the computational efficiency of TFs while maintaining accuracy by processing patches of qubits instead of individual qubits.
Literature Review
The paper reviews existing literature on using ANNs for quantum many-body simulations. It highlights the successes and limitations of different architectures, including restricted Boltzmann machines, RNNs, and PixelCNNs. The limitations of RNNs and PixelCNNs in capturing long-range correlations are discussed, motivating the exploration of TF models as an alternative. Previous works utilizing TFs for quantum simulations are mentioned, emphasizing their potential to overcome the limitations of RNNs and PixelCNNs. The authors also draw inspiration from the success of vision transformers in image processing, which process patches of pixels to improve efficiency, leading to the development of the LPTF architecture.
Methodology
The study uses two-dimensional Rydberg atom arrays as a model system. The Rydberg Hamiltonian, including detuning, Rabi oscillation, and van der Waals interaction terms, is defined. The authors compare the performance of RNNs and TFs in representing ground states of these arrays. Both RNNs and TFs are implemented as autoregressive models, generating probability distributions that represent the squared wavefunction amplitudes. The RNN processes qubits sequentially, while the TF uses a masked self-attention mechanism to capture all-to-all interactions. The performance is benchmarked against QMC simulations using the stochastic series expansion approach. To address the computational cost of TFs, the authors introduce patched versions of both RNNs and TFs, where the network processes patches of qubits instead of individual qubits. This reduces the sequence length and improves computational efficiency. Finally, the LPTF architecture is introduced. LPTFs consist of a patched TF followed by a patched RNN. The TF processes large patches, and its output is fed as the initial hidden state to the RNN, which processes smaller sub-patches. This hierarchical approach allows for efficient handling of large patches while maintaining accuracy. The performance of all models (RNN, TF, patched RNN, patched TF, and LPTF) is evaluated by minimizing the energy expectation value using a variational Monte Carlo approach. The energy expectation values and variances are calculated using samples generated from the trained networks and compared to the results from QMC simulations. Computational runtimes for training iterations and sample generation are also compared across different models and system sizes. The study explores different patch sizes and shapes to optimize the LPTF's performance. The staggered magnetization is used as an order parameter to assess the accuracy of the models in capturing phase transitions in the Rydberg system.
Key Findings
The study reveals that TFs achieve higher accuracies than RNNs in representing ground states of Rydberg atom arrays, particularly for larger system sizes. However, TFs are significantly more computationally expensive. Introducing patched RNNs and TFs significantly reduces the computational runtime while maintaining or improving accuracy. The LPTF architecture, combining a patched TF and a patched RNN, further reduces the computational cost and allows scaling to larger system sizes. LPTFs achieve accuracies below the QMC uncertainties for ground states, even with a significantly reduced number of samples and shorter runtimes. The LPTF's performance is consistent across different phases of matter and accurately captures phase transitions. The computational runtime per training iteration decreases rapidly for small patches but converges to steady times for larger patches, influenced by mini-batch size. The accuracy decreases with increasing patch size due to the limited representational power of the RNN for large patches and increased information per iteration. Patch sizes around 8x8 atoms offer a good compromise between speed and accuracy. The LPTF model accurately captures the phase transition, with differences from QMC results remaining consistently low, even near the phase transition where QMC uncertainties increase due to long autocorrelation times.
Discussion
The findings demonstrate the superior performance of TFs over RNNs for representing strongly correlated quantum systems, but highlight the importance of addressing their computational cost. The introduction of patched networks and the LPTF architecture effectively addresses this limitation, enabling efficient simulations of large quantum many-body systems. The results surpass the accuracy of state-of-the-art QMC methods for the considered system sizes, while simultaneously improving computational efficiency. The consistent performance of the LPTF across different phases of matter and its ability to accurately capture phase transitions emphasize its potential as a versatile tool for studying diverse quantum phenomena. The ability to achieve high accuracy with fewer samples is crucial for scaling up simulations to larger and more complex systems. The observation that the accuracy of the LPTF begins to degrade at very large patch sizes may motivate exploration of different RNN components for further improvement.
Conclusion
This work demonstrates the effectiveness of transformer models, especially the novel LPTF architecture, for simulating quantum many-body systems. The LPTF architecture significantly improves the computational efficiency of TFs while maintaining high accuracy, surpassing existing QMC methods for the system sizes considered. Future research could explore larger network models, higher embedding dimensions, and more sophisticated methods for handling complex-valued wavefunctions to further improve performance. Data-based initialization could also enhance the LPTF's performance. The LPTF's success suggests it can become a valuable tool for studying a wide range of quantum systems and phenomena, particularly for investigating large-scale behavior and exploring complex phase diagrams.
Limitations
The study focuses on two-dimensional Rydberg atom arrays and a specific Rydberg Hamiltonian. While the LPTF architecture is potentially applicable to other quantum systems, further investigations are needed to assess its generalizability. The accuracy of the LPTF decreases at very large patch sizes, suggesting a limitation in the current RNN component's ability to efficiently handle the increased amount of information. Future work may involve investigating different RNN subnetwork configurations, such as those with enhanced memory capabilities, to address this limitation.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny