logo
ResearchBunny Logo
Empowering deep neural quantum states through efficient optimization

Physics

Empowering deep neural quantum states through efficient optimization

A. Chen and M. Heyl

Discover groundbreaking research by Ao Chen and Markus Heyl, introducing a minimum-step stochastic-reconfiguration optimization algorithm that enhances neural quantum states. This innovative approach successfully tackles complex quantum systems, achieving machine precision and revealing the secrets of gapless quantum-spin-liquid phases.... show more
Introduction

The paper addresses the long-standing problem of accurately determining ground states of strongly interacting quantum many-body systems, particularly in two dimensions where exponential Hilbert-space growth and frustration pose severe computational challenges. Traditional approaches face distinct obstacles: exact diagonalization suffers from the curse of dimensionality, quantum Monte Carlo methods encounter the sign problem, and tensor-network techniques are limited by entanglement growth and contraction complexity. Neural quantum states (NQS) have emerged as a promising alternative by encoding wavefunctions in neural networks, showing progress on quantum spin liquids (QSLs). Yet, optimization remains a bottleneck: due to rugged loss landscapes, stochastic reconfiguration (SR)—a quantum analogue of natural gradient—is usually necessary but scales poorly with the number of parameters, impeding the training of deep, large-scale architectures. Consequently, most prior NQS studies have used shallow models. The authors propose overcoming this optimization barrier to unlock the expressive power of deep NQS and to resolve outstanding physics questions, such as whether QSL phases in prototypical frustrated J1–J2 Heisenberg models are gapped or gapless.

Literature Review

The paper surveys computational approaches for quantum many-body problems: exact diagonalization (limited by Hilbert space size), quantum Monte Carlo (sign problem), and tensor networks (entanglement and contraction costs). Neural quantum states have been successfully applied to frustrated magnets and QSLs using architectures such as RBMs, shallow CNNs, RNNs, and group CNNs. However, standard SR optimization has O(N_s^3) or effectively O(N_θ^3) costs when inverting large quantum metric matrices, restricting model sizes to ~10^3 parameters and shallow depth. Prior attempts to scale optimization include iterative solvers, approximate optimizers, and leveraging supercomputers, yet SR’s cost remains the key limitation. The nature of QSL phases in the square- and triangular-lattice J1–J2 Heisenberg models is debated, with both gapped and gapless scenarios reported. This context motivates developing an efficient optimizer enabling deep NQS to attain higher accuracy and to revisit gap assessments.

Methodology

The authors introduce Minimum-step Stochastic Reconfiguration (MinSR), a reformulation of SR tailored for the regime of many parameters (N_θ) and relatively few Monte Carlo samples (N_s). In variational Monte Carlo with NQS, imaginary-time evolution is approximated by minimizing the quantum distance between the updated variational state and the exact imaginary-time evolved state. Using samples, the distance can be expressed as d = ||O δθ − ε||, leading to a linear system O δθ = ε. Traditional SR solves δθ = S^−1 ε with S = O^† O, which is expensive to invert for large N_θ. MinSR compresses the information into the neural tangent kernel T = O O^† (size N_s × N_s), yielding the equivalent solution δθ = O^† T^−1 ε. This reduces complexity to O(N_s N_θ + N_s^3), i.e., linear in N_θ for fixed N_s, enabling deep networks. Numerical stability is ensured via pseudo-inverse regularization of T with relative/absolute cutoffs (soft thresholding) and double-precision arithmetic. Architectures and training: Two deep ResNet variants are employed. ResNet1 (real-valued convolutions with layer normalization and ReLU in each residual block; special final activation enabling sign structure) performs well for transfer learning across sizes. ResNet2 (similar blocks without normalization) supports both real and complex outputs; complex outputs are handled via appropriate activations and momentum projection to target specific symmetry/momentum sectors. For non-holomorphic networks, MinSR is adapted by separating real and imaginary parts to construct an equivalent real system. Physics setup: Benchmarking is performed on the spin-1/2 J1–J2 Heisenberg model on square lattices at J2/J1 = 0 and 0.5 (maximally frustrated regime), and on the triangular lattice at J2/J1 = 0.125. Physical sign structures (Marshall sign on square, 120° order on triangular) are optionally imposed; symmetry projection (point-group C4 or D3, spin inversion, and translational symmetry via CNN) is applied to project onto desired sectors. To further improve accuracy, a single Lanczos step is used to mix the optimized variational state with one orthogonal state; zero-variance extrapolation is employed to estimate ground-state energies by extrapolating E versus variance σ^2 to σ^2 = 0. Excited states for gap estimates are targeted by enforcing momentum and spin quantum numbers. Implementation details include diagonalizing T for pseudo-inverse, soft cutoff parameters (e.g., r_pinv ≈ 1e-12), and ensuring stable updates under MinSR.

Key Findings
  • Optimization efficiency: MinSR yields an optimization cost linear in N_θ (plus an N_s^3 term), enabling training of deep NQS with up to 64 layers and over 10^6 parameters (body also notes feasibility up to 10^9 parameters in principle). It matches SR accuracy while drastically reducing time cost in the regime N_s ≪ N_θ.
  • Precision benchmarks: On a 6×6 square lattice, deep NQS trained with MinSR reproduces wavefunction amplitudes near machine precision compared to exact diagonalization, outperforming shallow SR-trained networks.
  • Non-frustrated square lattice (J2/J1 = 0), 10×10: Achieved variational energy per site E/N = −0.67155260(3), better than prior variational references. The SSE reference used is E_GS/N = −0.67155267(5); the relative error is extremely small, surpassing previous benchmarks (e.g., RBM, shallow CNN, RBM+Lanczos).
  • Frustrated square lattice (J2/J1 = 0.5), 10×10: Deep ResNets trained with MinSR achieve best-to-date variational energy E/N = −0.4976921(4). Zero-variance extrapolation estimates E_gs/N = −0.497715(9), improving upon all reported methods (including GCNN, PP+RBM, CNN+LS). Error metric ε_d is reduced by ~4× versus previous best.
  • Larger square lattice, 16×16 at J2/J1 = 0.5: Achieved E/N = −0.4967163(8), outperforming prior variational results, with ε_d lower by 2.5×10^−5 relative to the previous best.
  • Energy gaps and QSL nature: Using MinSR-trained NQS with a Lanczos step and zero-variance extrapolation, finite-size gap data were obtained and extrapolated to the thermodynamic limit: • Square lattice at J2/J1 = 0.5, S = 1, k = (π, π): quadratic fit Δ = a + b/L + c/L^2 gives Δ = 0.00(3), and Δ×L remains constant with L, indicating a gapless QSL. • Triangular lattice at J2/J1 = 0.125, S = 1, k = (4π/3, 0): linear fit Δ = a + b/L yields Δ = −0.05(6) ≈ 0 in the thermodynamic limit; Δ×L approaches a finite constant, consistent with a gapless Dirac spin liquid. Overall, MinSR-enabled deep NQS substantially improve variational energies for large, frustrated 2D systems and provide strong numerical evidence for gapless QSL phases.
Discussion

By reformulating SR through the neural tangent kernel, MinSR preserves the geometry of the variational manifold while drastically reducing computational cost, unlocking deep, large-parameter NQS training. The improved expressive power translates into state-of-the-art variational energies on challenging frustrated spin models and enables precise finite-size scaling of excitation gaps. The gap extrapolations on square and triangular lattices support gapless QSL phases, informing a long-standing debate where conflicting conclusions existed. Beyond NQS, MinSR is a general VMC optimizer and can be applied to other ansätze (e.g., tensor networks), potentially enhancing their expressivity while keeping optimization tractable. The method’s principle—efficient natural-gradient-like updates via a compressed metric—may further benefit machine learning tasks if an appropriate optimization geometry can be defined.

Conclusion

The work introduces Minimum-step Stochastic Reconfiguration (MinSR), an efficient and accurate optimization algorithm for variational Monte Carlo that scales linearly with the number of network parameters and is equivalent to SR in the relevant regime. MinSR enables training deep NQS (up to 64 layers and >10^6 parameters) and achieves record-low variational energies on frustrated J1–J2 models for 10×10 and 16×16 lattices. With enhanced accuracy, the authors provide strong numerical evidence for gapless quantum spin liquids on both square (J2/J1 = 0.5) and triangular (J2/J1 = 0.125) lattices via gap extrapolation. Future directions include: extending MinSR-enabled deep variational wavefunctions to fermionic systems (e.g., Hubbard model) and ab initio quantum chemistry; integrating MinSR with tensor networks and other ansätze; designing architectures to further enhance accuracy at lower cost; and exploring MinSR-inspired natural-gradient methods in broader machine learning domains such as reinforcement learning.

Limitations
  • The approach is variational and depends on training quality, sampling statistics, and regularization of the pseudo-inverse; residual variational bias is mitigated via Lanczos steps and zero-variance extrapolation.
  • For stronger frustration (e.g., triangular lattice), variational errors are larger and necessitate careful extrapolations; linear fits were used to avoid overfitting with limited system sizes.
  • Sign structures (e.g., Marshall sign on square, 120° on triangular) and symmetry projections are employed as physical priors; while generality is argued to be preserved (sign networks can learn these), results may still benefit from model-specific inputs.
  • The MinSR efficiency gain assumes N_s ≪ N_θ; performance may vary outside this regime.
  • Only one Lanczos step is used; additional steps could further reduce bias but increase complexity.
  • Stability relies on pseudo-inverse cutoff choices and double-precision arithmetic; tuning may be required across problems.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny