Physics
Empowering deep neural quantum states through efficient optimization
A. Chen and M. Heyl
The study of quantum many-body systems is hindered by exponential Hilbert-space growth and method-specific obstacles such as exact diagonalization’s scaling limits, the sign problem in quantum Monte Carlo, and entanglement/matrix-contraction costs in tensor networks. Frustrated two-dimensional magnets, notably the spin-1/2 Heisenberg J1–J2 model on square and triangular lattices, are central testbeds due to proposed quantum spin liquid (QSL) phases whose gap structure remains debated. Neural quantum states (NQS) have emerged as a promising alternative by representing wavefunctions with neural networks. Yet, optimization remains a bottleneck: stochastic reconfiguration (SR)—a quantum analogue of natural gradient descent—provides robust training on rugged loss landscapes but scales poorly and has impeded the use of deep, large-parameter networks. This work addresses that optimization barrier by introducing a reformulation of SR tailored to deep-learning regimes, enabling training of deep NQS at unprecedented scales and using them to probe QSL gaps in frustrated models.
Prior studies established numerical challenges for 2D frustrated magnets and evidence for QSL phases, with conflicting reports on whether such QSLs are gapped or gapless in square and triangular lattices. Foundational NQS works demonstrated neural-network wavefunctions (e.g., RBM, CNN, autoregressive and recurrent models), while subsequent efforts targeted improved optimization via iterative solvers, approximate natural gradients, Lanczos augmentation, and large-scale HPC. SR/natural gradient methods supply a geometric metric in VMC but were computationally prohibitive for deep networks. Tensor networks and Gutzwiller/paired-product ansätze provided complementary benchmarks but also face scaling issues, especially with periodic boundaries and frustration. The present work builds on SR geometry, neural tangent kernel ideas, and variational acceleration techniques (Lanczos, zero-variance extrapolation), consolidating them into an efficient, scalable optimizer for deep NQS.
Variational Monte Carlo framework: For spin-1/2 systems with basis |σ⟩ = |σ1,…,σN⟩, an NQS with parameters θ outputs Ψθ(σ) and |Ψθ⟩ = Σσ Ψθ(σ)|σ⟩. Parameters are optimized to minimize E = ⟨Ψθ|H|Ψθ⟩/⟨Ψθ|Ψθ⟩ by approximating imaginary-time evolution. SR reformulation (MinSR): The per-step objective minimizes the quantum distance between the updated variational state |Ψθ+δθ⟩ and the imaginary-time evolved state e^{-δτH}|Ψθ⟩. Using Monte Carlo samples (Ns), one obtains a linear system O δθ = ε that encodes the geometry (O collects log-derivative statistics Oα and local-energy covariances). Traditional SR solves δθ = S^{-1}ε with S = O^T O, but inverting S costs O(Ns^3). MinSR introduces the neural tangent kernel T = O O^T and computes the least-squares minimum-norm solution δθ = O^T T^{-1} ε. This solution is mathematically equivalent to SR (pseudo-inverse sense) but reduces complexity to O(Nθ Ns + Ns), yielding linear scaling in Nθ in the deep-learning regime Nθ ≫ Ns. Numerics and stabilization: T is Hermitian and diagonalized in double precision. A pseudo-inverse with soft cutoff is used: eigenvalues λi below r_pinv λmax + a_pinv (typical r_pinv = 1e−12, a_pinv = 0) are regularized; a soft threshold avoids abrupt changes during training. For non-holomorphic or real-parameter complex-output networks, real-augmented systems are formed by stacking real and imaginary parts to recover an equivalent MinSR/SR formulation. Network architectures: Two deep ResNets are used. ResNet1: each residual block has LayerNorm, ReLU, convolution; outputs are passed through a custom activation f allowing sign structures (e.g., cosh-like with negative branch), and the wavefunction amplitude is a product over sites with a rescaling factor t to prevent overflow. ResNet2: residual blocks without normalization; final activation uses sinh+1 for real wavefunctions or exp(x1 + i x2) for complex ones, with rescaling to avoid overflow; wavefunction is obtained by summing last-layer channels with momentum projection ψ_{c,i} = Σ_r e^{i q r_i} Σ_c v_{c,i}, enabling nonzero momentum sectors for excited states. Physical priors and symmetries: For square lattices, the Marshall sign rule (MSR) is applied (exact at J2/J1=0, approximate near 0.5); for triangular lattices, a 120° order sign pattern is used. Spatial point-group symmetries (C4 for square, D3 for triangular) and spin inversion are enforced by projecting the trained ψ_net: ψ_sym = (1/|G|) Σ_g ω_g^* T_g ψ_net. CNNs already embed translation symmetry. Accuracy enhancement: A single Lanczos step constructs |ψ1⟩ = −(Ĥ − E0)|ψ0⟩ orthogonal to |ψ0⟩ and minimizes E over |ψα⟩ = |ψ0⟩ + α|ψ1⟩ with closed-form expressions for optimal α via moments μ3, μ4 estimated by Monte Carlo. Zero-variance extrapolation uses the empirical linear relation (Eα − E0) ∝ σ^2 to infer the σ^2 → 0 limit, supported by approximate slope invariance across sizes/sectors. Benchmark models: Heisenberg J1–J2 model with H = J1 Σ⟨i,j⟩ Si·Sj + J2 Σ⟨⟨i,j⟩⟩ Si·Sj, focusing on J2/J1=0 (non-frustrated) and 0.5 (maximally frustrated). Excited-state gaps are computed in specified spin/momentum sectors and extrapolated versus 1/L.
- Computational efficiency: MinSR is mathematically equivalent to SR but reduces the dominant cost from cubic scaling in Ns to linear complexity in Nθ for Nθ ≫ Ns, enabling training of deep NQS up to 64 layers and ~10^6 parameters.
- Precision on small lattices: On a 6×6 square lattice, deep NQS trained with MinSR reproduce ED wavefunction amplitudes to near machine precision (TF32 in non-frustrated, BF16 in frustrated cases), outperforming shallow SR-trained networks.
- Non-frustrated 10×10 Heisenberg (J2/J1=0): Achieved variational energy per site E/N = −0.67155260(3), surpassing prior references and matching a new SSE benchmark EGS/N = −0.67155267(5). Relative errors reach ~10^−7 with increasing parameters, clearly outperforming RBM and shallow CNN baselines.
- Frustrated 10×10 J1–J2 at J2/J1=0.5: Progressive improvement with network size yields best variational energy E/N = −0.4976921(4). Zero-variance extrapolation estimates Egs/N = −0.497715(9). Compared to the previous best, the residual ε_gs is about 4× smaller, indicating substantially improved accuracy. Results exceed competing approaches, including tensor networks, GWF+Lanczos, PP+RBM, and prior CNN/RBM variants.
- Large 16×16 lattice at J2/J1=0.5: Achieved best reported variational energy E/N = −0.4967163(8), improving upon previous records by ~2.5×10^−5.
- Energy gaps and QSL nature: • Square lattice (maximally frustrated, S=1, k=(π,π)): Finite-size extrapolation with Δ = a + b/L + c/L^2 gives Δ(∞) = 0.00(3). Constant Δ×L vs 1/L further supports a gapless phase. • Triangular lattice (J2/J1=0.125, S=1, k=(4π/3,0)): Linear fit Δ = a + b/L gives Δ(∞) = −0.05(6) ≈ 0; Δ×L trends are consistent with vanishing gap and Dirac spin liquid scaling (Δ ∝ 1/L). These results provide strong numerical evidence for gapless QSLs in both models.
The work directly tackles the optimization bottleneck that prevented deep, expressive NQS from realizing their full potential. By reformulating SR into MinSR, the authors preserve the geometric advantages of natural-gradient-like updates while achieving dramatic computational savings, enabling networks with orders of magnitude more parameters and layers. This capability translates into state-of-the-art variational energies for challenging 2D frustrated systems and allows probing subtle physical questions, notably the gap structure of QSL phases. The gap extrapolations on square and triangular lattices strongly support gapless QSL behavior, aligning with several prior indications and providing improved precision over previous studies. Beyond physics, the MinSR principle—least-squares minimum-norm updates in an appropriate geometric space—suggests broader utility for variational optimization where sampling-based gradients and natural metrics are available.
This paper introduces MinSR, a minimum-step stochastic reconfiguration algorithm that is mathematically equivalent to SR but far more efficient in deep-learning regimes. MinSR enables training of deep NQS with up to 64 layers and ~10^6 parameters, achieving near machine-precision fidelity on small lattices and record-low variational energies on large, frustrated systems. The method yields strong evidence for gapless QSL phases on both square and triangular lattices via careful finite-size gap extrapolations. Future work includes applying MinSR-empowered wavefunctions to fermionic problems (e.g., Hubbard model) and ab initio quantum chemistry, integrating MinSR with other variational ansätze such as tensor networks to enhance expressivity, and exploring MinSR-like natural gradient methods in general machine learning domains (e.g., reinforcement learning).
- Numerical stabilization relies on pseudo-inverse regularization with eigenvalue cutoffs; results can depend subtly on cutoff choices and double-precision arithmetic.
- Some physical priors are imposed (e.g., Marshall sign rule on square lattices, 120° sign structure on triangular lattices), which, while generalizable via sign networks, introduce model-specific bias.
- Variational errors are larger on triangular lattices; gap extrapolations use linear fits and, in some cases, empirical assumptions (e.g., slope reuse across sizes) to mitigate uncertainties.
- Results are variational upper bounds and rely on zero-variance extrapolation, which assumes approximately consistent error states across training attempts.
- The demonstrated efficiency gains apply most strongly in the regime Nθ ≫ Ns typical of deep learning; performance for other regimes may be less pronounced.
Related Publications
Explore these studies to deepen your understanding of the subject.

