logo
ResearchBunny Logo
Searching for spin glass ground states through deep reinforcement learning

Physics

Searching for spin glass ground states through deep reinforcement learning

C. Fan, M. Shen, et al.

Discover groundbreaking insights into spin glasses with DIRAC, a deep reinforcement learning framework that enhances performance in disordered magnets and complex optimization problems. This innovative approach, developed by Changjun Fan, Mutian Shen, Zohar Nussinov, Zhong Liu, Yizhou Sun, and Yang-Yu Liu, offers remarkable scalability and accuracy, revolutionizing our understanding of low-temperature spin glass phases.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses the long-standing problem of efficiently finding ground states of Ising spin glasses, particularly the Edwards-Anderson (EA) model on D-dimensional hypercubic lattices. The Hamiltonian H = -Σ_{<ij>} J_{ij} σ_i σ_j with quenched disorder and frustrations makes global optimization difficult. Determining ground states is key to understanding glassy behavior, stiffness exponents, and ergodicity breaking, and it is closely related to numerous NP-hard combinatorial optimization problems via Ising formulations. While exact methods work only for very small systems (or special planar cases via minimum-weight perfect matching), for general large instances there is no method achieving both high accuracy and efficiency. Heuristic Monte Carlo methods (simulated annealing, population annealing, parallel tempering) are widely used but can be slow or get trapped. Recent reinforcement learning (RL) successes in combinatorial optimization and prior RL-enhanced annealing for small 2D spin glasses motivate a new approach that can scale and directly compute ground states in higher dimensions. The study proposes DIRAC, a deep RL framework to learn long-sighted spin-flip policies and leverage gauge transformations to improve search.
Literature Review
Prior work includes exact branch-and-bound approaches effective only for very small systems; exact polynomial-time solutions for certain 2D planar cases via mapping to minimum-weight perfect matching. Heuristic thermal methods such as simulated annealing (SA), population annealing, and parallel tempering (PT) are standards in statistical physics for spin glasses. RL has been applied to various combinatorial problems (minimum vertex cover, independent set, network dismantling, TSP, vehicle routing) and, for spin systems, to optimize temperature schedules in SA for small 2D cases. However, RL-enhanced SA failed to find ground states in large 3D+ systems. The paper builds on advances in graph neural networks for representation learning, noting traditional node-centric GNNs may underperform when edge weights are crucial. It also leverages gauge transformation (GT) from Ising physics to connect different spin configurations while preserving energy, a tool not exploited in previous optimization algorithms.
Methodology
DIRAC formulates ground-state search as a Markov decision process. State s encodes the current spin configuration {σ_i} and couplings {J_{ij}}; an action a flips spin i; the reward r(s,a,s') equals the energy change due to flipping i: r = 2 Σ_j J_{ij} σ_i σ_j over nearest neighbors j. The objective is to learn a policy π_θ(a|s) maximizing expected cumulative rewards over a finite-horizon episode where each spin is flipped once (from all-up to all-down), and the best encountered configuration (lowest energy) is returned. Encoding: The Spin Glass Neural Network (SGNN) encoder represents states and actions on the lattice as a graph. Each layer performs (i) edge-centric updates (edge embeddings updated from adjacent nodes) and (ii) node-centric updates (node embeddings updated from adjacent edges), followed by non-linearities (ReLU). Repeating K message-passing iterations (K=5 in experiments) yields node embeddings (action embeddings) and a graph-level state embedding (sum over node embeddings). This edge+node updating is crucial to capture coupling strengths; purely node-centric GNNs underperform. Decoding: A multilayer perceptron (MLP) takes the concatenated state and action embeddings [z_s, z_i] and outputs a scalar Q(s,a;Θ), predicting long-term cumulative reward for flipping spin i. Training: Offline, DIRAC self-trains on randomly generated small EA instances with Gaussian couplings. For each episode on an instance of size N, the agent starts from all spins up, flips each spin once (greedy by Q during data collection), and ends at all spins down. Transitions (s_t, a_t, r_t, s_{t+n}) are stored in an experience replay buffer. The n-step Q-learning loss is minimized by mini-batch gradient descent with a target network updated periodically. Training runs for Θ = 10^6 episodes; the best model is selected by validation approximation ratio (e_DIRAC / e_Greedy). Online application: DIRAC^1 starts from all-up and greedily flips a small top fraction (e.g., 1%) of spins with highest Q-values at each step until all spins are flipped once; the lowest-energy configuration along the trajectory is returned. To overcome limitations of a single fixed initialization and one-pass greediness, gauge transformation (GT) is used to construct DIRAC^m: iteratively run DIRAC^1 for m rounds, each time GT-transforming the lowest-energy configuration from the previous round to all-up before reapplying DIRAC^1, then transforming back; stop when energy no longer decreases. Gauge randomization executes multiple runs with different initial configurations via random GTs and keeps the best. DIRAC can also plug into annealing methods by replacing or mixing the standard energy-based Metropolis-Hastings accept/reject (EMH) with a DIRAC-based Metropolis-Hastings (DMH) procedure: at each iteration, choose between EMH and a DIRAC^1+GT move; at local minima, apply temperature-dependent random perturbations. This yields DIRAC-SA and DIRAC-PT variants. Complexity and runtime considerations include batch node selection to speed application with minimal accuracy loss and CPU-based reporting for fairness (GPU used only for training).
Key Findings
- Ground-state finding probability (small systems with exact verification): DIRAC variants achieve higher P0 with far fewer initial configurations n_initial than Greedy, SA, and PT. Example: for D=2, L=10, PT requires n_initial ≈ 322,800 to reach P0=1.0, whereas DIRAC^m needs only 600; in some instances, a single gauge randomization suffices. - Scaling with system size: DIRAC’s advantage in required n_initial persists across sizes in 2D, 3D, and 4D. - Energy minimization on larger systems (where exact ground states are infeasible to certify): DIRAC-SA attains the lowest disorder-averaged energy per spin e0 across 2D, 3D, and 4D test cases. For D=3, L=10, literature PT reported e0 = −1.6981 using n_initial = 3.2×10^7; DIRAC-SA achieved e0 = −1.6906 using n_initial = 2×10^4 (fewer than one-thousandth the initial configurations used in the literature). Across nine systems, DIRAC-SA improves over SA by an average 0.79% in energy and DIRAC-PT improves over PT by an average 2.01%; improvements are statistically significant (p < 1e−4 in most cases, and < 0.05 in all cases, Wilcoxon signed-rank test). - Long-sighted behavior: Step-by-step comparisons show DIRAC^1 often accepts short-term energy increases early to achieve lower long-term energy, unlike the short-sighted energy-drop Greedy strategy. - Extreme-case demonstration (anti-ferromagnetic Ising on 20×20 with periodic boundary conditions): DIRAC reaches the exact checkerboard ground state in 200 steps (flipping half the spins) without errors; Greedy gets stuck after 191 steps at a high-energy local minimum; SA and PT eventually reach the ground state but require 21,333 and 10,808 steps respectively. - Efficiency: DIRAC’s application is computationally efficient and scalable. Example: After ~2.5 h training on 3D, DIRAC^TM takes on average 417 s to solve a random D=3, L=20 instance with n_initial=10, while PT requires ~3 h (n_initial=5,360) to reach comparable energy; total DIRAC time (training+application) ~2.62 h is still lower than PT. CPU-only testing times are reported; GPU usage is reserved for training. Batch flipping (e.g., top 1% Q) accelerates runtime with little accuracy loss.
Discussion
The findings demonstrate that a deep RL approach with appropriate graph-based representations and Q-learning can learn long-sighted spin-flip policies that outperform traditional energy-based heuristics in both ground-state discovery and energy minimization. Leveraging gauge transformations allows DIRAC to revisit earlier decisions across iterations, diversify initializations without changing physics, and systematically reduce frustration, leading to improved solutions. DIRAC’s plug-in design further enhances established Monte Carlo methods (SA, PT), indicating that RL-guided moves can complement exploration–exploitation dynamics in thermal annealing, especially at low temperatures where purely energy-based criteria behave greedily. Broadly, the results suggest a practical route to tackle large, frustrated systems and to probe low-temperature spin-glass phases more effectively, with implications for related NP-hard optimization problems that admit Ising formulations.
Conclusion
The study introduces DIRAC, a deep reinforcement learning framework for computing ground states of Edwards-Anderson spin glasses. Trained solely on small random instances, DIRAC generalizes to much larger systems and outperforms state-of-the-art heuristics in accuracy and runtime. The incorporation of gauge transformation is key, enabling iterative refinement (DIRAC^m), gauge randomization, and effective integration with annealing-based methods (DIRAC-SA, DIRAC-PT). Beyond spin glasses, the approach connects physics and AI, offering a general strategy for NP-hard problems with Ising formulations and pointing to extensions to higher-order (k-body) interactions via hypergraph encodings. Future work includes designing stronger encoders via advanced graph representation learning, improving RL training efficiency and hyperparameter tuning, extending to general Ising models with k-body terms, and further exploiting gauge-based strategies.
Limitations
- DIRAC requires offline training (per dimension) before application; while amortized over many instances, this training cost is absent in traditional heuristics. - The base one-pass strategy (DIRAC^1) cannot revisit earlier decisions and relies on a fixed initialization; iterative DIRAC^m with gauge transformations mitigates but introduces additional rounds and hyperparameters (m). - Reported performance comparisons for SA/PT depend on chosen annealing schedules and the number of initial configurations; different settings can shift baselines (though DIRAC can still plug in to improve them). - For very large systems, exact ground states are unknown; evaluations rely on disorder-averaged best-found energies, which may not reflect absolute optimality. - Extension to general Ising formulations with k-body interactions is nontrivial and requires hypergraph-based modeling; current encoder is tailored to pairwise EA interactions. - Some architectural hyperparameters (e.g., message-passing steps K and embedding dimension d) were tuned informally; more systematic searches may yield further gains.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny