Computer Science
Solving Boltzmann optimization problems with deep learning
F. Knoll, J. Daly, et al.
The paper addresses the challenge of designing and optimizing Ising-based, non–von Neumann logic circuits that operate near thermodynamic energy limits but are fundamentally nondeterministic. Traditional CMOS scaling is reaching physical limits, motivating alternative computing paradigms like Ising machines. In Ising systems, spins take values in {−1, +1} and interact via a Hamiltonian; system behavior is governed by a Boltzmann (Gibbs) distribution. The reverse Ising problem considered here seeks Hamiltonian parameters (local fields and pairwise couplings) such that specified input (fixed spins) lead to correct output (variable spins) with high probability. The objective is reframed from energy minimization to maximizing the Boltzmann probability of desired states across input patterns while suppressing probabilities of incorrect outputs. A key obstacle is the computational intractability of evaluating partition functions and handling nonconvex, nondifferentiable objectives with exponentially many states, which impedes gradient-based optimization and limits scalability. The authors propose a machine learning approach that learns to predict the optimal Boltzmann objective efficiently, enabling exploration of larger Ising circuits.
The work situates itself at the intersection of beyond-CMOS computing and statistical mechanics models for computation. Prior studies document the end of Moore’s law scaling and the need for non–von Neumann architectures, with Ising-based hardware explored for energy-efficient computation and probabilistic processing-in-memory. The Potts model (generalizing Ising for q≥2) has broad applications in physics and biology; surveys cover Gibbs measures and inverse problems. Computing Boltzmann probabilities and partition functions is generally intractable, with the classic Ising ground state problem being NP-complete. Numerous approximation methods exist (e.g., MCMC-based, symmetric-function approximations, quantum approximations of partition functions). For model design, the literature includes inverse Ising approaches and recent practical Ising circuit designs with minimal auxiliary spins. On the ML side, random forests are well-established for regression/classification, and the DJINN framework provides a method to initialize deep neural networks from decision trees, improving training efficiency and capacity to model nonlinear functions. This paper builds on these strands by casting a Boltzmann probability optimization as a supervised learning task and leveraging RF and DJINN-initialized DNNs to predict optimal probabilities for design exploration.
Problem formulation: The system has N spins partitioned into fixed inputs (n), variable outputs (m), and auxiliary spins (a), with N=n+m+a. The Hamiltonian is linear in spins with local fields and pairwise couplings. The reverse Ising problem seeks coefficients so that, for specified desired states s=(u,v,a) over multiple inputs u, the desired outputs v have lower energy (and thus higher Boltzmann probability) than any incorrect outputs t≠v. Because auxiliary spins can vary per desired state, constraints become nonlinear and non-convex, and the number of inequality constraints and auxiliary configurations grows rapidly (double exponential in a and the number of desired states). Objective and transformation: The target is not just energy ordering but maximizing the probability mass on desired outputs. The authors define an objective p(a(1),…,a(l)) that depends on auxiliary arrays and the Hamiltonian coefficients, aiming to minimize the maximum probability of undesired states across all desired inputs. Direct optimization is numerically unstable and nondifferentiable. To render the problem tractable for gradient-based solvers, they: (1) use a log transform on probabilities (minimizing the maximum log-probability of undesired states), which amplifies informative differences when probabilities are near one; (2) replace the nondifferentiable max with a smooth log-sum-exp (softmax) approximation parameterized by a large scale factor; and (3) apply standard log-sum-exp and vectorization techniques for numerical stability. This yields a continuously differentiable surrogate objective f(y), and the final probability estimate is computed via p(a)=1−exp(min_y f(y)). They fix temperature (βT=1) throughout. Training data generation: For each auxiliary array a, they run a Sequential Least Squares Quadratic Programming (SLSQP) solver (SciPy) to minimize the smooth objective f(y) over the Hamiltonian coefficients y (local fields h and couplings J) within empirically chosen dynamic ranges to provide broad probability coverage. Targets p(a) are then computed using the transformed expression. Four problem instances corresponding to small logic-multiplier circuits are studied: (1) N=9,n=4,a=1; (2) N=11,n=5,a=1; (3) N=14,n=6,a=2; (4) N=15,n=6,a=3. For Problems 1–2, the auxiliary space is exhausted; for Problems 3–4, 10,000 samples are drawn. To accelerate SLSQP for high dimensions, an explicit gradient implementation (NumPy) is provided; the default numerical gradient is also benchmarked. Models: Supervised regression maps auxiliary arrays a∈{±1}^k to p(a)∈[0,1]. Two model families are trained:
- Random forest regression: Ensembles of 100 decision trees with problem-specific maximum depths (up to 27), using the discrete auxiliary spin entries as split features. RFs provide strong baseline accuracy but are piecewise-constant predictors.
- Deep neural networks via DJINN: RFs with only 3 trees and max depth 10 are used to initialize DNN architectures that then train to capture nonlinearities beyond the RF. An example DNN for Problem 1 grows to layers of sizes [16, 18, 22, 30, 46, 78, 142, 270, 526, 1038, 2062, 4101, 8122, 15504, 24273, 27897, 1], yet overall parameter counts remain below those implied by the full 100-tree RFs. Evaluation: Mean squared error (MSE) on held-out test sets is reported for both RF and DJINN across all four problems. Performance benchmarks compare wall-clock times to compute the min–max Boltzmann probability for ensembles of 100 auxiliary arrays using SLSQP (with approximate and explicit gradients), RF, and DJINN (GPU).
- Accuracy (MSE on test sets): • Random forest regressor: Problem 1: 0.0001018; Problem 2: 0.000473; Problem 3: 0.019715; Problem 4: 0.021173. • DJINN DNN: Problem 1: 0.000262; Problem 2: 0.000910; Problem 3: 0.0206542; Problem 4: 0.0203343. RFs and DJINN achieve comparable accuracy, with MSE ≲ 0.02–0.02 for larger problems and ≲ 10^-3 for smaller ones.
- Performance (Problem 4, 100 evaluations): • SLSQP (approximate gradient): 4 days, 5:01:24.4 total; ~60.6 minutes per value. • SLSQP (explicit gradient): 5:08:53.7 total; ~3.09 minutes per value. • Random forest regressor: 31.987 ms after 258 s training; ~320 ms per value. • DJINN DNN: 28.5 ms after 251 s training; ~285 ms per value. These results show orders-of-magnitude speedups for ML models over direct optimization while maintaining low prediction error. The explicit gradient significantly accelerates SLSQP versus numerical gradients but remains much slower than ML inference. The ML approach enables rapid, scalable evaluation of Ising design parameters for larger circuits.
By transforming the Boltzmann probability optimization into a smooth, differentiable surrogate and learning a mapping from auxiliary spins to the optimized probability, the authors effectively bypass the intractable partition function computation and nondifferentiability that hinder traditional methods. The trained RF and DJINN models provide accurate predictions of optimal Boltzmann probabilities, enabling fast exploration of the design parameter space. This directly addresses the research objective of optimizing Ising system parameters so that desired outputs occur with high probability on nondeterministic hardware. The substantial runtime reductions open the door to studying larger spin configurations and more complex Ising circuits than are feasible with solver-based approaches, supporting the design of better ground-state solutions for the reverse Ising problem and, ultimately, more energy-efficient non–von Neumann computing architectures.
The paper introduces a novel framework that (1) recasts a Boltzmann probability optimization for reverse Ising design into a supervised learning problem via stable transformations, (2) generates high-quality training data using SLSQP with explicit gradients, and (3) trains random forest and DJINN-based deep neural networks to predict optimized probabilities accurately and efficiently. Experiments on four Ising multiplier configurations demonstrate low MSE and dramatic speedups over state-of-the-art solvers. This capability enables exploration of larger and more complex Ising circuits and suggests that deep learning surrogates can guide the design of Ising-based computing hardware with fewer spins and reduced error. Future work could expand to larger N, broader circuit topologies, alternative temperature regimes, and integration with end-to-end hardware-in-the-loop optimization.
- Scope: Empirical evaluation covers four specific multiplier-like Ising circuit configurations and fixed temperature (βT=1); generalization to other architectures, temperatures, or noise models is not demonstrated within the paper.
- Data generation cost: Training relies on solver-generated labels; although the explicit gradient accelerates SLSQP, generating large datasets for bigger systems may remain expensive.
- Objective approximation: The optimization uses log and softmax (log-sum-exp) approximations to handle nondifferentiability and numerical stability, introducing surrogate bias relative to the exact min–max Boltzmann objective.
- Model dependence: Learned surrogates are trained on specified dynamic ranges of Hamiltonian coefficients; extrapolation performance outside these ranges is not evaluated.
- Accuracy-performance tradeoff: While MSE is low (≈0.02 for larger problems), predictions are approximations and may require verification for safety-critical designs.
Related Publications
Explore these studies to deepen your understanding of the subject.

