logo
Loading...
Noise-injected analog Ising machines enable ultrafast statistical sampling and machine learning

Engineering and Technology

Noise-injected analog Ising machines enable ultrafast statistical sampling and machine learning

F. Böhm, D. Alonso-urquijo, et al.

Discover how groundbreaking research by Fabian Böhm, Diego Alonso-Urquijo, Guy Verschaffelt, and Guy Van der Sande is revolutionizing neural network training with ultrafast statistical sampling using analog Ising machines. By injecting noise, they achieve impressive accuracy in Boltzmann distribution sampling, significantly outpacing traditional software methods.... show more
Introduction

The study addresses a central bottleneck in applying analog Ising machines to machine learning: efficient Boltzmann sampling at controllable temperatures. While analog Ising machines naturally minimize the Ising Hamiltonian at very low effective temperatures and thus excel at combinatorial optimization, training stochastic generative neural networks (e.g., Boltzmann machines) requires accurate sampling from Boltzmann distributions at arbitrary temperatures. Existing approaches based on trapping in local minima or complex temperature control are inaccurate, cumbersome, and slow compared to software-based MCMC. The authors propose injecting broadband noise of controlled variance into analog Ising machines to emulate thermal equilibrium at a tunable temperature, enabling continuous, fast, and accurate Boltzmann sampling suitable for machine learning and other applications.

Literature Review

The paper situates analog Ising machines within broader efforts to find efficient alternatives to von Neumann computing for neural network training and optimization, referencing quantum annealers and hybrid analog-digital approaches. Prior analog Ising machines (opto-electronic, optical parametric oscillators, electronic oscillators) have solved large optimization problems but lacked efficient sampling. Previous sampling strategies relied on discontinuous operation (initialization, convergence, sampling) and exploiting local minima, which suffer from inaccuracy, complex temperature estimation/control, and substantial overhead. The literature also notes that contrastive divergence is a common approximate training method for RBMs but can introduce biases compared to full Boltzmann sampling. This work builds on and generalizes the role of noise in analog systems, proposing controlled noise injection as a universal, accurate, and high-speed sampling mechanism.

Methodology

Experimental platform: a time-multiplexed opto-electronic Ising machine combining an analog nonlinear optical system (DFB laser at 1.55 μm, lithium niobate Mach-Zehnder modulator with 13 GHz bandwidth, 150 MHz photodiode) with an FPGA for coupling. Spins are represented by analog amplitudes x_m exhibiting bistability; binary spins are σ_m = sign(x_m). Time-multiplexing generates N spins sequentially; the FPGA demultiplexes, performs matrix–vector multiplication (implementing J), adds biases b_m, and creates the feedback signal f_{m[k]} = α x_{m[k]} + β(∑n J{mn} x_{n[k]} + x_0 + b_m), rescaled by the saturation amplitude x_sat = 0.7 V. Gaussian white noise of standard deviation δ is injected via an analog noise source (AWG + 300 MHz amplifier) digitized by ADC and summed on the FPGA; comparable results were verified with an FPGA pseudo-random source. Typical operating parameters for demonstrations include α ∈ [0.5,1.2], β ∈ [0.1,0.75], δ up to ~1 V. Sampling tasks: (1) Boltzmann sampling on a 2D antiferromagnetic (J=-1) square lattice with N=100; compare energy distributions from continuous noise-induced sampling (sample each iteration; 5000 samples) to software Metropolis-Hastings (MCMC). (2) Discontinuous sampling baseline: 500 independent runs, 100 iterations each, high gain α=1.5 with small noise δ=0.1 V, sample final states. Accuracy quantified by Kullback–Leibler divergence D_KL. (3) Temperature calibration using a 4-spin antiferromagnetic ring with three degenerate energy levels; run 1000 iterations at each δ; compare measured level occupations to analytical Boltzmann probabilities to establish T ∝ δ^2 and extract the slope vs coupling β. Neural network experiments: Map RBMs to an Ising Hamiltonian (visible and hidden spins; couplings J ↔ w; biases mapped to Ising biases). Single-neuron activation probabilities: measure σ vs bias at multiple noise levels, fit logistic activation at different T. Small RBM (16 hidden, 16 visible): measure neuron activation probabilities across a random weight/bias network and compare to MCMC (Metropolis-Hastings). Unsupervised training task: RBM with 100 hidden and 64 visible neurons trained on 8×8 grayscale handwritten digits (digits dataset, augmented by shifts to 7188 training and 1797 test samples). Training uses minibatches of 100 images, learning rate ε=0.2; activation probabilities approximated by 1000 continuous samples per iteration at fixed α=0.5, β=0.75, δ=0.6 V; a logistic regression layer is trained on RBM features after each unsupervised step; evaluate pseudolikelihood C and classification accuracy η over iterations; compare to MCMC-based training (Metropolis-Hastings with 20,000 iterations per step) and contrastive divergence. Scalability studies (simulation): simulate spatially multiplexed, time-continuous analog Ising machines to assess scaling to N up to 8192 on sparse random antiferromagnetic graphs (each node degree 8). Use a clipped nonlinearity model approximating the experimental system; fixed α=0.3, β=0.2 (unless varied) with δ tuned to match T=2. Reference distributions obtained by MCMC with 20 million steps, sampling every 100 steps. Evaluate D_KL vs N and temperature dependence; also compare repeated-run variability for MCMC. Sampling rate estimation for analog systems with bandwidth B: compute autocorrelation time τ_cor of energy E(t) (C(τ) decay to 1/e) to infer independent sample rate 1/τ_cor for B ∈ {100 MHz, 1 GHz, 10 GHz}. Compare iteration counts Z and CPU runtime between MCMC and Ising machine forward-Euler simulations for independent samples. Generality across gain–dissipative systems: simulate three nonlinearities—clipped (opto-electronic model), polynomial, and sigmoid—on a 2D Ising model (N=100) across temperatures, and for RBM training; integration step widths h=0.1 (sampling) or h=1 (RBM training); parameters α≈0.8–0.9, β≈0.1, δ tuned per model.

Key Findings
  • Noise-induced continuous sampling accurately approximates Boltzmann energy distributions. For N=100 antiferromagnetic 2D lattice at T=2, the Ising machine distribution closely matches MCMC with D_KL = 0.04, whereas discontinuous sampling yields D_KL = 0.16.
  • Temperature control: For a 4-spin antiferromagnetic ring, measured occupation probabilities across noise variances match analytical Boltzmann distributions assuming a linear T ∝ δ^2 relationship; D_KL < 0.01 across the full temperature range. The slope of T vs δ^2 depends linearly on coupling strength β and enables a priori temperature setting by noise power.
  • Neuron activations: Single-spin activation probabilities follow logistic functions with temperature controlled by δ; multi-neuron RBM activations (16 hidden, 16 visible) at T=1 agree well with MCMC.
  • RBM training on digits (100 hidden, 64 visible): Pseudolikelihood and accuracy curves for Ising machine-based sampling track MCMC-based training. Reported maxima: C_IM = 17.6, C_MCMC = 19.9; classification accuracies η_IM = 0.943, η_MCMC = 0.938; contrastive divergence achieves η_CD = 0.948 and L_CD = 19.9. The RBM plus logistic regression improves accuracy over a logistic regression baseline (0.77 to 0.943).
  • Scalability and accuracy: Across sparse random graphs with N up to 8192 at T=2, average D_KL remains well below 0.1 for N ≥ 128 (e.g., D_KL,64 ≈ 0.09, D_KL,8192 ≈ 0.02). Some small-N instances show elevated D_KL due to analog mapping errors affecting ground-state occupancy. Temperature sweeps show a robust linear mapping between δ^2 and T for both N=64 and N=8192. At low T, accuracy deteriorates due to trapping in local minima; at higher T, D_KL can reach 0.01. For N=8192, Ising machine sampling shows lower D_KL than MCMC for 1 ≤ T ≤ 3.
  • Speed estimates (simulation): Independent sample rates scale linearly with analog bandwidth and are largely independent of N due to parallel noise injection and efficient mixing. For N=8192 at T=2: 1/τ_cor ≈ 4.9 GS/s (10 GHz), 667 MS/s (1 GHz), 19 MS/s (100 MHz). MCMC requires ~30,000 steps for independent samples at N=8192; even at 10 ps/step on parallel FPGAs, this is ~6× slower than a 100 MHz analog Ising machine, and ~1000× slower than a 10 GHz machine.
  • Software simulations of analog Ising machines require far fewer iterations than MCMC; for large N, CPU runtimes are up to 300× shorter than MCMC for generating independent samples.
  • Generality: Polynomial, clipped, and sigmoid gain–dissipative models reproduce MCMC energy–temperature curves (including at T_crit ≈ 2.27 in 2D Ising). Linear T–δ^2 holds across models. RBM training performance is similar across models, with maximum accuracies ~0.938–0.953.
Discussion

Injecting controlled broadband noise into analog Ising machines enables continuous Boltzmann sampling with accurate control of the effective temperature via a simple, empirically linear T–δ^2 relation. This approach overcomes key drawbacks of discontinuous sampling (initialization/equilibration overhead and difficult temperature control), allowing samples to be drawn at rates near the analog bandwidth and with accuracy comparable to MCMC. The method generalizes across different analog implementations (opto-electronic, electronic, optical parametric oscillators) modeled by distinct nonlinearities, underscoring universality. In machine learning, replacing software-based MCMC with noise-induced analog sampling closes an efficiency gap for training Boltzmann machines, as shown experimentally on an RBM for digit recognition, where accuracy and pseudolikelihood match or are comparable to digital baselines. Simulation studies suggest sampling accuracy scales favorably to thousands of spins and that analog machines can achieve orders-of-magnitude faster independent-sample generation than MCMC, with accuracy competitive with or surpassing MCMC at moderate temperatures for large problems. Beyond ML, the capability for ultrafast Boltzmann sampling positions analog Ising machines as powerful samplers for domains like finance and drug discovery and as accelerators or augmenters for other metaheuristics (e.g., parallel tempering).

Conclusion

The work introduces noise-induced sampling as a universal method to enable ultrafast, accurate Boltzmann sampling on analog Ising machines by injecting controlled analog noise. Experimentally, a time-multiplexed opto-electronic Ising machine achieves accurate energy distributions and effectively trains RBMs with performance comparable to software-based approaches. Simulations on spatially multiplexed systems demonstrate scalability to thousands of spins, a simple linear mapping between noise power and temperature, and sampling rates in the GS/s range limited primarily by analog bandwidth, offering orders-of-magnitude speedups over MCMC. The approach generalizes across different gain–dissipative nonlinearities and supports practical ML training and other sampling-intensive applications. Future work could explore higher-bandwidth fully analog realizations, integration with large-scale optical/electronic platforms, improved low-temperature sampling strategies to mitigate trapping, and application to broader probabilistic models and scientific domains.

Limitations
  • Accuracy degrades at low temperatures due to trapping in local energy minima, similar to MCMC, leading to variability across runs and elevated D_KL.
  • For small problem sizes, analog mapping imperfections (e.g., inhomogeneous spin amplitudes and clipping) can reduce ground-state occupancy and sampling accuracy.
  • The time-multiplexed experimental system has internal voltage and bandwidth constraints (e.g., clipping at ±0.7 V, limited photodiode bandwidth), which can affect dynamics and speed compared to fully analog, spatially multiplexed systems.
  • Temperature control requires calibration of the linear T–δ^2 slope for each system and parameter set (e.g., dependence on β), though it remains simple compared to discontinuous methods.
  • Reported speedups for analog hardware are based on simulations and bandwidth assumptions; realizing GS/s sampling rates depends on implementing high-bandwidth, low-latency analog components and scalable coupling hardware.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 22+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny