logo
ResearchBunny Logo
Neural sampling machine with stochastic synapse allows brain-like learning and inference

Engineering and Technology

Neural sampling machine with stochastic synapse allows brain-like learning and inference

S. Dutta, G. Detorakis, et al.

Discover the groundbreaking Neural Sampling Machine (NSM) designed to leverage stochastic synaptic connections for approximate Bayesian inference, achieving an impressive 98.25% accuracy on MNIST image classification. This innovative hardware is the result of collaborative research by Sourav Dutta, Georgios Detorakis, Abhishek Khanna, Benjamin Grisafe, Emre Neftci, and Suman Datta.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses how to realize brain-inspired probabilistic neural networks that can learn continually from noisy data and perform real-time inference with calibrated confidence on compact, low-power hardware. It proposes a hardware implementation of Neural Sampling Machines (NSMs), which leverage multiplicative synaptic stochasticity to enable approximate Bayesian inference, regularization, and online learning. The central hypothesis is that a hybrid synapse comprising an FeFET-based analog weight cell in series with a stochastic selector can physically realize the Bernoulli (“blank-out”) multiplicative noise required by NSMs. The work situates itself within neuromorphic compute-in-memory (CIM) crossbar architectures and argues the importance of always-on synaptic stochasticity for autonomous weight normalization, mitigation of internal covariate shift, and Bayesian inferencing—all desirable for efficient, continual learning systems.
Literature Review
The paper builds on several strands of prior work: (1) emerging devices and materials for neuromorphic computing, including analog multi-bit synapses and bio-inspired neuron circuits; (2) biological variability and unreliable synaptic transmission (release probabilities as low as ~10–50%), which inspire multiplicative synaptic noise models; (3) regularization techniques such as Dropout and DropConnect that introduce stochasticity during training, contrasted with NSM’s always-on stochasticity enabling Bayesian inference; (4) approximate Bayesian inference via Monte Carlo sampling and its relation to energy-efficient computation and communication; (5) compute-in-memory crossbar arrays using eNVMs (e.g., FeFETs, PCM, RRAM) to reduce data movement; and (6) device-level stochastic selectors (e.g., Ag/HfO2, OTS, MIEC, IMT oxides like VO2, NbOx) that can provide threshold switching behavior. The work extends inherent weight normalization concepts reported in stochastic neural networks and relates them to weight/batch normalization techniques in deep learning, highlighting their online, continual-learning advantages.
Methodology
Theory: NSM uses binary threshold neurons z_i = sgn(u_i) with pre-activation u_i = Σ_j (γ_i + α_i) w_ij z_j + b_i and multiplicative Bernoulli synaptic noise ξ_ij ~ Bernoulli(p) applied to weights. For Bernoulli noise, the firing probability becomes P(z_i=1|z) = 1/2 [1 + erf((p + α_i) Σ_j w_ij z_j / √(2 p (1 - p) Σ_j w_ij^2))] = 1/2 [1 + erf(β ||w_i||)] = 1/2 [1 + erf(v_i · z)], where β = (p + α_i)/√(2 p (1 - p)) and v_i = β w_i / ||w_i||. This induces inherent weight normalization (decoupling magnitude and direction), akin to weight/batch normalization, and yields gradients with respect to β and w that stabilize training and mitigate internal covariate shift. Hardware architecture: Implement NSM in a CIM crossbar where each synapse is a series pair of (1) FeFET analog weight cell (conductance G encodes w) and (2) two-terminal stochastic selector providing Bernoulli-like ON/OFF sampling. The crossbar performs row-wise writes and column-wise current summation (I_out = G·V). During reads/inference, an input voltage Vin is applied such that it lies within the selector’s threshold variability window so that, stochastically, the selector turns ON (ξ=1) or remains OFF (ξ=0), effectively blanking out a subset of weights per forward pass. FeFET analog weight cell: A 500 nm × 500 nm FeFET (28 nm HKMG technology) is used. Write: apply ±V_write pulses to the gate (WL), with BL and SL at 0 V; read: apply V_read=1 V at WL, Vin at BL, SL grounded; I_out = G·Vin. For proof-of-concept, an amplitude modulation scheme is used: write pulses increasing from 2.8 V to 4.0 V, 1 μs width, to traverse between low-resistance state (LRS) and high-resistance state (HRS) with gradual potentiation/depression. Conductance updates for potentiation and depression are fitted with ΔG = α + β(1 − e^{−(V_in − V0)/γ}) using parameters extracted across 10 devices to capture device-to-device and cycle-to-cycle variability. Stochastic selector (Ag/HfO2): Fabricated stack Ag/TiN/HfO2/Pt with 3 nm TiN and 4 nm HfO2. Mechanism: under bias, Ag+ migration through HfO2 forms a filament (ON); filament ruptures upon field reduction (OFF). DC IV shows abrupt threshold switching with substantial cycle-to-cycle variation in threshold V_T (or V_r). Long-pulse characterization (10 ms rise/fall, 10 ms width) quantifies V_T distribution; prior works show ~28 ns switching possible with higher trigger voltage. Measurements include 2000 cycles per device and 17-device statistics to assess variability. Stochastic readout is achieved by choosing Vin within the V_T variation window, enabling Bernoulli-like sampling of the FeFET conductance (examples shown for LRS and HRS). Selector stochastic modeling: The selector threshold dynamics V_r are modeled as an Ornstein–Uhlenbeck (OU) process: dV_r = θ(μ − V_r) dt + σ dW. Parameters (μ, θ, σ) are calibrated from experimental V_T data of 17 devices via linear regression mapping of the discretized OU (Euler–Maruyama) solution. The model reproduces cycle-to-cycle variance, overall distribution, and autocorrelation of V_r. Network and training: Architecture is 784 (input) – 300 – 300 – 300 (hidden) – 10 (softmax). Three cases are compared: (1) deterministic MLP, (2) theoretical NSM with ideal weights and Bernoulli synapses, (3) simulated hardware-NSM with FeFET weight model and OU-driven stochastic selectors. Training uses backpropagation with cross-entropy and an adapted Adam optimizer (lr=0.0003, betas 0.9/0.999), batch size 100, 200 epochs (full 60,000 MNIST train samples per epoch), linear lr decay after epoch 100: 0.0003 × min{2 − x/100}. Every two epochs, test accuracy is evaluated on 10,000 MNIST test samples using an ensemble of 100 stochastic forward passes. During training, backward updates use the derivative of the NSM activation (Eq. 4) and the FeFET conductance update model. During inference, always-on stochasticity is preserved; per iteration, each selector’s V_r is sampled from the OU process and a Boolean mask is formed (e.g., comparing to mean threshold) to realize blank-out synapses. Bayesian inference tests: To assess uncertainty estimation, digits from MNIST are rotated (e.g., up to 60°–90°) and 100 stochastic forward passes are performed per image, recording softmax inputs/outputs and prediction entropy H = −Σ p log p. Device fabrication and calibration details are provided: Ag/HfO2 selector process flow (ALD HfO2 4 nm at 120 °C, TiN 3 nm, Ag top electrode 150 nm), and OU parameter estimation via least-squares on discretized dynamics.
Key Findings
- Hybrid stochastic synapse: Demonstrated an in silico series combination of FeFET analog weight and Ag/HfO2 stochastic selector that realizes Bernoulli “blank-out” multiplicative synaptic noise. Measured switching probabilities of the selector match a Bernoulli distribution. - Selector stochasticity: Large variability windows in threshold voltage V_T observed over 2000 cycles and across 17 devices; OU-process model accurately reproduces V_T distributions and autocorrelation. - FeFET analog synapse: Achieved gradual, multilevel conductance updates using 2.8–4.0 V, 1 μs pulses; modeled potentiation/depression with a closed-form fit capturing device variability. - Image classification: Simulated hardware-NSM attains high MNIST accuracy (reported 98.25%), comparable to a regularized deterministic MLP; the theoretical NSM model slightly outperforms the MLP. - Self-normalization and stability: NSM exhibits inherent weight normalization, yielding narrower weight distributions and more stable activations (reduced internal covariate shift) compared to an unregularized MLP; comparable stability to MLP with explicit regularization. - Bayesian inference and uncertainty: With always-on stochastic synapses, the NSM performs Monte Carlo-style inference. For rotated digits, prediction entropy is near zero on correct classifications and increases substantially on misclassifications, reflecting calibrated uncertainty—contrasting with conventional MLPs that show no uncertainty (entropy ~0). - System integration: The hybrid approach is compatible with CIM crossbars, enabling selective stochastic reads at each crosspoint (I_out = G·Vin with Bernoulli masking via selector ON/OFF).
Discussion
The results validate that device-level stochasticity can be harnessed to instantiate NSMs that perform probabilistic inference and continual learning behaviors on hardware. The selector’s Bernoulli-like threshold variability enforces always-on multiplicative noise, which theoretically induces weight normalization and mitigates internal covariate shift. Empirically, the simulated hardware-NSM maintains tighter weight and activation distributions and achieves competitive MNIST accuracy versus deterministic MLPs while adding the capability to quantify predictive uncertainty through ensemble sampling at inference. The OU-based selector model bridges device physics and network-level behavior, enabling scalable simulation and design. Overall, the work demonstrates a practical path to probabilistic neuromorphic hardware leveraging CIM crossbars and hybrid synapses, with potential gains in energy efficiency, robustness to noise, and online learning capability.
Conclusion
This work introduces a hardware-compatible Neural Sampling Machine leveraging hybrid stochastic synapses formed by FeFET analog weight cells and Ag/HfO2 threshold selectors. The approach realizes Bernoulli multiplicative synaptic noise that confers inherent weight normalization, reduces internal covariate shift, and enables Bayesian inferencing with uncertainty estimates. Network-level simulations with experimentally grounded device models achieve high MNIST accuracy (~98.25%), validate calibrated uncertainty on distribution shifts (rotations), and confirm the match between measured and modeled selector stochasticity. Future directions include: (1) exploring alternative device stacks for both weights (e.g., PCM, RRAM) and selectors (e.g., OTS, MIEC, IMT oxides like VO2 and NbOx) to improve endurance and dynamics; (2) addressing on-chip training challenges for NSMs (bidirectional/symmetric connections, activation derivatives, and norm computations) via approximations such as feedback alignment, local loss functions, surrogate/straight-through gradient estimators, or enforcing constant norm constraints; and (3) scaling to larger datasets and architectures, and experimentally integrating full crossbar prototypes to quantify energy/latency benefits and reliability over time.
Limitations
- Device non-idealities: FeFETs exhibit limited dynamic range, nonlinearity, asymmetry between potentiation and depression, and device-to-device/cycle-to-cycle variability that constrain performance versus ideal NSMs. - Selector endurance and variability: While Ag/HfO2 selectors show endurance >1e8 cycles suitable for inference, on-chip training may require higher-endurance IMT-based selectors (VO2, NbOx). Threshold distributions necessitate careful Vin selection within variability windows. - Training on-chip: Implementing full backpropagation locally on crossbars is challenging due to requirements for bidirectional/symmetric connections, activation derivatives, and computing weight norms (||w||) per neuron; approximations may be needed and can introduce accuracy trade-offs. - Uncertainty estimation cost: Bayesian-style inference relies on multiple stochastic forward passes (e.g., 100), incurring additional latency/energy during inference compared to single-pass deterministic models. - Scope: Primary demonstrations are simulations calibrated with device measurements; a fully integrated large-scale hardware prototype and comprehensive energy/throughput measurements are not presented.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny