logo
ResearchBunny Logo
Powering AI at the edge: A robust, memristor-based binarized neural network with near-memory computing and miniaturized solar cell

Engineering and Technology

Powering AI at the edge: A robust, memristor-based binarized neural network with near-memory computing and miniaturized solar cell

F. Jebali, A. Majumdar, et al.

Discover the groundbreaking work by Fadi Jebali and colleagues on memristor-based neural networks that can power AI autonomously using energy harvesters. Their innovative approach utilizes a miniature solar cell to enable digital near-memory computing that adapts to varying lighting conditions. This research promises energy-efficient solutions for intelligent sensors in various applications.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge of powering AI at the extreme edge where energy is scarce and unstable. Conventional AI workloads are energy intensive and often relegated to cloud or fog due to power constraints. Memristor-based systems promise drastic energy reductions, but many rely on analog in-memory computing that needs tightly regulated supplies—ill-suited to energy harvesters whose output fluctuates. The research question is whether a robust, calibration-free binarized neural network (BNN) implemented with memristors and digital near-memory computing can operate directly from an unregulated energy harvester, maintaining functionality and acceptable accuracy as power conditions vary. The purpose is to demonstrate a memristor-based BNN that is resilient to supply fluctuations and can be directly powered by a miniature wide-bandgap solar cell, enabling self-powered edge AI. The significance lies in enabling intelligent, battery-less sensors with robust on-device inference under variable energy availability.
Literature Review
Prior work has established memristor-based and other emerging memory compute-in-memory approaches that significantly reduce AI energy consumption, including analog crossbar-based accelerators. However, analog implementations typically require precise supply regulation and calibration to mitigate device variability and circuit sensitivities. Differential RRAM-based BNNs and logic-in-memory strategies have been proposed to enhance robustness. Energy harvesting for IoT/edge devices is well studied, but integrating harvesters directly with AI processors commonly necessitates power management units that add area, complexity, and losses. The present work builds on differential 2T2R memristor schemes and XNOR-augmented precharge sense amplifiers to improve resilience without error-correcting codes or calibration, aiming to tolerate supply fluctuations characteristic of harvesters, and contrasts this with less robust analog approaches requiring calibration.
Methodology
Hardware design and fabrication: The team fabricated a hybrid CMOS/memristor BNN system in a low-power 130 nm process with five metal layers. Hafnium-oxide memristors (TiN/HfOx/Ti/TiN, 10 nm HfOx, 10 nm Ti, ~300 nm diameter) replace vias between metal layers M4 and M5. The system comprises four arrays of 8,192 memristors each (total 32,768), configurable as two layers with 116 inputs and 64 outputs, or a single 116-input, 128-output layer. A smaller die with one 8,192-memristor module and flexible periphery access was also fabricated. Arrays implement a 2T2R scheme: two memristors per synaptic weight programmed complementarily (one LRS, one HRS), stored non-volatilely, with neuron thresholds stored in dedicated rows. Circuit architecture: Digital near-memory computing is used. Custom XNOR-augmented precharge sense amplifiers (XPCSA) simultaneously read the complementary memristors and compute an XNOR with the binary input, providing differential robustness to variability and supply fluctuations. Population count units (integer digital popcount) and decrementing neuron threshold registers are replicated per output neuron and placed near arrays to minimize data movement. Only final binarized outputs (sign of register) are transmitted off-array. Power and high-voltage handling: Forming and programming require elevated voltages (up to ~4.5 V). Periphery includes level shifters built with thick-oxide devices to sustain high voltages during forming/programming. After programming, the high-voltage pads can be tied to the nominal digital VDD (~1.2 V). A power management unit and a finite state machine control forming, programming, and inference sequencing. Operation and pipeline: Inference is pipelined (details in Supplementary Note 3). Thresholds are first read to local neuron registers. Inputs are applied sequentially; outputs of XPCSA feed popcount/decrement logic. After all inputs are processed, the sign of each threshold register yields the neuron activation. Programming and forming: Before use, all memristors are formed sequentially using on-chip control (example settings: VDDC 4.5 V, VDDR 2.7 V, VDD 1.2 V for 10 µs). Programming to HRS uses VDDC 2.7 V and VDDR 4.5 V; to LRS uses VDDC=VDDR=2.7 V; VDD remains 1.2 V. Pulses are ~6 µs. Each 2T2R bit-cell is always programmed complementarily. Measurement setups: For lab-supply tests, the packaged IC on a custom PCB interfaces to an MCU (STM32F746), AWG, and oscilloscope; Python scripts handle vectorization and I/O. Inputs/weights are prepared off-chip and streamed; outputs are read back and compared to RTL and expected results. Power consumption is measured by inserting a current amplifier on VDD and capturing inference current waveforms. Solar cell harvester: A miniature AlGaAs/InGaP heterostructure wide-bandgap (1.73 eV) solar cell (~5×5 mm²) was fabricated (MBE-grown multilayer stack) and characterized. The solar cell’s open-circuit voltage aligns with 1.2 V nominal CMOS supply under high illumination. Current-voltage curves were taken under AM1.5G and under a variable halogen lamp. For harvester-powered operation, the IC’s power pads (VDD and medium/high rails shorted appropriately post-programming) were connected directly to the solar cell without any power regulation or conversion. Illumination was swept to emulate equivalent solar powers down to 0.08 suns; inference was executed as in the lab-supply setup. Neural network mapping and simulation: Hardware arrays are 128×64 (or 116×64 effective). Arbitrary BNN layers are partitioned into multiple binary arrays; outputs are aggregated via majority voting across array blocks. Binarization follows standard BNN practice (binary weights/activations except input and final layers). For MNIST, a fully connected network with hidden layers of 1102 and 64 neurons was used; for CIFAR-10, a VGG-like binarized CNN was trained (Conv/BN/MaxPool stacks leading to FC(1102-1102-10)). Networks were trained without hardware errors; then, during inference simulation, experimentally measured error probabilities as a function of neuron preactivation and illumination were injected to model hardware behavior. Accuracy was reported for baseline (no bit errors) and various equivalent solar powers. Energy analysis: Total inference energy was measured experimentally versus VDD and frequency; per-block energy breakdown was obtained via EDA-based simulations (Eldo for array with extracted parasitics, Cadence Voltus for digital blocks using VCD activity). A clock-gated variant and an optimized read sequence were also analyzed. Projections were made for a 28 nm FDSOI process using redesigned arrays and scaling of digital energy.
Key Findings
- Functional robustness across VDD and frequency without calibration: Outputs match RTL at VDD=1.2 V and 66 MHz; functional down to ~0.7 V at reduced frequency. Residual errors appear mainly at lower VDD and for near-threshold neuron preactivations. - Minimum measured inference energy: 45 nJ at VDD=0.7 V, 10 MHz. Energy scales ~with VDD² and is largely frequency-independent at fixed VDD, indicating capacitive dominance and negligible short-circuit currents. - Energy breakdown (simulated): Control circuitry ~72.3%; neuron registers 16.0% (no clock gating in test chip); clock distribution 5.2%; multiplication (memristor read + XNOR) 4.9%; accumulation (popcount) 1.6%; total MAC-related ~6.5%. - Accuracy vs supply and preactivation: No errors above ~1.0 V; occasional errors at 0.9 V and 66 MHz; errors concentrated where neuron preactivation |λ| is small. Errors vanish when |λ| > 5. More errors at 66 MHz than 33 MHz at 0.9 V, consistent with weakly programmed memristors dominating. - Harvester-powered operation: Directly powered by a 1.73 eV wide-bandgap AlGaAs/InGaP solar cell without regulation. Under 8 suns equivalent illumination, performance mirrors 1.2 V bench supply. Functional down to 0.08 suns, entering approximate computing mode: high-|λ| neurons correct; low-|λ| prone to errors. - Task-level impact (simulated with injected hardware error rates): MNIST FC BNN baseline 97.2% vs 97.1% (8 suns), 96.9% (0.8 and 0.36 suns), 96.5% (0.08 suns) — only 0.7 percentage point drop at 0.08 suns. CIFAR-10 CNN baseline 86.6% vs 83.6% (8 suns), 78.2–78.3% (0.8–0.36 suns), 73.4% (0.08 suns). Misclassifications at low illumination cluster near class boundaries or atypical samples (t-SNE analysis). - Comparative robustness: Simulations indicate analog in-memory designs (even with complementary programming) are less robust under variability and low VDD, requiring supply-dependent calibration, unlike the differential digital XPCSA approach. - Energy efficiency: Measured design (excluding FSM energy) achieves ~2.9 TOPS/W at 0.7 V, 10 MHz. With clock gating, optimized read, and excluding clock/reg energy, estimated 22.5 TOPS/W. Projected in 28 nm FDSOI: ~397 TOPS/W for a clock-gated design.
Discussion
The findings demonstrate that a memristor-based BNN using differential 2T2R weights and XNOR-in-sense amplification can operate reliably under variable and unregulated power directly from a miniaturized solar cell. This addresses the central question of achieving self-powered edge AI without calibration or regulation overhead. The differential sensing reduces sensitivity to device variability and supply fluctuations, enabling error-free operation at nominal VDD and graceful degradation as VDD/illumination decreases. Because BNNs are intrinsically tolerant to weight errors, errors that do occur at low power primarily affect neurons with near-threshold preactivations and translate into misclassifications on atypical or boundary cases, yielding a self-adaptive approximate computing behavior rather than catastrophic failure. Compared with analog in-memory computing, which typically necessitates precise calibration and stable supplies, the presented digital near-memory approach is markedly more robust to the variability and supply instability associated with energy harvesters. Memristors confer additional advantages over SRAM under low-voltage/unstable supply, including non-volatility (no data loss on power interruption) and strong immunity to read disturb with PCSA reads. The results are relevant for intelligent sensors and edge devices that must operate intermittently on harvested energy. The architecture achieves low inference energy, maintains MNIST-level performance even under very low illumination, and remains functional on more complex tasks (CIFAR-10) albeit with reduced accuracy—consistent with the targeted approximate behavior under energy scarcity.
Conclusion
This work introduces a robust, calibration-free memristor-based binarized neural network that operates via digital near-memory computing and can be directly powered by a miniature wide-bandgap solar cell. Key contributions include: a 32,768-memristor 2T2R design with XNOR-augmented precharge sense amplifiers; demonstration of reliable operation across supply voltages and frequencies; minimal inference energy of 45 nJ; direct harvester-powered operation down to 0.08 suns with graceful accuracy degradation; and task-level validation showing strong resilience on MNIST and acceptable degradation on CIFAR-10. Energy-efficiency analyses and scaling projections suggest substantial gains with clock gating, optimized reads, and advanced CMOS nodes. Future directions include: integrating on-chip data buffering, convolution support, and serial I/O for full multi-layer autonomy; adopting lower-threshold thick-oxide options to enable operation below 0.7 V; exploring other harvesters (e.g., thermoelectrics) possibly with on-chip charge pumps; improving memristor programming robustness; and co-optimizing array size and mapping strategies to further enhance accuracy and efficiency.
Limitations
- Forming and programming require high voltages (up to ~4.5 V) and thick-oxide devices; while not needed during inference, this complicates periphery design. - Operation below ~0.7 V becomes inaccurate due to thick-oxide device thresholds in the chosen process; alternative processes could extend the operating range. - Residual errors arise from memristor variability/weak programming, especially affecting neurons with small preactivation magnitudes at low VDD or low illumination. - Test chip lacks clock gating and includes always-on neuron registers, inflating energy; current design also relies on off-chip MCU for multi-layer sequencing and lacks on-chip convolution/input buffering. - Solar cell used was not fully optimized (no AR coating, 8.7% efficiency), and specific indoor spectra were approximated by a halogen lamp for variable illumination. - Majority-vote mapping of large layers to 128×64 arrays introduces moderate accuracy degradation relative to software baselines. - No explicit error-correction codes are employed; robustness relies on differential sensing and BNN error tolerance.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny