logo
ResearchBunny Logo
Hybrid architecture based on two-dimensional memristor crossbar array and CMOS integrated circuit for edge computing

Engineering and Technology

Hybrid architecture based on two-dimensional memristor crossbar array and CMOS integrated circuit for edge computing

P. Kumar, K. Zhu, et al.

Discover a revolutionary hybrid architecture for edge computing that effectively merges a 2D memristor crossbar array with CMOS circuitry to implement the extreme learning machine algorithm. This exciting research conducted by Pratik Kumar, Kaichen Zhu, Xu Gao, Sui-Dong Wang, Mario Lanza, and Chetan Singh Thakur showcases impressive performance in tackling complex audio, image, and non-linear classification tasks using real-time datasets.

00:00
00:00
~3 min • Beginner • English
Introduction
The introduction motivates the need for energy-efficient, massively parallel hardware for AI/ML at the edge, highlighting limitations of traditional von Neumann CMOS systems due to memory and communication bottlenecks, non-linearities, and scaling challenges. Analog CMOS can exploit device physics for performance but suffers from mismatch and non-linearity, which are typically design challenges; the authors propose to harness these effects. Conventional charge-based memories (RAM, Flash) degrade at lower nodes. Emerging resistive switching (RS) memories enable in-memory computing to overcome the von Neumann bottleneck. Transition metal oxide (TMO) RRAM is maturing, yet 2D-material RRAMs have shown RS down to monolayers, potentially offering better speed and power with different RS mechanisms. The authors note 2D RAM state-currents spanning sub-nA to mA, exceeding reports for TMO RRAM. They propose a hybrid architecture integrating a CMOS encoder with a 2D h-BN memristor crossbar decoder, implementing a local receptive field extreme learning machine (LRF-ELM) for edge computing, leveraging CMOS non-linearity and memristor multi-state conductance for classification of audio, image, and other datasets.
Literature Review
The paper situates the work within prior efforts on: (1) Edge AI hardware scaling challenges and energy constraints; (2) Analog CMOS neuromorphic designs that face device mismatch but can be tolerant when designed appropriately; (3) The memory wall and charge-based memory scaling issues; (4) Emerging RS memories—commercial progress in TMO RRAM and increasing demonstrations of 2D-material RRAMs with monolayer operation, distinct RS mechanisms, and potential advantages in speed/power; (5) Prior work on neuromorphic systems leveraging mismatch, memtransistors, and 2D-material synaptic devices; (6) Local receptive field Extreme Learning Machines (LRF-ELM) where only output weights are trained. The authors emphasize that most 2D-material circuit demonstrations used non-scalable exfoliation and targeted simple logic functions, whereas this work uses scalable CVD h-BN in crossbars with many stable conductance states, combined with CMOS for a full LRF-ELM implementation.
Methodology
Architecture: A hybrid system comprises a CMOS Encoder chip and a Memristor Decoder chip. The CMOS Encoder implements the LRF-ELM input-to-hidden transformation using analog Gaussian neurons, a row-select encode unit, a bias generator, and control logic. The Decoder comprises a row-select decode network, a 2D-material memristor crossbar array storing output weights, and a mixed-signal interface with differential pair integrator (DPI) synapses. Algorithm: The network uses Local Receptive Fields ELM (LRF-ELM). Input-to-hidden weights are random and fixed (arising from device mismatch). Hidden node non-linearity is Gaussian. Only output weights W are trained via a modified quantization-aware stochastic gradient descent (SGD) that accounts for memristor quantization (26 states) and device variability by sampling from Gaussian distributions with state-dependent means and variances derived from measured device statistics. During each iteration, intermittent weights are mapped to the closest memristor state, then perturbed by the corresponding variability. CMOS Encoder: Implemented in 180 nm CMOS (post-layout simulated). Each hidden node is a 9D Gaussian cell realized by cascading 1D Gaussian NP cells (alternating NMOS/PMOS subthreshold circuits), exploiting subthreshold sech^2 behavior approximating Gaussian. Inputs are normalized to the effective non-linearity range (~0.8–1.6 V). Bias Vbias tunes current amplitude/non-linearity. Local receptive fields use 3×3 windows with stride 2; e.g., for a 31×51 input matrix, 375 input nodes are derived. Row-select encode uses transmission-gate cascode analog multiplexers; selection is controlled by a scan-chain flip-flop. Memristor Decoder: A crossbar of Au/h-BN/Au memristors stores output weights as quantized conductance (26 stable states). Hidden node voltages are time-multiplexed through synchronized row-select encode/decode into the crossbar. Column currents are integrated by DPI log-domain synapses (subthreshold), accumulating over the scan of all hidden nodes; outputs are thresholded to produce class decisions. DPI dynamics are governed by charging/discharging equations with parameters set to avoid saturation within a scan cycle. Device Fabrication and Characterization: Crossbar arrays (e.g., 10×10) are fabricated using wafer-scale methods: Au bottom electrodes on SiO2/Si by photolithography and e-beam evaporation; ~6 nm CVD h-BN transferred via wet transfer (FeCl3 etch, PMMA scaffold); Au top electrodes aligned to form 5 µm×5 µm cross-points. SEM used for morphology; electrical tests via Keysight B1500 with WGFMU. Devices exhibit non-volatile bipolar RS with low HRS/LRS read currents (~2 pA HRS, ~10 nA LRS at low read bias), controllable potentiation even at sub-µA currents, and >26 stable conductance states starting at ~10 nS. Datasets and Evaluation: Audio classification on ESC-50 subsets and Free-Spoken-Digit Dataset (FSDD); audio features extracted offline via CAR-IHC cochlear model (cochleograms of sizes 31×51 for ESC-50, 47×51 for FSDD). Image classification on Semeion digit dataset (images scaled to 12×13). Lower-dimensional datasets (AReM activities; UCI Breast Cancer, Ionosphere, Haberman) use a standard fully connected ELM. Hidden node count set to ~10× input features. Weights trained offline with quantization-aware SGD; conductance states mapped to 26-level memristor states. CMOS encoder power per hidden node measured from simulations at 0.95 V supply.
Key Findings
- Fabricated Au/h-BN/Au memristor crossbar arrays (e.g., 10×10) using scalable CVD h-BN (~6 nm) with >96% of devices exhibiting analog bipolar resistive switching. - Stable multi-level operation with more than 26 distinct conductance states, starting at ~10 nS; controllable potentiation in sub-µA regime with smooth trends. - Low variability: Example distributions over 16 devices show coefficient of variation (Cv) of 6.2% for VSET and 12.4% for VRESET; cumulative analyses show among the lowest cycle-to-cycle and device-to-device variability compared to literature. - Read current levels as low as ~2 pA (HRS) and ~10 nA (LRS) at low read bias; state currents span orders of magnitude (sub-nA to mA reported for 2D RAM). - Quantization-aware training incorporating device state statistics enables accurate mapping to 26 levels with modest accuracy degradation. - Audio classification (ESC-50 subsets, 3750 hidden nodes): • Insects: Train 93%; Test 90% (floating), 90% (quantized) • Crackling Fire: Train 93%; Test 92% (floating), 92% (quantized) • Footstep: Train 86%; Test 83% (floating), 84% (quantized) • Washing Machine: Train 93%; Test 92% (floating), 90% (quantized) • Saw chain: Train 95%; Test 94% (floating), 92% (quantized) - Speaker recognition (FSDD, 5750 hidden nodes): Train 86.00%; Test 84.50% (floating), 82.25% (quantized), 80.00% (additional reported test metric). - Image classification (Semeion, 360 hidden nodes): Train 92.00% (floating), 90.93% (quantized); Test 90.66% (floating), 89.33% (quantized). - Low-dimensional datasets (fully connected ELM): • AReM: Walking 92/92 (train floating/quantized), test 92/89; Bending 91/90, test 92/89; Lying 81/78, test 82/80; Sitting 80/75, test 75/72. • Breast Cancer (100 hidden): Train 98/93; Test 92/89. • Ionosphere (100 hidden): Train 95/95; Test 89/87. • Haberman (100 hidden): Train 85/84; Test 75/70. - CMOS encoder power per computation per hidden node: ~7.8 µW at 0.95 V supply. - Architectural advantages include tolerance to device variability (using mismatch as random weights), reduced memory needs for first-layer weights, time-multiplexed MAC via crossbar, and separable optimization of CMOS and memristor chips.
Discussion
The hybrid CMOS–memristor LRF-ELM framework addresses edge AI constraints by combining analog CMOS non-linear encoding with in-memory MAC using 2D h-BN memristors. By exploiting inherent device mismatch for random input-to-hidden weights, the system avoids storing first-layer weights, reducing memory demands and improving robustness to variability. The memristor crossbar’s multi-state, low-variability conductance enables compact storage of trained output weights and efficient MAC operations. Quantization-aware training aligned to measured device statistics mitigates performance loss from conductance discretization and variability. Demonstrations across audio, image, and other datasets show competitive accuracies with modest degradation under 26-level quantization, while simulated CMOS encoder power per hidden node remains in the microwatt regime, supporting suitability for low-power edge devices. The separation of encoder (CMOS) and decoder (memristor) allows independent optimization and potential substitution of other emerging non-volatile memories.
Conclusion
This work demonstrates a scalable hybrid edge-computing architecture that integrates a CMOS analog encoder implementing LRF-ELM with a 2D h-BN memristor crossbar decoder. Key contributions include: (1) fabrication of CVD h-BN crossbar arrays exhibiting >96% analog bipolar RS devices with >26 stable conductance states and low variability; (2) an analog 9D Gaussian CMOS encoder leveraging device mismatch for random features and achieving low power; (3) a quantization-aware training procedure that incorporates measured memristor state statistics to map output weights effectively; and (4) successful classification on diverse datasets with limited accuracy loss after quantization. The modular architecture is suitable for edge scenarios and can be extended by replacing the memory technology in the decoder without changing the encoder. Future directions include full chip-level integration and measurement of the CMOS encoder silicon, scaling crossbar sizes and array integration density, exploring on-chip/near-memory training, and extending to other non-volatile memory technologies (e.g., PCM, oxide RRAM).
Limitations
- The CMOS encoder was designed and validated via post-layout simulations in 180 nm technology; fabricated silicon measurements for the encoder are not reported. - Training is performed off-chip; on-chip training or online adaptation is not demonstrated. - While quantization-aware training reduces degradation, mapping to 26 conductance levels still leads to some accuracy loss relative to floating precision. - Device variability is low but non-zero and must be accounted for; robustness at larger crossbar scales beyond the demonstrated arrays is not detailed in this work.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny