Engineering and Technology

Energy-efficient memcapacitor devices for neuromorphic computing

K. Demasius, A. Kirschen, et al.

Discover how Kai-Uwe Demasius, Aron Kirschen, and Stuart Parkin have revolutionized energy efficiency in data-intensive computing. Their research on memcapacitive devices shows an astounding potential of 29,600 tera-operations per second per watt, pushing the boundaries of neural network training!

00:00

~3 min • Beginner • English

Index

Introduction

Neuromorphic computing maps artificial neural networks onto hardware to perform massively parallel multiply-accumulate (MAC) operations efficiently. Prior resistive (memristive) approaches have advanced the field but suffer from static power dissipation. Memcapacitive devices, which store state in capacitance via electric-field coupling rather than conductance, promise lower static power. This study investigates a memcapacitive device based on charge shielding that targets energy-efficient, parallel MAC operations with high precision and CMOS compatibility. The research aims to demonstrate device operation, crossbar-based training, and to assess scalability and energy efficiency through experiments and simulations.

Literature Review

Earlier neuromorphic hardware has used oxide-based memristors, phase-change memory, spintronic devices, and ferroelectric devices (tunnel junctions and FeFETs), achieving up to ~100 TOPS W−1 energy efficiency. Memcapacitors have been proposed theoretically and demonstrated in a few implementations, including variable plate distance (MEMS), metal-insulator transition in series with dielectrics, oxygen vacancy front modulation in memristors, and MOS capacitors with memory. These often face parasitic resistive components at small plate distances or limited lateral scalability due to large plate distances, with similar issues for varying surface area or dielectric constant approaches. The present work positions charge-shielding memcapacitors as a scalable alternative with high dynamic range and low power. Related sensing/sense-amplifier literature shows sub-10 aF resolution using charge-based capacitive measurements and lock-in techniques, relevant for readout sensitivity in neuromorphic arrays.

Methodology

- Device concept: A memcapacitive cell with a top gate electrode, a shielding layer (SL) including lateral p+ and n+ regions (forming p+nn+), and a back-side readout electrode separated by dielectrics. Charge shielding in the SL modulates coupling between gate and readout, encoding weights in capacitance. Symmetric response to positive/negative voltages is enabled by carrier reservoirs (p+/n+), important for undistorted weight updates. - Memory mechanism: Ferroelectric-assisted charge trapping in the top dielectric (tunneling oxide ~2.5 nm) to promote stable trapping and low detrapping. During readout, the SL is grounded and the device is excited with an AC gate signal; writing uses a voltage differential between WL (gate) and SL. - Fabrication (micrometre-scale): Devices on SOI with n+-handle, 3.5 μm epitaxial layer, 190 nm buried oxide, 88 nm device silicon. Ion implantation (B, P), interface oxide formation, HfZrO2 deposition with TiN cap by ALD and anneal (600 °C), patterning for contacts, Al metallization, SL etching, BL isolation via 7 μm deep trenches refilled with SU-8, and SU-8 interlayer dielectric for WLs. - Single-device characterization: CV measurements with AC excitation (e.g., 100 mV, 1 kHz) and DC bias sweep; demonstration of capacitive coupling window modulation via pin junction bias. Programming via pulse number, pulse height, or pulse length modulation to achieve gradual LTP/LTD behavior; readout uses AC gate with bias to position operating window. - Crossbar system: A 26×6 array (156 cells), differential weight encoding (two cells per weight, positive minus negative). Inputs encoded as AC sine periods with sign via 180° phase shift; switched-capacitor scheme with a global clock implements signed four-quadrant multiplication and accumulation at BL integrators. - Training algorithm: Manhattan update rule (sign-based coarse-grained update). Error signal applied to SL; input to WL; XNOR-like combination determines update sign while limiting disturb to 1/3 of write/erase voltage. Dataset: 5×5 binary images of letters M, P, I with one flipped pixel per sample (78 total), split into train/test. - Simulations: TCAD (Synopsys) of a 90 nm gate-length device including drift-diffusion, SRH recombination, mobility models; exploration of CV behavior, dynamic range, memory-window shifts, and scaling (including high-κ dielectrics). SPICE (LTspice) model of arrays with parasitics to assess RC delays, energy per operation, and energy recovery. Noise analysis (kTC) and sensitivity estimates for required period averaging. Energy recovery analysis based on adiabatic power clocks with ~95% recovery for harmonic signals.

Key Findings

- Experimental device operation: CV curves exhibit a transmissive window during depletion and strong shielding during inversion/accumulation; curves resemble sigmoid derivatives. With memory dielectric, the coupling window shifts with a ~2.7 V memory window, confirming charge trapping. - Programmability and dynamic range: Gradual LTP/LTD achieved via pulse number/height/length modulation; differential readout currents confirm distinct analog states. Measured capacitive dynamic range for a micrometre-scale device is ~1:1,478 between written and erased states. - Crossbar MAC linearity: Four-quadrant multiplication realized with phase-encoded inputs and differential weights; measured outputs are highly linear versus input periods and accumulation across cells. - Training demonstration: On the 26×6 array, a perceptron trained with the Manhattan rule on 5×5 letter images (M, P, I) rapidly reduces misclassifications to near zero after one epoch and remains low across 10 epochs. Neuron activations separate classes clearly, with letter I most robust. - TCAD results and precision: For a 90 nm device (no memory dielectric initially), the max/min capacitance ratio is ~1:90 (ranges 1:60–1:90 depending on oxide thickness and gate length). Incorporating memory shifts the CV enabling on/off control with AC readout bias. High-κ dielectrics enable similar capacitance at smaller gate lengths (down to ~45 nm). A dynamic range of 1:60–1:90 supports 6–8 bits precision. - Readout timing and encoding: Approximately 142 AC periods are needed to encode 7–8-bit inputs with phase for sign; period averaging boosts SNR and reduces kTC noise. - Noise and sensitivity: For a ~6.65 aF device, kTC noise ~25 mV (room temp) drops to ~2.2 mV with 142-period averaging, yielding ~7-bit precision. Capacitive readout approaches are at least 8× more energy-efficient than resistive readouts for equivalent distinguishable levels. - Energy and performance (SPICE worst-case, 95% energy recovery): Reactive energy per cell per MAC (142 periods) ~5 fJ; active energy ~0.015–0.040 fJ per cell depending on array size. Worst-case energy efficiency η ≈ 3,452.6–3,782.2 TOPS W−1 across array sizes (100×100 to 2,500×2,500). Without recovery, ~198–199 TOPS W−1. - Application scenario (simulation): For MNIST perceptron-like workload, projected energy efficiency is ~29,600 TOPS W−1 with energy recovery; without recovery ~1,702 TOPS W−1. - Experimental-to-simulation scaling consistency: Measured reactive energy per period on micrometre devices, scaled by geometry and oxide thickness differences, aligns with simulated per-cell energies (~5.84 fJ corrected vs 5 fJ simulated). - Scalability: Devices potentially scalable laterally to ~45 nm with high-κ dielectrics; short-channel effects become more important at smaller gate lengths, but design mitigations preserve sufficient dynamic range.

Discussion

The work demonstrates that memcapacitive devices exploiting charge shielding can perform analog MAC operations with low static power and enable substantial energy recovery during readout, in contrast to inherently dissipative resistive paths in memristors. The switched-capacitor, phase-encoded scheme yields linear multiplication and accumulation and supports differential weights, directly addressing the need for accurate signed MACs in neuromorphic systems. Experimental results validate device programmability and array-level learning, while simulations indicate that with realistic parasitics, adiabatic power clocks, and period averaging, 6–8-bit precision and multi-kTOPS/W efficiencies are feasible, substantially exceeding many memristive approaches. The analysis connects device physics (shielding efficiency, Debye screening nonlinearity) to system-level metrics (SNR, kTC noise, RC delay, energy per MAC), and supports lateral scaling to sub-100 nm gates using high-κ dielectrics. Collectively, the findings substantiate the potential of memcapacitive crossbars as energy-dominant analog compute fabrics compatible with CMOS and amenable to reversible/adiabatic techniques.

Conclusion

A memcapacitive device architecture using a shielding layer between electrodes achieves high dynamic ratios (~1,480 at micrometre scale; ~90 in 90 nm simulations) and enables linear, signed MAC operations in crossbar arrays. A 26×6 array trained a simple image classifier, confirming functionality. Circuit and noise analyses indicate 6–8-bit precision with period-encoded readout and show that capacitive readout can be substantially more energy-efficient than resistive readout, especially with adiabatic energy recovery. Simulations project energy efficiencies from ~1,000 to 10,000 TOPS W−1 (up to ~29,600 TOPS W−1 in a representative workload) and support scalability to ~45 nm with high-κ dielectrics. The approach is CMOS-compatible and suggests a pathway to combine reversible and neuromorphic computing for highly energy-efficient AI hardware. Future work includes nanometre-scale fabrication, optimization of memory materials for analog storage, larger arrays, and integration of high-Q energy-recovery clocking.

Limitations

- Demonstrations are on micrometre-scale devices and a modest 26×6 array; large-scale nanometre implementations are supported by simulations but not yet fabricated. - Energy-efficiency figures rely on SPICE models and assume ~95% energy recovery via adiabatic power clocks; practical recovery may be limited by resistive losses and inductor quality factors. - Charge-trapping memories can be slow (ms) and energy-demanding for writes; while suitable for inference, training speed/energy may be constrained. Ferroelectric analog storage can show abrupt switching at small scales due to grain-size limits. - Short-channel effects, quantum confinement, and band-to-band tunneling could impact scaling; device variability, endurance, and retention are referenced to supplementary data but not exhaustively characterized here. - Sense amplifier sensitivity and noise management rely on multi-period averaging (e.g., 142 periods), which affects latency and throughput. - The training task (letters M, P, I, 5×5) is simple; generalization to complex datasets and deeper networks remains to be shown.

Related Publications

Explore these studies to deepen your understanding of the subject.

Engineering and Technology

Toward grouped-reservoir computing: organic neuromorphic vertical transistor with distributed reservoir states for efficient recognition and prediction

C. Gao, D. Liu, et al.

Engineering and Technology

Neuromorphic van der Waals crystals for substantial energy generation

S. Kim, S. Choi, et al.

Engineering and Technology

An ultrasmall organic synapse for neuromorphic computing

S. Liu, J. Zeng, et al.

Engineering and Technology

Atomic Nb-doping of WS₂ for high-performance synaptic transistors in neuromorphic computing

K. Guan, Y. Li, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny