logo
ResearchBunny Logo
Parallel synaptic design of ferroelectric tunnel junctions for neuromorphic computing

Computer Science

Parallel synaptic design of ferroelectric tunnel junctions for neuromorphic computing

T. Moon, H. J. Lee, et al.

Discover groundbreaking advancements in neuromorphic edge-computing with a novel synaptic design featuring ferroelectric tunnel junctions. This research, conducted by Taehwan Moon and colleagues, showcases enhanced linearity and minimal variability, promising significant improvements in synaptic performance and pattern recognition accuracy.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses a key challenge in memristive neuromorphic hardware: achieving linear, symmetric, and reliable analogue weight updates suitable for in-memory vector–matrix multiplications. Conventional two-terminal resistive memories exhibit significant cycle-to-cycle and device-to-device variability due to stochastic filamentary ion migration, degrading training and inference accuracy. Ferroelectric tunnel junctions (FTJs), with non-filamentary, polarization-based switching and inherently higher resistance, are compelling synaptic candidates but suffer from intrinsic nonlinearity and asymmetry in weight updates due to coercive-voltage-driven switching kinetics. Prior mitigation by incremental step pulses (ISPs) only partially linearizes updates. This work proposes a parallel FTJ synapse architecture with controlled voltage offsets across devices, enabling averaged switching responses that markedly improve linearity and reduce variability, thus enhancing training efficacy and energy-efficient edge AI computing.
Literature Review
The paper situates FTJs among memristive synapse technologies (resistive switching, spintronic, phase change, ferroelectric devices), noting FTJs’ advantages in endurance, resistance levels, and non-filamentary operation. However, FTJs exhibit nonlinear/asymmetric programming due to abrupt switching around the coercive voltage (Vc). ISPs have been used to mitigate nonlinearity but remain limited. A 2T-1FeFET approach separated MSBs/LSBs to improve linearity but at large area overhead and still required ISPs for MSBs. Superlattice/stack engineering to distribute Vc improved linearity at the cost of higher programming voltage. The proposed parallel FTJ approach aims to achieve low nonlinearity without area penalty (via future 3D stacking) or increased voltage, leveraging FTJs’ low variability.
Methodology
Device fabrication: FTJs were built on highly doped p-type 8-inch Si wafers. After HF dip, a 50 nm TiN bottom electrode was sputtered, then ozone pretreatment formed ~1 nm interfacial layer (IL). A 4 nm (Hf,Zr)O2 ferroelectric film was deposited by thermal ALD at 300°C using TEMA-Hf, TEMA-Zr, and ozone. A 50 nm Mo top electrode was sputtered. Rapid thermal anneal in N2 at 500°C for 1 min crystallized the FE phase. Top electrodes (areas 400–10,000 μm2) were patterned by dry etching. Measurements: Electrical tests used an 8-inch semi-auto probe station with Keithley SCS4200 (pulse and SMU). PUND (10 kHz) measured FE hysteresis; DC I–V characterized conductance change and reliability. A custom pulse-write/DC-read protocol evaluated long-term potentiation/depression (LTPD). Wake-up: 1000 cycles of ±2 V, 100 kHz pulses were applied prior to FE measurements. Proposed synapse architecture: Multiple FTJs are connected in parallel to a single access transistor; plate lines (PLs) of individual FTJs are biased at different constant offsets, yielding distinct effective voltage drops across each FTJ for the same source-line pulse (ISP). During programming, WL enables the access transistor; ISPs are applied to the source line; PLs hold staggered biases (equal gaps ΔV). During read, input/read voltage applies to source line with all PLs grounded, summing currents in parallel to yield total conductance (synaptic weight). Vertically stacked 3D integration (conceptual) can preserve footprint while increasing the number of parallel FTJs. Device-level characterization: Cross-sectional TEM confirmed well-crystallized FE with flat interfaces. PUND showed sharp switching current peaks near +0.8 V and −0.5 V; integrated PV yielded 2Pr ≈ 35 μC cm−2. DC I–V for various device areas showed overlapping current density, confirming area-scalable current and tunneling mechanism. Endurance was tested by applying 2 V, 500 kHz square pulses, monitoring DC I–V between; devices endured up to 10^8 cycles maintaining TER. Retention was measured at 0.2 V read after programming LRS (+2 V), HRS (−2 V), and an IRS (+0.8 V on HRS) at room temperature for 30,000 s with >10-year extrapolated retention. Variability assessment: Cycle-to-cycle (C2C) variability was assessed over 100 DC I–V cycles on a single device; device-to-device (D2D) variability over 100 devices across the nearest six dies on an 8-inch wafer. Cumulative probability of on/off currents at 0.2 V provided σ/μ. Synaptic behavior and modeling: LTPD of a single FTJ was characterized under ISPs (32 steps, pulse width 500 ns) across four initial amplitude ranges with corresponding PL offsets {0, −0.15, −0.30, −0.45 V}; conductance was read at 0.2 V after each pulse. The proposed synapse LTPD was computed by linear summation/averaging of the four FTJ responses. LTPD curves were fitted by exponential models: G_pot = B_pot(1 − e^(−P/A_pot)) + G_min and G_dep = −B_dep(1 − e^((P−P_max)/A_dep)) + G_max; nonlinearity label α was derived from A. RMS fitting errors evaluated model fidelity. Neural network simulation: A CNN was simulated for MNIST classification: one convolutional layer with four 3×3 filters, a 2×2 max pooling layer, and a fully connected network (676×50×10) with ReLU and softmax. Training used Manhattan update rule: sign-based potentiation/depression determined by backpropagated weight updates, applying device LTPD characteristics. Accuracy from floating-point software (97.26%) served as reference. Device energy metrics were computed; average programming energy and maximum read power densities were reported.
Key Findings
- Parallel FTJ synapse markedly improves linearity of weight updates by averaging switching across devices biased with staggered plate-line offsets. - Nonlinearity (α) improvement: from approximately −3.25/−2.51 (single-FTJ synapse, potentiation/depression) to −0.18/−1.14 (four parallel FTJs). Additional single-FTJ ranges showed higher nonlinearity and larger RMS fitting errors compared to the proposed design. - Variability (relative standard deviation): wafer-level low variability demonstrated—headline σ/μ: C2C = 0.036, D2D = 0.032. Detailed distributions at 0.2 V: C2C HRS 0.039, LRS 0.036; D2D HRS 0.022, LRS 0.032. Wafer mapping showed TER ≈ 10 across most of the 8-inch wafer with a mild vertical gradient due to ALD thickness non-uniformity. - Endurance and retention: endurance up to 1×10^8 cycles without breakdown, with TER largely maintained. Room-temperature retention for HRS/LRS/IRS stable over 30,000 s, extrapolated >10 years (noting faster retention loss at elevated temperatures and modest narrowing of the memory window). - Ferroelectric properties: PUND peaks near +0.8 V and −0.5 V; switchable polarization 2Pr ≈ 35 μC cm−2. DC I–V hysteresis consistent with FE switching; area-independent current density across 400–10,000 μm2 devices confirming tunneling conduction and favorable scaling of currents for large arrays. - Synaptic dynamic range: conceptual parallel FTJ synapse exhibits dynamic range Gmax/Gmin ≈ 5 (about half of TER ≈ 10), due to averaging. - Neural network performance: MNIST accuracy reached 96.84% using the proposed synapse model, close to software limit of 97.26%. A single FTJ (range 1) yielded 89.48% due to nonlinear/asymmetric updates. - Energy metrics: average programming energy ≈ 130.1 fJ μm−2 per pulse; maximum read power ≈ 146 fW μm−2. - Operating voltages: ISP ranges up to ~±1.55 V for LTPD; FE switching observed around |Vc| ≈ 0.5–0.8 V depending on frequency.
Discussion
The work demonstrates that intrinsic ferroelectric switching nonlinearity in FTJs can be effectively mitigated at the synapse level by parallelizing multiple FTJs and applying ISPs with controlled plate-line offsets. Each FTJ experiences a different segment of the switching curve relative to Vc; summing their currents produces a more linear aggregate response, improving training efficacy without complex per-cell control or increased programming voltage. The extremely low C2C and D2D variability of the underlying HfO2-based FTJs, confirmed across an 8-inch wafer, underpins the reliability of the proposed design in large arrays. Area-independent current density and high resistance levels alleviate current and energy concerns in large-scale neuromorphic accelerators. Although the dynamic range is modest (≈5), the improved linearity/asymmetry leads to MNIST accuracy (96.84%) close to the floating-point baseline (97.26%), significantly outperforming single-FTJ programming. Endurance (10^8 cycles) and projected retention (>10 years) further support practical deployment. The approach is compatible with 3D stacking (future vertical integration), suggesting minimal footprint overhead while enabling more parallel elements per synapse for further linearity tuning.
Conclusion
A general-purpose synaptic design leveraging multiple parallel FTJs with incremental pulsing and staggered plate-line biases substantially reduces the intrinsic nonlinearity of ferroelectric synapses. Wafer-scale FTJs with very low variability, reliable endurance, low-voltage operation, and favorable current scaling were demonstrated. A four-FTJ synapse achieved nonlinearity as low as −0.18/−1.14 and delivered 96.84% MNIST accuracy, close to software limits. The concept is extendable to other ferroelectric memories and aligns with 3D integration trends (target footprint ~5F^2). Future work could realize full 3D stacked implementations, optimize conductance/barrier engineering to expand dynamic range and absolute conductance, and validate large-scale array operation and system-level energy/throughput gains.
Limitations
- Dynamic range reduced to ~5 due to averaging across parallel FTJs. - Absolute conductance not optimized; very low conductance can increase ADC burden and reduce speed; improvements may require barrier engineering, thickness scaling, or electrode workfunction tuning. - Retention degradation may accelerate at elevated temperatures; slight narrowing of the memory window over time observed. - C2C drift noted under repeated DC stress (wake-up and possible stress-induced leakage current effects). - Wafer-level TER gradient due to ALD thickness non-uniformity; improved process control needed. - The 3D stacked synapse was proposed conceptually; full 3D hardware demonstration remains future work. Neural network results are based on device-informed simulations rather than measurements from a complete array.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny