logo
ResearchBunny Logo
Memristive tonotopic mapping with volatile resistive switching memory devices

Engineering and Technology

Memristive tonotopic mapping with volatile resistive switching memory devices

A. Milozzi, S. Ricci, et al.

Unlock the secrets of auditory perception with innovative research by Alessandro Milozzi, Saverio Ricci, and Daniele Ielmini. This groundbreaking study delves into volatile RRAM devices, demonstrating their potential for energy-efficient, high-density neuromorphic systems that excel in processing temporal signals, making strides in the realm of speech recognition.... show more
Introduction

Perception of information from the surrounding environment is a crucial task for animals to detect external stimuli and react to them. Light, sound, gravity, touch, and chemicals are converted into encoded spiking signals by dedicated apparatus and then interpreted by the brain. Because of its high-energy efficiency and intrinsic error tolerance, the human brain provides inspiring novel paradigms to achieve better computational performance. In this framework, the auditory system has gained strong attention due to its remarkable features: sound can reach our ears from all possible directions and can be perceived continuously. Unlike retinotopic or somatosensory maps, sound is processed internally by the auditory system using an internal representation of physical features, not spatial arrangement of sensors. Auditory processing deals with mechanical vibrations that are temporal signals. The frequency of natural sounds spans from tens of Hz to tens of kHz (about three orders of magnitude). To classify these signals, a spatial representation of this broad temporal range is needed. The cochlea solves this by realizing a tonotopic map, mapping different frequency components along logarithmically spaced positions of the cochlear channel. Emulating such spatiotemporal processing with simple, scalable hardware remains an open challenge for neuromorphic computing. Resistive switching memory (RRAM) devices have attracted strong interest for implementing artificial neurons and synapses in high-density, energy-efficient neural networks. State-of-the-art systems mainly rely on spatial coding and introduce temporal dynamics via auxiliary CMOS circuitry and sophisticated pulse encoding, which compromises area, energy, and biological plausibility, as RRAMs are often used as static first-order memristors. To enable device-level computation over time and frequency, innovative memristive materials and methodologies are needed to exploit intrinsic device dynamics, including stochasticity and volatility, to capture temporal features over broad scales. By leveraging the dynamic, stochastic response of volatile memristors, this work demonstrates device-level spatial mapping of temporal spike signals on a logarithmic scale, where volatility enables spontaneous relaxation and readiness for new computations, forming the basis for replicating audio processing functions of the human brain.

Literature Review

The work builds on established neuroscience of cochlear tonotopy and hair-cell transduction, where frequencies from ~20 Hz to 20 kHz are mapped logarithmically along the cochlea and detected at specific spatial locations. Prior neuromorphic hardware often employs RRAM as static, non-volatile weights for in-memory computing accelerators, with temporal processing delegated to CMOS circuits. Research on memristive devices has demonstrated synaptic-like behaviors (e.g., STDP, Hebbian learning) and dynamical memristors, but demonstrations of complete spatiotemporal primitives analogous to cochlear processing remain limited. The authors position volatile, stochastic Ag/HfOx RRAM as enabling intrinsic temporal computation (burst-based stochastic integration, rate sensitivity) and logarithmic mapping, addressing the gap between broad temporal feature ranges in biology and limited linear scales in prior device-level implementations.

Methodology

Devices and circuits: The study uses volatile Ag/HfOx RRAM in a 1T1R (one-transistor/one-resistor) configuration. The switching layer is ~5 nm HfOx between an Ag top electrode (TE, ~100 nm) and a graphitic carbon bottom electrode (BE, 70 nm pillar) connected to a Si-based CMOS transistor. Device is initially in HRS; applying positive bias forms a conductive filament (CF) of Ag (SET to LRS). Upon removing bias, the CF dissolves spontaneously (volatility), returning the device to HRS without explicit RESET, enabling unipolar operation. A selector transistor limits current (compliance Ic) and supports readout. Pulsed switching experiments: Applied trains of identical voltage spikes with pulse width Tpulse = 2.5 µs within a fixed time window Twindow = 25 ms. Frequencies spanned from 20 Hz to 20 kHz (log-spaced). The switching time (ton) corresponds to the time until the device turns ON within the train; the number of spikes to SET (Nset) depends on spike amplitude and frequency. The switching probability for a train is defined as Pswitch = Ntrains_ON / Ntrains for given amplitude V, frequency f, and window Twindow. Device-to-device and cycle-to-cycle stochasticity in Vset are modeled as normally distributed; Vhold defines CF dissolution onset. Frequency sensing circuit: Parallel 1T1R cells share a common BE and gate bias (ensuring similar ON currents via transistor saturation). Each device’s TE receives a spike train at the same frequency but with decreasing voltage amplitudes across devices (e.g., V1 = 2 V, V2 = 1.5 V, V3 = 1 V). Higher VTE cells are more sensitive (higher Pswitch at lower f). The summed BE current indicates how many devices have switched ON; distinct steps correspond to sequential device switching. Memristive tonotopic map (MTM): Extends the parallel array by adding XOR gates between adjacent device outputs to detect the boundary between the last ON and first OFF device. With N cells arranged by decreasing VTE, device i+1 responds to higher frequencies than device i. The XOR outputs yield frequency-selective responses with peaks at specific bands across the 20 Hz–20 kHz range. Redundancy (multiple parallel devices per frequency channel) averages stochasticity, akin to hair-cell redundancy. Speech recognition pipeline: Raw audio is converted to spike trains via analog-to-spike (A2S) conversion using three amplitude thresholds per channel to capture different pressure (intensity) levels, generating input spike trains for the MTM. The XOR outputs form a compact feature vector fed to a small feed-forward neural network (FFNN) for classification. For interpretability tests, an MTM with n = 3 channels and parallelization N = 20 per channel was used; the FFNN was trained and evaluated on repeated utterances of four words. Fabrication and characterization: Devices co-integrated with standard CMOS. Depositions performed by e-beam evaporation at ~3 × 10^−6 mbar without vacuum break. Electrical characterizations conducted on a probe station using rhodium-plated tungsten probes. Quasi-static I–V via Agilent HP4156C. Dynamic pulsing via Aim-TTi TGA12104; acquisition via Tektronix MSO58 oscilloscope. Measurement protocol and simulations: In switching-probability measurements, pulse amplitude and frequency were randomized per cycle to avoid correlations; all combinations over selected sets were tested. Probabilistic device model calibrated to experiments (including device-to-device variability) implemented in MATLAB R2022b; used for large-scale simulations of frequency sensing, MTM responses, and speech recognition. Neural network training/inference performed using MATLAB’s Statistical and Machine Learning Toolbox. Code and data available on request.

Key Findings
  • Volatile Ag/HfOx 1T1R devices exhibit clear threshold (Vset) and hold (Vhold) behavior with stochastic cycle-to-cycle Vset variations; OFF currents are below pA resolution, with HRS in tens of TΩ.
  • Under spike trains (Tpulse = 2.5 µs, Twindow = 25 ms), the number of spikes to switch (Nset) and switching time (ton) decrease with increasing spike amplitude and frequency. Example: a device switched after Nset = 38 spikes to G = 8 µS in one test; higher amplitudes reduce ton.
  • Switching probability Pswitch increases with both spike frequency and amplitude; heat maps show monotonic trends across 20 Hz to 20 kHz and 1–2 V ranges, with frequencies spaced logarithmically over three orders of magnitude.
  • Frequency-sensing circuit (parallel devices with descending VTE) demonstrates that the number of ON devices within 25 ms increases with input frequency. Histograms of ON-device counts shift right with higher f. The average ON-device count increases linearly with log10(f), enabling a logarithmic-to-linear tonotopic mapping akin to the cochlea.
  • MTM with XOR boundary detection yields frequency-selective channels: each XOR output peaks at a characteristic frequency, covering the 20 Hz–20 kHz range. Simulations (model calibrated to experiments) confirm distinct, tunable band responses; redundancy (e.g., 500 devices per frequency in a 30-cell MTM) enhances robustness.
  • A musical example (Beethoven Symphony No. 9 Finale, ~260–390 Hz) shows the active XOR channel following the score’s changing notes in simulation.
  • Speech recognition: Using an MTM with n = 3 channels, parallelization N = 20, A2S with three thresholds per channel, and a small FFNN classifier, four words ("yes", "no", "up", "down") spoken by one person with 20 repeats each were processed. Using 50 MTM-output examples for training and 50 for inference per word, the system achieved 96.5% accuracy (confusion matrix reported), with distinct XOR activity patterns reflecting differing spectro-temporal content.
  • Energy estimation per spike through an ON device: E ≈ VT × Ic × Tpulse = 1 V × 16 µA × 2.5 µs ≈ 40 pJ under measurement-friendly Ic; devices can operate with Ic down to ~10 nA, implying orders-of-magnitude lower energy. Volatility enables spontaneous reset without explicit energy for discharge, supporting efficient asynchronous operation and parallelization.
Discussion

The results show that volatile, stochastic memristors can implement core cochlear-like spatiotemporal primitives at the device and small-circuit level. By exploiting intrinsic switching probabilities over spike bursts, the system performs logarithmic integration and maps a broad frequency range (three decades) into a linear spatial representation (device index), mirroring cochlear tonotopy. This device-level computation reduces reliance on bulky capacitive integrators and complex CMOS temporal coding, improving energy efficiency and biological plausibility. The volatility provides automatic return to a ground state (all devices OFF), simplifying control and enabling asynchronous, event-driven operation with unipolar programming. The demonstrated MTM yields interpretable, frequency-selective outputs (XOR channels) that can be directly consumed by simple classifiers, bridging neuromorphic front-ends and downstream learning. Energy and scalability considerations are favorable: ultra-high HRS minimizes standby power, ON-current can be minimized to nA levels, and unipolar operation eases selector and driver design. The approach generalizes beyond audio to other modalities featuring logarithmically distributed temporal features (e.g., tactile, event-based vision). The work thus expands the set of neuromorphic primitives realizable with memristive devices and suggests paths toward dense, low-power, explainable hardware for temporal pattern recognition.

Conclusion

This work introduces neuromorphic circuits based on volatile Ag/HfOx RRAM that perform cochlea-inspired spatiotemporal processing at the device level. The authors demonstrate stochastic spike-burst integration, frequency-dependent switching probability, and a memristive tonotopic map that linearly encodes logarithmically spaced frequencies and supports interpretable, frequency-selective channels. The MTM enables a compact speech-recognition pipeline, achieving 96.5% accuracy on a four-word task with minimal circuitry. Volatility affords automatic reset and unipolar operation, promising low energy and simplified design. Future directions include scaling MTM size and redundancy to handle richer vocabularies and phoneme-level tasks, extending to other sensory domains, integrating more channels and parallel MTMs, and exploring advanced selectors and lower-current operation for further energy and area gains.

Limitations

The study focuses on proof-of-concept demonstrations with limited-scale arrays and simulations; large-scale hardware MTMs and extensive benchmarks are not presented. Speech recognition is demonstrated on a small dataset (four words, single speaker, 20 repetitions), so generalization across speakers and vocabularies is untested. XOR-based readout and classifier are simulated; full end-to-end on-chip integration is not shown. Energy estimates use measurement-friendly currents; ultra-low-current operation, while feasible, is not experimentally characterized here in full system context. Device stochasticity is mitigated via averaging/parallelization in simulations, but hardware-level redundancy at large scales remains to be validated.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny