logo
ResearchBunny Logo
An artificial sensory neuron with visual-haptic fusion

Engineering and Technology

An artificial sensory neuron with visual-haptic fusion

C. Wan, P. Cai, et al.

Explore the groundbreaking work of Changjin Wan, Pingqiang Cai, Xintong Guo, Ming Wang, Naoji Matsuhisa, Le Yang, Zhisheng Lv, Yifei Luo, Xian Jun Loh, and Xiaodong Chen as they unveil the bimodal artificial sensory neuron (BASE) that revolutionizes visual-haptic fusion. This innovative design integrates optic and pressure data to control myotubes and robotic hands, enhancing pattern recognition in cyborg and neuromorphic systems.... show more
Introduction

The study addresses how to implement supramodal sensory fusion—specifically visual-haptic integration—at the level of an artificial sensory neuron to improve perception, action control, and recognition. Motivated by biological systems that combine multiple sensory cues in distributed neural circuits (e.g., inferior parietal cortex) to enhance reliability and precision, the authors propose a bimodal artificial sensory neuron (BASE) that emulates these processes. The purpose is to collect optical and tactile stimuli, transmit them via an ionic pathway, and integrate them into postsynaptic-like currents for downstream actuation and computation. This approach aims to overcome limitations of centralized, sequential digital processing by leveraging neuromorphic, event-driven, and parallel integration akin to biological sensory neurons.

Literature Review

Prior work achieved artificial sensory neurons/synapses with single modalities (haptic or visual) for pattern recognition and actuation, but lacked supramodal fusion to increase reliability and accuracy. Biological evidence shows vision and touch are integrated to provide spatial ability and improved object perception. Synaptic transistors have been identified as suitable for multimodal fusion owing to parallel gating via ions in electrolytes. Related neuromorphic devices replicated synaptic plasticity and recognition tasks but generally remained unimodal or relied on conventional digital post-processing. The current work builds on these by integrating visual and haptic sensing into a single neuromorphic pipeline that outputs fused, time-dependent EPSC-like signals for actuation and recognition.

Methodology

Device architecture (BASE): A bimodal artificial sensory neuron integrates four components: (1) Photodetector (visual receptor): perovskite-based stack Zn2SnO4/PEA2MA2Pb3I10/PTAA/Au on PET/ITO; 2D perovskite PEA2MA2Pb3I10 provides ambient stability. (2) Pressure sensor (haptic receptor): microstructured PDMS (recessed pyramids) with top CNT-coated PDMS forming a pressure-dependent resistive pathway; increased pressure increases contact area and decreases resistance. (3) Ionic cable: PVA hydrogel (LiCl-doped) acting as an ionic transmission line analogous to an axon; bonded to electrodes via instant tough bonding. (4) Synaptic transistor: electrolyte-gated transistor (PVA gate dielectric) whose gate receives ionic signals; exponential ion relaxation produces slow EPSC-like decay. Interconnects and circuit: CNT electrodes fabricated via printing–filtration–transfer on PDMS interconnect the components. Three inputs: VY (photodetector), VH (pressure sensor), and a global inhibitory input (VIH) to reduce energy consumption; outputs VD and VS connect to the synaptic transistor with VDS = 0.5 V to measure EPSC. CNT electrode properties (sheet resistance vs density) and low interfacial impedance with hydrogel were characterized; hydrogel impedance compared for CNT vs Au contacts. Sensory channel characterizations: For the visual channel, IV curves under dark and illumination quantify resistance decrease with light intensity; EPSC peaks recorded versus light intensity (1 s pulses) and versus duration at fixed intensity. For the haptic channel, IV curves across pressures; EPSC peaks versus pressure (1 s) and versus duration at fixed pressure. Both channels produce comparable EPSC ranges; pressure sensors show better linearity and faster response. Biohybrid neuromuscular junction (BNJ) for actuation: Interdigital Au electrodes are coated with polypyrrole (PPy) via galvanostatic electrochemical polymerization to reduce electrode–cell impedance. C2C12 myoblasts are seeded after fibronectin functionalization, differentiated for 5 days to form myotubes. Electrical stimulation (converted from BASE EPSC) is applied; particle image velocimetry (PIV) measures myotube motion; regions of interest defined for quantification. Visual–haptic synchrony test for actuation: Visual stimulus (~4.8 mW cm^-2, ~250 ms; VY = +1.0 V) and haptic stimulus (~2 kPa, ~350 ms; VH = −1.0 V) applied with varying inter-stimulus interval ΔT = TH − TV. Due to EPSC decay, the second stimulus modulates the resultant EPSC amplitude. Define ΔEPSC% = (ΔEPSC − ΔEPSCY)/ΔEPSCY, where ΔEPSCY is EPSC from visual-only. Set threshold 20% to classify synchronous (ΔEPSC% > 20%) vs asynchronous (≤20%). ΔEPSC is linearly converted (0–1.0 V) to BNJ drive; migration observed if output >0.4 V. Robotic hand control: A modified BASE patch on a robotic hand uses visual feedback (LED on the ball; z-axis YES/NO) and haptic feedback (touch on half-close; y-axis YES/NO). Exploration uses ΔT = 100 ms. Fused EPSC responses distinguish three cases (V=YES/H=NO; V=NO/H=YES; V=YES/H=YES) to decide open/close action; unimodal feedback provides only 1D information and can mislead decisions. Pattern recognition with BASE matrices: 10×10 pixel arrays built for optic-only, pressure-only, and bimodal visual–haptic (VH) fusion. Each VH pixel contains one photodetector and one pressure sensor. In each row, n consecutive VH units connect to a synaptic transistor via a common ionic cable; this implements convolution-like integration with distance-dependent weights w(m) decreasing with distance. Joint EPSC (measured sum) is slightly less than arithmetic sum of individual inputs. Kernel size n defines how many pixels feed one transistor (n-VH); larger n performs spatial integration (data reduction) akin to feature extraction. Data acquisition: multi-transparency alphabetic patterns are placed on arrays for ~1.5 s with light on (~4.8 mW cm^-2), then removed with light off; outputs normalized. Perceptron setup: sensing data feed a two-layer perceptron in MATLAB (10 hidden nodes; 18 output nodes for 18 labels representing six letters × three transparencies). 720 patterns constructed by adding five random noise pixels to each base pattern; training and testing follow Supplementary protocols. Fabrication details (from Methods): Photodetector fabrication includes ITO/PET patterning (chemical etch), Zn2SnO4 nanoparticle spin coat and anneal, hot-cast perovskite (1.2 M) spin coat, PTAA spin coat, and Au evaporation. Microstructured PDMS films molded from silicon masters; CNT sprayed via ultrasonic humidifier to form conductive layers. Synaptic transistors: ITO electrodes and channel sputtered; PVA gate dielectric (in CaCl2 solution) cast and dried. CNT patterning via printed nylon filter masks and hot-press transfer to PDMS. PVA hydrogel prepared by dissolving PVA, freeze–thaw cycling, and LiCl immersion for high ionic conductivity. BNJ: interdigital Au electrodes, PPy electroplating in PTSA-doped pyrrole solution, followed by C2C12 culture and differentiation.

Key Findings
  • Conductive interconnects: CNT electrodes achieved lowest sheet resistance of 7.0 Ω sq^-1 at 33 µg cm^-2 density; interfacial impedance of PVA hydrogel with CNT contacts is nearly three orders of magnitude lower than with Au (f < 1 kHz), enabling efficient ionic–electronic transduction.
  • Power: Introducing a global inhibitory input (VIH) can reduce BASE energy consumption to <0.05 µW (Supplementary Fig. 7).
  • Sensory channels: Both photodetector and pressure sensor produce EPSCs whose peak amplitudes scale with stimulus intensity and duration; pressure sensor exhibits better linearity and faster response, but output ranges are similar, facilitating balanced fusion.
  • Synaptic transistor fusion: Measured joint EPSC from simultaneous inputs is slightly less than the arithmetic sum of individual EPSCs; input weights decay with distance along the ionic cable, enabling convolution-like spatial filtering.
  • Biohybrid actuator coupling: PPy coating reduced BNJ electrode impedance from ~1 MΩ to ~3 kΩ at 1 Hz, improving coupling. Myotube motion increases with stimulation voltage (0.2, 0.6, 1.0 V); perceptible migration requires >0.4 V at BNJ input.
  • Visual–haptic synchrony for actuation: With visual (~4.8 mW cm^-2, ~250 ms; VY=+1.0 V) and haptic (~2 kPa, ~350 ms; VH=−1.0 V) stimuli, ΔEPSC% versus ΔT shows significant facilitation for short intervals; using a 20% threshold separates synchronous (>20%) vs asynchronous (≤20%) events, which respectively do or do not activate myotubes (due to the 0.4 V actuation threshold after linear conversion).
  • Robotic control: BASE fusion disambiguates spatial YES/NO along two axes and reduces mis-grasps compared to unimodal control, using an exploration protocol with ΔT=100 ms to infer ball position and command hand opening/closing.
  • Pattern recognition: Only VH fusion maps preserve both shape and transparency; optic-only loses transparency detail; pressure-only lacks transparency information. Despite spatial integration (kernel n=5, retaining ~60% of original data), fusion recognition rate is ~66%, slightly exceeding optic-only unimodal recognition at full spatial resolution (n=1, ~65%). This demonstrates improved robustness with multimodal fusion even at reduced data size.
Discussion

The work demonstrates that neuromorphic fusion of visual and haptic cues at the device level enhances perception for action and recognition, addressing the need for supramodal integration beyond unimodal artificial sensory systems. By combining a perovskite photodetector and a pressure sensor with ionic transmission and a synaptic transistor, the BASE produces time-dependent, nonlinear EPSC signals that encode bimodal information similarly to biological neurons. This fused signal reliably triggers actuation in a biohybrid neuromuscular junction when stimuli are temporally synchronized within a tolerance window, mimicking human perception of synchrony and eye–hand coordination. For robotics, the fused response provides two-dimensional spatial inference absent in unimodal sensing, improving decision-making for grasping. In recognition tasks, the VH fusion matrix, acting as a convolution-like feature extractor via distance-weighted ionic integration, yields higher recognition rates than unimodal matrices, even when data are spatially downsampled, highlighting efficiency and robustness gains. Overall, executing sensory fusion at the neuronal device level supports distributed, event-driven processing with potential advantages in power efficiency and fault tolerance compared to centralized digital pipelines.

Conclusion

This study introduces a bimodal artificial sensory neuron (BASE) that fuses visual and haptic inputs through an ionic/electronic hybrid pathway and a synaptic transistor, producing EPSC-like outputs for downstream actuation and computation. The system successfully: (1) controls skeletal myotube contraction based on temporal synchrony of multimodal stimuli; (2) guides a robotic hand using fused two-dimensional spatial information; and (3) enables robust recognition of multi-transparency alphabetic patterns with improved accuracy over unimodal approaches, even with reduced data size via convolution-like integration. These results underscore the promise of neuronal-level sensory fusion for building biologically inspired, power-efficient, and scalable perceptual systems relevant to neurorobotics, cyborg interfaces, and neuromorphic AI. Future directions include integrating more sensory modalities, scaling to larger arrays and fully on-device learning, optimizing device materials and architectures for lower power and higher speed, and embedding the fusion hardware into closed-loop autonomous systems.

Limitations

The paper does not include a dedicated limitations section. Noted constraints include: (1) the pattern recognition evaluation uses sensing data fed to a MATLAB-based perceptron rather than fully on-device training/inference; (2) robotic demonstrations focus on simplified YES/NO positional inference and grasp decisions in a constrained setup; (3) the fusion addresses two modalities (vision and touch) only; and (4) recognition rates (~66% for fusion at kernel n=5) indicate room for improvement toward higher accuracy. Scaling, long-term stability, and real-world environmental variability are not extensively characterized within the main text.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny