Engineering and Technology

A framework for the general design and computation of hybrid neural networks

R. Zhao, Z. Yang, et al.

This innovative research introduces a groundbreaking framework for hybrid neural networks (HNNs), merging the capabilities of spiking neural networks and artificial neural networks. By utilizing hybrid units, the framework effectively integrates different information flows, demonstrated through case studies in sensing, modulation, and reasoning networks. This exciting work was conducted by a team of talented researchers including Rong Zhao, Zheyu Yang, and others.

00:00

~3 min • Beginner • English

Index

Introduction

The paper addresses how to systematically integrate artificial neural networks (ANNs) and spiking neural networks (SNNs) to approach artificial general intelligence (AGI) capabilities. While ANNs and SNNs offer complementary strengths, their fundamental differences in coding schemes, synchronization, and neuronal dynamics hinder direct integration. Existing hybrid approaches are often task-specific and limited. The authors propose a general-purpose framework that decouples ANNs and SNNs and connects them via hybrid units (HUs) to transform and modulate heterogeneous information flows across spatial and temporal scales. The goal is to retain the key advantages of both paradigms while enabling flexible, scalable hybrid model design applicable across tasks and domains.

Literature Review

The paper situates the work within efforts to combine computer-science-oriented (ANNs) and neuroscience-oriented (SNNs) models, noting supporting neuromorphic platforms (e.g., Loihi, SpiNNaker) and hybrid hardware/software stacks. Prior hybrid modeling attempts focused on specific features (e.g., efficiency, biological plausibility) for particular tasks, lacking a general framework. The authors reference work on SNN/ANN differences in coding and dynamics, and neuromorphic systems that motivate cross-paradigm integration.

Methodology

Framework: The core concept is the hybrid unit (HU), a reconfigurable interface that transforms information between ANN’s synchronous real-valued signals and SNN’s asynchronous spikes. An HU comprises four steps: windowing W to synchronize time scales, kernel H to extract spatiotemporal features and form an intermediate representation, nonlinear transformation F to realize complex mappings, and optional discretization Q to match target domain representations. Formally, Y = HU[X] = Q(F(H(W(X)))). The components are parameterizable and support both manual design (using prior knowledge for deterministic/simple mappings) and automatic learning (for non-deterministic, complex, or unknown mappings). Three learning configurations are described: joint training with connected networks, independent training with separate objectives, or end-to-end training with the complete model. Theoretical notes argue universal approximation capability under mild conditions. Comparative neuron models: The paper contrasts ANN neurons (linear transform plus differentiable nonlinearity) with SNN neurons (membrane potential dynamics and threshold-triggered spikes), highlighting different signal forms and coding strategies. Demonstration 1 — Hybrid Sensing Network (HSN): Architecture adopts dual pathways—an ANN “what” pathway for static features from APS frames and an SNN “where” pathway for dynamic features from DVS events. The model predicts next-frame features SF(t+Δt) by combining current static features SF(t) with HU-transformed dynamic features from the SNN, then performs object tracking by template matching. Learnable HUs, jointly trained with frontend networks, fuse features. SNN dynamics use an iterative LIF model trained via spatiotemporal backpropagation (BPTT). Demonstration 2 — Hybrid Modulation Network (HMN): A hierarchical system for meta-continual learning (MCL). An ANN backbone extracts task-level information and, via learnable HUs, produces modulation signals that adjust SNN branch neuron thresholds, controlling excitability. The backbone/HU training objective aligns modulation signals with task similarity (defined for permuted N-MNIST via Hamming distance between permutation indices), using cosine similarity constraints and sparsity. After training, sample-specific modulation signals are averaged and thresholded to form task-specific signals that modulate the SNN during sequential training. Demonstration 3 — Hybrid Reasoning Network (HRN): A multimodal reasoning model for VQA (CLEVRER). ANN-based visual (Mask R-CNN + PropNet for dynamics) and language (sequence generation model) parsers provide features and instructions. An SNN-based symbolic analyzer with integrate-and-fire neurons represents objects, attributes, and functional operations; it constructs a working-memory graph by combining prior knowledge (long-term memory) and perceived relations via Hebbian learning and one-shot plasticity. Designable HUs convert static visual facts and language instructions to spike stimuli; learnable HUs detect dynamic events (e.g., collisions) from object trajectories using a 1D U-Net with MLP heads trained with CE and MSE losses. Reasoning proceeds by spiking dynamics executing basic logical operations encoded in the network graph.

Key Findings

- General framework: HUs provide a flexible, learnable interface for heterogeneous information transformation and modulation, enabling decoupled yet cooperative ANN-SNN hybrid architectures. - HSN (tracking): On real-world streaming evaluation (Tianjic chips), HSN achieved 0.679 mIoU, over 100% higher than a real-time ANN baseline (0.33 mIoU), approaching ideal offline ANN accuracy (0.85 mIoU). HSN ran at 5952 FPS with power efficiency of 130 μJ/inference, approximately 11× faster and 2× more power-efficient than pure ANN trackers. It reduces computational redundancy by recomputing only dynamic features. - HMN (meta-continual learning on permuted N-MNIST): Modulation signals clustered by task similarity (t-SNE), and the branch SNN exhibited task-specific activation patterns that reduce interference across dissimilar tasks while reusing parameters for similar tasks. After learning 40 tasks, HMN outperformed single SNNs and regularization-based methods (e.g., EWC, SI, context-dependent gating). HMN can be combined with EWC for further gains. - HRN (CLEVRER VQA): Achieved validation accuracies of 91.65% (descriptive), 95.27% (explanatory), 85.96% (predictive), and 78.81% (counterfactual), competitive with state-of-the-art. The reasoning latency remained nearly constant as the number of objects/events increased, evidencing strong parallelism. HRN showed robustness to anomalous frontend detections (e.g., relaxed collision thresholds), outperforming NS-DR and NS-Guess by leveraging prior-knowledge-constrained spiking reasoning that narrows the answer space.

Discussion

The framework addresses the challenge of integrating ANNs and SNNs by decoupling their operations and using HUs to manage hybrid information flows in transmission and modulation across differing time scales and representations. The three demonstrations highlight how this resolves task-specific bottlenecks: HSN combines ANN precision with SNN efficiency for high-speed, high-accuracy tracking under streaming constraints; HMN leverages hybrid modulation to navigate catastrophic forgetting via task-aware parameter control; HRN uses ANN perception with SNN symbolic reasoning to obtain interpretable, robust, and scalable reasoning. The approach aligns with biological systems that integrate multi-scale, multimodal signals and suggests HNNs as prototypes for both neuromorphic applications and computational neuroscience. The versatility and learnability of HUs pave a path toward cross-paradigm systems capable of complex, real-world tasks relevant to AGI.

Conclusion

The paper introduces a general framework for hybrid neural networks that uses hybrid units to flexibly and learnably transform and modulate information between ANNs and SNNs. This decoupled design enables scalable, heterogeneous architectures suited to diverse tasks. Three case studies demonstrate improved efficiency and accuracy in streaming visual tracking (HSN), enhanced continual learning through task-driven modulation (HMN), and interpretable, parallel, and robust multimodal reasoning (HRN). The work suggests future directions including incorporating heterogeneous dynamics and connectivity within nominally homogeneous subnetworks, refining HU learning strategies, integrating additional state-of-the-art perception models, and deploying on neuromorphic hardware for large-scale, energy-efficient applications.

Limitations

- In demonstrations, the ANN and SNN components are largely homogeneous within their paradigms and do not yet exploit cross-paradigm heterogeneous dynamics/connectivity; integrating such heterogeneity is deferred to future work. - The task similarity measure used to train HMN’s backbone and HUs is problem-specific (e.g., Hamming distance of permutation indices for permuted N-MNIST); defining general task similarity remains an open problem. - Some HU configurations rely on supervised labels (e.g., collision events in CLEVRER) and prior knowledge; performance may depend on the availability and quality of such supervision and priors.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Chemistry

A generative artificial intelligence framework based on a molecular diffusion model for the design of metal-organic frameworks for carbon capture

H. Park, X. Yan, et al.

Chemistry

A general strategy for heterogenizing olefin polymerization catalysts and the synthesis of polyolefins and composites

C. Zou, G. Si, et al.

Medicine and Health

The cryo-EM structure of the bd oxidase from *M. tuberculosis* reveals a unique structural framework and enables rational drug design to combat TB

S. Safarian, H. K. Opel-reading, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny