Computer Science

Multi-compartment Neuron and Population Encoding improved Spiking Neural Network for Deep Distributional Reinforcement Learning

Y. Sun, Y. Zeng, et al.

Discover a groundbreaking approach to deep distributional reinforcement learning through a brain-inspired spiking neural network that mimics biological neuron structures. This innovative research, conducted by Yinqian Sun, Yi Zeng, Feifei Zhao, and Zhuoya Zhao, reveals extraordinary results in Atari game experiments, outperforming traditional ANN methods.

00:00

~3 min • Beginner • English

Index

Introduction

Spiking neural networks (SNNs) model biological neuron dynamics and offer advantages in energy efficiency, robustness, and biological plausibility, with applications across vision, speech, decision-making, and robotics. However, most SNNs rely on simplified point-neuron models such as LIF that ignore structural properties of biological neurons, potentially limiting computational and learning capacity. Neurobiological evidence shows dendrites play unique roles in integrating synaptic inputs, with pyramidal neurons exhibiting segregated computing compartments (apical and basal dendrites, soma). Prior multi-compartment models have largely been studied on simple tasks (e.g., MNIST). This work explores whether biologically inspired multi-compartment neurons and population coding can improve SNNs on complex decision-making tasks, specifically deep distributional reinforcement learning (DRL). The authors propose integrating a three-compartment neuron model (apical dendrite, basal dendrite, soma) with a population-based spike encoding of quantile fractions to create a multi-compartment spiking FQF (MCS-FQF) for Atari games, directly trained with surrogate gradients. The goal is to enhance information representation and integration in SNNs to match or surpass ANN-based baselines in DRL.

Literature Review

The paper reviews neuron models from simple LIF point neurons to biophysically detailed Izhikevich and Hodgkin–Huxley models, noting most do not incorporate structural dendritic compartments. Neuroscience studies show dendritic integration increases neuronal processing capacity; pyramidal neurons can be modeled as multi-layer networks and exhibit dendritic spikes and compartmentalized computation (Poirazi et al., Kampa and Stuart, Smith et al., Gidon et al.). Learning frameworks leveraging dendrites include dendritic prediction and target-based learning (Urbanczik and Senn; Sacramento et al.; Lansdell et al.; Capone et al.). Multi-compartment models have been implemented on neuromorphic hardware (Shrestha et al.; Kopsick et al.). In RL with SNNs, prior works include DQN-to-SNN conversion for Atari (Tan et al.), knowledge distillation (Zhang et al.), direct training with STBP (Liu et al.; Chen et al.), normalization to mitigate spike vanishing (Sun et al.), and population-coded SNNs for continuous control (Tang et al.). Distributional RL more closely reflects brain decision-making (Lowet et al.), and Fully Parameterized Quantile Function (FQF) is a strong ANN baseline (Yang et al.). There remains a gap in applying multi-compartment SNNs to complex DRL tasks and in effectively representing continuous variables (quantile fractions) within SNNs, motivating the proposed approach.

Methodology

The authors propose MCS-FQF, a multi-compartment spiking Fully Parameterized Quantile Function model for distributional RL. Architecture: (1) A spiking convolutional neural network (SCNN) encodes Atari image observations into spike trains (three conv layers with LIF neurons), producing state spike embeddings Os. (2) Quantile fraction proposal: N fractions τ are defined with τ0=0, τN=1 and τi computed from softmax logits φ derived from Os via a learnable projection Wf, where probabilities pk=softmax(φ) and τi=∑_{k=0}^{i-1} pk. (3) Population encoding for quantile fractions: Each fraction τi is encoded into spikes using a population of M neurons with Gaussian receptive fields; neuron j’s firing rate for τi is r_ij = φ_j exp(- (τ_i - μ_j)^2 / (2σ_j^2)), with μ_j=j/N and σ_j=C for all j. Spikes are sampled from a Poisson process over a time window T, yielding S_{τi}. (4) Multi-compartment neuron (MCN): A three-compartment neuron integrates basal dendrite input (state spikes) and apical dendrite input (population-coded fraction spikes). Basal and apical compartment potentials Vb and Va follow first-order dynamics with learnable synapses wb and wa. The somatic potential u integrates Vb and Va with conductances gB, gA and leak gL: τL du/dt = -u + (gB/gL)(Vb - u) + (gA/gL)(Va - u). A derived theorem shows u(t) is a spatiotemporal integration of dendritic potentials. MCN fires spikes Sm(t) when u crosses threshold Vth. (5) A two-layer fully connected SNN (LIF neurons) maps MCN spike outputs to quantile value estimates F^{-1}_w(τi). The action-value is Q(s,a)=∑_{i=0}^{N-1} (τ_{i+1}-τ_i) F^{-1}_w(τ̂_i) with τ̂_i=(τ_i+τ_{i+1})/2. Training: Direct end-to-end training with Spatio-Temporal Backpropagation (STBP) using surrogate gradients for spike non-differentiability: ∂o_t/∂u_t = 2τL / (4 + (π τL u_t)^2). Losses include Huber quantile regression loss for quantile values and Wasserstein loss for optimizing fraction proposals Wf, with analytic gradients for τ and Wf. Synaptic gradients for MCN basal and apical weights are derived via backprop through time using the surrogate gradients and compartment dynamics. Experimental setup: 512 MCNs for integration; final FC SNN with 512 hidden neurons outputs N=32 quantiles per action; simulation window T=8; trained for 20M frames; Adam for Huber loss (lr_a=1e-4), RMSprop for Wasserstein loss (lr_f=2.5e-9). Hyperparameters include τA=τB=2.0, τL=2.0, gA=gB=gL=1.0, Vth=1.0, Vreset=0.0, M=64, receptive constant C=0.05.

Key Findings

- Across 19 Atari games, MCS-FQF achieves comparable or better performance than ANN-based FQF and significantly outperforms an ANN-SNN conversion-based Spiking-FQF (converted with T_c=256) on most games, with faster and more stable learning. - Representative scores (mean over 10 trials): • VideoPinball: MCS-FQF 606,765.7 ± 250,377.9 vs FQF 357,333.9 ± 282,594.5 and ANN-SNN 244,185.7 ± 203,054.5. • Hero: MCS-FQF 36,912.5 ± 1,091.4 vs FQF 25,709.5 ± 16,881.6 and ANN-SNN 16,851.7 ± 1,389.7. • Kangaroo: MCS-FQF 15,080.0 ± 325.0 vs FQF 11,520.0 ± 1,647.3 and ANN-SNN 7,488.6 ± 1,248.4. • Krull: MCS-FQF 11,276.0 ± 599.5 vs FQF 10,207.0 ± 773.0 and ANN-SNN 6,490.2 ± 820.2. • Enduro: MCS-FQF 4,156.4 ± 874.7 vs FQF 3,421.2 ± 1,274.0 and ANN-SNN 2,019.0 ± 591.3. - On some simpler tasks with few actions, MCS-FQF is slightly below FQF but still above ANN-SNN conversion, e.g., Asteroids (MCS-FQF 1,666.0 vs FQF 2,292.0), BeamRider (15,302.2 vs 18,192.0), KungFuMaster (34,820.0 vs 44,460.0). - Ablation studies: • Replacing MCN with two groups of LI neurons (S-FQF-POP) degrades performance versus MCS-FQF, indicating the importance of compartmental integration. • Using cosine embedding for fractions (S-FQF) further degrades performance and slows learning compared to population encoding, highlighting the effectiveness of the proposed population spike encoding for continuous fraction representation. - Analysis of MCN activity in MsPacman shows apical dendrites can inhibit or facilitate somatic spiking depending on their potential relative to basal input, supporting improved, distinguishable spiking representations and better information integration.

Discussion

The proposed MCS-FQF leverages biologically inspired mechanisms to address limitations of point-neuron SNNs in complex DRL. The MCN integrates state and fraction information at spatiotemporal scales, enabling richer computations than pointwise operations in FQF or simple combinations in converted SNNs. Population encoding maps continuous quantile fractions into a high-dimensional spiking space that SNNs can effectively process, outperforming cosine embeddings. Spiking activity analyses reveal cooperative and inhibitory interactions between apical and basal dendrites that modulate somatic firing, enhancing representational richness. These factors explain MCS-FQF’s improved stability and performance over ANN-SNN conversion and its competitive or superior results compared to ANN FQF across diverse Atari tasks.

Conclusion

The paper introduces a biologically inspired multi-compartment neuron model and a Gaussian receptive-field population encoding method to realize an SNN-based fully parameterized quantile function (MCS-FQF) for deep distributional reinforcement learning. Applied to 19 Atari games, MCS-FQF achieves faster, more stable learning and outperforms or matches the ANN-based FQF baseline while significantly surpassing an ANN-SNN conversion approach. Ablation studies confirm that both the MCN and the population encoding are critical to performance gains. The work demonstrates that incorporating neural structure and encoding principles can enhance SNNs for complex decision-making tasks, and provides an end-to-end surrogate-gradient training framework for multi-compartment SNNs in distributional RL.

Limitations

The MCN model includes dendritic-to-somatic influence but omits bidirectional soma–dendrite interactions to avoid feedback loops that can cause oscillations and destabilize backpropagation. Future work aims to incorporate such interactions with suitable training methods for cyclic computation graphs. Additionally, all population neurons share the same receptive field width; optimizing heterogeneous receptive fields and activity patterns to better represent spiking information remains an open problem.

Related Publications

Explore these studies to deepen your understanding of the subject.

Engineering and Technology

Optical neural network via loose neuron array and functional learning

Y. Huo, H. Bao, et al.

Medicine and Health

Development and evaluation of deep learning algorithms for assessment of acute burns and the need for surgery

C. Boissin, L. Laflamme, et al.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Engineering and Technology

Stretchable and anti-impact iontronic pressure sensor with an ultrabroad linear range for biophysical monitoring and deep learning-aided knee rehabilitation

H. Xu, L. Gao, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny