Computer Science
High-performance deep spiking neural networks with 0.3 spikes per neuron
A. Stanojevic, S. Woźniak, et al.
This innovative research, conducted by Ana Stanojevic and her team, delves into the training of time-to-first-spike networks, tackling the challenges inherent in biologically-inspired spiking neural networks. The findings unveil a specific parameterization that enables SNNs to achieve performance on par with traditional neural networks while utilizing fewer spikes per neuron.
~3 min • Beginner • English
Introduction
The study addresses why deep time-to-first-spike (TTFS) spiking neural networks (SNNs) have been difficult to train to match artificial neural networks (ANNs), despite exact ANN→SNN mappings. The authors hypothesize that training instabilities stem from a vanishing-or-exploding gradient problem specific to TTFS learning dynamics and that an appropriate parameterization can ensure stable gradients and equivalence of SNN and ReLU ANN training trajectories. The work is motivated by the need for energy-efficient inference with sparse, binary spikes, enabling deployment on low-power neuromorphic hardware while closing the accuracy gap with ANNs on standard vision benchmarks.
Literature Review
Prior SNN training used biologically inspired plasticity rules or surrogate gradients/pseudo-derivatives, which enabled deeper training but often at the cost of many spikes or approximation. Conversion approaches map ANNs to SNNs via rate coding or temporal coding; many are approximate, though exact mapping from ReLU to TTFS has been shown. Exact gradient-based TTFS training historically scaled poorly beyond shallow networks unless ad-hoc approximations were introduced. Hardware constraints (quantization, noise, sparsity) further complicate training; common pipelines convert pre-trained ANNs then fine-tune, but robust, scalable TTFS gradient descent methods capable of deep training remained lacking. This work builds on and extends exact mapping theory to a more general TTFS model and analyzes the Jacobians to pinpoint gradient pathologies and their remedy.
Methodology
- Model: A feed-forward TTFS SNN with N hidden layers (fully connected or convolutional). Inputs in [0,1] are encoded to spike times; each neuron emits at most one spike in [t_min^(n), t_max^(n)). Membrane potential has two regimes: initial slope A^(n) until t_min^(n), then fixed slope B^(n) thereafter; threshold θ^(n)=θ−D^(n). A long refractory period ensures single-spike behavior.
- Reverse mapping to ReLU: For arbitrary B^(n), an exact reverse mapping defines equivalent ReLU weights and biases so that firing times map to ReLU activations x_i^(n)=(t_i^(n)−t_min^(n))/t_c for spiking neurons and 0 otherwise. The non-spiking output layer integrates to t_read and uses softmax cross-entropy identical to ReLU.
- Learning dynamics analysis: Using exact backpropagation through spike times, the gradient factorizes via Jacobians dt^(k)/dt^(k−1). The Jacobian involves a binary mask M^(n−1) of active neurons and a scaling by 1/B^(n), showing that eigenvalues are determined by B^(n) as well as W^(n). Standard ANN initializations can yield vanishing/exploding gradients in TTFS (eigenvalues outside the unit circle).
- Two remedies at initialization:
1) α1-model with a smart initialization (initialize in ReLU space then forward-map), stabilizing eigenvalues.
2) B1-model (identity mapping) with A^(n)=0 and B^(n)=1, making M(W) the identity; standard deep learning initialization then yields Jacobian eigenvalues within the unit circle.
- Equivalence of training trajectories: For the B1-model, the reverse map is identity so TTFS updates equal ReLU updates, ensuring identical gradient descent trajectories. In α1-model, nonlinear mapping induces step-dependent, neuron-pair-specific distortions that cannot be corrected by a scalar learning rate.
- Practical training: Implement the B1-model (identity mapping) with exact gradients and adaptive t_max updates to keep spikes away from boundaries. Train from scratch on MNIST/fMNIST and fine-tune converted VGG16 SNNs on CIFAR10/100 and PLACES365. For sparsity, apply L1 regularization during pretraining/fine-tuning. For hardware constraints, perform fine-tuning under simulated spike-time jitter, time quantization (discrete steps per layer), weight quantization (down to 4 bits) and reduced latency (shrinking [t_min,t_max)).
- Architectures and details: FC2 (one hidden layer, 340 units), FC16, LeNet5 (5-layer ConvNet), VGG16 variants. Training uses Adam with exponential LR decay (typical initial LR 5e-4; fine-tuning LRs: 1e-4 for CIFARs, 1e-7 for PLACES365), batch size 8; results averaged over 16 trials for deeper nets. Exact mapping fuses batch norm into neighboring layers where present in pre-trained models.
- Hardware-energy rationale: SNN energy dominated by spike transmission and capacitor charging; sparsity (spikes/neuron) directly reduces transmission costs. Experiments target low SNN Sparsity (<0.3 spikes/neuron).
Key Findings
- Theoretical: Identified a severe vanishing/exploding gradient issue in TTFS SNNs due to Jacobian scaling by 1/B^(n). Setting B^(n)=1 (B1-model) yields stable eigenvalues at initialization and ensures exact equivalence of SNN and ReLU training trajectories.
- Training equivalence: In 8-layer MNIST experiments, B1-model training curve matches the ReLU curve, reaching 100% training accuracy in <100 epochs; α1-model deviates and requires much smaller LR and far longer training to converge.
- From-scratch results (MNIST/fMNIST):
• MNIST FC2: ReLU 98.30%, SNN (ours) 98.30%.
• MNIST FC16: ReLU 98.43±0.07%, SNN (ours) 98.43±0.07%.
• MNIST VGG16: ReLU 99.57±0.01%, SNN (ours) 99.58±0.01%.
• fMNIST FC2: ReLU 90.14%, SNN (ours) 90.14%.
• fMNIST LeNet5: ReLU 90.91±0.17%, SNN (ours) 90.94±0.25%.
These match or surpass prior SNN baselines of similar size.
- Large-scale fine-tuning (VGG16):
• CIFAR10: pretrain/convert 93.59%; after fine-tuning 93.69±0.02% with SNN Sparsity 0.38. With L1: 93.28±0.02% and Sparsity 0.20 (≈0.2 spikes/neuron).
• CIFAR100: pretrain/convert 70.48%; after fine-tuning 72.23±0.06% (Sparsity 0.38). With L1: 72.20±0.04% (Sparsity 0.24). Prior SNN work reported 65.98% at Sparsity 0.28.
• PLACES365: pretrain/convert 52.69%; after fine-tuning 53.86±0.02% (Sparsity 0.54). With L1: 48.88±0.06% (Sparsity 0.27).
- Hardware-constraint fine-tuning (CIFAR10, VGG16):
• Spike-time jitter: fine-tuning recovers/improves accuracy; moderate jitter can slightly improve performance.
• Time quantization: with as few as 16 time steps per layer, fine-tuned SNN achieves >90% test accuracy.
• Weight quantization: quantization-aware fine-tuning yields >90% accuracy with 4-bit weights.
• Reduced latency: shrinking [t_min,t_max) without fine-tuning degrades accuracy; with fine-tuning, >90% accuracy is recovered with up to 4× latency reduction.
- Spiking sparsity: Achieved <0.3 spikes/neuron on large datasets; as low as 0.2 spikes/neuron on CIFAR10 with L1, implying significant energy savings.
Discussion
The analysis proves that TTFS SNN training difficulties arise from layer-wise Jacobian scaling, causing gradient explosion/vanishing. Enforcing a constant slope at threshold (B^(n)=1) eliminates this instability and, crucially, makes TTFS gradient updates identical to ReLU updates, ensuring identical learning trajectories and final performance. Empirically, this closes the ANN–SNN performance gap across datasets and architectures while preserving and further enhancing spiking sparsity for energy efficiency. The framework also supports hardware-in-the-loop fine-tuning, enabling robustness to practical constraints such as timing noise, coarse temporal resolution, low weight precision, and low-latency operation, with minimal accuracy loss. The results suggest TTFS SNNs can serve as drop-in, energy-efficient counterparts to ReLU networks for deployment, aligning with neuroscientific observations that early spike timing carries substantial information.
Conclusion
The work provides a principled method to train deep TTFS SNNs that match ReLU networks exactly in both training dynamics and accuracy by using an identity mapping (B1-model) with constant-slope threshold crossings and exact spike-time gradients. It demonstrates state-of-the-art SNN performance on MNIST, fMNIST and closes the gap on CIFAR10/100 and PLACES365, achieving high sparsity (<0.3 spikes/neuron) and robustness under hardware constraints. Future directions include extending the approach to architectures with skip connections (ResNets), incorporating batch normalization directly during SNN training (rather than fusing), handling temporal/sequential data (videos) possibly with recurrent networks, addressing fully asynchronous SNN training instabilities, and enabling on-chip or hardware-in-the-loop continual learning on neuromorphic platforms.
Limitations
- Current theory and experiments focus on feed-forward architectures without skip connections; extensions to ResNets are left open.
- Batch normalization is not trained natively in SNNs; it is fused from pre-trained ReLU models before fine-tuning.
- Experiments emphasize static image tasks; handling temporal streams and recurrent processing remains future work.
- Training very deep SNNs can exhibit numerical instabilities; results for deeper nets are averaged over multiple trials.
- Hardware jitter explored mainly as local random jitter; random layer-wise phase shifts were not tested and may require modified coding (e.g., relative timing).
- The approach assumes per-layer timing windows with some sequencing between layers; fully asynchronous operation and its training stability are not fully addressed.
Related Publications
Explore these studies to deepen your understanding of the subject.

