logo
ResearchBunny Logo
Causal-aware reliability assessment of single-channel EEG for transformer-based sleep staging

Medicine and Health

Causal-aware reliability assessment of single-channel EEG for transformer-based sleep staging

Y. Hu, X. Yang, et al.

Single-channel EEG sleep staging promises practical home monitoring but faces reliability gaps vs. clinical PSG. This study proposes a Transformer-based sleep staging model and a causal-inspired analysis of EEG channel selection. Experiments by Yongkang Hu, Xiangbo Yang, Yunhan Xu, and Jingpeng Sun show electrodes over the central brain region yield significantly higher accuracy, macro-F1, and consistency than frontal or occipital placements, informing the design of more robust wearable systems.... show more
Introduction

The study addresses the need for reliable, automated sleep staging suitable for wearable, home-based monitoring using single-channel EEG. Manual PSG scoring is labor-intensive, time-consuming, and subject to inter-rater variability, motivating automated approaches. While deep learning methods have advanced sleep staging, many rely on multi-modal or multi-channel data less suitable for portable systems. Single-channel EEG methods exist but often lack systematic justification for channel selection and do not assess how performance varies across channels or sleep stages. This work proposes a transformer-based single-channel model and investigates, from a causal-inspired perspective, how EEG electrode location influences staging reliability. It further examines stage-wise variability and explores augmenting EEG with EOG to improve performance and interpretability for deployable wearable systems.

Literature Review

Two major lines of automatic sleep staging are reviewed: (1) traditional machine learning that depends on hand-crafted temporal/frequency features and classifiers (SVM, kNN, random forest, decision trees), which struggle with generalization; and (2) deep learning methods that learn representations end-to-end and have achieved strong performance. Examples include 3DSleepNet (3D CNNs capturing spatial-spectral-temporal dynamics), XSleepNet (multi-view sequence-to-sequence over raw and time-frequency modalities), and interpretable Bi-LSTM approaches with signal decomposition. For single-channel EEG, notable models include DeepSleepNet (CNN-based), U-Time (fully convolutional, U-Net inspired), TinySleepNet (CNN + RNN), and SleepTransformer (transformer-based single-channel). Despite promising results, prior single-channel studies select practical channels (e.g., Fpz-Cz, Pz-Cz, C3-A2, C4-A1) without systematic justification and seldom analyze channel-dependent performance variations or stage-specific limitations—gaps this paper targets.

Methodology

Problem formulation: Single-channel EEG sleep staging is treated as sequential multi-class classification over 30-second epochs labeled as WAKE, N1, N2, N3, and REM per AASM. Given inputs X in R^{T×C} (C includes EEG and optionally EOG), a transformer-based network F maps epochs to class labels. Models: Two architectures are proposed.

  • SingleSleep: operates on raw single-channel EEG.
  • SingleSleepPlus: integrates single-lead EEG and EOG for cross-modal fusion. Components:
  1. Multi-scale 1D-CNN feature extractor: Captures local (sleep spindles, K-complexes, slow waves) and more global context by using parallel convolutional branches with varied kernel sizes. Following Pradeepkumar et al. (2022), three branches are used: (i) one 1D-CNN with kernel 50; (ii) two 1D-CNNs with kernels 25 and 2; (iii) three 1D-CNNs with kernels 5, 5, and 2. Each conv is followed by LeakyReLU and batch normalization. Outputs are concatenated along the embedding dimension, passed through a 1×1 conv, LeakyReLU, and batch norm. Non-overlapping 0.5 s windows at 200 Hz form the features, mapping a single-channel sequence of length T to a feature sequence of length T/(0.5×fs) with embedding size E.
  2. Transformer encoder: For each modality, a trainable CLS token is prepended to the sequence; sinusoidal or learned positional encodings are added. Multi-head self-attention computes interactions via Q=XW_Q, K=XW_K, V=XW_V, with attention Softmax(QK^T/√d_q)V, followed by standard transformer blocks.
  3. Cross-modal fusion (SingleSleepPlus only): The CLS outputs from each modality’s transformer are fused via a self-attention-based fusion block to exchange class-level information. Fused class tokens are combined with modality features and fed to a feed-forward classification head. Dataset: Experiments use ISRUC-Sleep (Khalighi et al., 2016) healthy subset ISRUC-S3 (10 healthy subjects, ages 30–58), sampled at 200 Hz. Six EEG channels (F3-A2, C3-A2, O1-A2, F4-A1, C4-A1, O2-A1) and two EOG channels (LOC-A2, ROC-A1) were considered. Annotations follow AASM (WAKE, N1, N2, N3, REM) by two experts. Raw EEG is used without time-frequency transformations; no data augmentation is applied. Evaluation metrics: Overall accuracy (ACC), macro-averaged F1 (MF1), sensitivity (per-class recall averaged), specificity (per-class specificity averaged), and class-specific F1 per stage. Training setup: Adam optimizer (lr=0.001, β1=0.9, β2=0.999), batch size 32, categorical cross-entropy with class weights [Wake=1, N1=2, N2=1, N3=1, REM=2] to mitigate imbalance. Transformer and fusion modules use 8 attention heads and 128 hidden units. Implemented in PyTorch; trained on an NVIDIA RTX 3090 (24 GB).
Key Findings
  • Channel-dependent reliability: Central electrodes (C3-A2, C4-A1) provided the best single-channel performance on ISRUC-S3. C3-A2 achieved Acc 77.16%, MF1 71.61%, Sens 74.14%, Spec 94.43%; C4-A1 achieved Acc 76.79%, MF1 69.98%, Sens 72.76%, Spec 94.26%.
  • Frontal vs occipital: Frontal (F3-A2, F4-A1) outperformed occipital (O1-A2, O2-A1) overall. Occipital channels were the poorest overall but contributed stronger REM identification.
  • Stage-wise patterns by channel: Frontal channels best identified N1; central channels excelled at N2 and N3; occipital channels were relatively better for REM.
  • EOG augmentation benefits (SingleSleepPlus): Adding EOG to EEG improved performance, especially for REM and N1. Using C4-A1 with LOC-A2 yielded the best overall Acc 81.50%, MF1 77.15%, Spec 95.32%, with notable gains for N1 (class F1 81.55%) and strong REM F1 70.20. The largest sensitivity observed was 78.55% with F3-A2 & LOC-A2. Some pairs achieved best per-class F1: e.g., F3-A2 & ROC-A1 N2 F1 90.33; F3-A2 & LOC-A2 REM F1 73.64.
  • Case example: For one subject (ISRUC-S3, subject 8), C3-A2 alone yielded Acc 76.29%, MF1 66.01; adding LOC-A2 raised Acc to 84.43%, MF1 to 79.97. Overall, electrode choice causally impacts reliability, with central leads most reliable for single-channel staging, and EOG providing critical complementary cues for REM and N1.
Discussion

The work addresses whether single-channel EEG can reliably support automated sleep staging for wearables and how electrode selection affects reliability. Results demonstrate that central electrodes (C3/C4) offer the most reliable overall performance, aligning with EEG physiology for N2/N3 features (spindles, slow waves). Frontal channels favor N1 detection, while occipital channels better capture REM-related features, underscoring that optimal channel choice depends on the target stage. Augmenting EEG with EOG significantly improves recognition of REM and N1 stages—stages where ocular activity is diagnostically informative—highlighting the value of multimodal fusion in mitigating single-channel limitations. These findings provide causal-inspired insights into spatial determinants of classification reliability, guiding practical channel selection and sensor design for wearable sleep monitoring and enabling more interpretable, stage-aware deployment strategies.

Conclusion

This paper proposes SingleSleep (single-channel EEG) and SingleSleepPlus (EEG+EOG) transformer-based models and conducts a causal-aware analysis of how electrode selection influences sleep staging reliability. Empirical results on ISRUC-S3 show central EEG channels achieve the best single-channel accuracy and macro-F1, while stage-specific performance varies by region (frontal:N1, central:N2/N3, occipital:REM). Incorporating EOG markedly boosts performance, particularly for REM and N1, indicating that multimodal cues alleviate key single-channel weaknesses. Future work will expand analyses to more channels to build a more systematic and comprehensive reliability assessment for wearable applications.

Limitations
  • Single-channel EEG alone shows reduced reliability for REM and N1 compared to when EOG is included, indicating modality limitations for these stages.
  • The channel analysis is limited to six EEG channels and two EOG channels from the ISRUC-S3 healthy subset; broader channel coverage and diverse populations remain to be studied.
  • No data augmentation or time-frequency feature engineering was used; while ensuring reproducibility, this may limit performance under certain conditions.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny