Medicine and Health
Age estimation from sleep studies using deep learning predicts life expectancy
A. Brink-kjær, E. B. Leary, et al.
This groundbreaking study, conducted by a team of experts including Andreas Brink-Kjær and Eileen B. Leary, leverages deep neural networks to accurately estimate age and mortality risk from polysomnograms. With a remarkable mean absolute error of just 5.8 years, these models reveal crucial insights into how age estimation errors can significantly correlate with life expectancy.
~3 min • Beginner • English
Introduction
The study addresses whether deep learning applied to full-night polysomnography (PSG) can estimate chronological age and whether the age estimate error (AEE; predicted age minus chronological age) serves as a biomarker of mortality risk beyond traditional sleep metrics. PSGs are the gold standard for sleep assessment but are typically reduced to manually scored summary metrics such as sleep stages and apnea indices, which are time-consuming and may miss richer physiological information. Aging substantially alters sleep architecture (shorter, more fragmented sleep; reduced slow-wave sleep; fewer spindles; relatively preserved REM), and several sleep features have been linked to increased morbidity and mortality even after adjusting for age. The authors hypothesize that modeling age directly from raw PSG signals with deep learning captures latent physiological signatures of aging and health that predict all-cause and cardiovascular mortality, beyond conventional PSG-derived measures.
Literature Review
Prior work demonstrates associations between sleep-disordered breathing (e.g., AHI, hypoxic burden), arousal burden, reduced sleep efficiency, decreased slow-wave sleep, and reduced REM sleep with adverse outcomes including cardiovascular disease, cognitive impairment, and mortality. Traditional manual scoring is limited and variable. Recent deep learning approaches for PSG have primarily emulated human sleep staging rules, potentially missing additional information. A previous study modeled EEG-based sleep-stage features to estimate a brain aging index where greater age error associated with higher mortality, but it relied on hand-crafted features and EEG only. This study extends the literature by using multi-signal PSG deep learning to model age as a proxy for health and mortality risk, interpreting learned features, and testing associations of AEE with lifestyle factors and mortality outcomes.
Methodology
Data: Combined 13,332 PSGs spanning seven cohorts: STAGES, Stanford Sleep Cohort (SSC), Wisconsin Sleep Cohort (WSC), Sleep Heart Health Study (SHHS), MrOS Sleep Study (MrOS), Cleveland Family Study (CFS), and HomePAP. Mortality analyses used SHHS, MrOS, and WSC (n = 9386 with deaths = 3045 for all-cause; n = 9188 with cardiovascular deaths = 976). Participants aged ~20–90; those >89 were coded as 89 and excluded from test evaluation.
Design and splits: PSGs excluded if age unknown, split-night CPAP, <3 h sleep, or >1 missing signal. Data split into training (n = 2500; stratified to uniform age distribution), validation (n = 200), primary test (n = 10,699), and a repeated-visit test subset (n = 547). HomePAP (n = 190) served as an external-like test set (unseen cohort/technical setup). CPAP users and those with certain neurological disorders were excluded from test sets.
Signals and preprocessing: Included central EEG (C3-M2, C4-M1), EOG, chin EMG, ECG, respiratory airflow/nasal pressure, thoracic/abdominal belts, SpO2. Signals resampled to 128 Hz (SpO2 linearly), filtered with IIR filters to standardize spectral content (e.g., EEG 0.3–45 Hz high-pass; airflow/respiratory band-pass 0.1–15 Hz), and amplitude-normalized (−1 to 1 to 2nd–95th percentiles; SpO2 to 0–100%). Artifact handling and derivation selection standardized across sites.
Model architecture: Deep neural networks process 5-min epochs (Phase 1) via channel mixing, CNN with inverted residual bottlenecks, bi-directional LSTM, additive attention, and layer normalization to produce age estimates; Phase 2 aggregates the last-layer activations across the night via bi-LSTM and attention to output a final age estimate. Multiple models trained on different input combinations: (a) Central EEG; (b) EEG+EOG+EMG; (c) ECG; (d) respiratory signals; and an ensemble model (e) averaging predictions from (a)–(d).
Optimization: Two-phase training to reduce compute and increase data exposure; Huber-like loss blending L2 (<5 years error) and L1 (≥5 years), scaled so 25-year error equals loss 1; Adam optimizer (β1=0.9, β2=0.999), weight decay excluding biases; hyperparameters tuned on validation set.
Baselines and evaluation: Linear regression baseline using basic sleep measures (AHI, arousal index, TST, WASO, N1%, N2%, N3%, REM%). Performance measured by mean absolute error (MAE) and Pearson correlation, stratified by 5-year age bins to mitigate non-uniform age distribution.
Interpretability: Gradient SHAP computed relevance attribution scores over input samples. Relevance aligned to manual events (sleep-stage transitions, arousals, apneas/hypopneas) in CFS, MrOS, SHHS to validate learned patterns.
Mortality analysis: Cox proportional hazards models relating AEE (per 10-year increase) to all-cause and cardiovascular mortality. Models adjusted in stages: Model 1 (demographics, lifestyle, medications, cohort), Model 2 (Model 1 + sleep metrics and comorbidities including WASO, N2%, REM%, arousal index, AHI, SpO2<80% time, ESS, hypertension, CHF, MI, stroke, T2D), Model 3 referenced in text for survival curve generation. Proportional hazards assumption checked via scaled Schoenfeld residuals. Life expectancy change estimated by extending survival curves (Weibull fit) and comparing areas with AEE at −10 vs +10 years at ages 40, 60, 80 with covariates at means/medians.
Night-to-night variability: STAGES subset (n = 42) with two nights evaluated; differences assessed for significance.
Key Findings
- Age estimation performance (MAE, mean ± SD across 5-year age bins):
- Validation set (n = 200): Ensemble (e) 5.8 ± 1.16 years; Central EEG (a) 6.52 ± 2.48; EEG+EOG+EMG (b) 6.81 ± 1.84; ECG (c) 10.4 ± 2.23; Respiratory (d) 8.09 ± 1.89. Basic sleep measures: 14.9 ± 6.53.
- Test set (n = 10,509): Ensemble (e) 8.16 ± 3.75 years; Central EEG (a) 7.65 ± 2.70; EEG+EOG+EMG (b) 8.62 ± 2.92; ECG (c) 13.9 ± 6.74; Respiratory (d) 13.7 ± 6.05. Basic sleep measures: 12.5 ± 4.06.
- Training set (n = 2500): Ensemble (e) 6.11 ± 1.84; Central EEG (a) 5.43 ± 2.26; EEG+EOG+EMG (b) 5.35 ± 0.96; ECG (c) 9.11 ± 1.89; Respiratory (d) 8.87 ± 2.10. Basic sleep measures: 14.9 ± 6.08.
- The ensemble generalized well to HomePAP (external-like test), while Central EEG generalized best within that cohort.
- Night-to-night variability (STAGES, n = 42): AEE 5.93 years (night 1) vs 7.31 (night 2); difference −1.17 ± 5.71 years (p = 0.19), indicating no significant systematic shift across nights.
- Interpretability: Gradient SHAP indicated higher AEE relevance around arousals and transitions to lighter sleep/wake; slow-wave oscillations decreased AEE. Respiratory model relevance increased near apnea events; ECG model relevance suggested arrhythmia contributions.
- Mortality associations (per 10-year increase in AEE, combined SHHS+MrOS+WSC):
- Ensemble (e): All-cause HR 1.29 (95% CI: 1.20–1.39); Cardiovascular HR 1.40 (1.21–1.62) after multivariable adjustment (Cox Model 3).
- Other inputs (Model 3, all-cause / cardiovascular): Central EEG (a) 1.11 (1.06–1.16) / 1.17 (1.07–1.28); EEG+EOG+EMG (b) 1.14 (1.08–1.20) / 1.15 (1.04–1.28); ECG (c) 1.07 (1.03–1.11) / 1.11 (1.04–1.19); Respiratory (d) 1.09 (1.03–1.15) / 1.07 (0.96–1.19).
- Sensitivity: Excluding hypertension reduced but maintained significance: all-cause HR 1.25 (1.11–1.40), cardiovascular HR 1.21 (1.03–1.60). Excluding sleep apnea (AHI ≤ 5): all-cause HR 1.22 (1.10–1.37), cardiovascular HR 1.24 (1.10–1.54).
- Cohort-specific: Associations significant in SHHS and MrOS; WSC yielded non-significant effects likely due to fewer deaths (n = 98).
- Life expectancy impact (AEE +10 vs −10): At age 40: −12.6 years (CI: 8.9–16.2); age 60: −8.7 (6.1–11.4); age 80: −6.0 (4.2–7.8).
- Clinical correlates: Higher AEE associated with hypertension prevalence; T2D associated with higher AEE in EEG-based models; cardiovascular comorbidities associated with higher AEE in ECG model; respiratory model AEE associated with sex and BMI. No significant AEE associations with stroke, COPD, or benzodiazepine use.
Discussion
Deep learning models leveraging full PSG signals can estimate chronological age with substantially lower error than models based on basic sleep metrics. The derived age estimate error (AEE) captures latent physiological deviations in sleep microarchitecture and cardiopulmonary signals, reflected in increased relevance around arousals, apneas, and arrhythmias. Importantly, higher AEE predicts elevated all-cause and cardiovascular mortality, independent of demographics, sleep architecture, sleep-disordered breathing severity, hypoxemia burden, daytime sleepiness, and cardiometabolic comorbidities. Sensitivity analyses excluding individuals with hypertension or sleep apnea showed attenuated but persistent associations, suggesting AEE contains prognostic information beyond these conditions. Survival modeling indicates that a 20-year swing in AEE (−10 to +10) is associated with clinically meaningful reductions in life expectancy, particularly at younger baseline ages. These findings support AEE as a candidate biomarker of sleep-related physiological aging and overall health risk that is not captured by conventional PSG summaries. Model interpretability aligns with known age-related sleep changes (e.g., fragmentation), lending face validity to the learned features.
Conclusion
This work demonstrates that deep neural networks applied to multi-signal PSG can accurately estimate age and that the resulting age estimate error (AEE) is a robust predictor of all-cause and cardiovascular mortality beyond standard PSG metrics and clinical covariates. The ensemble model achieved MAEs between approximately 5.8 and 8.16 years across validation and test cohorts and generalized to an external-like cohort. Gradient SHAP analyses suggest models leverage physiologically meaningful patterns (arousals, apneas, arrhythmias). AEE differences of ±10 years translate into sizable differences in estimated life expectancy. AEE may serve as an intuitive sleep-based health marker in clinical and research settings to communicate risk and potentially monitor interventions. Future work should validate performance and prognostic value in completely unseen cohorts and technical environments, reduce model bias at age extremes, assess longitudinal trajectories with multi-night recordings, extend to pediatric populations, and develop models directly predicting specific morbidities and mortality while accounting for AEE.
Limitations
- Multi-cohort heterogeneity (equipment, electrode placement, recording environments, scoring practices) may introduce technical noise and limit comparability.
- Single-night PSG per subject limits assessment of night-to-night variability; first-night effects may influence AEE.
- Bias at age extremes (regression to the mean), with underestimation in older subjects and overestimation in younger subjects; potential survival bias in older cohorts.
- Mortality analyses included participants used in model development (though test-restricted analyses were performed); WSC had limited death events reducing power.
- Gradient SHAP interpretability assumes independent, linear attributions and choice of baseline affects interpretation; may not fully capture complex network behavior.
- Model not validated for children due to limited pediatric data.
- Basic covariate and comorbidity data required for fully adjusted models may be inconsistently available across cohorts.
Related Publications
Explore these studies to deepen your understanding of the subject.

