
Medicine and Health
Machine learning explains response variability of deep brain stimulation on Parkinson's disease quality of life
E. Ferrea, F. Negahbani, et al.
Discover how explainable machine learning uncovers key predictors of quality of life changes in Parkinson's disease patients after deep brain stimulation. This research, conducted by Enrico Ferrea, Farzin Negahbani, Idil Cebi, Daniel Weiss, and Alireza Gharabaghi, reveals that preoperative factors and brain activity significantly influence patient outcomes.
~3 min • Beginner • English
Introduction
The study addresses why quality-of-life (QoL) outcomes after subthalamic nucleus (STN) deep brain stimulation (DBS) in Parkinson’s disease (PD) vary widely, despite consistent motor benefits. QoL is multidimensional (physical, mental, social), and determinants of postoperative change are complex. Prior work often identified preoperative QoL as a predictor of postoperative improvement, but findings are inconsistent and have rarely integrated neuroimaging and neurophysiology. The authors hypothesized that combining demographic, patient-reported outcomes, imaging-based electrode localization, and intraoperative electrophysiology with explainable machine learning would clarify which baseline and treatment-related factors most strongly predict PDQ-39 change after STN DBS, thereby informing patient selection, counseling, surgical targeting, and personalized therapy.
Literature Review
Previous studies have shown substantial heterogeneity in QoL change after STN DBS, with up to half of patients not achieving clinically meaningful improvement. Baseline QoL has been the most consistent predictor of postoperative QoL gains (greater preoperative burden predicts greater improvement), though some studies reported opposite trends, implying additional determinants. Surgical factors such as exact electrode contact positioning within the STN and stimulation-induced neural responses influence outcomes. Research integrating patient-reported outcomes with neuroimaging and neurophysiology to explain QoL variability is sparse, and most AI/digital approaches emphasize motor symptoms rather than QoL. Recent imaging work suggests differing ‘sweet spots’ for QoL versus motor improvement within or near the STN (including more ventral/anterior/medial positions in some reports, and posterior/superior regions near the upper STN border in others), leaving an optimal QoL target inconclusive. Beta-band activity (13–35 Hz) in basal ganglia is a robust PD biomarker for motor symptoms, and intraoperative beta oscillations correlate with motor outcomes; their relationship with non-motor outcomes and QoL has been less explored. This study builds on these findings by integrating multimodal predictors and employing explainable ML to clarify contributions to QoL change.
Methodology
Design and cohort: Retrospective analysis of 63 consecutive PD patients undergoing bilateral STN DBS with segmented leads (Abbott 6170). Primary outcome was PDQ-39 obtained preoperatively (medication ON) and postoperatively (medication/stimulation ON) at a mean of 20.6 ± 15.23 months. Motor outcomes (MDS-UPDRS-III) and levodopa equivalent daily dose (LEDD) were also recorded. Ethics approval (781/2015B02) was granted by the University Hospital Tübingen Ethics Committee; data were collected as part of standard clinical care.
Electrophysiology: During implantation trajectories, continuous LFPs were recorded from macrocontacts of the DBS leads using an online mapping approach. For each patient, 9.94 ± 3.87 recording depths were sampled (total 1190 sites). Signals were re-referenced to the uppermost contact. Power spectral density (PSD) was parameterized by removing the 1/f aperiodic component to isolate oscillatory peaks. Band powers were computed in theta (3–7 Hz), alpha (8–12 Hz), lower beta (13–20 Hz), and upper beta (21–35 Hz). Recordings were anatomically annotated (inside/outside STN) using atlas-based labeling.
Imaging and localization: Preoperative MRI (1.5T) and postoperative CT were co-registered and normalized to MNI ICBM 2009b space via Lead-DBS v2.6, with ANTs/SPM for registrations, brain-shift correction, and manual refinement. Electrode positions (x,y,z in MNI) for therapeutically active contacts and along trajectories were derived. Features were assigned to electrode locations and categorized with the DISTAI atlas.
Features: A multimodal set of up to 20 features included demographics (age, sex, disease duration, time since surgery), preoperative PDQ-39, LEDD ratio (post/pre), electrophysiology (band powers from left/right STN), and electrode location (x,y,z for left/right active contacts). PDQ-39 change was quantified as normalized difference: (pre − post)/(pre + post), bounded in [−1, 1] (higher indicates improvement). Alternative models also tested absolute PDQ-39 change, distance to optimal target, and stimulation amplitude.
Modeling: An XGBoost regressor predicted PDQ-39 change. Inputs were z-scored. A nested leave-one-out cross-validation (LOOCV) framework was used: each LOOCV training set underwent 6-fold CV hyperparameter optimization via Hyperopt (Bayesian search; up to 300 samples per parameter over defined ranges) to select n_estimators, min_child_weight, max_depth, gamma, subsample, and colsample_bytree based on validation MSE. Variants considered electrophysiological inputs from (i) along the implantation trajectory, (ii) averaged at the final lead position, and (iii) at therapeutically active contacts; hemisphere-averaged and most/least affected hemisphere groupings were also tested.
Explainability and statistics: SHAP was used with XGBoost to estimate per-feature contributions. Mean absolute SHAP values ranked feature importance; beeswarm and decision plots visualized directionality. ANOVA on cross-validated mean SHAP distributions (12-feature model) assessed feature effects (11 df, F=720.99, p<0.001), followed by Tukey’s HSD for multiple comparisons. Feature ablation retrained models after removing specific features to assess impact on Pearson r. A control analysis replaced left STN upper beta with baseline UPDRS-III. An SVM with linear kernel (C=1000) derived thresholds separating positive vs negative SHAP contributions for actionable cutoffs. Model performance was evaluated by Pearson correlation between predicted and actual normalized PDQ-39 change, with p-values (alpha 0.05) and MSE.
Key Findings
- Clinical outcomes: Levodopa improved motor symptoms preoperatively (OFF to ON) by 49% in MDS-UPDRS-III (40.05 ± 14.70 to 21.35 ± 12.07; T=6.81; P_Bonferroni_adj=2.73e-09). Postoperatively (med ON/stim ON), motor improvement vs pre-op OFF was 55% (40.05 ± 14.70 to 18.04 ± 11.73; T=8.10; P_Bonferroni_adj=5.64e-12). LEDD decreased by 32% (983.48 ± 413.85 to 666.33 ± 391.11; t=7.11; p=1.37e-09). PDQ-39 improved on average by 4.67 points (45.75 ± 26.51 to 41.08 ± 27.50; t=1.35; p=0.183); 54% improved, 3% unchanged, 43% worsened. Minimal clinically important change thresholds were −4.72 (improvement) and +4.22 (worsening).
- Baseline PDQ-39 vs change: Preoperative PDQ-39 correlated with normalized PDQ-39 change (p=3.15e-03; r≈0.36; MSE=0.872), indicating higher baseline burden predicts greater improvement and lower burden predicts deterioration.
- Model performance with multimodal features: Best-performing configuration using electrophysiology along the implantation trajectory (per hemisphere) achieved p=4.59e-05, r=0.49, MSE=0.093. Averaging features at the final lead position: p=4.28e-03, r=0.36, MSE=0.109. Using therapeutically active contact: p=4.39e-03, r=0.35, MSE=0.11. Averaging both hemispheres along trajectory: p=1.28e-03, r=0.40, MSE=0.106. Grouping by most/least affected hemisphere: p=6.54e-03, r=0.34, MSE=0.11. Alternative model with distance to optimal target and stimulation amplitude: p=2.61e-04, r=0.44, MSE=0.10. Predicting absolute (not normalized) PDQ-39 change remained significant (p=2.97e-04; r=0.44; MSE=0.82). After removing 8 least important features (20→12), performance improved to r=0.55 (p=3.49e-06; MSE=0.086).
- Feature importance and directionality (SHAP): Preoperative PDQ-39 and left STN upper beta power (21–35 Hz) were the top predictors (significantly greater SHAP magnitude than other features; ANOVA F=720.99, p<0.001; Tukey HSD p<0.001). Higher left STN upper beta and higher baseline PDQ-39 drove improvements. Greater LEDD reduction (lower LEDD ratio), younger age, and shorter time since surgery contributed positively. Among electrode location features, only the z (depth) coordinate of active contacts influenced outcomes.
- Actionable thresholds (SVM on SHAP): Improvement associated with PDQ-39 baseline >31.5 points (100% accuracy), left STN upper beta PSD >0.15 V^2/Hz (98.4%), LEDD ratio <0.68 (i.e., >32% reduction; 98.4%), age <71.5 years (100%). A time-since-surgery threshold around 15.5 months also separated contributions (98.4%).
- Electrode position effect: Active contacts above z = −7 (MNI) were linked to QoL improvement, while positions below −7 were linked to deterioration. Mean active contact locations: left (x=−12.53 ± 1.24; y=12.76 ± 1.58; z=−6.57 ± 1.40), right (x=11.59 ± 1.34; y=12.34 ± 1.49; z=−6.49 ± 1.70).
- Hemispheric specificity: Electrophysiological information contributed robustly across models, with hemispheric differences important; left STN upper beta power was particularly predictive of improvement.
- Ablation controls: Removing preoperative PDQ-39 or left STN upper beta reduced r from 0.55 to 0.36 and 0.37, respectively. Removing LEDD ratio had minor impact (r=0.52). Replacing upper beta with baseline UPDRS-III reduced performance to r=0.37; UPDRS-III ranked low in importance, suggesting upper beta is not merely a surrogate of motor severity. Removing age (r=0.50), time since surgery (r=0.53), or electrode depths (r=0.52) produced smaller decrements; removing other electrophysiological bands did not reduce performance, supporting specificity of upper beta.
Discussion
The study demonstrates that integrating patient-reported outcomes with intraoperative neurophysiology and precise electrode localization enables explainable predictions of QoL change after STN DBS. Baseline PDQ-39 reliably predicts direction and magnitude of postoperative change, clarifying that high preoperative burden favors improvement, whereas low burden is associated with deterioration at approximately 20-month follow-up. Crucially, left STN upper beta activity emerged as the strongest neurophysiological predictor of improvement, outperforming demographics, electrode position, and medication reduction, and retaining predictive value independent of baseline motor severity. The z-axis (depth) of active contacts further modulated outcomes, with QoL improvements associated with positions above z = −7 (MNI), pointing to the relevance of superior STN regions for QoL. These findings highlight hemispheric asymmetries and support physiologically informed, personalized approaches to DBS targeting and programming. The explainable ML framework (SHAP) provided interpretable, actionable thresholds (e.g., PDQ-39 >31.5; left upper beta >0.15 V^2/Hz; LEDD ratio <0.68; age <71.5), facilitating patient counseling, surgical planning, and postoperative management. The results align with, and help reconcile, heterogeneous literature on QoL predictors by quantifying the relative, independent contributions of baseline QoL, neurophysiology, and electrode placement.
Conclusion
This work shows that explainable machine learning applied to multimodal data can elucidate variability in QoL outcomes after STN DBS for PD. Preoperative PDQ-39 and left STN upper beta power are the dominant predictors of postoperative QoL change, with electrode depth relative to z = −7 (MNI) further modulating outcomes. The approach yields actionable thresholds that can support patient selection, counseling, targeting within the STN, and individualized stimulation strategies. Future research should validate these findings in larger, multi-center cohorts; incorporate richer non-motor assessments; leverage directional and sensing-enabled DBS to refine biomarker-guided targeting/programming; and explore longitudinal, adaptive paradigms that track and optimize QoL-related neural signatures.
Limitations
- Single-center cohort with modest sample size (n=63) limits generalizability and precluded more sophisticated modeling and clustering.
- Follow-up intervals varied (mean ~20.6 months), which may influence QoL trajectories and model estimates.
- QoL is multifactorial; non-motor domains were not exhaustively characterized beyond PDQ-39, potentially omitting relevant predictors.
- Imaging resolutions varied across patients; despite careful normalization and manual refinement, localization inaccuracies may persist.
- The observational design cannot establish causality; thresholds and feature effects require prospective validation.
- Contradictory findings in the literature regarding QoL ‘sweet spots’ suggest that anatomical effects may depend on additional, unmeasured factors.
Related Publications
Explore these studies to deepen your understanding of the subject.