
Medicine and Health
Neuroimaging and natural language processing-based classification of suicidal thoughts in major depressive disorder
D. Y. Lee, G. Byeon, et al.
This groundbreaking study developed a machine learning model that classifies suicidal thoughts in patients with major depressive disorder using advanced techniques from psychiatry and neuroimaging. By integrating clinical notes and brain MRI data, researchers achieved impressive accuracy in evaluation. The work was conducted by authors Dong Yun Lee, Gihwan Byeon, Narae Kim, Sang Joon Son, Rae Woong Park, and Bumhee Park.
~3 min • Beginner • English
Introduction
Suicide is a leading cause of mortality globally, with rising rates observed over recent decades. Psychiatric illnesses—particularly depression—are the most significant risk factors. Emerging evidence suggests that specific symptom dimensions (e.g., anhedonia, psychological pain, psychotic experiences) can better predict suicidal behaviors than diagnostic labels alone, highlighting the need to assess detailed psychopathology. Psychiatric clinical notes capture such symptom details but are typically qualitative and difficult to quantify. Natural language processing (NLP) enables quantitative extraction of symptom patterns from unstructured text. In parallel, brain MRI is widely used clinically, but qualitative assessment often misses subtle structural abnormalities in mood disorders. Quantitative neuroimaging approaches assessing regional volumes and structural network patterns have shown associations with mental illness. The study aims to develop and validate a multimodal machine learning model that integrates NLP-derived symptom topics from initial psychiatric notes with structural MRI source-based morphometry features to classify suicidal thoughts at first-episode depression diagnosis, and to test its generalizability across institutions.
Literature Review
Methodology
Study design and ethics: Retrospective, cross-sectional modeling study approved by Ajou University Hospital IRB (AJOUIRB-DB-2022-335) with de-identified data; consent waived. Data sources and cohorts: Development cohort from Ajou University School of Medicine (AUSOM), South Korea (2010–2022); external validation cohort from Kangwon National University Hospital (KNUH), South Korea (2015–2022). Data included socio-demographics, diagnoses, observations, visits, procedures, medications, clinical notes, and brain MRI, standardized to OMOP-CDM v5.3.1. Inclusion criteria: new depressive episode with first diagnosis defining index date; ≥1 year prior observation; brain MRI within 1 month of index; exclusion of bipolar disorder, schizophrenia, and dementia. Final N=210 (AUSOM n=152; KNUH n=58). Outcome: Suicidal thoughts at index defined by psychiatrist-documented presence of ideation, plan, or attempt in the initial interview note. NLP feature extraction (Topic modeling): Chief complaint section of initial notes (written in English) extracted via regex; preprocessing included stemming, normalization, stop-word removal; bag-of-words constructed. Latent Dirichlet Allocation (LDA) applied; number of topics selected as 5 based on multiple perplexity criteria (Griffiths2004, Deveaud2014 maximize; CaoJuan2009, Arun2010 minimize). For each patient, topic probabilities (five values) were derived. MRI feature extraction (Source-based morphometry, SBM): Structural T1-weighted MRI acquired on 1.5T or 3T scanners at AUSOM (GE Signa HDx 1.5T; GE Discovery MR750w 3T) and at KNUH for external validation (Siemens Magnetom Avanto 1.5T; Philips Achieva dStream 3T). Images visually inspected by neuroradiologists; no gross abnormalities or artifacts. Voxel-based morphometry (VBM) preprocessing using SPM12 DARTEL: gray matter segmentation, study-specific template creation, DARTEL normalization, modulation, and 6 mm FWHM Gaussian smoothing. Cross-sectional independent component analysis (ICA) applied via FastICA with ICASSO (100 runs; random initializations). Laplace PCA used for dimensionality reduction; ICASSO hierarchical clustering assessed reliability; components with stability >0.8 retained. Group-level IC maps z-scored and thresholded at z>3 for visualization; selected maps interpreted as structural networks. Five IC maps corresponding to putative large-scale networks were selected: visual, default mode (DMN), auditory (AN), cortical midline (CMN), and sensorimotor (SMN). Per-subject loading weights on each IC served as MRI features. Statistical analyses and modeling: Baseline comparisons between suicidal vs non-suicidal groups via t-tests/chi-square. Pearson correlations assessed multicollinearity between MRI and text features; r>0.7 threshold for concern (none observed; all r<0.4). Three XGBoost classifiers trained: Text-only (five topic probabilities), MRI-only (five IC loading weights), and Combined (all ten features). AUSOM cohort split 75% train / 25% test stratified by outcome. Fivefold cross-validation on training set with grid search to tune depth, learning rate, and number of trees using AUROC as objective; best hyperparameters retrained on full training set. Performance on held-out test set evaluated via AUROC, accuracy, sensitivity, specificity, F1 score; optimal cutoff via maximal Youden index. External validation: KNUH data processed by projecting AUSOM-trained LDA topic-word distributions to compute KNUH patient topic probabilities and by projecting AUSOM-derived IC maps onto KNUH preprocessed MRI to obtain loading weights. AUSOM-trained models evaluated on KNUH features with same metrics. Sensitivity analysis excluded age effects from MRI features (per Supplementary Method) in both cohorts; models re-evaluated. Model comparison and interpretation: AUROCs compared using DeLong’s test across models and between internal and external validations. SHAP values computed to interpret feature contributions; beeswarm plots summarized per-feature impact. Software: R 4.1.0 with OHDSI and open-source packages for all non-imaging analyses; MATLAB-based custom software for MRI processing.
Key Findings
Cohorts: Total N=210 (AUSOM n=152 development; KNUH n=58 external validation); overall 60% female; mean age 55.2 years. Suicidal thoughts prevalence: AUSOM 36/152 (23.7%); KNUH 18/58 (31.0%). AUSOM baseline differences: substance use disorder more frequent in suicidal group (13.9% vs 1.7%, p=0.01); other demographics/medical/psychiatric histories not significantly different. Feature characteristics: NLP topics (five): neurovegetative (Topic1), anxiety (Topic2), psychotic (Topic3), insomnia (Topic4), somatic (Topic5). Topic2 (anxiety) probability significantly higher in suicidal group; Topic5 (somatic) significantly lower in suicidal group. MRI features: five SBM networks (visual, DMN, AN, CMN, SMN); no statistically significant group differences in network weights, though overall proportions differed; largest mean difference observed in DMN (precuneus) and CMN (ACC) loading weights. Correlations: all MRI–text feature pairwise r<0.4 (no multicollinearity). Internal validation (AUSOM test set): AUROCs—Text-only 0.748 (95% CI 0.544–0.951); MRI-only 0.738 (0.546–0.929); Combined 0.810 (0.624–0.996). DeLong comparisons: Combined vs Text p=0.037 (significant improvement); Combined vs MRI p=0.304 (ns); Text vs MRI p=0.474 (ns). Other metrics (Combined): accuracy 0.833, sensitivity 0.900, specificity 0.800, F1 0.783. External validation (KNUH): AUROCs—Combined 0.742 (0.577–0.907); Text-only 0.706 (0.543–0.868); MRI-only 0.667 (0.518–0.816). Internal vs external AUROC differences not significant (Combined p=0.296; Text p=0.377; MRI p=0.285), supporting generalizability. Sensitivity analysis removing age effects from MRI: Internal AUROCs—MRI-only 0.708; Combined 0.838; External AUROCs—MRI-only 0.650; Combined 0.742 (similar to original). Model interpretation (SHAP): Strongest positive predictors of suicide included higher Topic2 (anxiety), higher DMN loading, higher CMN loading, higher Topic3 (psychosis), and higher Topic4 (insomnia). Lower Topic5 (somatic), lower Topic1 (neurovegetative), and lower AN, visual, and SMN loadings were associated with greater suicide risk in the model.
Discussion
Integrating NLP-derived symptom dimensions and structural MRI network loadings improved classification of suicidal thoughts at first-episode depression diagnosis compared with text alone, with consistent performance in external validation, underscoring the value of multimodal biomarkers. Clinically, higher anxiety/agitation and lower somatic emphasis in initial complaints were associated with suicidal thoughts, aligning with literature linking anxiety to suicidality. The lower somatic topic association may reflect cohort characteristics and cultural factors in Korean clinical settings, where somatic presentations can predominate among depressed patients without suicidality. Neuroanatomically, although group differences were not statistically significant across MRI features, greater loading in DMN (notably precuneus) and CMN (ACC) showed notable associations with suicidality. This is consistent with prior evidence implicating self-referential networks (precuneus/DMN) and cortical midline structures (ACC/CMN) in suicidality and rumination, though literature is mixed regarding ACC volume directionality. The model’s robustness across two geographically distinct hospitals and after accounting for age effects supports general applicability. The findings suggest that combining objective structural MRI patterns with nuanced symptom profiles extracted from routine notes can aid early clinical screening for suicidal thoughts and potentially assist non-psychiatrist clinicians in risk stratification.
Conclusion
A multimodal machine learning model combining NLP-derived clinical symptom topics and structural MRI source-based morphometry improved classification of suicidal thoughts in patients with first-episode depression over NLP alone and performed consistently in external validation. Anxiety/agitation-focused complaints and higher loading in DMN/CMN networks were key contributors. These results support personalized, multidimensional assessment strategies for suicidality screening in clinical practice. Future work should include larger, longitudinal cohorts; incorporate additional imaging modalities and harmonization techniques; integrate standardized psychometric scales; and differentiate suicidal ideation from attempts to refine risk stratification.
Limitations
- Cross-sectional design using first-visit data prevents assessment of longitudinal outcomes. - Modest sample size may limit model training and generalizability. - MRI features derived only from T1-weighted structural images; other sequences/modalities were not included. - Data combined across 1.5T and 3T scanners; although prior work suggests comparable volume patterns, scanner harmonization was not performed. - Frontotemporal and subcortical networks, known to relate to suicide, did not pass reliability thresholds and were not key features, possibly due to limited sample size. - Psychological scales (e.g., HAM-D, somatic symptom scores) were not incorporated due to incomplete availability, reducing potential feature richness. - Suicidal ideation and attempts were combined into a single outcome label, potentially obscuring distinctions between levels of suicidality.
Related Publications
Explore these studies to deepen your understanding of the subject.