Introduction
Suicide is a significant global public health concern, with depression being a major risk factor. Accurately classifying suicidal risk in depressed patients is crucial for personalized intervention. This study aimed to develop a machine learning model leveraging both unstructured clinical data (psychiatric notes) and neuroimaging data (brain MRI) to predict suicidal thoughts in individuals with major depressive disorder. Previous epidemiological studies consistently highlight underlying psychiatric illness, particularly depression, as a leading risk factor for suicide. However, the association with suicidal behavior is often stronger for specific symptom dimensions than for a diagnosis alone. Symptoms like anhedonia, psychological pain, and psychotic experiences have been independently linked to suicidal behavior. Medical records provide valuable information on these symptoms, but their qualitative nature hinders quantitative analysis. Natural Language Processing (NLP) offers a solution by extracting quantitative data from qualitative records. Similarly, brain MRI, while traditionally challenging to interpret qualitatively in mood disorders, offers quantitative analysis via techniques like voxel-based morphometry (VBM) and source-based morphometry (SBM) to identify structural brain patterns associated with mental illness and suicidal risk. This study combined NLP and neuroimaging approaches to build a comprehensive predictive model.
Literature Review
The introduction section extensively reviews the existing literature on suicide epidemiology, risk factors (especially focusing on the importance of specific symptoms within depressive disorders), the limitations of relying solely on qualitative data from psychiatric notes, and the potential of NLP and neuroimaging techniques (VBM, SBM) in assessing and predicting suicidal behavior. Several studies are cited to support the independent predictive value of symptoms like anhedonia, psychological pain, and psychotic experiences, highlighting the need for detailed symptom assessment beyond simple diagnostic categories. The limitations of traditional qualitative analysis of medical records are discussed, leading to the justification of using NLP for quantitative analysis of unstructured text data. The use of MRI and quantitative analysis approaches such as VBM and SBM to address the challenges of interpreting brain imaging in psychiatric disorders are also reviewed.
Methodology
This study employed a retrospective cohort design. Data were collected from two psychiatric outpatient clinics: Ajou University School of Medicine (AUSOM) and Kangwon National University Hospital (KNUH). The AUSOM data (152 patients with a new depressive episode) were used for model development and internal validation, while the KNUH data (58 patients) served as an external validation set. The primary outcome was the presence or absence of suicidal thoughts at the index date (first diagnosis of depressive disorder). Clinical notes were processed using NLP, specifically Latent Dirichlet Allocation (LDA), to extract topic probabilities representing five psychiatric symptom domains (neurovegetative, anxiety, psychotic, insomnia, and somatic symptoms). Brain MRI (T1-weighted images) were preprocessed using VBM and analyzed via SBM (independent component analysis) to identify five independent components representing brain networks (visual network, default mode network (DMN), auditory network (AN), cortical midline network (CMN), and sensorimotor network (SMN)). An XGBoost algorithm was used to develop three classification models: one using only NLP data, one using only MRI data, and a combined model incorporating both data types. The optimal number of topics in the LDA model was determined by assessing perplexity scores. Model performance was evaluated using AUROC, accuracy, sensitivity, specificity, and F1-score, with internal and external validation. SHAP values were used for model interpretation. Statistical comparisons between groups (suicidal vs. non-suicidal) were performed using independent samples t-tests and chi-square analyses.
Key Findings
The study included 152 patients from AUSOM for model development and 58 patients from KNUH for external validation. In the AUSOM dataset, 36 (23.7%) patients reported suicidal thoughts. In the KNUH dataset, this figure was higher at 18 out of 58 (31%). NLP analysis revealed that anxiety and somatic symptom topics were significantly more common in the suicidal group. While no significant differences were found between groups in individual brain network measures, differences were observed in overall network proportions. The combined NLP and MRI model demonstrated the highest predictive performance, with an AUROC of 0.810 (95% CI: 0.624-0.996) during internal validation and 0.742 (95% CI: 0.577-0.907) during external validation. This outperformed models using only NLP data (AUROC of 0.748 and 0.706 for internal and external validation, respectively) or only MRI data (AUROC of 0.738 and 0.667 for internal and external validation, respectively). SHAP value analysis indicated that anxiety-related topics, DMN and CMN loading weights, and psychotic and insomnia-related topics were strong predictors of suicidal thoughts. Conversely, lower values for somatic symptoms, neurovegetative symptoms, and certain brain networks (auditory, visual, SMN) were associated with a higher risk of suicidal thoughts.
Discussion
The findings support the hypothesis that integrating neuroimaging and NLP data enhances the classification of suicidal thoughts in patients with depression. The superior performance of the combined model compared to single-modality models highlights the value of a multimodal approach. The consistent performance across internal and external validation sets suggests the generalizability of the model to different clinical settings and populations. The observed association between anxiety, specific symptom clusters, and brain network patterns provides insights into the neurobiological and clinical correlates of suicidal thoughts. These findings align with previous research demonstrating the importance of anxiety in suicide risk and the involvement of the DMN and CMN in self-referential processes implicated in suicidal ideation. The model's potential clinical utility lies in its capacity to quantitatively predict suicidal thoughts, potentially assisting clinicians in risk assessment and intervention.
Conclusion
This study demonstrates that a multimodal model integrating NLP analysis of clinical notes and SBM analysis of brain MRI significantly improves the classification of suicidal thoughts in patients with depression. The model's robust performance across internal and external validation suggests clinical applicability. Future research should focus on validating the model in larger, more diverse populations and exploring the longitudinal predictive validity of the model to refine risk stratification and treatment strategies.
Limitations
The study's limitations include the use of cross-sectional data, which prevents assessment of longitudinal trajectories. The sample size, while comparable to some existing brain imaging studies, could be improved for enhanced model generalizability. The use of both 1.5T and 3T MRI scanners introduces potential heterogeneity, though efforts were made to address this. The study did not include all potentially relevant brain networks (frontotemporal and subcortical) and psychological scales, partly due to sample size limitations. Future research should address these limitations to further improve the model and its clinical utility.
Related Publications
Explore these studies to deepen your understanding of the subject.