Psychology

Application of machine learning in predicting aggressive behaviors from hospitalized patients with schizophrenia

N. Cheng, M. Guo, et al.

This research explores the development of a powerful predictive model for aggressive behaviors in hospitalized schizophrenia patients through innovative machine learning algorithms. Conducted by a team of experts, including Nuo Cheng and Meihao Guo, it highlights the effectiveness of the Random Forest algorithm, providing insights that could enhance clinical practices and patient care.

00:00

~3 min • Beginner • English

Index

Introduction

Schizophrenia is a severe psychiatric disorder marked by impairments in perception, emotion, cognition, and behavior. Aggressive behaviors are common among hospitalized patients with schizophrenia, with meta-analytic prevalence estimates in China between 15.3% and 53.2%. Such behaviors threaten the safety of patients, staff, and others, and increase the use of restraints and healthcare burden. Accurate risk assessment and early warning are therefore crucial. Machine learning (ML) methods have shown promise in psychiatry for prediction tasks, including treatment response and suicide risk in schizophrenia. Prior work has also leveraged connectivity analyses and ML (e.g., Random Forest, Lasso) for diagnosis and biomarker discovery. This study aimed to apply multiple ML algorithms—Multi-Layer Perceptron (MLP), Lasso regression, Support Vector Machine (SVM), and Random Forest (RF)—to predict aggressive behaviors in hospitalized patients with schizophrenia, and to evaluate their predictive performance and clinical utility.

Literature Review

The authors note substantial prior research on aggression prevalence in schizophrenia and risk factors including cognitive impairment, prior aggression, social support, and treatment adherence. In predictive modeling, previous studies have applied various ML algorithms to related tasks. Wang et al. (2020) used demographic, clinical, and sociocultural variables in 275 patients; Random Forest performed marginally better than other algorithms. Yu et al. (2022) reported better performance of Neural Networks in 397 male patients, while in a smaller sample of 57 male patients, SVM performed best. Studies in offender populations in Zurich identified Gradient Boosting and Boosted Classification Trees as top performers. A hybrid model combining LASSO and SVM achieved high AUC (0.95) in a small male sample. Collectively, these studies suggest ML can complement clinical decision-making for violence/aggression risk in schizophrenia, though results vary by sample, features, and methods.

Methodology

Design and setting: Observational predictive modeling study at the Second Affiliated Hospital of Xinxiang Medical University, China. Period: July 2019–August 2021. Sampling: Cluster sampling of hospitalized patients with schizophrenia. Participants: Inclusion criteria: ICD-10 schizophrenia diagnosis; age ≥14; primary school education or above; normal hearing/vision sufficient to complete assessments; prior outpatient/inpatient schizophrenia diagnosis; ≥6 months antipsychotic use. Exclusion: intellectual disability or organic brain disease; severe physical illness/adverse drug reactions; severe mental decline or acute agitation; sensory impairment; pregnancy/lactation. Ethics approval obtained; informed consent from patients/guardians. Measures: - General Condition Questionnaire (demographics: age, sex, marital status, education, residence, occupation, caregiver, family income; disease info: duration, hospitalizations, family history, past attacks, management style; pre-admission status: medication adherence, follow-up frequency). - ITAQ (11 items, 0–22; higher indicates better insight/treatment attitude; reliability and validity reported). - Family APGAR (5 items, 0–10; higher indicates better family function). - SSRS (10 items; subjective, objective support, and utilization; higher indicates more support). - FBS (24 items across 6 dimensions; higher indicates heavier family burden). - Aggressive behaviors: MOAS administered before discharge; weighted total score ≥4 defined as significant aggressive behavior. Data collection: Conducted by trained psychiatric clinicians and nurses within 3 days of admission; standardized administration; invalid questionnaires (missing/inconsistent) excluded. MOAS used prior to discharge to ascertain aggression status. Machine learning: Four algorithms implemented: MLP, Lasso, SVM, RF. Data split: 70% training, 30% testing. Hyperparameter tuning via Bayesian optimization with 4-fold cross-validation on the training set; then 10× repeated 4-fold cross-validation for inner validation; final model retrained on full training set and evaluated on the testing set. Performance metrics: accuracy, sensitivity, specificity, ROC AUC; model AUCs compared using DeLong test. Feature importance assessed in RF via Gini importance; sequential modeling with top-ranked features to assess performance plateau. A nomogram was constructed using the top eight RF features for clinical application. Statistical analysis also used SPSS 23.0 (chi-square, t-test, rank-sum tests; p < 0.05 significant).

Key Findings

- Sample: 2,184 recruited; 2,064 valid questionnaires (94.51%); after withdrawals, 2,037 included. Aggressive behaviors present in 611/2,037 (30.0%). - Group differences (selected): Longer disease duration, closed management during hospitalization, poorer medication adherence, less frequent follow-up, lower ITAQ, lower APGAR, and slightly lower SSRS were associated with aggression (various p < 0.05). History of previous attacks was much more common in the aggressive group (75.12% vs 36.04%, p < 0.001). - Model performance on testing set: - Random Forest: AUC 0.955 (95% CI 0.935–0.970); Accuracy 0.889; Sensitivity 0.892; Specificity 0.887. - SVM: AUC 0.902 (95% CI 0.876–0.924); Accuracy 0.827; Sensitivity 0.949; Specificity 0.770. - MLP: AUC 0.904 (95% CI 0.877–0.926); Accuracy 0.866; Sensitivity 0.908; Specificity 0.847. - Lasso: AUC 0.901 (95% CI 0.874–0.923); Accuracy 0.866; Sensitivity 0.908; Specificity 0.847. RF AUC was significantly higher than the other three models (p < 0.0001); no significant differences among SVM, MLP, Lasso (p > 0.5). - Inner validation (10×4-fold CV): Original dataset RF AUC 0.949 (0.938–0.960); balanced dataset RF AUC 0.933 (0.916–0.950). Performance on balanced data was comparable but slightly lower than on the original dataset. - RF feature importance (top 8): APGAR, ITAQ, Duration of disease, History of previous attacks, SSRS, Medication adherence, Age, FBS. Using top 8 features maintained high performance (AUC ~0.94), with best performance using all features (AUC ~0.949 in sequential inclusion analysis).

Discussion

The study demonstrates that ML models can effectively predict inpatient aggressive behaviors among individuals with schizophrenia using routinely obtainable demographic, clinical, and psychosocial variables. RF outperformed SVM, MLP, and Lasso, providing both high discrimination and feature importance outputs for interpretability. The most influential predictors emphasize the psychosocial and illness-course context of aggression risk: family functioning (APGAR), insight/treatment attitude (ITAQ), longer duration of illness, prior attacks, social support (SSRS), medication adherence, age, and family burden (FBS). These findings align with literature linking impaired insight, poorer social/family support, and prior aggression to increased violence risk, underscoring the need for integrating family-based interventions and adherence support into clinical care. The construction of a nomogram based on the top eight features offers a practical tool for individualized risk estimation, enabling early warning, targeted interventions, and improved safety management in psychiatric wards. The comparable performance in balanced vs. original datasets suggests model robustness to class imbalance present in clinical populations.

Conclusion

This study built and validated ML models to predict aggressive behaviors in hospitalized patients with schizophrenia using multidimensional clinical and psychosocial data. RF achieved the highest predictive performance and provided interpretable feature rankings. A nomogram based on the top eight features supports individualized clinical risk assessment. Future research should incorporate additional variables, including biological markers and contextual precipitants, examine disease stage effects, and conduct external, multi-center validation to enhance generalizability and clinical utility.

Limitations

- Data collected at admission prevented analysis of disease stage effects or within-stay changes. - Feature set excluded potential biological indicators and certain precipitating factors; broader biopsychosocial variables are needed. - Single-site dataset limits generalizability; external validation is required. - Although class imbalance was examined with a balanced dataset, prospective validation in real-world settings is needed.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Application of deep learning algorithm on whole genome sequencing data uncovers structural variants associated with multiple mental disorders in African American patients

Y. Liu, H. Qu, et al.

Medicine and Health

Machine learning-based prediction of in-hospital death for patients with takotsubo syndrome: The InterTAK-ML model

O. D. Filippo, V. L. Cammann, et al.

Medicine and Health

Prediction of ciprofloxacin resistance in hospitalized patients using machine learning

I. Mintz, M. Chowers, et al.

Medicine and Health

Prediction of mortality risk and duration of hospitalization of COVID-19 patients with chronic comorbidities based on machine learning algorithms

P. Amiri, M. Montazeri, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny