logo
ResearchBunny Logo
Artificial intelligence in mental health care: a systematic review of diagnosis, monitoring, and intervention applications

Medicine and Health

Artificial intelligence in mental health care: a systematic review of diagnosis, monitoring, and intervention applications

P. Cruz-gonzalez, A. W. He, et al.

AI is reshaping mental healthcare: this systematic review shows AI—especially support vector machines, random forests, machine-learning monitoring tools, and AI chatbots—can accurately detect, classify, predict risk, and monitor treatment response for mental health conditions. Future work should build diverse datasets and improve model interpretability. Research conducted by Authors present in <Authors> tag.

00:00
00:00
~3 min • Beginner • English
Introduction
The review situates AI as systems that interpret external data, learn, and adapt to accomplish goals, highlighting rapid advances in machine learning, deep learning, and natural language processing across healthcare tasks relevant to mental health. It emphasizes growing global demand for accurate diagnosis, continuous monitoring, and scalable interventions—heightened during the COVID-19 pandemic—and positions AI as a tool to support early detection, treatment planning, and remote patient monitoring. The review aims to comprehensively examine AI applications across the patient journey in three domains—diagnosis, monitoring, and intervention—while identifying limitations, challenges, and ethical issues. Research questions: (1) How is AI used in diagnosing mental health illnesses, monitoring disease progression and treatment effectiveness, and conducting AI-assisted mental health interventions? (2) What are the limitations, challenges, and ethical concerns in the application of AI technologies in mental health?
Literature Review
Background literature notes AI's promise in medical imaging, documentation, and monitoring, with machine learning enabling prediction and categorization. Neural networks and deep learning have supported complex tasks including natural language processing and speech recognition for clinical documentation and conversations. In mental health, prior studies explored early detection, treatment planning, therapy session signal analysis, and continuous monitoring. Persistent barriers include representative data scarcity, data security concerns, fragmented formats, training resource limitations, and skepticism privileging clinical judgment over quantitative measures.
Methodology
Design: Systematic review following PRISMA guidelines; registered on PROSPERO (CRD42023388503). Databases and timeframe: CINAHL, CCTR, PubMed, PsycINFO, Scopus; from inception to August 2024 (search terms table provided; filters tailored per database). Inclusion criteria: Studies using AI-assisted diagnosis tools, AI-monitored treatment effectiveness/prognosis, or AI-based interventions in mental health; must include mental health outcomes. Excluded: studies focused primarily on dementia, ADHD, autism spectrum disorders, drug abuse; non-English; systematic reviews, meta-analyses, protocols, book chapters, conference presentations. Domain definitions: Diagnosis—AI to detect/predict presence/risk of mental disorders and identify associated features (excluding subgroup classification after diagnosis). Monitoring—AI to collect data for ongoing prognosis or monitor treatment effects (excluding prediction prior to treatment initiation). Intervention—AI-assisted interventions (excluding studies using AI solely for data analysis/outcome prediction). Outcomes: AI approaches; domain; presence/severity of disorders/symptoms; accuracy/effectiveness; applications/limitations/challenges/ethical concerns. Selection process: Two authors independently screened titles/abstracts and full texts; discrepancies resolved by a third author. Data extraction: Three authors extracted per domain; one author verified and resolved discrepancies; extracted AI approaches, tools, sample size, effectiveness, limitations/challenges/ethical considerations; contacted investigators for missing data. Quality assessment: NHLBI tools across controlled intervention, observational cohort/cross-sectional, case-control, and before-after (pre-post) without control; items scored as yes/no/other; overall quality categorized (good/fair/poor) by qualitative assessment; two independent appraisers with third resolver. PRISMA results: 842 records (CCTR 294; CINAHL 45; PsycINFO 86; PubMed 192; Scopus 225). Duplicates removed: 163. Screened: 679; excluded after title/abstract: 425. Full text screened: 254; excluded: 169 (various reasons). Included: 85 studies (Diagnosis 32; Monitoring 39; Intervention 13; one overlapping Diagnosis and Monitoring). Samples: Diagnosis target population n=327,625; Monitoring cumulative n=168,077; Intervention total n=2,816.
Key Findings
• Overall: 85 studies included—Diagnosis (32), Monitoring (39), Intervention (13), with one overlapping (Diagnosis/Monitoring). • Methods prevalent: Diagnosis—support vector machine (SVM) and random forest (RF); Monitoring—machine learning across RF, SVM, elastic net, gradient boosting, etc.; Intervention—AI chatbots most common. • Diagnosis: AI models detected/classified/predicted risk across depression, schizophrenia, suicide, anxiety, bipolar disorder, OCD, postpartum depression; common predictors included demographics, clinical history, physiological data (EEG, HRV), MRI biomarkers, proteomics, semantic content. Reported accuracies ranged ~51% to 97.54%; single-modality models achieved satisfactory performance (e.g., EEG CNN accuracy 97.54%; SVM on HRV AUC ~0.74; NLP for suicidal language matched gold standard ~96.67%; MRI-based SVMs for schizophrenia vs controls accuracy ~90%). • Monitoring: Majority focused on predicting treatment effectiveness/response (25/40), spanning pharmacologic (SSRIs/SNRIs/TCAs, duloxetine, ketamine, omega-3), psychotherapy (CBT, internet-based), neuromodulation (rTMS), and digital/biobehavioral interventions. Predictors included symptom scales (HDRS, MADRS, BDI, PHQ-9, HADS, QIDS-SR16), demographics, medical history/comorbidities, psychosocial factors, smartphone/passive sensing, physiological metrics, genetics (SNPs, gene expression), EEG/fMRI, speech features, and treatment variables. Performance examples: suicide risk via smartphone (AUC 0.78); psilocybin response prediction (AUC up to 0.88); rTMS response in schizophrenia (balanced accuracy up to 94% in active group); MDD mobile sensing sequence model AUC 0.65; precision medicine antidepressant selection using pretreatment fMRI (R² 28–48%). • Intervention: 10 chatbot studies plus 3 AI-assisted tools (medication adherence app, therapist support platform, AI robotic puppy). Outcomes frequently used PHQ-8/9 and GAD-7. Mixed effectiveness: significant reductions in anxiety/depression in several trials (e.g., Tess, Vitalk, Emohaa, Fido) while others showed limited/no advantages over usual care (ELME, some postpartum chatbots); combined in-person + AI sometimes superior to AI-only. • Quality appraisal: 50 good, 34 fair, 1 poor (58.8% good overall). Domain ratings: Monitoring ~69% good; Diagnosis ~56% good; Intervention ~38% good. • Ethics: Emphasis on privacy, informed consent, de-identification; high-risk participant support (e.g., suicide), secure data storage, IRB approvals; caution about model opacity and bias. • Quantitative highlights: PRISMA counts; diagnosis sample n=327,625; monitoring sample n=168,077; intervention sample n=2,816; reported diagnostic accuracies up to ~97.54%; multiple AUCs in 0.70–0.93 range across tasks.
Discussion
Findings across 85 studies indicate AI's utility throughout the mental health care continuum. In diagnosis, SVM/RF and other ML models achieved moderate-to-high accuracy in identifying disorders and risk, often using clinical, physiological, and imaging biomarkers, thereby supporting earlier detection and stratification. In monitoring, AI models predicted treatment response and prognosis across pharmacologic and psychotherapeutic modalities using multimodal predictors, enabling more personalized and adaptive care. In intervention, AI chatbots and platforms demonstrated variable but promising effectiveness in reducing symptoms and supporting adherence and therapy delivery, highlighting scalability potential and the need for optimization. Ethical considerations—privacy, consent, data security, and transparency—are central to responsible deployment, especially for high-risk populations. The synthesis addresses the research questions by mapping AI applications in diagnosis/monitoring/intervention and surfacing limitations/challenges that guide clinical integration, emphasizing data quality, interpretability, external validation, standardized protocols, and careful evaluation of benefit-risk.
Conclusion
This review demonstrates that AI methods can accurately detect and classify mental health conditions, predict treatment response, and assist interventions. Across 85 studies, machine learning—particularly SVM and RF—proved effective for diagnosis; diverse ML models supported monitoring of treatment effectiveness and prognosis; AI chatbots and platforms offered scalable intervention options with mixed but often positive outcomes. To translate promise into practice, future work should prioritize larger, diverse, and high-quality datasets; enhance transparency and interpretability; perform rigorous external validation; standardize experimental protocols; and strengthen ethical safeguards. These steps will improve clinical reliability, personalization, and resource allocation in mental health care.
Limitations
Review-level limitations: exclusion of conference papers may omit emerging AI advances; limited critical analysis of individual AI model architectures constrains deeper efficacy assessment; English-only inclusion reduces cultural and geographic generalizability. Study-level limitations across included works: small and imbalanced samples; limited dataset diversity; incomplete/missing data; confounding variables; model opacity; overfitting risks; limited external validation; trade-offs among performance metrics; cross-cultural implementation barriers; variable reporting quality (adherence, blinding, power calculations). These factors may affect interpretation and generalizability of findings.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny