logo
ResearchBunny Logo
Effectiveness of AI-Driven Conversational Agents in Improving Mental Health Among Young People: Systematic Review and Meta-Analysis

Psychology

Effectiveness of AI-Driven Conversational Agents in Improving Mental Health Among Young People: Systematic Review and Meta-Analysis

Y. Feng, Y. Hang, et al.

AI-driven conversational agents show promise as early interventions for youth depression: research conducted by Yi Feng, Yaming Hang, Wenzhi Wu, Xiaohang Song, Xiyao Xiao, Fangbai Dong, and Zhihong Qiao pooled 15 randomized trials (1,974 participants) and found a moderate-to-large effect on depressive symptoms, while effects for anxiety, stress, affect, and well-being were nonsignificant.

00:00
00:00
~3 min • Beginner • English
Introduction
Mental health challenges among adolescents and young adults are a growing global concern, affecting 10–20% of youth, with half of mental health disorders emerging before age 14 and 75% by age 25. These disorders are often underdiagnosed and undertreated, with long-lasting impacts on education, psychosocial functioning, and quality of life. The COVID-19 pandemic further increased depression, anxiety, and stress in this group. Concurrently, young people are highly engaged with digital technologies, offering opportunities for scalable, accessible digital mental health interventions that provide anonymity and flexibility compared to traditional therapy. Early digital interventions (eg, internet-based CBT) face limitations such as limited interactivity, high dropout, and lack of personalization. Advances in AI, deep learning, and natural language processing (NLP) have enabled AI-driven conversational agents (CAs) that simulate human-like interaction, provide psychoeducation, and deliver therapy (eg, CBT). These systems vary by system orientation (general-purpose vs domain-specific), topic constraints (globally open vs vertical), and response patterns (free dialogue vs structured-guided), often integrating multiple approaches. NLP-enhanced systems can better understand context and user intent, enabling personalized therapeutic dialogues that may improve engagement and clinical outcomes. Despite their growing use among adults, effectiveness for adolescents and young adults remains underexplored. Prior reviews often combined non-NLP and AI-driven CAs or mixed age groups, increasing heterogeneity. This meta-analysis aims to evaluate the effectiveness of AI-driven CAs in reducing mental health symptoms—particularly depression and anxiety—among young people aged 12–25 years and to explore moderators (eg, population and CA characteristics) influencing outcomes.
Literature Review
Previous reviews have frequently combined non-NLP digital interventions with AI-driven CAs or included both young and older adults, contributing to heterogeneity and limiting clarity on efficacy for youth. Earlier meta-analyses reported smaller effects for depression when non-NLP systems were included (Hedges g≈0.26–0.29). Recent work across age groups suggests AI-driven CAs, especially those leveraging NLP and machine learning, may provide greater flexibility, adaptability, and engagement, potentially yielding stronger effects for depression than non-NLP systems. Younger age has been associated with larger CA effects on depression in prior reviews, suggesting digital familiarity may enhance acceptability and effectiveness.
Methodology
Design: Systematic review and meta-analysis following PRISMA guidelines; not preregistered. Databases and Search: PubMed, PsycINFO, EMBASE, Cochrane Library, and Web of Science searched from inception to August 6, 2024, using comprehensive terms for conversational agents and mental health outcomes. No filters were applied. Reference lists of included studies and prior reviews were hand-searched. Eligibility (PICOS): Population: adolescents and young adults with mean age 12–25; clinical, subclinical, or nonclinical samples eligible. Intervention: AI-driven CAs using NLP or machine learning to guide conversational responses. Comparator: any control (eg, waitlist, treatment-as-usual, therapist-led). Outcomes: at least one mental health outcome with sufficient data for effect size calculation. Design: randomized controlled trials; English-language original research. Exclusions included non-AI/non-NLP CAs, non-RCTs, inadequate outcome data, non-English, and unpublished articles. Screening and Selection: Two independent reviewers screened titles/abstracts and assessed full texts, resolving disagreements with a third reviewer. PRISMA flow: 14,909 records identified; 7,412 duplicates removed; 7,497 screened; 397 full texts assessed; 14 articles (15 RCTs) included. Data Extraction: Authors, year, participant characteristics (sample size, gender, mean age), CA specifications (name, platform, interaction mode), intervention details (length, control type), and outcome measures were extracted by two independent authors and checked by a third. Quality Assessment: Cochrane Risk of Bias tool assessed random sequence generation, allocation concealment, blinding (participants/personnel and outcome assessors), incomplete outcome data, and selective reporting. Overall study quality was suboptimal; many studies lacked reporting on blinding and registration. Statistical Analysis: Posttest means, SDs, and sample sizes used to compute Hedges g (bias-adjusted effect size) with random-effects models. Intention-to-treat data preferred when available. Alternative statistics (eg, Cohen d, t, F) used when means/SDs were missing. Multiarm trials combined per Cochrane guidance. Heterogeneity assessed via Q and I²; outliers defined by study 95% CIs outside the pooled 95% CI. Leave-one-out sensitivity analyses conducted. Publication Bias: Assessed via funnel plots, Duval and Tweedie trim-and-fill, and Egger tests. Moderator Analyses: Conducted when overall ES was significant and heterogeneity present (P<.10 or I²>25%). Subgroup analyses used mixed-effects models; meta-regression used unrestricted maximum likelihood. Moderators: age, gender, intervention length, interaction mode, delivery platform, sample type (clinical/subclinical/nonclinical), and control group type. Visualization performed in R and Review Manager; ES calculation in Comprehensive Meta-Analysis v3 and Stata SE v15.1.
Key Findings
- Included studies: 14 articles (15 RCTs), involving 1974 participants. - Depression (posttest): After removing an outlier and adjusting for publication bias, AI-driven CAs showed a moderate-to-large effect on depressive symptoms (Hedges g=0.61; 95% CI 0.35–0.86; significant heterogeneity reduced but remained: Q=17.76; I²=54.9%). - Generalized anxiety: Nonsignificant after publication bias adjustment (g=0.06; 95% CI −0.21 to 0.32); heterogeneity remained. - Stress: Nonsignificant (g=0.002; 95% CI −0.19 to 0.20); I²=0%. - Positive affect: Nonsignificant (g=0.01; 95% CI −0.24 to 0.27); I²=63.1%. - Negative affect: Nonsignificant after outlier removal and publication bias adjustment (g=0.07; 95% CI −0.13 to 0.27); heterogeneity reduced to nonsignificance. - Mental well-being: Nonsignificant (g=0.04; 95% CI −0.21 to 0.29). - Subgroup (depression): Sample type moderated effects (Qb=8.46, P<.05). Subclinical samples showed significant, larger effects (g=0.74; 95% CI 0.50–0.98) than nonclinical samples (g=0.04; 95% CI −0.38 to 0.46). Clinical sample (single study) g=0.91 (95% CI −0.11 to 1.94). - Other moderators (depression): Interaction mode, delivery platform, intervention length, control group type, mean age, sex, publication year, and study quality did not significantly moderate effects. - Publication bias: Generally minimal; trim-and-fill adjustments for anxiety and negative affect did not change conclusions; Egger tests largely nonsignificant.
Discussion
The meta-analysis addresses whether AI-driven CAs improve mental health among adolescents and young adults. Results demonstrate that AI-driven CAs can significantly reduce depressive symptoms, particularly in subclinical populations, supporting their role as accessible, scalable early interventions in youth mental health. The nonsignificant effects for generalized anxiety, stress, positive/negative affect, and mental well-being suggest current CA designs may be more suited to depression-focused strategies (often CBT-based) and less equipped with behavioral components (eg, exposure) necessary for anxiety and stress. The lack of moderating effects from interaction mode, delivery platform, age, sex, and intervention length indicates robustness of depression benefits across technical and demographic variations, consistent with youths’ familiarity with diverse digital platforms. These findings align with prior work showing stronger CA effects on depression than other outcomes and suggest that NLP-enabled personalization and engagement may particularly benefit young users.
Conclusion
AI-driven conversational agents show robust effectiveness in reducing depressive symptoms among young people, with particularly strong effects in subclinical populations, highlighting their potential for early intervention. Their efficacy for anxiety, stress, and well-being outcomes is not yet robust, indicating a need to refine therapeutic capabilities—such as integrating exposure-based strategies—and to evaluate long-term outcomes. Continued advancements in AI and thoughtful integration with evidence-based therapy components may help bridge the youth mental health treatment gap. Future studies should include follow-up assessments, examine engagement and treatment fidelity, and identify user and CA features that optimize clinical impact.
Limitations
- Few studies reported follow-up outcomes, limiting assessment of long-term effectiveness. - English-only inclusion may introduce selection bias and limit generalizability. - Limited numbers of studies for certain outcomes (eg, stress, mental well-being) reduce statistical power. - Heterogeneity due to diverse CA therapeutic orientations and system designs. - Insufficient reporting on blinding, registration, and selective reporting in many trials; overall study quality suboptimal. - Unclear user engagement and interaction patterns with CAs hinder fidelity assessment and identification of active therapeutic components.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny