Medicine and Health

The evolving field of digital mental health: current evidence and implementation issues for smartphone apps, generative artificial intelligence, and virtual reality

J. Torous, J. Linardon, et al.

Explore how smartphone apps, virtual reality, and generative AI—including large language models—are reshaping digital mental health, from digital phenotyping to real-world implementation. This paper, conducted by John Torous, Jake Linardon, Simon B. Goldberg, Shufang Sun, Imogen Bell, Jennifer Nicholas, Lamiece Hassan, Yining Hua, Alyssa Milton, and Joseph Firth, outlines five themes to boost evidence, engagement, equity, and scalable care.... show more

Introduction

This special article reviews how digital mental health has evolved beyond synchronous telehealth in the post–COVID-19 era and examines whether emerging technologies can increase access and improve care quality. The authors note that telehealth utilization has declined from pandemic peaks and, by relying on clinician time, is constrained in scalability. Asynchronous tools—smartphone apps, virtual reality (VR), and generative AI/large language models (LLMs)—offer flexible self-help, coach-guided, or clinician-supported options, but robust real-world evidence and standards are lacking. The paper’s purpose is to synthesize current evidence and implementation challenges across five themes: (1) technological advances (digital phenotyping, VR, generative AI/LLMs); (2) clinical outcomes of smartphone apps across conditions; (3) engagement barriers and solutions; (4) implementation barriers, facilitators, and strategies; and (5) equity considerations for marginalized populations and low- and middle-income countries (LMICs). The overarching aim is to chart a roadmap for rigorous, generalizable science and scalable, blended-care models that integrate human support with digital tools.

Literature Review

The review collates recent evidence across major domains of digital mental health:

Technological advances: Digital phenotyping using smartphone sensors and ecological momentary assessment shows promising clinical validity for mood and psychotic disorders, with replication in relapse detection and symptom prediction. However, the field lacks standards for data collection/processing and external validation. VR has strong evidence as augmentation to CBT (VR-CBT), generally outperforming waitlist/psychoeducation in anxiety and comparable to traditional CBT for social anxiety, psychosis, PTSD, and phobias, though active-control differences are often nonsignificant and heterogeneity is high. Generative AI/LLMs are rapidly advancing with multimodal capabilities and potential use cases across prevention, risk detection, diagnosis, treatment optimization, and documentation; yet training-data opacity, bias, hallucinations, and lack of standardized evaluation remain critical limitations.
Smartphone apps: Marketplace counts exceed 10,000 apps, with research concentrated on apps. Adverse events are underreported; negative effects can reach ~20% in severe mental illness trials. Well-being apps yield modest benefits with low evidence quality across the marketplace. For depression/anxiety, a meta-analysis of 176 RCTs shows small but significant effects versus controls; CBT-based apps and those with chatbots/mood monitoring may have larger effects. For major depression, adjunctive apps confer small added benefits over standard care. Evidence for bipolar disorder monitoring apps is mixed to negative, with some signals of harm; some self-management apps (e.g., LiveWell) show symptom and quality-of-life improvements without reducing relapse. In schizophrenia/psychosis, RCTs show mixed outcomes with some null findings against digital shams; meta-analytic effects are minimal but improve with human support. Eating disorder app evidence is limited; marketplace quality is low; some blended CBT programs show symptom reductions; JITAI pilots show feasibility but underpowered efficacy. Substance use apps show mixed results: adjunctive smoking apps can show moderate effects, standalone effects are uncertain; cannabis pilots are positive; alcohol-use app benefits are uncertain; JITAIs show mixed, underpowered evidence.
Engagement: Real-world engagement is poor (median daily open ~4%, 30-day retention ~3%). Barriers include usability, lack of personalization, privacy concerns, low digital literacy, time constraints, and poor integration into daily life. Solutions include personalization, culturally tailored design, secure data practices, acceptance-facilitating interventions, human support (therapists/coaches/digital navigators), JITAIs, digital literacy training, and leveraging social influence while mindful of potential downsides.
Implementation: Barriers occur at practitioner (skills, confidence, perceived coldness/safety concerns), service (workflow fit, interoperability, infrastructure, staffing), and system levels (policy, reimbursement, regulation). Facilitators include clinician co-design, training, leadership support, adequate resources, new roles (digital navigators), and evolving regulatory frameworks (e.g., UK DTAC, Australian NSQDMH Standards, German DiGA, US FDA guidances, potential Medicare codes). Hybrid effectiveness–implementation designs and tailored implementation strategies (e.g., ImpleMentAll) are highlighted.
Equity: Non-tailored apps risk exacerbating disparities. Culturally adapted digital interventions for racial/ethnic minorities show large effects (g≈0.90) versus waitlist/TAU but high attrition (~42%); research gaps exist for Black and Indigenous groups. Tailoring and participatory methods are essential. In LMICs, evidence is growing but limited; cultural adaptation frameworks exist; digital tools can also train and supervise lay providers. Digital interventions show promise for conflict-affected populations, though contextual adaptation is often insufficient.

Methodology

This is a narrative, integrative review that synthesizes recent advances and evidence across five thematic areas: (1) smartphones/digital phenotyping; (2) virtual reality; (3) generative AI/LLMs; (4) smartphone app interventions by condition; (5) engagement, implementation, and equity. The authors summarize findings from meta-analyses, randomized controlled trials, pilot and feasibility studies, naturalistic usage analyses, implementation frameworks (e.g., CFIR), and policy/regulatory developments. No formal systematic search protocol or preregistration is reported; instead, the review aggregates current research trends, representative studies, and exemplars to outline evidence strength, gaps, and directions for rigorous evaluation and real-world implementation.

Key Findings

Telehealth contraction and the need for scalability: Telehealth visits in 2024 were <50% of their COVID-19 peak, underscoring the need for scalable asynchronous tools.
Digital phenotyping: Feasible and promising for relapse detection in schizophrenia and symptom prediction in mood disorders; field constrained by lack of standards for data collection/feature extraction and limited external validation.
Virtual reality: VR-CBT is superior to waitlist/psychoeducation for anxiety; generally comparable to traditional CBT for social anxiety, psychosis, PTSD, and specific phobias. Active-control superiority is often nonsignificant; scaling is challenged by hardware, development, and training costs.
Generative AI/LLMs: Potential across prevention, risk detection (including suicidality), assessment, treatment optimization, crisis support, therapy augmentation, and clinician documentation/training. Key risks include bias, hallucinations, data opacity, lack of standardized evaluation, and safety concerns demonstrated in real-world incidents.
Apps for well-being: RCTs show modest improvements (e.g., quality of life, positive affect, mindfulness, psychological flexibility). Only a small fraction of public apps have empirical support; existing trials often have small samples and risk of bias.
Depression/anxiety self-management: A meta-analysis of 176 RCTs shows small but significant symptom reductions versus controls. CBT-based content, chatbots, and mood monitoring associate with larger effects. Human guidance increases effect sizes.
Mood disorders clinical management: For major depression, adjunctive apps confer small added benefits over standard care. For bipolar disorder, monitoring apps show no consistent symptom benefits and possible risks (e.g., increased depressive/manic episodes in some trials); some self-management apps improve depressive symptoms and relational quality of life without reducing relapse.
Schizophrenia/psychosis: Mixed RCT results; some null vs digital shams. Meta-analysis suggests minimal effects overall, improved with human support. Marketplace availability and quality for psychosis-focused apps are limited.
Eating disorders: Marketplace quality is low; a blended CBT program reduced symptoms and distress; app-augmented monitoring may reduce dropout; JITAIs show feasibility but inconclusive efficacy in small pilots.
Substance use disorders: Adjunctive smoking cessation apps show moderate added effects; standalone effects are uncertain; benefits improve when paired with pharmacotherapy. Cannabis-use pilots are positive; alcohol-use app efficacy is uncertain. JITAI evidence is mixed and underpowered.
Adverse events and safety: Negative effects are underreported; up to ~20% negative effects in some severe mental illness trials. Safety and harm assessment need standardization.
Engagement: Real-world engagement is low—median daily open rate ~4%, ~3% 30-day retention; nearly 50% of users may not return after first use; one-third of sessions can be ≤10 seconds; <1% of sessions after 3–12 months of inactivity.
Engagement solutions: Personalization, flexible integration into daily life, clear privacy/data protections, acceptance-facilitating interventions, human support (therapists/coaches/digital navigators), JITAIs, digital literacy training, and appropriate social features.
Implementation: Barriers at clinician, service, and system levels; facilitators include co-design, training, leadership, resources, interoperability plans, and new roles. Regulatory frameworks (DTAC, NSQDMH Standards, DiGA, FDA guidances) and hybrid trials support translation.
Equity and LMICs: Culturally adapted digital interventions for racial/ethnic minorities show large effects (g≈0.90) with high attrition; research gaps for Black/Indigenous populations. AI can increase referrals among underserved groups but risks bias if trained on skewed data. In LMICs, evidence is growing; digital tools can train lay providers and support conflict-affected populations, but cultural/linguistic adaptation and scalability are essential.

Discussion

The synthesis demonstrates that asynchronous digital tools can augment and extend mental health care beyond clinician-limited telehealth, but real-world impact depends on both scientific rigor and implementation quality. Evidence supports small but reliable benefits of self-management apps for depression/anxiety and comparable effectiveness of VR-CBT to traditional CBT in several conditions; however, other areas (bipolar disorder, psychosis, eating disorders, substance use) show mixed or preliminary evidence. Persistent engagement challenges and underreported adverse events limit effectiveness outside trials. Findings emphasize that human support (e.g., therapists, coaches, digital navigators) meaningfully enhances outcomes and engagement, arguing for blended or hybrid care models. Implementation science highlights multilevel barriers (clinician skills/attitudes, workflow fit, infrastructure, policy/reimbursement) and points to co-design, training, leadership, and evolving regulatory frameworks as enablers of scale. Equity analyses show that non-tailored digital tools can exacerbate disparities, while culturally adapted interventions and participatory approaches can produce large benefits. Across technologies—particularly LLMs—standardized evaluation, transparency, and safety protocols are crucial to mitigate bias and harm. Overall, the review addresses its aim by delineating what works, for whom, and under what conditions, proposing a roadmap that couples rigorous, generalizable research (e.g., digital placebos, factorial and hybrid designs, external validation) with practical implementation strategies to achieve sustainable, equitable impact.

Conclusion

Digital mental health now spans smartphone apps, VR, and generative AI/LLMs, with evidence of benefit in several areas but limited translation to routine care at scale. Two central imperatives emerge: (1) strengthen generalizable science through rigorous designs (e.g., digital placebos, factorial trials), standardized metrics, transparent data/model reporting, external validation, and systematic harm assessment; and (2) ensure real-world impact via co-designed, personalized, hybrid models that integrate human support, address engagement and safety, and leverage implementation science for clinician training, workflow integration, interoperability, and policy-aligned reimbursement. Equity requires culturally tailored interventions, participatory design, and attention to digital access and literacy, particularly for marginalized populations and LMICs. Future research should prioritize mechanisms of change, adaptive/personalized interventions (e.g., JITAIs, digital phenotyping-driven precision), clinician-facing AI tools, and hybrid effectiveness–implementation trials to enable safe, scalable, and sustainable deployment.

Limitations

Narrative synthesis without a formal systematic search may introduce selection bias and preclude quantitative meta-analytic estimates across all topics.
Many cited studies are pilots or early-phase trials with small samples, heterogeneity in designs, inadequate control conditions (e.g., lack of digital placebos), and limited follow-up, constraining generalizability.
Adverse events are inconsistently measured and reported in digital interventions, limiting safety conclusions.
Digital phenotyping lacks standardized data collection/processing and external validation; wearable and smartphone heterogeneity complicates comparability.
VR evidence shows heterogeneity and limited superiority over active controls; consumer-ready, scalable applications remain scarce.
LLM research is nascent, with opaque training data, unstandardized evaluation, and risks of bias/hallucinations; many findings lack replication.
Evidence is mixed or limited for bipolar disorder, psychosis, eating disorders, and non-tobacco substance use apps; equity research gaps persist, especially for Black and Indigenous populations.
Implementation evidence on which strategies work best, and under what conditions, remains limited beyond identification of barriers/facilitators.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Development of prediction models for screening depression and anxiety using smartphone and wearable-based digital phenotyping: protocol for the Smartphone and Wearable Assessment for Real-Time Screening of Depression and Anxiety (SWARTS-DA) observational study in Korea

Y. Shin, A. Y. Kim, et al.

Psychology

Enablers and barriers to military veterans seeking help for mental health and alcohol difficulties: A system review of the quantitative evidence

H. C, T. P, et al.

Psychology

Neuroimaging the effects of smartphone (over-)use on brain function and structure—a review on the current state of MRI-based findings and a roadmap for future research

C. Montag and B. Becker

Biology

Neuroimaging the effects of smartphone (over-)use on brain function and structure-a review on the current state of MRI-based findings and a roadmap for future research

C. Montag and B. Becker

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny