logo
ResearchBunny Logo
“It happened to be the perfect thing”: experiences of generative AI chatbots for mental health

Medicine and Health

“It happened to be the perfect thing”: experiences of generative AI chatbots for mental health

S. Siddals, J. Torous, et al.

Generative AI chatbots like ChatGPT may offer accessible, meaningful mental health support: users reported high engagement, improvements in relationships, healing from trauma and loss, and themes of an 'emotional sanctuary', 'insightful guidance', and the 'joy of connection'. Participants compared AI to human therapy and urged better safety guardrails, human-like memory, and more capacity to lead the therapeutic process. This research was conducted by Steven Siddals, John Torous, and Astrid Coxon.... show more
Introduction

Mental ill-health is widespread and growing, yet access to effective treatment remains limited globally. Digital mental health interventions (DMHIs) offer scalable, low-cost tools (e.g., psychoeducation, mood tracking, CBT), but evidence shows small effects, engagement challenges, and user dissatisfaction. Rule-based AI chatbots (e.g., Woebot, Wysa) have improved depression symptoms and therapeutic alliance but are constrained and often produce generic or frustrating responses. Generative AI chatbots (e.g., ChatGPT, Gemini, Pi) trained on large datasets can produce human-like language, achieve strong performance on tasks relevant to mental health (e.g., reframing, relationship advice), and have high user engagement. However, concerns include safety (hallucinations, bias, liability), unpredictability, and unclear clinical effectiveness. There is a research gap in understanding real-world, unguided use of generative AI for mental health. This study uses qualitative semi-structured interviews and reflexive thematic analysis to explore how people currently experience generative AI chatbots for mental health and wellbeing in naturalistic settings, aiming to inform researchers, developers, and clinicians.

Literature Review

Prior work on DMHIs highlights modest effectiveness (small effect sizes, possible publication bias), low sustained engagement, and mixed user experiences. Rule-based chatbots have shown promise: Woebot and Wysa can reduce depressive symptoms and foster therapeutic alliances comparable to human therapists, with better engagement and social support reported in qualitative analyses. Yet rule-based approaches are often perceived as scripted and limited, producing generic, repetitive, or constrained responses. Generative AI (LLMs) introduces capabilities such as nuanced language understanding/generation, performance on tasks like medical dialogue, persuasive communication, theory of mind, making people feel heard, relationship guidance, and cognitive reframing. Early evidence suggests generative AI agents may outperform rule-based agents in reducing psychological distress and show promising pilot outcomes in inpatient settings. Nonetheless, literature cautions about hallucinations, biases, safety risks (including crisis response), unpredictability, and the need for clinician-in-the-loop and constrained, evidence-based implementations. There is a noted lack of qualitative research on real-world, unprompted use of generative AI for mental health, motivating this study.

Methodology

Design: Qualitative study using semi-structured interviews followed by reflexive thematic analysis (Braun & Clarke). Participant selection: Convenience sampling via user forums (Pi, Reddit, IFS guide app), King’s College London channels, and LinkedIn. Inclusion criteria: ≥3 separate conversations with an LLM-based generative AI chatbot on mental health/wellbeing, each ≥20 minutes; age ≥16; English proficiency; no geographic restrictions; no compensation. Recruitment and consent: 35 consented; 19 interviewed. Data collection: 19 online interviews conducted by SS between Jan 10 and Mar 16, 2024; duration 49–112 minutes; recorded and auto-transcribed (Microsoft Teams); 17 video, 2 audio-only; AC quality-checked the first interview video. Data analysis: Inductive reflexive thematic analysis. SS reviewed recordings, manually corrected transcripts, and conducted line-by-line coding yielding ~600 codes; AC reviewed codes. Codes were organized into hierarchies of subthemes and themes, iteratively refined for clarity and coherence, integrating positive and negative aspects into unified themes; theme names were adjusted to communicate essence. Two participants reviewed outputs; no corrections provided. Ethics: Approved by King’s College London Health Faculties Research Ethics Subcommittee (HR/DP-23/24-40197); informed consent obtained; data anonymized; recordings and transcripts securely stored during analysis and then deleted; anonymized data archived. Data availability: Theme/subtheme/code hierarchy available at https://bit.ly/gen-AI-chatbots-mental-health; additional data on request to corresponding author.

Key Findings

Participants (N=19; 12 male, 7 female; ages 17–60; eight countries; primarily Asian and Caucasian) reported high engagement and predominantly positive impacts from generative AI chatbots for mental health. Usage: Platforms—Pi (15), ChatGPT (3), Other (2); focus areas—Anxiety & depression (6), Stress & conflict (5), Dealing with loss (3), Romantic relationships (3), Other (2); peak usage—<10 conversations (3), 10–50 (4), several times per week (5), daily up to 2 hours (5), daily 2–5 hours (2). Reported impacts included improved mood, reduced anxiety, healing from trauma and loss, better relationships, and support for ongoing therapy; some described changes as life-changing. Four overarching themes: 1) Emotional sanctuary—Chatbots felt understanding, validating, kind, nonjudgmental, always available; yet guardrails sometimes interrupted support, felt rejecting, and prompted self-censorship; listening quality varied (overlong, premature advice). 2) Insightful guidance—Valued advice, especially for relationship conflicts (perspective-taking, boundaries), and general self-care; mixed views on appropriate challenge: some found non-challenging, others experienced supportive challenge; trust varied, with some skepticism (hallucinations, unsatisfying advice) versus high trust by others. 3) Joy of connection—Awe and enjoyment; companionship reduced loneliness; perceived advantages over rule-based apps (less scripted, more connection); some felt chatbots helped them open up to people; desired UI improvements (accessibility, emotion/voice recognition, rich media like avatars, VR, conversation visualizations). 4) The AI therapist?—Chatbots augmented therapy (preparation, alignment with therapist guidance), sometimes facilitated starting therapy; used when human therapy unavailable or insufficient; perceived limits versus human therapy (empathy, connection, sense of commitment); inability to lead the therapeutic process (holding accountability, guiding through intense emotions) and lack of human-like memory were key constraints; creative therapeutic uses included role-play, symbolic imagery, and multi-voice support for processing relationships and loss.

Discussion

Findings address the research question by revealing how people use and experience generative AI chatbots for mental health in real-world settings: users report meaningful, often frequent engagement, perceived emotional sanctuary, actionable guidance (especially for relationships), and enjoyable connections, with applications complementing or partially substituting human therapy. Compared to rule-based chatbots, generative AI was perceived to deliver deeper understanding, broader and higher-quality advice, flexibility, and novel creative therapeutic uses (role-play, imagery), though rule-based tools may be more predictable and explainable. Safety considerations are nuanced: participants did not report overt harmful or narcissistic behaviors noted in early generative AI accounts, but guardrails—especially in crisis contexts—sometimes disrupted support, felt rejecting, and limited perceived benefit; emerging evidence suggests generative AI can assist in crisis response if not overly constrained. The results underscore the need for balanced safety frameworks, rigorous effectiveness research (RCTs vs. active controls, longitudinal studies), and attention to moderators (user perceptions of AI capabilities/limits). Practical implications: developers should improve listening (hesitancy before advice, brevity, interruptibility), build human-like memory and user modeling to lead and structure therapeutic processes, and enhance multimodal interfaces; clinicians should become familiar with these tools to discuss patient use, leverage augmentative potential, and address concerns; accessibility and sustainable business models remain challenges for scaling.

Conclusion

Generative AI chatbots may provide meaningful mental health support with high engagement and perceived benefits across emotion processing, relationships, mood, and trauma recovery, offering experiences that differ from and sometimes exceed traditional DMHIs. Participants valued emotional sanctuary, insightful guidance, and enjoyable, connecting interactions, while identifying gaps in safety guardrails, memory, and the ability to lead therapeutic processes. Future research should rigorously evaluate effectiveness versus active controls, explore nuanced safety approaches (especially in crises), and investigate moderators such as user understanding of AI. Development priorities include improved guardrails, listening and conversational skills, persistent memory/user modeling, accountability structures, and richer multimodal interfaces. If addressed, generative AI chatbots could become a scalable component of solutions to the mental health treatment gap.

Limitations

Convenience sampling and self-selection may bias towards tech-savvy, well-educated participants from high-income countries and milder conditions, limiting generalizability and potentially over-representing positive experiences. Many populations were not represented. Reflexive thematic analysis introduces subjectivity, especially with a sole primary analyst (SS), though reviews by AC and JT aimed to enhance rigor. The qualitative design cannot establish causal effectiveness, and dynamic evolution of generative AI tools may change user experiences over time.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny