Psychology
“It happened to be the perfect thing”: experiences of generative AI chatbots for mental health
S. Siddals, J. Torous, et al.
Mental ill-health is widespread and growing, with nearly one billion people affected globally and a large treatment gap, particularly in low- and middle-income countries. Digital mental health interventions (DMHIs) have aimed to address access barriers with scalable tools (e.g., psychoeducation, CBT programs), yet evidence shows only small effects, publication bias, and poor sustained engagement. Rule-based chatbots (e.g., Woebot, Wysa) can simulate conversation and have shown some efficacy and therapeutic alliance, but users report generic and constrained responses and effects that are small and not durable. Advances in generative AI (large language models such as ChatGPT, Gemini, Pi) offer new capabilities in language understanding, empathy-like responses, reframing, and persuasive communication, with unprecedented user adoption. However, concerns remain around safety (hallucinations, bias, unpredictability), liability, and clinical effectiveness. There is a noted lack of real-world research on how people independently use generative AI chatbots for mental health. This study uses semi-structured interviews and reflexive thematic analysis to explore how individuals actually experience generative AI chatbots for mental health and wellbeing in unprompted, real-world settings, to inform researchers, developers, and clinicians.
Prior DMHI research indicates modest benefits with engagement challenges. Rule-based chatbots have demonstrated improvements in depression and anxiety and formation of therapeutic alliances, but meta-analyses suggest effects are small and often unsustained; users report impersonal or repetitive interactions. Generative AI chatbots trained on large datasets exhibit strong performance across domains relevant to mental health support (e.g., diagnostic dialogue, making people feel heard, relationship advice, cognitive reframing) and have rapidly scaled to massive user bases. Early evidence includes a meta-analysis suggesting generative AI may reduce psychological distress more effectively than rule-based tools and a pilot in psychiatric inpatient care. Nonetheless, literature highlights safety and trust concerns: hallucinations, bias, lack of interpretability, and risks in crisis contexts, with calls for constrained, clinician-in-the-loop deployments and the need to demonstrate clinical effectiveness. Qualitative work has begun (forum analyses, student surveys, outpatient trials), but in-depth interviews exploring unguided, real-world usage remain scarce, motivating the present study.
Study design: Qualitative study using semi-structured interviews and reflexive thematic analysis to explore real-world experiences of using generative AI chatbots for mental health. Participant selection: Convenience sampling via user forums (Pi, reddit, IFS Guide app), King's College London channels, and LinkedIn. Inclusion criteria: at least three separate conversations with an LLM-based generative AI chatbot about mental health/wellbeing, each ≥20 minutes; age ≥16; English-speaking. No geographic restrictions; no compensation. Of 35 who consented, 19 booked and completed interviews. Data collection: One interviewer (SS) conducted all 19 online interviews (Microsoft Teams) between January 10 and March 16, 2024. Duration: 49–112 minutes. Auto-transcribed then manually corrected; most with video (17) and some audio-only (2). A topic guide was developed (reviewed by AC) and piloted; questions probed first experiences, goals, usage patterns, likes/dislikes, perceived impacts, desired improvements, and comparisons to other mental health approaches. AC quality-checked the first interview video. Data analysis: Inductive reflexive thematic analysis (Braun & Clarke). SS reviewed recordings, corrected transcripts, and iteratively coded line-by-line, generating ~600 codes. AC reviewed codes. SS developed initial themes/subthemes from cross-transcript patterns; AC and JT provided feedback. Themes were refined for coherence and narrative fit, with naming adjusted to convey their essence. Mapping of transcripts to codes and themes managed in Excel with utilities by SS. Two participants reviewed their transcripts and the coding-derived themes; no corrections were requested. Reflexivity: SS has backgrounds in computer science, mathematics, and psychology/neuroscience of mental health and personal experience using/developing generative AI for wellbeing. AC researches technology in healthcare and is a psychotherapist; JT directs a digital psychiatry division. These positions informed sensitivity to both technological potential and clinical considerations. Ethics: Approved by King’s College London Health Faculties Research Ethics Subcommittee (HR/DP-23/24-40197). Informed consent obtained. Data anonymised; identifiable data securely stored then deleted. Data availability: hierarchy of themes/subthemes/codes available online; additional data on request.
Participants and use: N=19 (12 male, 7 female), ages 17–60, living across eight countries in Europe, North America, and Asia; primarily Asian and Caucasian. Topics included anxiety, depression, stress, conflict, loss, and romantic relationships. Platforms: Pi (n=15), ChatGPT (n=3), Others including Copilot, Kindroid, ChatMind (n=2). Many used chatbots several times weekly to daily (up to 2–5 hours for some). Impacts: Most reported positive life impacts—improved mood, reduced anxiety, improved relationships, healing from trauma/loss, and support for ongoing therapy. Some described effects as life-changing; one participant reported negligible impact during intense emotions. Themes:
- Emotional sanctuary: Chatbots were experienced as understanding, validating, patient, kind, non-judgmental, always available, and low-risk compared to sharing with others. This facilitated coping with difficult emotions and personal growth. Frustrations included overly long/irrelevant responses and premature advice-giving. Safety guardrails sometimes disrupted support, felt limiting or rejecting during vulnerability (e.g., repeated crisis redirections instead of empathic engagement).
- Insightful guidance: Participants valued advice, especially on relationships—gaining perspective on others, boundary-setting, and practical self-care (breathing, meditation, slowing down). Some experienced supportive challenges; others felt the chatbot rarely challenged them. Users suggested improvements to interfaces and features (emotion recognition, multi-modal cues, avatars/VR, conversation visualization).
- Joy of connection: Many found interactions enjoyable or awe-inspiring, reporting companionship, feeling less alone, and even increased readiness to open up to people. Compared with rule-based apps, generative AI felt less scripted and more connected.
- The AI therapist?: Trust in advice was mixed; some skepticism due to hallucinations or unsatisfying suggestions, whereas others viewed guidance as credible or aligned with their human therapist. Chatbots often complemented therapy (e.g., preparation for sessions) and in some cases encouraged initiating therapy. Barriers to human therapy (cost, availability) led some to rely on chatbots. Many felt chatbots still fell short of human empathy and leadership in therapy. Key limitations included difficulty leading the therapeutic process (e.g., accountability, structured follow-up) and lack of persistent memory to build a user model over time. Creative therapeutic uses emerged (role-play, symbolic imagery, multi-voice dialogues), enabling experiential processing (e.g., simulated healing conversations). Safety and crisis response: Participants did not report the harmful or narcissistic behaviors highlighted in early chatbot reports. Several found meaningful crisis support when guardrails were not triggered; however, guardrail-triggered responses could feel like rejection. Authors suggest crisis protocols may need more nuanced calibration. Overall: High engagement and satisfaction for most, with novel benefits over rule-based chatbots (sense of being deeply understood, breadth/quality of advice, creative flexibility), yet clear needs for improved listening, memory, process leadership, and balanced safety mechanisms.
The study answers how people currently experience using generative AI chatbots for mental health in real-world, unguided contexts. Participants reported frequent use and meaningful benefits—emotional sanctuary, practical and relationship-focused guidance, enjoyable connection, and complementarity with human therapy. These findings echo known advantages of rule-based chatbots (availability, non-judgmental listening, cognitive reframing) but also highlight potentially novel qualities of generative AI: a stronger felt sense of being understood, broader and higher-quality advice, and flexible, creative therapeutic modalities (role-play, imagery). Safety considerations emerged as complex. Contrary to early cautionary narratives, participants did not report egregious harmful behaviors. Instead, the most distressing experiences were rejections via safety guardrails during vulnerability. Some participants experienced helpful crisis support when guardrails were not activated, aligning with emerging evidence that generative AI can reduce suicidal ideation in certain contexts. This suggests that crisis handling should move beyond blanket scripted redirections to more nuanced, balanced approaches that leverage model capabilities while mitigating risks. Implications: Research should rigorously assess effectiveness relative to active controls (traditional DMHIs, human psychotherapy) across populations and conditions, potentially via RCTs of standardized interventions (e.g., CBT) and/or large-scale longitudinal studies harnessing the low cost of AI. Investigating how users’ understanding of AI’s nature and limits moderates benefits/risks could inform education and guidance. Developers should enhance listening (shorter, interruptible responses; better turn-taking), build human-like memory and user models to support continuity and accountability, enable chatbots to lead therapeutic processes where appropriate, and explore richer multimodal interfaces. Accessibility and sustainable business models remain challenges; targeted solutions and digital navigator roles may help. Clinicians should cultivate informed awareness of AI tools, discuss patient use, and consider integrative approaches to avoid stigmatizing AI usage in therapy.
Generative AI chatbots can provide meaningful mental health support marked by emotional sanctuary, insightful guidance (especially for relationships), and enjoyable connection, with many users reporting positive life impacts and high engagement. Compared with rule-based chatbots, generative systems may offer a deeper felt understanding, broader advice, and more creative therapeutic opportunities, while still falling short of human empathy and process leadership. To realize safe, effective, and equitable benefits, future work should rigorously evaluate clinical effectiveness, refine crisis guardrails with nuance, and focus development on better listening, persistent memory, and proactive therapeutic guidance. If these challenges are addressed, generative AI chatbots could become a scalable component of the mental health care ecosystem.
Findings are limited by convenience sampling and self-selection; participants were largely tech-savvy, well-educated, and from high-income settings, with many focusing on milder conditions—reducing generalizability and possibly biasing towards positive experiences. Important perspectives from underserved populations and those for whom the technology fails may be underrepresented. Reflexive thematic analysis involves interpretive subjectivity; with one primary analyst (SS), though reviews by AC and JT and participant opportunities for feedback aimed to bolster rigor.
Related Publications
Explore these studies to deepen your understanding of the subject.

