Introduction
The global mental health crisis necessitates accessible and effective interventions. An estimated 970 million people lived with mental disorders in 2019, a 48% increase since 1990. Access to care remains severely limited, with only 23% of individuals with depression receiving adequate treatment in high-income countries, and even less in low- and middle-income countries. Digital mental health interventions (DMHIs) offer a promising solution, but their effectiveness has been limited by small effect sizes and low user engagement. Rule-based AI chatbots show some promise, improving depression symptoms and building therapeutic alliances, but users often report frustration with generic and repetitive responses. Generative AI chatbots, trained on vast datasets, offer significant advancements in language understanding and generation, potentially addressing the limitations of rule-based chatbots. While showing impressive user engagement and achieving human-level performance in various communication tasks, research on their real-world use in mental health is limited. This qualitative study aims to fill this gap by exploring individuals' experiences using generative AI chatbots for mental health support in unprompted, real-world settings.
Literature Review
Existing research on rule-based AI chatbots in mental health demonstrates their potential to improve symptoms and build therapeutic alliances, but their effectiveness is limited by issues such as generic responses and low user engagement. Meta-analyses indicate that therapeutic effects are often small and not sustained over time. Users have expressed frustration with the limitations of pre-defined scripts and algorithms. In contrast, generative AI chatbots, such as ChatGPT, offer significant potential due to their ability to understand and generate human-like language based on extensive training data. Studies suggest they may outperform rule-based chatbots in reducing psychological distress, but more research is urgently needed to validate these findings and explore real-world usage patterns and potential risks. Qualitative studies on user experiences with generative AI chatbots have been limited, with most focusing on thematic analyses of user forum comments or student survey responses. This study addresses the gap by employing semi-structured interviews and reflexive thematic analysis to investigate real-world experiences of using generative AI chatbots for mental health and wellbeing.
Methodology
This qualitative study used semi-structured interviews to explore the experiences of 19 participants who had used generative AI chatbots for mental health and wellbeing. Convenience sampling was employed, recruiting participants through user forums, university channels, and LinkedIn. Participants were required to have engaged in at least three 20-minute conversations with an LLM-based chatbot on mental health topics. A semi-structured interview guide, piloted beforehand, was used to explore participants' experiences, including their initial experiences, goals for using the chatbots, frequency and duration of use, positive and negative aspects, impact on daily life, suggestions for improvement, and comparisons with other mental health approaches. Interviews were conducted online and auto-transcribed. Data analysis followed Braun & Clarke's reflexive thematic analysis approach. The first author (SS) conducted all interviews, performed initial coding and thematic analysis, while a second author (AC), an expert in qualitative methods, reviewed the coding and themes. The final themes were refined through iterative discussion among the authors.
Key Findings
Nineteen participants (12 male, 7 female), aged 17-60, from eight countries, participated. Most used Pi (Inflection AI), with some using ChatGPT (OpenAI) and other platforms. A majority used chatbots several times a week and reported positive impacts, including improved relationships, healing from trauma and loss, and improved mood. Four overarching themes emerged:
1. **Emotional Sanctuary:** Participants found the chatbots understanding, validating, patient, kind, non-judgmental, and always available. This created a safe space for processing emotions, even leading to significant life changes for some. However, frustration arose due to irrelevant responses, the chatbots jumping to solutions before fully hearing the user, and safety guardrails that felt limiting or rejecting during vulnerable moments.
2. **Insightful Guidance:** Participants valued the guidance and advice received, particularly on relationship issues. The chatbots helped some understand different perspectives, set healthier boundaries, and gain clarity in complex situations. Advice on self-care, reframing negative thoughts, and managing anxiety was also highly valued. However, some participants questioned the chatbots' ability to challenge inappropriate behaviors, with some expressing a lack of challenge while others found themselves being challenged in a supportive way.
3. **Joy of Connection:** Many participants enjoyed using the chatbots and felt a sense of awe, companionship, and reduced loneliness. Some preferred the chatbots to human companions due to accessibility and safety. The experience helped some connect more easily with other people.
4. **The AI Therapist?:** The level of trust in the chatbots' guidance was mixed, with some expressing skepticism and others expressing high levels of trust. Many participants compared and contrasted the chatbot experience with human therapy, often using the chatbots to augment their existing therapy. However, limitations included the chatbot's inability to lead the therapeutic process, its lack of memory, and its inability to hold users accountable for change. Some described creative and flexible uses of the chatbots such as role-playing, creating imagery, and utilizing fictional characters for therapeutic benefit.
Discussion
This study reveals that generative AI chatbots offer meaningful mental health support, with high engagement and positive impacts reported by participants. Several themes echoed findings on rule-based chatbots, such as the provision of a non-judgmental listening ear and support in reframing negative thoughts. However, other themes, such as the profound sense of being understood, the quality and breadth of advice, and the creative uses of the technology, appear more unique to generative AI. This research highlights the potential for generative AI chatbots to provide accessible and personalized support, particularly for individuals who lack access to traditional therapy. However, it also underscores the need for a more nuanced approach to safety than simply relying on pre-scripted responses to crisis situations. The study suggests that generative AI's potential to provide support in crises should be explored further and that overly restrictive safety protocols may have detrimental effects. Future research should focus on comparing the effectiveness of generative AI chatbots against other DMHIs and human therapy, considering factors like symptom severity, impairment, clinical status, and relapse rates. This may require large-scale longitudinal studies to account for the complex and personalized nature of chatbot use. Moreover, exploring how users' understanding of AI's capabilities and limitations affects the benefits and risks is crucial. This would inform the development of educational tools for responsible AI use in mental health.
Conclusion
Generative AI chatbots show significant potential for providing accessible and meaningful mental health support, but further research is needed to establish their efficacy and ensure responsible use. Future research should focus on rigorous evaluations of their effectiveness and safety, addressing the limitations identified in this study. Developers should prioritize improvements in listening skills, memory, the ability to lead the therapeutic process, and the development of more nuanced safety protocols. Clinicians should build awareness of these tools and consider how they might be incorporated into their practice.
Limitations
The study's convenience sampling method may have introduced bias towards positive experiences, as participants were self-selected and predominantly from high-income countries with high levels of technological literacy and who experienced milder mental health conditions. Consequently, the findings may not be generalizable to all populations, particularly those with more severe conditions or limited access to technology. The reflexive nature of thematic analysis, while providing depth, also introduces subjectivity, although rigorous methodology and inter-rater review helped mitigate this. The focus on relatively short-term impacts is another limitation. Long-term studies are necessary to assess sustained effectiveness.
Related Publications
Explore these studies to deepen your understanding of the subject.