logo
ResearchBunny Logo
Introduction
The study is motivated by the rapid advancements in artificial intelligence (AI), particularly in large language models (LLMs) like ChatGPT, Google Bard, and Bing. These LLMs demonstrate human-like conversational abilities and have sparked interest in their potential applications in various fields, including psychotherapy. Early AI programs, like ELIZA, provided rudimentary therapeutic support, but current LLMs represent a significant leap forward. The researchers note the evolution of AI from the steam-powered machines of the first industrial revolution to the current fourth industrial revolution, characterized by AI's integration into healthcare and psychotherapy. The study aims to evaluate the social intelligence (SI) of these LLMs, comparing their performance to that of human psychologists. SI is a critical skill for effective counseling and psychotherapy, encompassing the ability to understand others' feelings, emotions, and needs. The introduction also reviews existing literature on AI in psychotherapy, highlighting the potential benefits (improved diagnosis, data analysis, and therapeutic support) and challenges (potential for errors, ethical considerations). The authors note the conflicting views on AI's capabilities, with some studies showcasing its strengths and others emphasizing its limitations and potential for harm. The study thus seeks to directly address the question of whether AI has reached a level of social intelligence comparable to that of trained psychologists.
Literature Review
The literature review examines existing research on AI's capabilities in the field of psychotherapy. Several studies suggest AI's potential for improving diagnostics, particularly in identifying personality traits and mental health disorders. Others highlight its role in data-driven analysis and the development of new therapeutic models. AI's use in speech content analysis and physiological signal monitoring is also discussed. However, the review also notes concerns about AI's potential for errors, particularly in suicide risk assessment, and the importance of ethical considerations in its application to mental health care. The literature underscores the need for rigorous evaluation of AI models' performance and further research to ensure their safe and responsible use in psychotherapy.
Methodology
The study employed a stratified random sample of 180 counseling psychology students from King Khalid University (72 bachelor's, 108 doctoral students). The age range for doctoral students was 33-46 years, while bachelor's students ranged from 20-28 years. The three LLMs (ChatGPT-4, Google Bard, and Bing) were also included as participants. All participants completed the Social Intelligence Scale (SIS), a 64-item scale with two dimensions: judgment of human behavior and action in social situations. The SIS was originally developed in Arabic and was modified and validated for this study. Construct validity and reliability were assessed, with reliability coefficients ranging from 0.67 to 0.77. The AI models were evaluated once on August 1, 2023, by providing each model with the 64 SIS scenarios. The human participants completed the questionnaire via email. IBM SPSS version 28 was used for data analysis. A one-sample t-test was used to compare the SI scores of the AI models with those of the human participants at both bachelor's and doctoral levels. Means, standard deviations, and percentages were calculated to determine the rankings of AI models and psychologists.
Key Findings
The study revealed significant differences in SI scores between the AI models and the human psychologists. ChatGPT-4 achieved an SI score of 59, significantly exceeding the scores of all human participants (39.19 for bachelor's students and 46.73 for doctoral students). Bing obtained a score of 48, outperforming 50% of doctoral students and 90% of bachelor's students. Google Bard's score was 40, which was comparable to bachelor's students but significantly lower than doctoral students. The findings suggest that ChatGPT-4 and Bing demonstrated superior SI compared to human participants, while Google Bard's performance was on par with bachelor's level students.
Discussion
The results challenge the assumption that human psychologists would demonstrate higher SI than AI models, particularly given the selected sample's presumed high social intelligence. The superior performance of ChatGPT-4 and Bing suggests rapid advancements in AI's ability to understand social cues and respond appropriately. The lower performance of Google Bard might be attributed to its relative novelty and ongoing development at the time of the study. The study also raises ethical questions regarding the use of AI in psychotherapy, including adherence to ethical principles such as confidentiality and empathy. The authors acknowledge limitations in the study's design, including a relatively small and homogeneous sample. While noting concerns about AI potentially replacing human psychologists, the authors suggest that AI could become a valuable tool that complements and enhances the work of human psychotherapists.
Conclusion
The study demonstrates that certain AI models, particularly ChatGPT-4 and Bing, surpass human psychologists in social intelligence as measured by the SIS. This highlights the rapid advancement of AI and its potential to assist in psychotherapy. However, the study underscores the need for further research to address ethical concerns and refine AI's application in mental health care. Future studies should focus on more diverse samples and longitudinal evaluations to track AI development.
Limitations
The study's limitations include a small and relatively homogeneous sample used for the scale's psychometric property validation. The single evaluation of AI models means that longitudinal studies are needed to account for rapid developments in AI. The use of a paid version of ChatGPT-4 and free versions of Bing and Google Bard might have affected results. The sample of psychology students may not fully represent the broader population of psychotherapists. Future research should address these limitations through larger, more diverse samples and longitudinal study designs.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny