logo
ResearchBunny Logo
Large language models are proficient in solving and creating emotional intelligence tests

Psychology

Large language models are proficient in solving and creating emotional intelligence tests

K. Schlegel, N. R. Sommer, et al.

Large Language Models like ChatGPT-4 matched or surpassed human performance on five standard emotional intelligence tests and even generated new test items with comparable difficulty. Research conducted by Katja Schlegel, Nils R. Sommer, and Marcello Mortillaro shows LLMs produce responses consistent with accurate knowledge about human emotions and their regulation.

00:00
00:00
~3 min • Beginner • English
Abstract
Large Language Models (LLMs) demonstrate expertise across diverse domains, yet their capacity for emotional intelligence remains uncertain. This research examined whether LLMs can solve and generate performance-based emotional intelligence tests. Results showed that ChatGPT-4, ChatGPT-01, Gemini 1.5 flash, Copilot 365, Claude 3.5 Haiku, and DeepSeek V3 outperformed humans on five standard emotional intelligence tests, achieving an average accuracy of 81%, compared to the 56% human average reported in the original validation studies. In a second step, ChatGPT-4 generated new test items for each emotional intelligence test. These new versions and the original tests were administered to human participants across five studies (total N = 467). Overall, original and ChatGPT-generated tests demonstrated statistically equivalent test difficulty. Perceived item clarity and realism, item content diversity, internal consistency, correlations with a vocabulary test, and correlations with an external ability emotional intelligence test were not statistically equivalent between original and ChatGPT-generated tests. However, all differences were smaller than Cohen's d ± 0.25, and none of the 95% confidence interval boundaries exceeded a medium effect size (d ± 0.50). Additionally, original and ChatGPT-generated tests were strongly correlated (r = 0.46). These findings suggest that LLMs can generate responses that are consistent with accurate knowledge about human emotions and their regulation.
Publisher
Communications Psychology
Published On
May 21, 2025
Authors
Katja Schlegel, Nils R. Sommer, Marcello Mortillaro
Tags
Large Language Models
Emotional Intelligence
Test Generation
Psychometric Validation
Human vs AI Comparison
ChatGPT-4
Item Equivalence
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny