logo
ResearchBunny Logo
Evaluation of large language models on mental health: from knowledge test to illness diagnosis

Computer Science

Evaluation of large language models on mental health: from knowledge test to illness diagnosis

Y. Xu, Z. Fang, et al.

Large language models are reshaping mental health tools—this study systematically evaluates 15 state-of-the-art LLMs (including DeepSeek-R1/V3, GPT-4.1, Llama4, and Alibaba’s QwQ) on knowledge testing and diagnostic tasks in Chinese datasets. Results show DeepSeek-R1, QwQ, and GPT-4.1 lead in accuracy, offering guidance for safer model selection. This research was conducted by the authors listed in the <Authors> tag.

00:00
00:00
~3 min • Beginner • English
Citation Metrics
Citations
0
Influential Citations
0
Reference Count
33

Note: The citation metrics presented here have been sourced from Semantic Scholar and OpenAlex.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny