Medicine and HealthFrontiers in Psychiatry
Evaluation of large language models on mental health: from knowledge test to illness diagnosis
Y. Xu, Z. Fang, et al.
Large language models are put to the test for Chinese mental health tasks in this study — evaluating 15 state-of-the-art LLMs (e.g., DeepSeek-R1, GPT-4.1, QwQ) on knowledge and diagnostic benchmarks (Dreaddit, SDCNL, CAS exam). This research, conducted by the authors listed in the <Authors> tag, highlights top performers and offers clear guidance for selecting and improving models in sensitive mental health scenarios.
Related Publications
Explore these studies to deepen your understanding
Adjacent work that informs or extends this paper's methodology and findings.
Computer Science
Evaluation of large language models on mental health: from knowledge test to illness diagnosis
Y. Xu, Z. Fang, et al.
Medicine and Health
A framework for human evaluation of large language models in healthcare derived from literature review
T. Y. C. Tam, S. Sivarajkumar, et al.
Education
How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment
A. Gilson, C. W. Safranek, et al.
Psychology
From Passion to Abyss: The Mental Health of Athletes during COVID-19 Lockdown
L. Pitacho, P. J. D. Palma, et al.

