logo
ResearchBunny Logo
How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

Education

How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

A. Gilson, C. W. Safranek, et al.

Discover how ChatGPT, a cutting-edge natural language processing model, tackled the challenges of the USMLE Step 1 and Step 2 exams with surprising accuracy. This groundbreaking research, conducted by Aidan Gilson, Conrad W Safranek, Thomas Huang, Vimig Socrates, Ling Chi, Richard Andrew Taylor, and David Chartash, highlights ChatGPT's potential as a valuable tool in medical education, especially for simulating small group learning.

00:00
00:00
Playback language: English
Abstract
ChatGPT, a 175-billion parameter natural language processing model, was evaluated on its performance on questions from the United States Medical Licensing Examination (USMLE) Step 1 and Step 2 exams. ChatGPT achieved accuracies of 44% (44/100), 42% (42/100), 64.4% (56/87), and 57.8% (59/102) on AMBOSS-Step1, AMBOSS-Step2, NBME-Free-Step1, and NBME-Free-Step2 data sets, respectively. The model's performance decreased with increasing question difficulty. ChatGPT provided logical justification for its answer selection in all cases and included information internal to the question in 96.8% of responses. The presence of information external to the question was significantly higher for correct answers compared to incorrect ones. These findings suggest ChatGPT's potential for medical education, particularly in simulating small group learning.
Publisher
JMIR Medical Education
Published On
Feb 08, 2023
Authors
Aidan Gilson, Conrad W Safranek, Thomas Huang, Vimig Socrates, Ling Chi, Richard Andrew Taylor, David Chartash
Tags
ChatGPT
USMLE
medical education
natural language processing
model performance
question difficulty
learning simulation
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny