logo
ResearchBunny Logo
Abstract
ChatGPT, a 175-billion parameter natural language processing model, was evaluated on its performance on questions from the United States Medical Licensing Examination (USMLE) Step 1 and Step 2 exams. ChatGPT achieved accuracies of 44% (44/100), 42% (42/100), 64.4% (56/87), and 57.8% (59/102) on AMBOSS-Step1, AMBOSS-Step2, NBME-Free-Step1, and NBME-Free-Step2 data sets, respectively. The model's performance decreased with increasing question difficulty. ChatGPT provided logical justification for its answer selection in all cases and included information internal to the question in 96.8% of responses. The presence of information external to the question was significantly higher for correct answers compared to incorrect ones. These findings suggest ChatGPT's potential for medical education, particularly in simulating small group learning.
Publisher
JMIR Medical Education
Published On
Feb 08, 2023
Authors
Aidan Gilson, Conrad W Safranek, Thomas Huang, Vimig Socrates, Ling Chi, Richard Andrew Taylor, David Chartash
Tags
ChatGPT
USMLE
medical education
natural language processing
model performance
question difficulty
learning simulation
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny