logo
ResearchBunny Logo
Abstract
This study evaluates the clinical accuracy of GPT-3.5, GPT-4, and Llama 2 LLMs for clinical decision support tasks, benchmarking against Google search. GPT-4 showed superior performance in suggesting initial diagnoses, examination steps, and treatments across various clinical disciplines and disease frequencies. Llama 2 models showed slightly lower performance. While promising, the results highlight the need for robust and regulated AI models in healthcare, with open-source LLMs offering potential advantages in data privacy and transparency.
Publisher
Nature Communications
Published On
Mar 06, 2024
Authors
Sarah Sandmann, Sarah Riepenhausen, Lucas Plagwitz, Julian Varghese
Tags
AI in healthcare
clinical decision support
GPT-3.5
GPT-4
Llama 2
diagnosis
data privacy
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny