This study evaluates the clinical accuracy of GPT-3.5, GPT-4, and Llama 2 LLMs for clinical decision support tasks, benchmarking against Google search. GPT-4 showed superior performance in suggesting initial diagnoses, examination steps, and treatments across various clinical disciplines and disease frequencies. Llama 2 models showed slightly lower performance. While promising, the results highlight the need for robust and regulated AI models in healthcare, with open-source LLMs offering potential advantages in data privacy and transparency.
Publisher
Nature Communications
Published On
Mar 06, 2024
Authors
Sarah Sandmann, Sarah Riepenhausen, Lucas Plagwitz, Julian Varghese
Tags
AI in healthcare
clinical decision support
GPT-3.5
GPT-4
Llama 2
diagnosis
data privacy
Related Publications
Explore these studies to deepen your understanding of the subject.