Medicine and Health

Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks

S. Sandmann, S. Riepenhausen, et al.

This groundbreaking study by Sarah Sandmann, Sarah Riepenhausen, Lucas Plagwitz, and Julian Varghese explores the capabilities of advanced AI models like GPT-3.5, GPT-4, and Llama 2 in clinical decision support. Discover how GPT-4 outperformed its peers, suggesting effective diagnoses and treatments, while pointing to the crucial need for regulated AI in healthcare!... show more

Abstract

It is likely that individuals are turning to Large Language Models (LLMs) to seek health advice, much like searching for diagnoses on Google. We evaluate clinical accuracy of GPT-3.5 and GPT-4 for suggesting initial diagnosis, examination steps and treatment of 110 medical cases across diverse clinical disciplines. Moreover, two model configurations of the Llama 2 open source LLMs are assessed in a sub-study. For benchmarking the diagnostic task, we conduct a naïve Google search for comparison. Overall, GPT-4 performed best with superior performances over GPT-3.5 considering diagnosis and examination and better performance on frequent vs rare diseases is evident for all three approaches. The sub-study indicates slightly lower performances for Llama models. In conclusion, the commercial LLMs show promising potential for medical question answering in two successive major leases. However, some weaknesses underscore the need for robust and regulated AI models in health care. Open source LLMs can be a viable option to address specific needs regarding data privacy and transparency of training.

Publisher

Nature Communications

Published On

Mar 06, 2024

Authors

Sarah Sandmann, Sarah Riepenhausen, Lucas Plagwitz, Julian Varghese

DOI

https://doi.org/10.1038/s41467-024-46411-8

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Population Pharmacokinetic and Exposure–Response Analysis of Finerenone: Insights Based on Phase IIb Data and Simulations to Support Dose Selection for Pivotal Trials in Type 2 Diabetes with Chronic Kidney Disease

N. Snelder, R. Heinig, et al.

Medicine and Health

Systematic Review and Meta-Analysis: The clinical value of color ultrasound screening for fetal cardiovascular abnormalities during the second trimester

B. Shi, Z. Han, et al.

Medicine and Health

Clinical Severity of SARS-CoV-2 Variants during COVID-19 Vaccination: A Systematic Review and Meta-Analysis

Z. Yuan, Z. Shao, et al.

Psychology

Probiotics as a Tool for Regulating Molecular Mechanisms in Depression: A Systematic Review and Meta-Analysis of Randomized Clinical Trials

A. Pinhasov, M. Sikorska, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny