logo
ResearchBunny Logo
Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs

Medicine and Health

Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs

L. Wang, X. Chen, et al.

Discover how prompt engineering can enhance the reliability of Large Language Models in answering medical queries. Learn from a study conducted by Li Wang, Xi Chen, XiangWen Deng, Hao Wen, MingKe You, WeiZhi Liu, Qi Li, and Jian Li, highlighting the crucial role of effective prompting strategies in medical accuracy.

00:00
00:00
~3 min • Beginner • English
Abstract
The use of large language models (LLMs) in clinical medicine is currently thriving. Effectively transferring LLMs' pertinent theoretical knowledge from computer science to their application in clinical medicine is crucial. Prompt engineering has shown potential as an effective method in this regard. To explore the application of prompt engineering in LLMs and to examine the reliability of LLMs, different styles of prompts were designed and used to ask different LLMs about their agreement with the American Academy of Orthopedic Surgeons (AAOS) osteoarthritis (OA) evidence-based guidelines. Each question was asked 5 times. We compared the consistency of the findings with guidelines across different evidence levels for different prompts and assessed the reliability of different prompts by asking the same question 5 times. gpt-4-Web with ROT prompting had the highest overall consistency (62.9%) and a significant performance for strong recommendations, with a total consistency of 77.5%. The reliability of the different LLMs for different prompts was not stable (Fleiss kappa ranged from -0.002 to 0.984). This study revealed that different prompts had variable effects across various models, and the gpt-4-Web with ROT prompt was the most consistent. An appropriate prompt could improve the accuracy of responses to professional medical questions.
Publisher
npj Digital Medicine
Published On
Jan 01, 2024
Authors
Li Wang, Xi Chen, XiangWen Deng, Hao Wen, MingKe You, WeiZhi Liu, Qi Li, Jian Li
Tags
prompt engineering
Large Language Models
osteoarthritis
consistency
reliability
medical questions
accuracy
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny