logo
ResearchBunny Logo
Efficient Detection of Stigmatizing Language in Electronic Health Records via In-Context Learning: A Comparative Analysis and Validation Study

Medicine and Health

Efficient Detection of Stigmatizing Language in Electronic Health Records via In-Context Learning: A Comparative Analysis and Validation Study

H. Chen, M. Alfred, et al.

This groundbreaking study by Hongbo Chen, Myrtede Alfred, and Eldan Cohen delves into the effectiveness of in-context learning (ICL) for identifying stigmatizing language in Electronic Health Records. Remarkably, ICL surpassed traditional methods with superior performance despite utilizing less data, highlighting its potential in bias reduction within healthcare documentation.

00:00
00:00
~3 min • Beginner • English
Abstract
Background: The presence of stigmatizing language within Electronic Health Records (EHRs) poses risks to patient care by perpetuating biases, disrupting therapeutic relationships, and diminishing treatment adherence. Prior work has largely relied on supervised machine learning, which requires resource‑intensive annotated datasets. In‑context learning (ICL) enables large language models (LLMs) to adapt to tasks based on instructions and examples, reducing dependence on labeled data. Objective: To investigate the efficacy of ICL for detecting stigmatizing language in EHRs under data‑scarce conditions. Methods: We analyzed 5,043 EHR sentences from MIMIC‑IV emergency department discharge summaries. ICL was compared against zero‑shot (textual entailment) and few‑shot (SetFit) approaches, and a fully supervised fine‑tuning approach. Four prompting strategies were tested for ICL: Generic, Chain of Thought (COT), Clue and Reasoning Prompting (CARP), and a novel Stigma Detection Guided Prompt. Fairness was evaluated using equality of performance across sex, age, and race via TPR, FPR, and F1 disparities. Results: In zero‑shot, the best ICL model (GEMMA‑2 with Stigma Detection Guided Prompt) achieved F1=0.858 (95% CI [0.854, 0.862]), outperforming the best textual entailment model (DEBERTA‑M, F1=0.723, 95% CI [0.718, 0.728]) (P<.001). In few‑shot, the best ICL model (LLAMA‑3 with the same prompt) exceeded SetFit by 21.2%, 21.4%, and 12.3% F1 with 4, 8, and 16 annotations per class, respectively (all P<.001). With only 32 labeled instances, best ICL reached F1=0.901 (95% CI [0.895, 0.907]), close to supervised RoBERTa F1=0.931 (95% CI [0.924, 0.938]) trained on 3,543 labeled instances. Supervised models showed larger fairness disparities (e.g., highest TPR disparities up to 0.051 by sex, 0.108 by age, 0.064 by race) than ICL, which remained below 0.016 across subgroups. Conclusions: ICL effectively detects stigmatizing language, outperforming popular zero‑ and few‑shot baselines and approaching fully supervised performance with orders of magnitude fewer labels. The new Stigma Detection Guided Prompt enhances ICL detection. ICL provides a data‑efficient and more equitable alternative for EHR stigma detection.
Publisher
JMIR Medical Informatics
Published On
Nov 20, 2024
Authors
Hongbo Chen, Myrtede Alfred, Eldan Cohen
Tags
in-context learning
stigmatizing language
Electronic Health Records
bias evaluation
data-scarce conditions
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny