logo
ResearchBunny Logo
Deciphering clinical abbreviations with a privacy protecting machine learning system

Medicine and Health

Deciphering clinical abbreviations with a privacy protecting machine learning system

A. Rajkomar, E. Loreaux, et al.

Physicians often rely on clinical abbreviations, leading to confusion for patients and even their peers. This groundbreaking research from Alvin Rajkomar and team harnesses a machine learning model to decode these shorthand terms with remarkable accuracy, sometimes outperforming board-certified physicians. Discover how technology can bridge the gap in medical communication!

00:00
00:00
Playback language: English
Introduction
The widespread use of abbreviations and shorthand in clinical notes presents significant challenges to both patient understanding and efficient clinical workflow. New US legislation mandating electronic sharing of clinical notes with patients heightens the importance of clear and accessible medical records. Studies show that patient comprehension of common medical abbreviations is low (around 62%), while expanding these abbreviations significantly improves comprehension (to 95%). Even clinicians struggle with deciphering notes due to variations in terminology across specialties and institutions. Misinterpretations of abbreviations can lead to medical errors. While avoiding shorthand is advised, it reduces efficiency and increases administrative burden. This necessitates the development of automated systems to assist clinicians in expanding abbreviations. Prior research in abbreviation disambiguation has focused on individual models for each abbreviation, often relying on limited training data, potentially privacy-compromising clinical data, and complex multi-model systems. The challenges include the lack of a large corpus of original and “translated” text, the reliance on de-identified medical data (with associated privacy risks), and the large number of separate tasks involved. This work investigates a new approach using public web data and a single end-to-end translation model to address these challenges.
Literature Review
Existing literature on abbreviation and acronym disambiguation highlights various methods, including naive Bayes, support vector machines, profile-based approaches, hyperdimensional computing, convolutional neural networks, long short-term memory networks, encoder-based transformers (e.g., clinicalBERT), latent meaning cells, and decoder-based transformers. Previous methods evaluated a varying number of abbreviations (13 to 1116), often focusing on ambiguous abbreviations. Many rely on heuristics such as string matching, which is imperfect due to the dual usage of some abbreviations as English words. Some methods address data scarcity by using costly or imprecise labeling techniques, or by relying on de-identified clinical data, raising privacy concerns. Federated learning has been proposed to avoid central data collection, but it requires consistent data structures across sites, which are often lacking in electronic health record systems. State-of-the-art models for abbreviation detection and expansion are typically trained separately, leading to complex systems.
Methodology
This study proposes a novel approach that leverages public web data and an end-to-end translation model to overcome the limitations of prior methods. The methodology consists of three key components: 1. **Fine-tuning dataset generation with WSRS (Web-Scale Reverse Substitution):** A distributed algorithm, WSRS, is designed to create a training dataset by applying reverse substitution to public web data. This involves substituting long forms with their corresponding abbreviations in snippets from a large web corpus. The WSRS algorithm addresses the imbalance in frequency of abbreviation-expansion pairs by upsampling rare expansions and limiting common ones, making it suitable for web-scale data processing. Two versions of the dataset were generated: MC-WSRS (using a medically-related web crawl) and C4-WSRS (using the publicly available C4 dataset). 2. **Model fine-tuning:** Large encoder-decoder Text-to-Text Transfer Transformers (T5) models of varying sizes (60M, 770M, 11B, and 80B parameters) are used. These models are pre-trained on a web corpus and then fine-tuned using the WSRS dataset. The input to the model is a snippet with abbreviations, and the target is the same snippet with the abbreviations expanded. The masked language modeling loss function is used for fine-tuning. 3. **Model Inference:** Three inference methods are compared: standard inference, iterative inference, and elicitive inference. Elicitive inference, a chained-inference technique, involves using the model's output as input in multiple rounds to elicit further expansions, improving performance. A beam search is used during inference. Four clinical notes datasets were used for evaluation: a synthetic dataset created by clinicians, and three real-world datasets (CASI, MIMIC-III, and i2b2-2014). A token-level variation of the Needleman-Wunsch global sequence alignment algorithm, incorporating custom scoring rules, was used to align input and output sequences for evaluation. Human performance was also assessed using four groups: lay people (with and without Google access), medical students, and board-certified physicians.
Key Findings
The study's key findings demonstrate the effectiveness of the proposed approach: * **High accuracy:** The T5 80B model with elicitive inference achieved state-of-the-art performance on the four test datasets. For example, on the synthetic dataset, the model achieved 97.0% total accuracy, surpassing the accuracy of physicians (88.7%). Detection recall ranged from 96.8% to 99.7%, and expansion accuracy ranged from 95.1% to 97.9% across the datasets. The model performed comparably well on ambiguous and unambiguous abbreviations, and its performance was independent of abbreviation rarity. * **Clinician-level performance:** The model's performance matched or exceeded that of board-certified physicians in disambiguating abbreviations, closing the gap in comprehension between lay people and experts. * **Robustness across datasets:** High accuracy was maintained across diverse real-world clinical datasets from various health systems, including both inpatient and outpatient notes. * **Handling of ambiguous abbreviations and English words:** The model successfully disambiguated the same abbreviation used with different meanings in a single sentence and correctly distinguished between English word usage and abbreviation usage for words that can function as both. * **Privacy-preserving:** The system was built using only publicly available web data, ensuring patient privacy. * **Data Quality Impact:** Experiments using the C4 dataset showed that while MC-WSRS performed better, the performance gap was closed using elicitive inference, highlighting the value of medical data enrichment. The model had trouble expanding common bigrams containing abbreviations such as medication names. * **Comparison to human performance:** Lay people without Google access performed poorly (28.6% accuracy), while those with Google access performed better (74.5%). Medical students and physicians performed similarly (88.7%), but the model still achieved superior accuracy (97.6%).
Discussion
The results demonstrate a successful, privacy-preserving machine learning system achieving expert-level performance in deciphering clinical abbreviations. The end-to-end translation approach simplifies the system and improves scalability compared to previous methods. The use of public web data addresses privacy concerns while the elicitive inference technique improves generalizability. The model's ability to handle ambiguous abbreviations and contextual nuances underscores its natural language understanding capabilities. This has the potential to significantly improve patient comprehension of medical records and facilitate more efficient clinical workflows. The model's capacity to expand abbreviations for rare diseases and expressions, though not fully evaluated due to data limitations, suggests broad applicability.
Conclusion
This study presents a novel, privacy-preserving machine learning system for deciphering clinical abbreviations, achieving accuracy comparable to or exceeding that of expert physicians. The end-to-end translation model, trained on public web data using the WSRS algorithm, effectively addresses the limitations of previous methods. Future research should explore optimizations to reduce the computational cost of elicitive inference and compare performance across different large language models. Further investigation into the clinical impact of model errors and efforts to mitigate biases in clinical notes are also warranted. Addressing the "last-mile problem" of ensuring seamless comprehension for patients with varying health literacy levels remains a crucial area for future work.
Limitations
The study has several limitations: the elicitive inference technique increases computational cost; the study did not compare the model to all high-performing language models; human performance may vary based on factors like literacy and specialty; the clinical effects of model errors are unknown; and the model's handling of potentially insensitive or offensive language in clinical notes requires further investigation.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny