logo
ResearchBunny Logo
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Medicine and Health

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

L. Rasmy, Y. Xiang, et al.

Discover how Med-BERT, an innovative contextualized embedding model tailored for electronic health records, revolutionizes disease prediction accuracy. Developed by Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi, this research showcases significant performance enhancements across clinical databases, paving the way for cost-efficient AI in healthcare.

00:00
00:00
~3 min • Beginner • English
Abstract
Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks but typically require large, annotated datasets. Inspired by bidirectional encoder representations from transformers (BERT) in natural language processing, we propose Med-BERT, a contextualized embedding model adapted to structured EHRs and pretrained on 28,490,650 patients. Fine-tuning experiments on two disease prediction tasks across two clinical databases show that Med-BERT substantially improves area under the receiver operating characteristic curve (AUC) by 1.21–6.14%. Med-BERT is particularly effective with small fine-tuning sets, achieving over 20% AUC boosts in some settings or matching performance of models trained on datasets up to ten times larger than those without Med-BERT. Using a large, widely adopted vocabulary (ICD-9 and ICD-10) and multi-institutional data, Med-BERT demonstrates cross-dataset generalizability. These results suggest Med-BERT can enable high-performing predictive models with limited local data, reduce data collection burden, and accelerate AI-enabled healthcare.
Publisher
npj Digital Medicine
Published On
May 20, 2021
Authors
Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, Degui Zhi
Tags
Med-BERT
disease prediction
electronic health records
contextualized embeddings
AI healthcare
clinical databases
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny