Introduction
Deep learning (DL) models for disease prediction using EHRs show promise but often require large training datasets. This limitation hinders their adoption in scenarios with limited data. The success of BERT and similar models in natural language processing (NLP) using pre-trained contextualized embeddings inspired the development of Med-BERT. Existing EHR transfer learning approaches often utilize static embeddings or internal layers of pre-trained models, failing to fully capture contextual information. Med-BERT addresses this by adapting the BERT framework to structured EHR data, leveraging a massive dataset to pre-train contextualized embeddings. The authors compare Med-BERT to existing models like BEHRT and G-BERT, highlighting Med-BERT's larger vocabulary, larger pre-training cohort, and the use of a publicly accessible vocabulary (ICD-9 and ICD-10) from a multi-institutional dataset (Cerner), enhancing its deployability. This work demonstrates that a BERT-style model for structured EHRs significantly improves performance in real-world predictive modeling tasks, offering advantages in handling limited training data, reducing data collection expenses, and accelerating AI-aided healthcare.
Literature Review
The introduction reviews existing AI-based disease prediction methods, including conventional machine learning and deep learning approaches. It notes the success of deep learning but highlights the need for large, high-quality datasets, which are often unavailable due to factors like limited cases for rare conditions, difficulties in data cleaning and annotation, and governance issues. The authors discuss transfer learning as a solution, focusing on the pre-training/fine-tuning paradigm successful in NLP and computer vision. They mention BERT and related models as successful examples in NLP and discuss existing attempts to apply similar techniques to EHR data, highlighting limitations of previous approaches such as the use of static embeddings which fail to capture contextual information and other limitations such as smaller vocabulary sizes or smaller pretraining datasets compared to Med-BERT.
Methodology
Med-BERT utilizes the transformer architecture of BERT, incorporating three types of embeddings: code embeddings (low-dimensional representations of diagnosis codes), serialization embeddings (representing the order of codes within a visit), and visit embeddings (distinguishing visits). Unlike BERT, Med-BERT doesn't use [CLS] and [SEP] tokens due to differences in input format. Instead, a feed-forward layer (FFL) is used for classification tasks. The pre-training process uses two tasks: Masked Language Modeling (predicting masked codes given context) and Prolonged Length of Stay (LOS) prediction (predicting whether a patient had a prolonged hospital stay). For downstream prediction tasks, a classification layer is added on top of the pre-trained Med-BERT, and the entire model is fine-tuned on a disease-specific training cohort. Evaluations are conducted on two disease prediction tasks (Diabetic Heart Failure (DHF) and Pancreatic Cancer (PaCa)) using cohorts from Cerner and Truven databases. The performance is measured using AUC, comparing Med-BERT with various baseline models (GRU, Bi-GRU, RETAIN) with and without additional static embeddings and with an untrained Med-BERT model to demonstrate the benefits of pre-training. A visualization tool is used to interpret attention patterns within the model.
Key Findings
Med-BERT significantly improves prediction accuracy across multiple disease prediction tasks and datasets. It consistently boosts AUC scores by 1.21–6.14% compared to baseline models. The improvements are particularly pronounced in scenarios with small fine-tuning datasets, with AUC boosts exceeding 20% in some cases. Med-BERT's performance on the Truven dataset (external to the pre-training data) demonstrates its generalizability. The visualization analysis reveals that Med-BERT learns both syntactic and medically meaningful contextual information across visits, capturing temporal relationships between codes. Comparison with other embedding methods like t-W2V showed that Med-BERT consistently outperforms those methods.
Discussion
Med-BERT addresses the challenge of limited training data in deep learning-based disease prediction. Its superior performance, particularly with small datasets, demonstrates the effectiveness of pre-trained contextualized embeddings in capturing complex semantic relationships within EHR data. The model's generalizability is highlighted by its success on an external dataset. The visualization module contributes to model interpretability, allowing insights into the learned relationships between codes. The authors discuss the limitations of simpler models like Logistic Regression (LR) for small datasets, showing that while LR is competitive with very small sample sizes, Med-BERT surpasses it with larger training sets. The broad vocabulary (ICD-9 and ICD-10) enhances Med-BERT’s applicability across different institutions. While the study focuses on diagnosis codes, future work could incorporate other data types, improving the comprehensiveness and accuracy of disease prediction.
Conclusion
Med-BERT effectively leverages the power of pre-trained contextualized embeddings for improved disease prediction using EHR data, especially beneficial for scenarios with limited training data. The results underscore the model's generalizability and interpretability. Future directions include incorporating additional data types and refining visualization techniques.
Limitations
The current study utilizes only diagnosis codes (ICD-9 and ICD-10), excluding other valuable information present in EHRs, such as medication, procedures, and lab results. The length of time intervals between visits is not explicitly considered, potentially impacting the capturing of temporal dependencies. The authors also acknowledge the potential for miscoding of diabetes type. These limitations could affect the model's overall performance and generalizability.
Related Publications
Explore these studies to deepen your understanding of the subject.