logo
ResearchBunny Logo
Introduction
Acute kidney injury (AKI) is a severe clinical syndrome with high in-hospital mortality and long-term adverse health outcomes. Current diagnostic tools are limited, especially outside intensive care units. Machine learning, using electronic health record (EHR) data, offers potential for improved AKI prediction. Existing models, while showing promise with AUROC values ranging from 0.71–0.80 in derivation studies, face challenges in transportability across different healthcare settings. Factors like patient heterogeneity, clinical process variability, and EHR data heterogeneity hinder model generalization. This study aims to address this limitation by leveraging the standardized data within the US PCORnet platform, specifically the Greater Plains Collaborative (GPC), to develop and validate a transportable AKI prediction model, and to develop a method for predicting model transportability.
Literature Review
Traditional machine learning models for AKI prediction have achieved varying success in derivation and external validation studies. The performance of these models often deteriorates when applied to datasets from different hospitals or populations due to variations in patient characteristics, data quality, and clinical practices. While some studies report high AUROC scores in their derivation cohorts, external validation studies show a significant drop in performance, highlighting the critical need for robust and transportable models. The lack of standardized data formats and terminologies contributes to these challenges. The emergence of platforms like PCORnet, which standardize EHR data across multiple institutions, offers a potential solution to improve the generalizability of AI models in healthcare.
Methodology
This study utilized data from the Greater Plains Collaborative (GPC), a PCORnet Clinical Data Research Network encompassing twelve independent health systems. The researchers collected EHR data from January 1, 2010, to December 31, 2018, focusing on inpatient encounters. After data curation, which involved handling outliers and missing values, a dataset containing over 38,000 unique variables was obtained. A gradient boosting model with decision trees in a discrete-time survival framework (DS-GBT) was employed for AKI prediction. The data was split into derivation, calibration, internal validation, and temporal validation sets. The model's performance was evaluated using AUROC and AUPRC. SHAP values were used for interpretability, showing the marginal effects of features. External validation was performed on data from five other GPC sites. An adjusted maximum mean discrepancy (adjMMD) metric was developed to quantify the difference in feature space distributions between the source and target hospitals and predict model transportability. The researchers analyzed the common and specific features across sites, and investigated the relationship between adjMMD and the AUROC drop to find a minimal feature set that could accurately predict transportability.
Key Findings
The DS-GBT model showed robust performance in the source health system, with a temporal hold-out AUROC of 0.76 for any AKI prediction within 48 hours and 0.81 for at least AKI stage 2. SHAP values revealed serum creatinine (SCr), vancomycin exposure, blood pressure changes, age, BMI, and chest X-ray as top predictors. External validation across five other health systems showed significant performance variability, with AUROC for predicting moderate-to-severe AKI ranging from 0.68 to 0.80. Refitting the model on local data significantly improved performance in most sites. Analysis revealed substantial heterogeneity in risk factors across sites, with many site-specific important features. The adjMMD metric showed a strong correlation (Pearson correlation coefficient of 0.95) with the AUROC drop when considering the top 13 features from the source model. This suggests that adjMMD, calculated using a minimal feature set, can effectively predict the performance deterioration of a transported model. A simple linear regression equation was derived to estimate the AUROC drop based on the adjMMD score. The robustness of adjMMD was further validated through additional experiments using different models and derivation sites.
Discussion
This study highlights the importance of considering model transportability in developing AI models for clinical use. The observed performance variations emphasize the need for methods like adjMMD to predict model performance in new settings. While the model demonstrated good performance in the source hospital, its performance significantly varied when deployed in other healthcare systems. The adjusted maximum mean discrepancy (adjMMD) metric successfully predicted these performance changes, highlighting its potential utility in assessing the transportability of AI models. The study’s findings have implications for developing generalizable AI models in healthcare and the need for methods that allow for assessment of model transportability without needing full access to target data. The interactive dashboards created for interpreting the model provide valuable tools for clinical decision support. Future research could focus on refining the adjMMD metric, exploring additional data harmonization techniques, and developing more sophisticated methods for model adaptation and transfer learning.
Conclusion
This research demonstrates the challenges of transporting AI models for AKI prediction across different healthcare settings. A novel adjusted maximum mean discrepancy (adjMMD) metric was developed and validated to predict model transportability, thereby accelerating the adaptation of external AI models. This metric offers a valuable tool for assessing the suitability of a model in a new setting without requiring full access to target hospital data. Future research should focus on improving data harmonization and exploring advanced transfer learning techniques to create truly generalizable AI models for AKI prediction.
Limitations
The study's limitations include the reliance on SCr-based AKI definition, potential underestimation of AKI incidence in patients without baseline SCr within 7 days of admission, censoring of patients with length of stay exceeding 7 days, use of CPT billing codes for procedures, and missing key variables in the PCORnet CDM. The generalization of the adjMMD findings may be limited by the specific characteristics of the GPC network and the model used in this study. Despite these limitations, the study provides valuable insights into the challenges of cross-site model transportability and offers a novel approach for predicting and mitigating performance variations.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny