Medicine and Health

Real-time prediction of COVID-19 related mortality using electronic health records

P. Schwab, A. Mehjoo, et al.

Discover CovEWS, a groundbreaking risk scoring system that predicts COVID-19 mortality risk using electronic health records. Developed by a team of experts including Patrick Schwab, Arash Mehjoo, Sonali Parbhoo, and others, this innovative tool showcases exceptional predictive performance, allowing for timely interventions that could save lives.

00:00

~3 min • Beginner • English

Index

Introduction

The COVID-19 pandemic has strained healthcare systems worldwide, with millions infected and hundreds of thousands of deaths by August 2020. Efficient early detection of patients likely to deteriorate is crucial for optimal allocation of scarce resources. Predictive models leveraging electronic health records (EHRs) can support triage and management by identifying high-risk patients in advance. Prior work has identified demographic and inflammatory markers associated with mortality and proposed risk scores; however, many approaches rely on single-center data, do not incorporate time-varying risk factors, and lack mechanisms to update predictions in real time. To address these gaps, the authors developed CovEWS, a continuously updating risk assessment system for real-time prediction of COVID-19-related mortality. CovEWS is trained on large, diverse, multi-institutional EHR datasets, integrates short- and long-term risk factors, and models non-linear, time-varying relationships to provide early warnings up to several days before mortality events.

Literature Review

Existing COVID-19 mortality risk assessments include generic scores such as SOFA and MEWS and COVID-19-specific models (e.g., Yan et al.; Liang et al.). Prior studies identified risk factors such as age, inflammatory markers, oxygenation status, and comorbidities. However, many models were developed on limited, often single-country cohorts and typically assume static covariates, limiting their ability to react to rapid clinical changes. The literature also highlights the need for external validation, calibration, and consideration of missingness in real-world EHRs. CovEWS is positioned to improve on these aspects by combining time-varying covariates, non-linear modeling, and multi-cohort training with external validation.

Methodology

Data sources and cohorts: De-identified EHRs were obtained from two federated networks: Optum (US) and TriNetX (US and international). Optum provided 47,834 COVID-19-positive patients between March 21 and June 5, 2020 (11 weeks), and an additional temporally separated 'Optum future' cohort of 14,014 patients (June to July 2020). TriNetX provided an external test cohort of 5005 COVID-19-positive patients (March 21 to June 25, 2020; 13 weeks) from 24 healthcare organizations in the US, Australia, Malaysia, and India. Patients were considered COVID-19-positive by ICD-10 coding aligned with CDC guidelines or positive lab tests. Data included demographics, diagnoses (ICD-9/ICD-10), vital signs, labs (LOINC), procedures (CPT, ICD-9-CM, ICD-10-PCS), and clinical observations. Cohort differences included higher missingness of short-term vitals in TriNetX and differences in disease severity and hospitalization rates. Splits: From Optum, 23,692 patients (50%) were used for training, 9477 (20%) for validation/model selection, and 14,215 (30%) for held-out testing. The entire TriNetX cohort (5005 patients) served as an external test set. The 'Optum future' cohort was used to assess temporal robustness under changing treatment policies (e.g., hydroxychloroquine de-emphasized and dexamethasone adoption). Handling missing data: Missingness was pervasive. Multiple imputation by chained equations (MICE) was employed. Additional analyses on subsets with fewer missing covariates were performed to assess robustness. Outcome and prediction horizons: The outcome was COVID-19-related mortality. Performance was evaluated at fixed prediction horizons (1, 2, 4, 8, 16, 24, 48, 96, 192 hours before the event). For censored patients, the last EHR entry date served as reference. Model: CovEWS is a time-varying survival model based on a neural extension of the Cox proportional hazards model that accommodates non-linear effects and time-varying covariates. The hazard is modeled as h(t) = h0(t) * exp(ϕθ(x(t))), where ϕθ is a neural network capturing non-linear interactions of covariates x(t). The network comprises a linear layer, LeakyReLU activation, a final linear layer, and tanh activation. Training maximizes the partial log-likelihood with Efron’s method for handling ties, using automatic differentiation and the Adam optimizer (learning rate 0.001) for up to 100 epochs. Dropout regularization was applied. Hyperparameter optimization allotted up to 15 random configurations per algorithm within predefined ranges on the validation cohort. Calibration and thresholds: Post-training recalibration aligned risk outputs with observed event rates using ROC-based analysis to select operating points for desired sensitivities (e.g., 85%, 90%, 95%). Thresholds varied by horizon to prioritize sensitivity for earlier forecasts. Guidance for clinical integration suggests choosing thresholds to mitigate alarm fatigue and aligning with workflow checkpoints (e.g., admission, discharge, pre-/post-interventions, continuous monitoring). Interpretability: Feature importance was computed via Integrated Gradients (IG), with attributions normalized to [-100%, 100%], computed at each EHR update to provide temporal attributions of contributing covariates (e.g., SpO2, respiratory rate, blood pressure, creatinine, CRP). Baselines: Comparators included SOFA, MEWS, COVER-F/COVF, a linear time-varying Cox model (CovEWS-linear), and models by Yan et al. and Liang et al. Where direct score components were unavailable, reasonable assumptions were made based on available EHR fields, acknowledging potential impact on baseline performance. Statistical evaluation: Specificity at fixed high sensitivities (≥90% and ≥95%) was the primary metric across horizons on the Optum test and TriNetX external cohorts. Bootstrapping (200 samples) provided 95% confidence intervals. One-sided Mann–Whitney–Wilcoxon tests with Bonferroni correction assessed superiority of CovEWS over baselines. Kaplan–Meier stratified survival analyses evaluated time-varying risk stratification by CovEWS score bands (e.g., <60, 60–69, 70–79, 80–89, 90–100).

Key Findings

- Early warning performance: CovEWS provided clinically meaningful predictions up to 192 hours before mortality. On the Optum held-out test set, specificity at sensitivity >95% decreased from 89.3% (95% CI: 83.0, 91.6%) at 1 hour to 70.5% (95% CI: 65.6, 74.4%) at 192 hours. On the external TriNetX cohort, specificity at sensitivity >95% decreased from 78.8% (95% CI: 76.0, 84.7%) at 1 hour to 69.4% (95% CI: 57.6, 75.2%) at 192 hours. - Superiority to baselines: Across horizons and cohorts (Optum test and TriNetX external), CovEWS significantly outperformed SOFA, MEWS, COVER-F/COVF, Yan et al., Liang et al., and the linear time-varying Cox model in specificity at fixed high sensitivities (p < 0.05, one-sided Mann–Whitney–Wilcoxon with Bonferroni correction), with few nonsignificant exceptions at the longest horizons in some subsets. - Generalization and robustness: Performance generalized to the external TriNetX dataset and remained largely robust on the temporally separated Optum future cohort despite changes in treatment policies (e.g., dexamethasone adoption, hydroxychloroquine discontinuation). Variance increased at long horizons due to fewer long follow-ups. - Subgroup analyses: CovEWS maintained superior predictive value across ethnic subgroups (Caucasian, Black or African American, Hispanic, Asian) and in non-hospitalized patients, though performance was lower in the latter, likely due to higher missingness and differing care patterns. - Time-varying risk stratification: Kaplan–Meier analyses showed clear separation of survival curves by CovEWS score strata over time; higher strata (e.g., scores 90–100) had higher mortality rates. Separation generalized across Optum and TriNetX; reduced short-term vital availability in TriNetX attenuated near-term risk reactivity in the highest stratum.

Discussion

The study demonstrates that a continuously updating, time-varying, non-linear survival model trained on large, diverse, real-world EHR data can identify patients at high risk of COVID-19-related mortality hours to days in advance. CovEWS addresses limitations of static and single-center models by integrating short- and long-term risk factors and reacting in real time to clinical changes. Superior performance over established generic (SOFA, MEWS) and COVID-19-specific models across multiple horizons, datasets, and subgroups indicates strong generalizability. The ability to provide reliable early warnings has practical implications for clinical decision support, enabling earlier escalation of monitoring, timely therapeutic interventions, and informed goals-of-care discussions, potentially improving outcomes and resource allocation. Differences in short-term variable availability and missingness affect near-term risk responsiveness and may vary across data sources, highlighting the importance of data quality and harmonization. While promising, clinical deployment requires careful threshold calibration to balance sensitivity and alarm burden, and integration into workflows at key decision points.

Conclusion

CovEWS is a real-time early warning system that predicts COVID-19-related mortality risk up to 192 hours in advance using routinely collected EHR data. Trained and validated on large, multi-institutional cohorts, it outperforms existing generic and COVID-19-specific risk scores, generalizes across populations and time periods, and provides interpretable, time-varying risk attributions. Future work should (1) evaluate prospective clinical impact on decision-making, outcomes, and resource utilization; (2) refine calibration and threshold selection tailored to institutional workflows to mitigate alarm fatigue; (3) expand validation across additional geographies and healthcare systems; (4) incorporate social determinants and address bias and health disparities; and (5) improve handling of missingness and standardization of data capture to enhance short-term risk responsiveness.

Limitations

- Data quality and missingness: Real-world EHRs exhibited pervasive missing data, particularly short-term vitals in the TriNetX cohort, limiting near-term risk reactivity; multiple imputation (MICE) was used but residual bias may remain. - Outcome timing: Exact dates of death were not always available (privacy restrictions), requiring approximations that may underestimate performance or introduce timing uncertainty. - Baseline implementations: Some comparator scores (e.g., SOFA, COVER-F) required assumptions due to unavailable components, potentially affecting comparative performance. - Unmeasured confounding: Do-not-resuscitate (DNR) status and social determinants were not available; mortality may partially reflect treatment limitation decisions, complicating causal interpretation and potentially introducing bias. - Heterogeneity across sites: Differences in data collection practices, treatment policies, and patient populations across federated networks may affect generalizability despite external validation. - Non-hospitalized subgroup: Higher missingness and care patterns reduced performance compared to hospitalized populations. - Label limitations: Potential under-ascertainment of deaths occurring outside participating institutions could bias outcome labeling.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Real-time tracking and prediction of COVID-19 infection using digital proxies of population mobility and mixing

K. Leung, J. T. Wu, et al.

Medicine and Health

TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records

Z. Yang, A. Mitra, et al.

Medicine and Health

Development of prediction models for screening depression and anxiety using smartphone and wearable-based digital phenotyping: protocol for the Smartphone and Wearable Assessment for Real-Time Screening of Depression and Anxiety (SWARTS-DA) observational study in Korea

Y. Shin, A. Y. Kim, et al.

Medicine and Health

Natural language processing of multi-hospital electronic health records for public health surveillance of suicidality

R. Bey, A. Cohen, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny