logo
ResearchBunny Logo
A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients

Medicine and Health

A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients

N. Razavian, V. J. Major, et al.

This innovative research conducted by Narges Razavian and colleagues presents a real-time prediction model for favorable outcomes in hospitalized COVID-19 patients. With impressive precision and integrated into EHR, this model aims to revolutionize patient care within 96 hours of prediction.... show more
Introduction

This study addresses a pressing operational and clinical need during the COVID-19 pandemic: identifying hospitalized patients who are at low risk of adverse events and therefore likely to have favorable outcomes within the next 96 hours. The goal is to aid safe discharge and bed management during surges by predicting favorable trajectories rather than deterioration. The authors hypothesize that a parsimonious, interpretable model using real-time EHR data (vital signs, labs, and oxygen support) can accurately and prospectively predict favorable outcomes and be integrated into clinical workflows to support decision-making.

Literature Review

The authors note at least 20 peer-reviewed prognostic models for COVID-19 published early in the pandemic, using demographics, clinical data, and imaging to predict adverse outcomes (e.g., ICU transfer, intubation, death). Most were developed in China, with a few from the U.S., South Korea, and Europe. Only a minority underwent external or held-out validation, even fewer were prospectively validated, and none reported clinical implementation or focused on predicting favorable outcomes. This gap motivates a model tailored to identifying low-risk patients with real-world deployment.

Methodology

Design: Two-stage modeling framework with retrospective development/validation and prospective validation plus EHR implementation. Cohorts: Retrospective cohort included all COVID-19 positive adult hospitalizations across four hospitals from 03/03/2020–04/26/2020 (3317 unique patients; 3345 admissions; 28,431 prediction instances across splits). Prospective evaluation covered 05/15/2020–05/28/2020, with predictions every 30 minutes (445 patients; 474 admissions; 109,913 prediction instances). Outcome: Favorable outcome defined as absence of adverse events within 96 hours post-prediction. Adverse events included: death or discharge to hospice; ICU admission; significant oxygen support (mechanical ventilation, non-invasive positive pressure ventilation including BiPAP/CPAP, high-flow nasal cannula, face mask including partial/non-rebreather, or nasal cannula >6 L/min); and, if discharged, ED re-presentation or readmission within 96 hours. Predictors: Demographics (age, sex, race/ethnicity, smoking history); labs (neutrophils, lymphocytes, eosinophils—counts and percentages; platelet count/volume; BUN; creatinine; D-dimer; ferritin; LDH; procalcitonin); vital signs aggregated over prior 12 hours (min and max of HR, RR, SpO2, temperature); weight and BMI; oxygen support (room air, nasal cannula with max flow rate over prior 12 hours, or devices beyond nasal cannula). Current length of stay was considered. Real-time availability constraints emphasized inpatient data during admission. Missing data: For retrospective analysis, prediction instances with completely missing vitals and no prior measurements (4%) were excluded; remaining missing aggregated vitals (<2.3%) were forward-filled, reducing missingness to <0.02%. Labs such as D-dimer had up to 10% missingness; remaining missing labs and weight/BMI were forward-filled or set to default values; alternative imputation showed no performance benefit. In real-time deployment, missing required inputs suppressed score generation and displayed a “Missing Data” placeholder. Development and validation splits: Retrospective data split by patient into training (60%; 1990 patients; 17,614 instances), validation (20%; 663 patients; 4903 instances), and held-out test (20%; 664 patients; 5914 instances). Stage 1 (blackbox models): Trained logistic regression (L1/L2 with L-BFGS), Random Forest (scikit-learn; tuned depth and min samples; feature subsampling sqrt of feature count), LightGBM (gradient boosting decision trees; tuned trees, regularization, subsampling, learning rate; class imbalance handling), and an average-probability ensemble. Model selection used AUROC and AUPRC on the validation set. Stage 2 (parsimonious model): Conducted conditional independence tests to quantify each variable’s added information beyond others (p-value threshold 0.2 with multiple testing correction) to select features. Built a logistic regression on the selected variables with quantile normalization; inspected individual conditional expectation (ICE) plots to linearize non-linear relationships (e.g., U-shaped patterns) by splitting variables at derived cut points. Performed ablation to remove non-contributing variables, ultimately excluding age, BMI, and maximum SpO2 from the final linear model despite initial selection. The final parsimonious model was a regularized (elastic net) logistic regression tuned by grid search to maximize average AUROC and AUPRC on validation data. Coefficients and intercept are reported (Table 2); positive coefficients associate with favorable outcomes, negative with decreased likelihood. EHR implementation: Deployed within Epic’s cloud platform to compute scores every 30 minutes for eligible inpatients with active COVID-19 flags. Scores were color-coded: green (low risk) at a threshold targeting 90% PPV (53% sensitivity in held-out set), orange at ~80% PPV, and red for highest risk. Predictions and explanations (feature contributions, recent score trends) were embedded in a patient list column and a COVID-19 summary report. For display clarity, the inverse of the raw probability was scaled 0–100 so that lower displayed scores represent lower adverse-event risk. Prospective evaluation: After locking parameters, prospective predictions and outcomes were collected; AUROC, AUPRC, PPV, and sensitivity at the predefined green threshold were computed with bootstrap CIs (100 iterations; 50% resampling). Additional analyses included timing of the first green score relative to admission and discharge among patients discharged alive in the held-out set.

Key Findings
  • Retrospective performance: High discrimination and precision. Reported average precision (AUPRC) ~88.6% (95% CI: 88.4–88.7) and AUROC ~95.1–95.2% on held-out data.
  • Prospective performance: AUROC 90.8% (95% CI: 90.8–90.8) and AUPRC 86.8% (95% CI: 86.8–86.9) across 109,913 predictions for 445 patients (474 admissions).
  • Threshold-based operating point (prospective): Using the pre-specified green threshold, 41.0% of predictions were green with PPV 93.3% and sensitivity 67.8% (retrospective held-out: 90% PPV, 53% sensitivity). Observed favorable outcomes in 93.3% (green), 72.4% (yellow), and 23.5% (red) predictions prospectively.
  • Clinical timing: Among held-out patients discharged alive, 77.8% (361/464) had at least one green score; the first green score occurred a median of 3.2 days (IQR 1.4–5.4) before discharge. 91.4% (330/361) of these patients never required ICU care.
  • Adoption: Model integrated into the EHR with clinician-facing displays; preliminary usage metrics indicated clinicians accessed patient lists and the COVID-19 Summary page containing model scores and explanations, suggesting incorporation into workflows.
  • Feature insights: Parsimonious model emphasized real-time indicators including oxygen support status/flow, SpO2 minimum, RR/HR aggregates, temperature, inflammatory markers (CRP, LDH), BUN, and platelets. Some commonly cited risk markers (age, lymphocyte count, D-dimer, sex) did not improve the final model when combined with other variables.
Discussion

Predicting favorable (low-risk) trajectories provides a practical decision support tool for discharge planning and resource management during COVID-19 surges, complementing rather than duplicating deterioration models. The two-stage approach yielded an interpretable, EHR-implementable model with strong retrospective and prospective performance, demonstrating real-world robustness as clinical practices evolved. Integration into standard clinical workflows and provision of transparent feature contributions likely enhanced clinician trust and adoption. The model’s focus on current physiologic status and oxygen support aligns with clinical reasoning for assessing near-term stability and discharge readiness. Prospective validation confirms generalizability over time within the health system, and early usage patterns suggest that the tool can aid clinicians in safely identifying patients suitable for lower levels of care or discharge.

Conclusion

The study presents a validated, parsimonious, and prospectively evaluated model that predicts favorable outcomes within 96 hours for hospitalized COVID-19 patients using readily available EHR data. The model achieves high AUROC and AUPRC, maintains performance prospectively, and was successfully implemented with clinician-facing explanations to support discharge decision-making. Future work includes a randomized controlled trial to quantify impact on length of stay and broader dissemination via EHR vendor infrastructure to evaluate generalizability across institutions and evolving care contexts.

Limitations
  • Data source and generalizability: Single health system; performance in other institutions and populations requires external validation despite parsimonious design.
  • Real-time data constraints: Some retrospective variables (e.g., codified comorbidities, prior medications) were not reliably available in real time and were excluded, potentially omitting informative context.
  • Measurement variability and missingness: Vital signs can be inaccurately documented; missing data required forward-filling or omission of some instances; real-time missing inputs prevented score generation.
  • Temporal drift: Rapidly evolving testing and treatment practices (e.g., increased D-dimer testing) may affect model inputs/outcomes over time.
  • Feature-outcome coupling: Oxygen support was both a predictor and part of the adverse outcome definition, potentially reinforcing associations; however, exclusion of this feature degraded overall performance.
  • Model form: Linear modeling may inadequately capture non-linear effects (e.g., U-shaped relationships), though ICE-guided transformations were applied.
  • Clinical nuance: The model cannot distinguish certain clinical contexts (e.g., BiPAP for chronic conditions versus acute respiratory failure) without clinician interpretation.
  • Impact evaluation: While prospectively validated for predictive performance, clinical outcome impact awaits results of the ongoing randomized controlled trial.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny