logo
ResearchBunny Logo
Early triage of critically ill COVID-19 patients using deep learning

Medicine and Health

Early triage of critically ill COVID-19 patients using deep learning

W. Liang, J. Yao, et al.

This study showcases a groundbreaking deep learning survival model that accurately predicts the risk of COVID-19 patients progressing to critical illness based on their clinical characteristics at admission. Developed by a diverse team of researchers, this model not only demonstrates impressive validation results across multiple cohorts but also features an online tool that aids in timely patient triage and resource allocation.

00:00
00:00
~3 min • Beginner • English
Introduction
COVID-19 can rapidly progress to critical illness in a subset of patients, with approximately 6.5% developing critical illness and a high mortality in this group. Early identification of patients at high risk upon admission is crucial for timely treatment and efficient allocation of healthcare resources. Traditional survival analysis models, such as the Cox proportional hazards model, assume linear relationships between covariates and outcomes, which may be insufficient for complex clinical events like progression to critical illness. Advances in deep learning and availability of large-scale clinical data motivate integrating deep neural networks with survival analysis to improve prognostic accuracy. This study aims to develop and validate a deep learning-based survival model to predict the risk of hospitalized COVID-19 patients developing critical illness using routinely available clinical data at admission, and to translate the model into a practical triage tool.
Literature Review
The paper situates its contribution against established methods in survival analysis, particularly the Cox proportional hazards model (CPHM), widely used for prognostic modeling but limited by linearity assumptions. It references successful applications of deep learning in medical imaging and diagnosis (e.g., CNNs for skin cancer) and prior work integrating deep learning with Cox models (e.g., DeepSurv). It also contrasts performance with clinical scoring systems such as CURB-65, used in community-acquired pneumonia, highlighting the need for COVID-19-specific risk prediction leveraging nonlinear relationships and potentially time-dependent covariates.
Methodology
Study design: A retrospective multicenter cohort was assembled under the National Clinical Research Center for Respiratory Disease with data from laboratory-confirmed hospitalized COVID-19 cases reported to China’s National Health Commission between 2019-11-21 and 2020-01-31. Only PCR or high-throughput sequencing confirmed cases were included. Critical illness was defined as a composite of ICU admission, invasive ventilation, or death. The primary training dataset included 1590 patients (131 developed critical illness) from 575 medical centers. Three independent cohorts were collected for external validation: Wuhan (Hankou Hospital), other cities in Hubei province (excluding Wuhan), and Guangdong (Foshan Hospital). Ethical approval was obtained, and informed consent was waived. Data processing: Experienced clinicians extracted and cross-checked demographic, clinical, laboratory, and radiologic information from electronic medical records into a structured database. Radiologic assessments (X-ray or CT) were abstracted from charts or reviewed when images were available. Missing data were handled via multivariate imputation by chained equations (MICE). Features with at least 60% completeness were considered. Feature selection: From 74 baseline clinical features, a Cox model with LASSO penalty was used for variable selection. K-fold cross-validation determined the LASSO penalty λ minimizing partial likelihood. Ten predictors with significant associations (p < 0.05) were selected: chest X-ray abnormality, age, dyspnea, COPD, number of comorbidities, cancer history, neutrophil-to-lymphocyte ratio, lactate dehydrogenase (LDH), direct bilirubin, and creatine kinase. Model development: A three-layer feedforward neural network (deep survival model) with two hidden fully connected layers (tanh activations) and dropout was trained to predict a risk score consistent with the Cox partial likelihood framework. The loss optimized was the negative partial log-likelihood. Training used Adam optimizer, and hyperparameters (layer sizes, learning rate, dropout rate, epochs) were tuned via Bayesian hyperparameter optimization. The outputs of the deep survival network were combined with the LASSO-selected features in an integrated Cox-based framework termed the Deep Learning Survival Cox model. Evaluation: The training cohort was split 80/20 into training and internal validation sets with balanced distributions. Discrimination was assessed by concordance index (C-index) and area under the ROC curve (AUC). External validation used the three independent cohorts; performance was also evaluated on subsets excluding patients missing three or more variables (Ex3). Risk thresholds were defined to stratify patients into low, medium, and high risk at 95% sensitivity/specificity operating points. Longitudinal risk monitoring used repeated follow-up exam data (CT and labs) for a subset to compute dynamic 30-day critical illness risk. An online triage tool was implemented to return personalized nomograms and 5-, 10-, and 30-day risk estimates.
Key Findings
- Predictors: Ten admission features were selected (X-ray abnormality, age, dyspnea, COPD, number of comorbidities, cancer history, neutrophil-to-lymphocyte ratio, LDH, direct bilirubin, creatine kinase) as significant predictors of progression to critical illness. - Internal validation: Deep Learning Survival Cox model achieved C-index 0.894 (95% CI 0.857–0.930) and AUC 0.911 (95% CI 0.875–0.945), outperforming the classic Cox model (C-index 0.876; AUC 0.889) and CURB-65 (C-index 0.75, 95% CI 0.70–0.80). - Risk stratification (training cohort): Patients grouped into low (n=875), medium (n=560), and high risk (n=155) had actual critical illness event probabilities of 0.9%, 7.3%, and 52.9%, respectively, with significant separation on Kaplan–Meier curves. - External validation (all cases): Wuhan cohort (n=940) AUC 0.881 (95% CI 0.845–0.905), C-index 0.878 (95% CI 0.852–0.903); Hubei cohort (n=389) AUC 0.819 (95% CI 0.623–0.978), C-index 0.769; Guangdong cohort (n=73) AUC 0.967 (95% CI 0.965–1.000), C-index 0.963 (95% CI 0.960–1.000). Performance generally improved when excluding cases missing three or more variables (Ex3): Wuhan AUC 0.893, C-index 0.890; Hubei AUC 0.888, C-index 0.882; Guangdong unchanged (no missing data). - Longitudinal monitoring: In 457 patients with follow-up exams, prediction performance at follow-up time points exceeded admission performance (AUC 0.960, C-index 0.935 vs admission AUC 0.881, C-index 0.878), indicating risk estimates become more accurate as time approaches events. - False negatives analysis: Among 106 critical cases across external sets, only two were classified as low risk; both had substantial missing data and values similar to non-critical cases (no X-ray abnormality, no dyspnea, no comorbidities), suggesting reasonable classification given observed data. - Practical tool: An online triage calculator provides personalized nomograms and 5-, 10-, and 30-day critical illness probabilities; example nomogram indicates high-risk classification with total points 209 and probabilities 0.58, 0.62, and 0.69 at 5, 10, and 30 days, respectively.
Discussion
The study demonstrates that integrating deep learning with Cox survival analysis captures nonlinear relationships among clinical covariates, yielding superior discrimination over the classic Cox model and over a generic clinical score (CURB-65). The identified predictors align with known COVID-19 risk factors, including older age, respiratory symptoms, radiographic abnormalities, lymphopenia (reflected via NLR), and comorbidities such as COPD and cancer. The model generalizes across regions with different healthcare resource constraints and maintains robustness to some missing data, and its dynamic application improves predictive performance during hospitalization. Clinically, the stratification into low, medium, and high risk can guide early triage, prioritize monitoring and interventions for high-risk patients, and improve resource allocation. The analysis of rare false negatives suggests that observed-data-driven predictions were reasonable, supporting the model’s utility in real-world settings.
Conclusion
A deep learning–enhanced survival model using routinely available admission data can accurately predict progression to critical illness in hospitalized COVID-19 patients, outperforming traditional Cox modeling and standard clinical scoring systems. The approach supports early triage and resource allocation and can monitor risk longitudinally during hospitalization. Future work should include prospective validation, broader international external validation, incorporation of time-dependent covariates and imaging data (CT/X-ray) into the model, refinement of handling missingness, and continued evaluation of clinical impact through deployment of the online triage tool.
Limitations
- Missing data: Over half of patients lacked complete feature sets; although the model tolerated missingness (and the tool allows fewer than three missing variables), missing data can impact performance, especially in smaller or resource-limited hospitals. - Retrospective design and potential non-response bias: The cohort was compiled from hospitals reporting to the NHC; despite broad coverage (31/34 provinces), non-response bias could not be fully excluded. - Generalizability: External validation was within China; broader international validation is needed. Some external cohorts had small event counts (e.g., Guangdong with 3 critical cases), leading to wide confidence intervals. - Scope of inputs: The deployed model uses clinical and laboratory variables at admission; imaging and richer time-dependent data were not integrated into the final admission model. - Data access: Dataset is not publicly available due to policy constraints, limiting independent replication beyond collaboration. - Imputation/model assumptions: MICE was used for missing data; model performance may be sensitive to imputation choices and to measurement variability across centers.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny