Medicine and Health

Prediction of mortality risk and duration of hospitalization of COVID-19 patients with chronic comorbidities based on machine learning algorithms

P. Amiri, M. Montazeri, et al.

This insightful retrospective study by Parastoo Amiri and colleagues explores how machine learning can predict mortality risk and length of hospital stay in COVID-19 patients with chronic comorbidities. Discover how algorithms can enhance clinical decision-making and improve resource allocation in healthcare settings.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses the need to rapidly predict mortality risk and hospital length of stay (LoS) in COVID-19 patients, particularly those with chronic comorbidities who experience more severe disease, higher mortality, and potentially longer hospitalization. Accurate predictions can improve resource allocation (beds, staff) and support clinical decision-making. Prior work has applied ML to predict COVID-19 mortality and LoS but often did not focus specifically on patients with chronic comorbidities or targeted specific inpatient groups (e.g., ICU only) or used traditional biostatistical approaches. This study aims to develop and compare ML models to predict mortality risk and LoS among COVID-19 patients with any chronic comorbidity and to identify the most important clinical variables. Research questions: (1) Which ML algorithms best predict mortality risk? (2) Which ML algorithms best predict LoS? (3) Which clinical variables are most important for predicting mortality risk? (4) Which clinical variables are most important for predicting LoS?

Literature Review

The authors summarize multiple studies applying ML to COVID-19 outcomes. Prior mortality prediction studies used clinical and laboratory data with models such as Lasso, SVM, and other classifiers, reporting strong sensitivities and specificities (e.g., Korean nationwide cohort; studies in Denmark and the UK). LoS prediction has been explored using EHR-based ML models with moderate to high accuracy, including Random Forest and MLP in various settings (USA, Saudi Arabia, Iran). Systematic reviews identified age, sex, and comorbidities (HTN, DM) as key predictors of mortality and LoS. Several works focused on ICU cohorts or used standard biostatistical methods rather than ML. Studies also implicate comorbidities such as diabetes, asthma, cancer, hypertension, and cardiovascular diseases as significant predictors of adverse COVID-19 outcomes. This study extends prior work by focusing specifically on COVID-19 patients with any chronic comorbidity and comparing multiple ML algorithms for both mortality and LoS prediction using routinely available clinical features at admission.

Methodology

Design and setting: Retrospective single-center study at Afzalipour Hospital (main COVID-19 center in Kerman, Iran), covering March 2020 to January 2021. Population: Adult patients (≥18 years) with RT-PCR–confirmed COVID-19 and at least one chronic comorbidity. Exclusions: Patients <18 years, pregnant women. Data sources: Hospital Information System and Electronic Health Records (EHRs), supplemented by paper records to complete missing clinical entries. Variables: Demographics, chronic comorbidities, admission symptoms, discharge status (alive/dead), and length of stay (LoS). Based on literature and expert (two infectiologists) input, 26 features were selected as candidate predictors. Outcomes: (1) Mortality status at discharge (binary: 0 dead, 1 alive); post-discharge mortality not considered. (2) Hospital LoS (continuous). Data preparation: Missing data rows in Excel were identified and deleted. Features were normalized using StandardScaler (Z = (x − μ)/σ). Filter-based feature ranking assessed feature importance. To address class imbalance (null values under 5%), RandomOverSampler was used to oversample minority class by random sampling with replacement. Data were split into 70% training and 30% test sets, ensuring no leakage between sets. Modeling—mortality (classification): Base ML models included Naïve Bayes, K-Nearest Neighbors (K=1,3,5,10,15,30,50), Support Vector Machine (SVM), Multilayer Perceptron (MLP; e.g., hidden 512×256 with logistic activation), and Random Forest (10, 50, 100 trees; bagging with 100 iterations as base learner). Ensemble methods included Random Forest (n=100), AdaBoost, Gradient Boosting (n=50), HistGradientBoosting (max_iter=50), and Random Forest with Halving Grid Search. Modeling—LoS (regression): Algorithms included MLP (architecture 32×1024×32, ReLU activation), ElasticNet, Support Vector Regression (SVR), Lasso, and Ridge. Training details: MLP models were trained from scratch; for classification MLP, training used epochs=100, batch size=10, dropout=0.5, with Keras EarlyStopping monitoring accuracy and loss. Model explainability: Local Interpretable Model-agnostic Explanations (LIME) used to generate per-patient explanations (e.g., feature contributions for an example patient). Performance evaluation: For mortality models, metrics included accuracy, precision, recall (sensitivity), specificity, F1 score, ROC curve and AUC; confusion matrices reported. For LoS models, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) were computed. Software: Python with scikit-learn (version 3.8 environment). Ethics: Approved by Kerman University of Medical Sciences ethics committee (IR.KMU.REC.1400.055); patient identifiers concealed; consent waived due to retrospective design.

Key Findings

- Cohort: 1291 COVID-19 inpatients with chronic comorbidities; 900 alive (69.6%), 391 dead (30.3%). Mean age: dead 66.2 years, alive 53.9 years. Males: 54.6%. - Common symptoms at admission: shortness of breath 53.6%, fever 30.1%, cough 25.3%. - Common chronic comorbidities: diabetes mellitus (DM) 31.3%, hypertension (HTN) 27.3%, ischemic heart disease (IHD) 14.2%. - Features: 26 predictors selected from clinical records. - Mortality prediction (base models, Table 3): SVM performed best with accuracy 80.00%, precision 77.68%, recall 79.66%, F1-score 78.66%, ROC AUC 0.85. MLP (hidden 512×256, logistic) achieved accuracy 79.61% and AUC 0.84. Naïve Bayes underperformed (accuracy 63.92%, F1 40.25%). - Mortality prediction (ensembles, Table 4): Gradient Boosting (n=50) achieved highest accuracy 84.15%, precision 83.86%, recall 84.15%, F1-score 83.64%, ROC AUC 0.79; other ensembles (Random Forest, AdaBoost, HistGradientBoosting) had accuracies ~82–83%. - Overall average accuracy across classification models reported ~74.11%. - LoS prediction (Table 5): MLP (32×1024×32, ReLU) was best with MSE 38.96, RMSE 6.24, MAE 4.34; SVR RMSE 6.32, MAE 4.29; linear regularized models had slightly higher errors. - Important predictors: Statements in the paper highlight (a) hyperlipidemia (HLP), diabetes, asthma, and cancer as important comorbidities for mortality; and shortness of breath as important for LoS; (b) elsewhere, hypertension was identified as most effective for mortality and asthma for LoS; shortness of breath and sore throat were noted as top symptoms for predicting mortality and LoS, respectively. - Age was a strong predictor: older age associated with higher mortality and longer LoS. - Model explainability via LIME illustrated individual-level feature contributions (e.g., diarrhea positive association; chest tightness negative in an example for Gradient Boosting).

Discussion

The study demonstrates that ML models trained on routinely available demographic data, admission symptoms, and chronic comorbidity profiles can predict in-hospital mortality and LoS among COVID-19 patients with chronic comorbidities. Among base classifiers, SVM achieved the strongest discrimination (AUC 0.85), while Gradient Boosting ensembles provided the highest overall accuracy (84.15%) for mortality classification. For LoS, a deep MLP with ReLU activation minimized prediction error (MSE 38.96), outperforming linear regularized and SVR models. These findings address the research questions by identifying top-performing algorithms for both outcomes and highlighting key clinical variables, including age and specific comorbidities and symptoms. The results align with prior studies that found ML methods effective for COVID-19 risk stratification and that age and comorbidities such as hypertension, diabetes, asthma, and cancer are major determinants of adverse outcomes. Implementing such predictive tools can assist clinicians and administrators in triaging patients, anticipating resource needs (beds, staffing), focusing monitoring and interventions on high-risk cases, and potentially improving outcomes and reducing LoS. The use of explainability (LIME) supports interpretability and trust by clarifying feature contributions at the patient level.

Conclusion

ML algorithms can predict mortality risk and LoS in COVID-19 patients with chronic comorbidities using standard clinical and demographic data. In this cohort, Gradient Boosting (mortality classification) and MLP with ReLU (LoS regression) were the best-performing models. Advanced age is a key factor associated with increased mortality and longer hospitalization. Such models can support timely clinical interventions and resource allocation. Future research should include multi-center data and integrate laboratory and radiological biomarkers to enhance model generalizability and performance, and longitudinal monitoring of patients with chronic comorbidities who survive COVID-19.

Limitations

- Single-center retrospective design, potentially limiting generalizability despite the hospital serving as the largest COVID-19 center in Kerman province. - Important prognostic factors, notably laboratory and radiological biomarkers, were not included; only usual clinical features at admission were used.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Machine learning-based prediction of COVID-19 diagnosis based on symptoms

Y. Zoabi, S. Deri-rozov, et al.

Medicine and Health

Machine learning-based prediction of in-hospital death for patients with takotsubo syndrome: The InterTAK-ML model

O. D. Filippo, V. L. Cammann, et al.

Medicine and Health

Passive detection of COVID-19 with wearable sensors and explainable machine learning algorithms

M. Gadaleta, J. M. Radin, et al.

Business

Measuring the impact of enterprise risk management on performance, value, and risk indicators of Borsa Istanbul XBANK companies with data mining prediction models

M. Ç. Akbaş

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny