Introduction
The COVID-19 pandemic significantly increased the number of patients presenting to emergency departments (EDs) with respiratory illnesses. Accurate and rapid triage is crucial for effective clinical decision-making and resource allocation, particularly during a pandemic when resources are strained. Traditional ED triage is challenging, made more difficult by the complexities of COVID-19. Data-driven risk evaluation using artificial intelligence (AI) offers a potential solution to streamline the process.
Chest X-ray imaging is a readily available and cost-effective first-line triage tool for COVID-19 patients, providing valuable information about pulmonary disease. While other modalities like CT scans offer higher resolution, X-rays are less costly, expose patients to lower radiation doses, and are easier to obtain without risking contamination of equipment. Importantly, X-ray findings in COVID-19 patients have shown similarity to those in CT scans.
Existing research using AI and COVID-19 imaging data has predominantly focused on diagnosis rather than prognosis. However, prognostic models predicting mortality, morbidity, and other outcomes are highly valuable for several clinical applications, such as consistent patient triage, alerting bed management teams, and resource allocation. Previous machine learning approaches for COVID-19 prognosis have mainly utilized routinely collected clinical variables (vital signs, lab tests). Some studies have explored scoring systems for chest X-rays to assess disease severity through deep learning or manual clinical evaluation. However, the use of deep learning for prognosis using chest X-rays in COVID-19 patients, and the combined use of imaging and clinical variables, remain under-explored.
This retrospective study developed an AI system for automatic deterioration risk evaluation, integrating chest X-ray imaging with routinely collected clinical variables. The aim was to support critical clinical decision-making in the ED to improve efficient patient triage. The system employs a deep convolutional neural network (COVID-GMIC) for X-ray analysis and a gradient boosting model (COVID-GBM) for clinical variables, with their predictions combined to provide an overall risk assessment.
Literature Review
The existing literature extensively covers the use of AI for COVID-19 diagnosis from imaging data, but relatively fewer studies focused on prognosis using chest X-rays. Studies using machine learning for COVID-19 prognosis primarily relied on clinical variables such as vital signs and lab tests which are established predictors of deterioration. While some research explored automated scoring systems for chest X-ray images to assess lung involvement using deep learning, a comprehensive AI system integrating both imaging and clinical data for prognosis remained largely unaddressed. The current study uniquely bridges this gap by leveraging both data modalities to improve predictive accuracy and clinical utility.
Methodology
The AI system was developed and evaluated using a dataset of 19,957 chest X-ray exams from 4,722 COVID-19 patients at NYU Langone Health (March 3, 2020 – May 13, 2020). The dataset included chest X-ray images and clinical variables (vital signs, lab tests, demographics) collected near the time of image acquisition. Rigorous inclusion/exclusion criteria, defined with clinical experts, were applied to ensure data quality and relevance to the prediction task. The final dataset consisted of 7502 exams (4204 unique patients), split into training (5224 exams, 2943 patients) and test sets (770 exams, 718 patients) without patient overlap.
The system comprises two models:
1. **COVID-GMIC (Chest X-ray):** A deep convolutional neural network based on the Globally-Aware Multiple Instance Classifier (GMIC) architecture. COVID-GMIC is designed for both accuracy and interpretability, generating saliency maps highlighting image regions influencing predictions. It employs a global module for overall image understanding and a local module for detailed analysis of specific regions of interest (ROIs) identified by the saliency maps. These are then fused to produce a final prediction.
2. **COVID-GBM (Clinical Variables):** A gradient boosting model trained on routinely collected clinical variables (vital signs, lab results, demographics). Feature engineering was performed on laboratory results to include minimum and maximum values within 12 hours of the vital sign measurement.
Both models predict deterioration risk within 24, 48, 72, and 96 hours. Their predictions were combined using a weighted ensemble (optimal weights determined through validation), creating a multi-modal system that leverages complementary information from both data sources. A separate model, COVID-GMIC-DRC, a modified version of COVID-GMIC, was used to generate deterioration risk curves (DRCs), predicting the probability of deterioration over time (3, 12, 24, 48, 72, 96, 144, and 192 hours).
Model training involved hyperparameter optimization using random search and Monte Carlo cross-validation. The performance was evaluated using AUC and PR AUC, along with positive and negative predictive values (PPV and NPV). A reader study compared COVID-GMIC's performance with two radiologists (3 and 17 years of experience) on a subset of 200 chest X-rays from the test set. A prospective, silent validation of a preliminary DNN version (using only chest X-rays) was performed at NYU Langone Health, allowing it to operate in real-time within the hospital's system.
Key Findings
The ensemble model (COVID-GMIC + COVID-GBM) achieved the best performance for predicting deterioration within 96 hours, with an AUC of 0.786 (95% CI: 0.745–0.830) and a PR AUC of 0.517 (95% CI: 0.429–0.600) on the test set. The COVID-GMIC model, using chest X-rays alone, achieved comparable AUC to two experienced radiologists in a reader study, and even outperformed them for time windows exceeding 24 hours. The COVID-GMIC-DRC model effectively discriminated between patients based on the time of the first adverse event (concordance index of 0.713 at 96 hours). The model also demonstrated good calibration.
The prospective silent validation of a preliminary DNN version at NYU Langone Health showed real-time performance with comparable results to the retrospective test set. The AI system efficiently processed images in approximately 2 seconds, without requiring GPUs. The most predictive features in the COVID-GBM model included temperature and age. The saliency maps generated by COVID-GMIC highlighted clinically relevant regions in the chest X-ray images, such as airspace opacities and consolidation. The study revealed that combining clinical variables and imaging data significantly improved prediction accuracy, suggesting complementary information in both modalities.
Discussion
The study successfully demonstrates the potential of a multi-modal AI system to predict COVID-19 patient deterioration in the ED setting. The system's ability to integrate chest X-ray images and clinical variables provides a more comprehensive risk assessment than using either modality alone. The results of the reader study highlight that the AI system can perform at a level comparable to experienced radiologists in assessing the risk of deterioration, suggesting a valuable clinical tool. The findings address the need for efficient triage, particularly important during pandemics or other situations with limited resources. The system's real-time performance and relatively low computational demands demonstrated through the silent deployment make it practical for integration into existing clinical workflows.
Conclusion
This research presents a novel AI system for predicting COVID-19 patient deterioration, combining chest X-ray analysis and clinical variables. The system demonstrated strong performance in both retrospective and prospective validations, showing comparable or superior results to radiologists in some scenarios. Its real-time capabilities and interpretability highlight its potential for improving patient care and resource allocation in emergency settings. Future research could focus on incorporating longitudinal image data, expanding the clinical variables included in the model, and conducting more extensive external validation across diverse hospital settings and patient populations.
Limitations
The study's limitations include the use of data from a single institution (NYU Langone Health), potentially affecting generalizability. The silent deployment utilized only the chest X-ray model, excluding clinical variables and intervention data. The COVID-GMIC-DRC model also lacked clinical variables due to calibration challenges with gradient boosting models. Future studies should address these limitations through external validation and model enhancements.
Related Publications
Explore these studies to deepen your understanding of the subject.