Medicine and Health

Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets

S. A. Harmon, T. H. Sanford, et al.

This groundbreaking study reveals that deep learning algorithms, trained with data from 1280 international patients, can detect COVID-19 pneumonia in chest CT scans with impressive accuracy—up to 90.8%. This research, conducted by a dedicated team of authors, underscores the exciting potential of AI in rapid and precise medical evaluations.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses whether deep learning-based artificial intelligence can detect COVID-19 pneumonia on chest CT and distinguish it from other conditions across diverse populations and imaging protocols. In the COVID-19 pandemic, RT-PCR testing has limitations including delays and variable sensitivity reported as low as 60–70%, while CT can reveal characteristic lung findings such as peripheral ground-glass opacities, consolidations, and vascular enlargement, sometimes even when RT-PCR is negative or before symptoms develop. However, CT use for screening/diagnosis is controversial due to overlap with other pneumonias and concerns from professional societies. Prior AI approaches have shown feasibility but often lack generalizability due to single-center training. This work aims to train and evaluate robust AI models on multinational, multi-institutional CT datasets to achieve accurate and generalizable COVID-19 classification, potentially supporting triage and clinical decision-making in diverse settings.

Literature Review

The paper notes that CT has been reported to have high sensitivity for COVID-19 pneumonia and can detect disease when RT-PCR is negative or before symptom onset, aligning with international observations and consensus recommendations for CT use in specific clinical scenarios (e.g., respiratory distress, resource-constrained triage). Prior AI studies from single centers demonstrated feasibility for COVID-19 detection and even differentiation from community-acquired pneumonia, sometimes reporting up to ~95% accuracy, but these efforts are often limited by overfitting and lack of generalizability due to homogeneous data sources and institutional biases. Professional societies in the US/UK have recommended against routine CT for screening/diagnosis due to overlap with other pneumonias (e.g., influenza). The present study is positioned to address these gaps by training on diverse multinational data and explicitly testing generalization to unseen institutions/populations.

Methodology

Study design and cohorts: The authors assembled a multinational dataset comprising COVID-19-positive patients and multiple control populations. COVID-19 RT-PCR-confirmed patients underwent chest CT from four international centers: Hubei, China; Milan, Italy (two centers); and Tokyo, Japan. Timing of CT acquisition varied by region and clinical practice (e.g., same-day CT with positive RT-PCR in Hubei versus more variable timing and often later disease stage in Italy). Controls included diverse non-contrast chest CTs from SUNY Upstate Medical University (oncology, emergency/trauma, and other indications), laboratory-confirmed non-COVID pneumonias from SUNY and NIH, a cohort with unremarkable lungs from NIH, and publicly available thoracic CTs with lung nodules from LIDC. Overall, the reported training/validation/testing partitions totaled 1059/328/1337 scans respectively, with a test set COVID-19 prevalence of 24.4% (326/1337). Preprocessing and lung segmentation: To focus classification on pulmonary parenchyma and minimize confounding from extra-thoracic structures, a lung segmentation model (ALM-net architecture) was trained using 1018 LIDC images and 95 in-house CT volumes with substantial ground-glass/consolidation burden. Images were resampled to 0.8×0.8×5.0 mm, clipped to HU range (−1000, 500), and segmentation performance achieved mean Dice similarity coefficients ranging ~0.85–0.99 (std dev < 0.1). Classification models: Two DenseNet-121-based 3D convolutional classifiers were developed: - Full 3D model: the entire cropped lung volume (without masking) resized to 192×192×64 voxels was used as input. - Hybrid 3D model: the cropped lung volume was resampled to 1×1×5 mm resolution and multiple sub-volumes (e.g., 192×192×32) were sampled; during training, 6 regions per patient were used and predictions averaged to yield a patient-level probability. All images were clipped to HU (−1000, 500) and cropped to a bounding box around the lungs (with a 5-voxel buffer). Training used aggressive data augmentation to reduce overfitting. Implementation used NVIDIA Clara Train (TensorFlow), with Grad-CAM employed for post hoc visualization of salient regions. Evaluation strategy: Performance metrics included accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and AUC. McNemar’s test was used for comparative performance testing. Two evaluation designs were reported: (1) original training schema with the predefined test set, and (2) an independent testing population design excluding the Tokyo, Japan cohort from training/validation to assess domain generalization. For the independent testing, identical training configurations and hyperparameters were used; calibration and AUC were compared, and model threshold adjustments were explored for the hybrid model to trade off sensitivity and specificity.

Key Findings

- In the primary independent test set of 1337 patients (COVID-19 prevalence 24.4%, 326/1337), the AI achieved up to 90.8% accuracy with 84% sensitivity and 93% specificity for COVID-19 classification. Overall AUC for the 3D model was 0.949. - False positive rate in 140 patients with laboratory-confirmed non-COVID pneumonias was 10%, indicating high specificity in differentiating COVID-19 from other pneumonias (including influenza/H1N1 cases). - Validation accuracies during training converged at 92.4% (hybrid 3D) and 91.7% (full 3D) for COVID-19 vs other conditions. - Under an independent testing population design (with the Tokyo cohort excluded from training/validation): the 3D model correctly identified 87/109 COVID-19 patients (79.8% sensitivity), while the hybrid 3D model identified 74/109. Lowering the hybrid 3D decision threshold from 0.5 to 0.376 increased sensitivity to match the 3D model with a modest specificity decrease (from 95.1% to 92.8%). - Grad-CAM visualizations showed salient activations in peripheral lung regions consistent with COVID-19-associated disease, including in non-consolidating areas, suggesting the model learned features beyond overt consolidation.

Discussion

The findings demonstrate that deep learning models trained on heterogeneous, multinational CT datasets can detect COVID-19-associated lung disease with high accuracy and specificity and can generalize to institutions not seen during training, albeit with some sensitivity reduction. By segmenting lungs to focus on parenchyma, using aggressive augmentation, and favoring a simpler full 3D model, the approach mitigated overfitting and institutional biases common in prior single-center AI studies. High specificity against non-COVID pneumonias, including influenza, addresses a key concern limiting CT’s diagnostic utility and suggests AI can augment radiologist assessment when clinical overlap exists. The observed performance dependence on disease stage and CT utilization patterns underscores that earlier-stage imaging may enhance detection, whereas advanced pneumonia may reduce sensitivity. Threshold tuning can tailor sensitivity/specificity tradeoffs for different clinical contexts. Overall, the approach supports AI-assisted triage, characterization, and potentially quantification of COVID-19 lung disease across diverse populations and imaging protocols.

Conclusion

An AI system trained on heterogeneous, multi-institutional chest CT data achieved strong performance for classifying COVID-19 pneumonia and demonstrated generalizability to unseen populations. The work contributes a robust pipeline combining lung segmentation and 3D classification with explainable visualizations. While CT may not be universally recommended for screening/diagnosis, AI-enhanced CT interpretation could serve as an objective adjunct for triage, research endpoints, and specific clinical scenarios (e.g., resource-limited settings, outbreaks). Future research should refine domain adaptation for new institutions, improve sensitivity in advanced disease, extend from classification to precise lesion localization/segmentation and disease quantification, and prospectively validate the models across varied prevalences and clinical pathways.

Limitations

- Retrospective, multi-cohort design with positive and negative cases from different populations limits strict assessment of generalizability and may introduce selection biases. - Performance is prevalence dependent; the constructed test set had a 24.4% COVID-19 prevalence, which may not match all real-world settings. - CT acquisition protocols and timing relative to disease onset varied across centers; advanced disease cases showed lower sensitivity compared to earlier-stage imaging. - The model provides classification and saliency maps but does not output precise lesion localization or segmentation for COVID-19 findings. - While the independent testing design (excluding one center) showed generalization, a fully external prospective validation was not conducted. - Use of CT for routine screening/diagnosis remains controversial due to overlap with other pneumonias; despite high specificity observed here, clinical integration requires careful consideration and further validation.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Education

Research on the development of principles for designing elementary English speaking lessons using artificial intelligence chatbots

J. Han and D. Lee

Chemistry

A generative artificial intelligence framework based on a molecular diffusion model for the design of metal-organic frameworks for carbon capture

H. Park, X. Yan, et al.

Medicine and Health

An artificial intelligence system for predicting the deterioration of COVID-19 patients in the emergency department

F. E. Shamout, Y. Shen, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny