logo
ResearchBunny Logo
Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Discover a groundbreaking deep learning ensemble framework for detecting COVID-19 and pneumonia through CT scan and X-ray images. This innovative model, developed by Xingsi Xue and colleagues, boasts an impressive 99% accuracy and outperforms existing methods. Don't miss the chance to explore how advanced transfer learning techniques can revolutionize medical imaging diagnostics.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the need for fast, accurate detection of COVID-19 and pneumonia from medical imaging, motivated by the limitations of RT-PCR (false negatives/positives and delays) and the expertise required to interpret chest X-rays and CT scans. Convolutional neural networks with transfer learning can leverage large-scale pre-trained models (e.g., VGG, ResNet, DenseNet) to extract discriminative features from medical images with limited labeled data. The research aims to automate optimal model architecture and training parameters, reduce training time via transfer learning, and improve multi-class classification performance for COVID-19, pneumonia, and healthy cases using an ensemble of deep learning models. The importance lies in improving early diagnosis, reducing reliance on manual interpretation, and achieving robust performance across CT and X-ray modalities.
Literature Review
The paper reviews numerous DL-based CAD approaches for COVID-19 detection using chest X-rays and CT scans. Prior works employed transfer learning with models such as VGG16/19, ResNet (32/50/152), DenseNet (121/169/201), Inception variants, Xception, MobileNet, and ensemble strategies, reporting high accuracies in the 90–99% range on various datasets. DenseNet201 achieved ~93.7% accuracy in one study, and VGG-19 showed precision 99.4%, sensitivity 97.4%, specificity 99.4% on a 1000-image sample. Other studies combined attention modules (e.g., CBAM) with ResNet, used data augmentation to mitigate overfitting, and explored stacking ensembles and SVM-based transfer learning. The review highlights that transfer learning consistently improves COVID-19 detection, with VGG-19 often outperforming other models on CT datasets and ensembles improving robustness. Limitations in prior work include noise sensitivity, insufficient feature integration, and potential overfitting on small datasets.
Methodology
Proposed framework: a deep learning ensemble for detecting COVID-19 and pneumonia using CT and chest X-ray images with transfer learning and model stacking. - Preprocessing and enhancement: Spatial domain filtering to denoise images using linear (mean, Wiener) and non-linear (median) filters. Histogram equalization enhances contrast by redistributing gray-level intensities via cumulative distribution-based mapping. - Attention-based ResNet (for CT): An attention module learns feature weights to emphasize relevant features. The attention output Atn_i(Z) = Q_i(Z)·F_i(Z) + F_i(Z), and relationships among features are modeled via mutual information. An inception layer captures multi-scale features. Classification uses a softmax layer with cross-entropy loss. A learning rate of 0.0001 and 10 epochs are mentioned for this component. - Enhanced VGG-16 (for X-ray): Modified VGG-16 with convolutional, pooling (max and average), fully connected, dropout, and softmax layers. Feature maps from average and max pooling are concatenated and passed through a 7×7 convolution and sigmoid nonlinearity before classification. The architecture targets multi-class classification (COVID-19, pneumonia, healthy). VGG-16 is also used for feature extraction/segmentation in Algorithm 1. - Ensemble and stacking: Multiple pre-trained models (e.g., ResNet152, ResNet50, DenseNet121, VGG16) are fine-tuned. Outputs are stacked and fed to a meta-learner (a single neuron) to predict final class labels, integrating complementary strengths of base learners. - Data augmentation: Random resized cropping (scale 0.5–1.0; resized size 224), random rotation (−5° to +5°), random horizontal flipping (p=0.5), and color jittering to increase data diversity and reduce overfitting. - Training settings: Implemented in Python (PyTorch), epochs from 1 to 100, learning rate 0.003 (global setting), optimizer ADAM, cross-entropy loss, batch size 16. - Evaluation metrics: Accuracy, precision, recall (sensitivity), F1-score, ROC analysis, and computational time. Experiments include iteration-count and sample-size analyses to assess convergence, generalization, and efficiency.
Key Findings
- Enhanced VGG-16 achieved 99% accuracy for three-class chest X-ray classification (COVID-19, pneumonia, healthy). - Ensemble model performance across datasets and experiments: - Accuracy: ~95% to 96.2% (ensemble) versus 61–72% for basic CNN/DNN, 84–85.5% for improved CNN, and ~92–92.5% for single-model transfer learning. - Precision: ~95% to 96.2% (ensemble) versus 60–72.6% (CNN/DNN), 84–85.5% (improved CNN), 92–92.5% (single-model TL). - Recall: ~94.5% to 95.9% (ensemble) versus 59.6–71.5% (CNN/DNN), ~83.5–84% (improved CNN), 91.5–92% (single-model TL). - F1-score: ~95.2% to 96.7% (ensemble) versus 60.2–72% (CNN/DNN), ~84.2–84.5% (improved CNN), ~92.2–92.5% (single-model TL). - Average F-score reported in the abstract: 95–97%. - ROC: The ensemble exhibits ROC curves closer to the top-left corner versus competing methods, indicating superior sensitivity-specificity tradeoff. - Computational efficiency: Ensemble inference time ~0.4 s (noted as minimal compared to up to 0.9 s for baseline methods) with overall time complexity O(n). The conclusion notes total time <0.5 s. - Data augmentation mitigated overfitting; spatial filtering improved precision and recall by reducing noise and correcting illumination. - Datasets used included multiple CT and X-ray collections, e.g., a CT dataset with 746 scans (349 positive, 397 negative; train/val/test 425/118/203), COVID-19 radiography collection with 1200 COVID+, 1341 healthy, 1345 pneumonia, a small X-ray set (579 images; 342 positive, 237 negative; train/val/test 309/70/200), a large CT set (12,058 scans; 2282 positive, 9776 negative; train/val/test 11,400/258/400), and SARS-CoV-2 CT (2482 images; 1252 positive, 1230 negative; train/val/test 1800/400/400).
Discussion
The ensemble framework effectively addresses the need for rapid and accurate COVID-19 and pneumonia detection from imaging by combining transfer-learned CNNs, attention mechanisms, and robust preprocessing. Spatial filtering and histogram equalization enhance image quality, improving feature extraction and reducing noise-induced errors. Attention-based ResNet emphasizes salient CT features, while the enhanced VGG-16 effectively differentiates among COVID-19, pneumonia, and healthy classes in X-rays. Stacking integrates complementary predictions, yielding higher accuracy, precision, recall, and F1-score than single-model baselines and prior CNN/DNN approaches. Data augmentation improves generalization, mitigating overfitting on limited medical datasets. The strong ROC performance demonstrates reliable classification across thresholds, and the low computational time supports practical deployment. Overall, the findings validate that a transfer learning-based ensemble with preprocessing can outperform traditional and single-model methods in clinical imaging scenarios for COVID-19/pneumonia detection.
Conclusion
The study introduces an ensemble deep learning framework combining attention-based ResNet and an enhanced VGG-16 with transfer learning and stacking to detect COVID-19 and pneumonia from CT and chest X-ray images. The approach automates optimal architecture/training parameters, reduces training time, and improves multi-class classification performance. Across multiple public datasets, the method achieves high accuracy (~95–96%), precision (~95–96%), recall (~94–96%), and F1-score (~95–97%), with efficient inference (<0.5 s) and favorable ROC characteristics. The model is suitable for pre-training, recognition, and multi-class categorization of respiratory illnesses. Future work includes integrating advanced soft computing methods, improving feature integration, and validating on larger, more diverse datasets to further enhance robustness and generalizability.
Limitations
Performance is affected by image noise levels and the quality of features; integrating all relevant features is crucial. The work acknowledges that irregularities in annotations and limited class images remain challenging. Future directions include employing recent soft computing techniques, developing better recommender components based on dataset features, and validating on larger-scale datasets to further improve performance within minimal computing time.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny