logo
ResearchBunny Logo
COVID-19 Classification on Chest X-ray Images Using Deep Learning Methods

Medicine and Health

COVID-19 Classification on Chest X-ray Images Using Deep Learning Methods

P. B. Tchounwou, S. Zimeras, et al.

This groundbreaking study presents a comparison of five deep learning models for COVID-19 classification using chest X-ray images. Remarkably, ResNet101 outperformed the rest with an impressive precision, recall, and accuracy of 96%. This research was conducted by Paul B Tchounwou, Stelios Zimeras, Styliani Geronikolou, Marios Constantinou, Themis Exarchos, Aristidis G Vrahatis, and Panagiotis Vlamos.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the need for fast, reliable COVID-19 diagnosis, given the disease’s significant global impact and characteristic pulmonary involvement. While RT-PCR is the primary diagnostic tool, it is susceptible to false results due to sample and procedural factors. CT imaging has shown higher diagnostic accuracy but is more expensive and imparts higher radiation doses. Chest X-rays are cheaper, faster, more accessible, and involve less radiation, making them attractive for screening. The research investigates whether state-of-the-art deep learning models can accurately classify COVID-19 from CXR images, aiming to assess the performance and potential of individual models for clinical use.
Literature Review
Prior work indicates that many COVID-19 cases share similar radiographic features, including bilateral abnormalities and ground-glass opacities, particularly in early stages, and pulmonary consolidation in later stages. Numerous deep learning studies have reported strong results for COVID-19 detection on CXR; however, some relied on limited data, potentially limiting generalizability. Reviews have covered AI methods for imaging-based COVID-19 diagnosis, and related advances include low-dose CT denoising and tailored CNN designs for chest imaging. This study builds on that literature by using a large, consolidated CXR dataset (COVID-QU) and systematically comparing several widely used CNN architectures with transfer learning.
Methodology
Dataset: The COVID-QU dataset (33,920 CXR images) with three classes: COVID-19 (11,956 images), non-COVID-19 pneumonia (11,263 images; viral or bacterial), and Normal (10,701 images). Only PA and AP views are included. The dataset also provides lung masks (not used here). Data are split into train/validation/test as follows: Train 21,715 images (COVID-19 7,658; non-COVID-19 7,208; Normal 6,849), Validation 5,417 (COVID-19 1,903; non-COVID-19 1,802; Normal 1,712), Test 6,788 (COVID-19 2,395; non-COVID-19 2,253; Normal 2,140). The dataset aggregates images from multiple sources (e.g., BIMCV-COVID19+, RSNA CXR, Chest X-ray Pneumonia, PadChest, etc.). Models: Five pre-trained CNNs on ImageNet were evaluated with transfer learning: ResNet50, ResNet101, DenseNet121, DenseNet169, InceptionV3. Pre-processing and augmentation: Random rotation (±10°) and random horizontal flip applied on-the-fly during training to mitigate overfitting. Images were resized to 224×224 for ResNet and DenseNet using bilinear interpolation. InceptionV3 was used without resizing constraints. Inputs were normalized to either [0,1] or [-1,1] depending on model-specific preprocessing. Model definition: Base model loaded with ImageNet weights without its original classifier and frozen initially. A custom classifier head was added: Global Average Pooling (Flatten for InceptionV3), Dropout 0.2, and a Dense layer with 3 units (softmax, HeNormal initializer). Training setup: Metrics included categorical accuracy, precision, recall, F1-score, and counts (TP, TN, FP, FN). Loss: categorical cross-entropy. Optimizer: Adam (initial LR 4×10^-3, β1=0.9, β2=0.999, ε=1×10^-7). Callbacks: ModelCheckpoint (weights-only), EarlyStopping (patience 8, restore best), ReduceLROnPlateau (factor 0.2, patience 3), TensorBoard, CSVLogger. Training procedure: Initial training trained only the classifier with base layers frozen, up to 100 epochs with early stopping. Fine-tuning then unfroze a subset of base layers and retrained for ~10–15 epochs with Adam LR 4×10^-4, maintaining the same loss, metrics, and callbacks. After fine-tuning, models were evaluated on the held-out test set. Parameter counts post-unfreezing (Total/Trainable/Non-trainable) included, for example: ResNet50 (Total 23,564,800; Trainable 14,970,880), ResNet101 (Total 42,632,707; Trainable 25,040,899), DenseNet121 (Total 7,040,579; Trainable 5,527,299), DenseNet169 (Total 12,647,875; Trainable 11,059,843), InceptionV3 (Total 22,023,971; Trainable 17,588,163). Environment: Python 3.10.2, TensorFlow 2.10, Keras 2.10.0, trained on CPU (Windows 10) due to GPU incompatibility with TensorFlow.
Key Findings
- Class balance: The three classes had similar sample sizes in the test set, minimizing class imbalance concerns. - ResNet50 (Test, Table 3): Accuracy 0.95; COVID-19 Precision/Recall/F1 = 0.97/0.97/0.97; non-COVID-19 0.95/0.94/0.94; Normal 0.94/0.94/0.94. - ResNet101 (Test, Table 4): Accuracy 0.96; COVID-19 0.99/0.96/0.98; non-COVID-19 0.95/0.95/0.95; Normal 0.93/0.95/0.94. Best overall performance, achieving 96% across Accuracy, Precision, and Recall. - DenseNet121 (Test, Table 5): Accuracy 0.93; COVID-19 0.99/0.94/0.96; non-COVID-19 0.86/0.97/0.91; Normal 0.95/0.87/0.91. Notable precision drop for non-COVID-19 and recall drop for Normal. - DenseNet169 (Test, Table 6): Accuracy 0.94; COVID-19 0.99/0.93/0.96; non-COVID-19 0.95/0.92/0.94; Normal 0.88/0.96/0.92. - InceptionV3 (Test, Table 7): Accuracy 0.95; COVID-19 0.97/0.97/0.97; non-COVID-19 0.94/0.94/0.94; Normal 0.94/0.93/0.93. - All models achieved Recall ≥ 93% on the overall test set. ResNet101 was the top performer by balanced and highest overall metrics (96% Accuracy/Precision/Recall).
Discussion
The findings demonstrate that individual deep learning models can effectively classify COVID-19 from CXR images, addressing the study’s aim of evaluating the potential of popular CNN architectures under a consistent training and evaluation pipeline. The superior performance of ResNet101 suggests that deeper residual networks with higher representational capacity can provide modest gains in generalization across COVID-19, non-COVID-19 pneumonia, and Normal classes. While all models provided high COVID-19 recall—critical in a screening or triage context—confusion between non-COVID-19 and Normal classes persisted for some architectures, indicating room for improvement, possibly via lung segmentation, infection localization, or model ensembling. The results, obtained on a large, consolidated dataset with a held-out test set, support the feasibility of deploying such models as decision-support tools, subject to further clinical validation.
Conclusion
Five transfer learning CNNs (ResNet50/101, DenseNet121/169, InceptionV3) were trained and evaluated on a large COVID-19 CXR dataset for three-way classification (COVID-19, non-COVID-19 pneumonia, Normal). All models achieved strong performance, with overall Recall of at least 93%, and ResNet101 attaining 96% across Accuracy, Precision, and Recall. These results highlight the promise of individual deep learning models for COVID-19 screening from CXR images. Future work includes incorporating lung segmentation and lesion localization to improve accuracy, exploring ensemble models for greater robustness and generalization, and benchmarking the system against professional radiologists to assess clinical utility.
Limitations
- No lung segmentation or infection localization was used; the study focused on whole-image classification, which may limit performance in challenging cases. - Ensemble techniques were not evaluated; only individual models were compared, potentially underutilizing complementary strengths across architectures. - External clinical validation was not reported; evaluation was limited to the COVID-QU dataset’s predefined splits, and performance against professional radiologists remains to be assessed. - Some architectures showed notable confusion between non-COVID-19 and Normal classes, indicating limits to class separability under the current pipeline.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny