logo
ResearchBunny Logo
Diagnostic Accuracy of Machine Learning AI Architectures in Detecting and Classifying Lung Cancer: A Systematic Review

Medicine and Health

Diagnostic Accuracy of Machine Learning AI Architectures in Detecting and Classifying Lung Cancer: A Systematic Review

A. Pacurari, S. Bhattarai, et al.

Discover how machine learning is transforming lung cancer diagnosis! This systematic review highlights the promising potential of various AI architectures in improving diagnostic accuracy for lung cancer, as investigated by A.C. Pacurari, S. Bhattarai, A. Muhammad, and other leading researchers.

00:00
00:00
~3 min • Beginner • English
Introduction
Lung cancer is the leading cause of cancer-related mortality worldwide, with most patients diagnosed at advanced stages, leading to poor prognosis. The heterogeneity in imaging appearances (from microscopic nodules and ground-glass opacities to multiple nodules, effusions, and collapse) and histopathology (adenocarcinoma, squamous cell carcinoma, small-cell carcinoma, and other variants) complicates therapeutic decision-making. Precision medicine requires comprehensive characterization (stage, histology, genomics) and multidisciplinary input to choose chemotherapy, targeted therapy, immunotherapy, surgery, and/or radiotherapy. Clinical workflows are resource-intensive, involving detailed imaging and pathology review. Artificial intelligence (AI), particularly machine learning (ML)–based computer-aided detection/diagnosis (CAD) systems, can assist by rapidly recognizing imaging patterns, highlighting lesion areas, and differentiating abnormal from healthy lung regions. ML approaches have shown potential to classify nodules (benign vs malignant), including very small lesions. In lung oncology, AI aims to individualize diagnosis based on tumor characteristics. Many studies report AI utility in nodule identification, histologic diagnosis, risk stratification, drug development, and prognosis prediction. This systematic review focuses on analyzing and assessing the diagnostic accuracy of existing ML AI architectures in detecting and classifying lung cancer.
Literature Review
Methodology
Systematic review conducted February 2023 across PubMed, Web of Science, Cochrane, and Scopus, covering literature through December 2022. Search terms included MeSH-linked and keyword combinations for lung cancer and AI/ML (e.g., lung cancer, pulmonary nodule, lung neoplasms, thoracic neoplasms, AI, machine learning, cancer screening, neural network, diagnostic imaging). English-language journal articles only. Protocol followed PRISMA and PROSPERO guidance; the review was registered on OSF. Inclusion criteria: adult populations screened incidentally or via screening; ML-based algorithms (e.g., neural networks, CADs built on ML models) using radiological imaging (e.g., X-ray, CT, HRCT, LDCT) to detect/classify lung cancer; availability of diagnostic accuracy data (TP, TN, FP, FN) or sufficient information to compute them. Exclusion criteria: phantom/histopathology/microscopic images; non-imaging modalities; segmentation-only studies without ML augmentation; deep learning–only studies (to standardize on ML); non-lung diseases; commentaries/editorials/abstract-only pieces. Primary diagnostic test accuracy metrics were sensitivity and specificity. Study selection: from 5894 records, 517 duplicates were removed; 5062 were excluded by abstract; 315 full texts were assessed; 9 studies met inclusion. Quality assessment used NHLBI Study Quality Assessment Tools appropriate to study designs, with two independent evaluators. For observational cohort/cross-sectional designs, a scoring system assigned 1 point for “Yes” and 0 for “No/Other,” classifying 0–4 as fair, 5–9 good, and ≥10 excellent; this process aimed to mitigate selection, missing data, and measurement biases.
Key Findings
Nine studies (2014–2022) from Turkey, USA, Poland, Pakistan, Italy, Bangladesh, and India assessed ML architectures for lung cancer detection/classification. Designs: five case-control, three retrospective cohort, one prospective cohort. Quality ratings: 1 excellent, 3 good, 5 fair. ML architectures: ANN, entropy degradation method (EDM), probabilistic neural network (PNN), support vector machine (SVM), partially observable Markov decision process (POMDP), and random forest neural network (RFNN). Lesion types: SCLC (2 studies), NSCLC (1 study), and malignant vs benign nodules (6 studies). Sample sizes ranged from 32 to 5402 patients. Performance highlights: - Sensitivity ranged from 0.81 to 0.99; specificity from 0.46 to 1.00; accuracy from 77.8% to 100%. - Exemplars: Dandil et al. (ANN) TP 24, TN 34, FP 4, FN 2 on 128 CTs; sensitivity 0.92, specificity 0.89, accuracy 92.3%. Wu et al. (EDM) had lower performance (sensitivity 0.83, specificity 0.72, accuracy 77.8%). Wozniak/Capizzi et al. (PNN) achieved sensitivities 0.95–0.96, specificities 0.90–0.91, accuracies ~92%. Khan et al. (SVM) showed sensitivity 0.97, specificity 0.99, accuracy 98%. Petousis et al. (POMDP) had high sensitivity (0.97) but low specificity (0.46), reflecting many FPs despite maintaining TP rates. Chauvie et al. (RFNN) achieved specificity 1.00 and 100% accuracy with Lung-RADS data and high PPV without sacrificing sensitivity. Hoque et al. (SVM) showed sensitivity 0.99 but specificity 0.50 (accuracy 95%). Kumar et al. (SVM, NSCLC) reported sensitivity 0.81, specificity 0.82, accuracy 98.8%, outperforming KNN, naïve Bayes, and J48 even with SMOTE. Overall, ML architectures effectively differentiated malignant from benign nodules and detected SCLC/NSCLC across imaging modalities (CT, HRCT, LDCT, X-ray, RADS), with performance influenced by study design, dataset size/quality, and model type.
Discussion
Across nine studies, ML-based AI methods demonstrated consistently promising diagnostic accuracy for lung cancer detection and nodule classification, suggesting value as alternatives or adjuncts to expert radiologists and microscopic analysis. Performance varied by architecture and context: SVM and PNN often achieved high sensitivity/specificity on CT and X-ray datasets, whereas POMDP achieved high sensitivity but lower specificity in LDCT screening. RFNN combined with Lung-RADS yielded excellent specificity and accuracy, achieving high PPV without sacrificing sensitivity. Variations in accuracy (from 77.8% to 100%) likely reflect differences in study design, image modality, dataset size/quality, and reference standards. Comparisons with other literature cited within the paper indicate that deep learning can perform strongly with large datasets, but traditional ML may be preferable for smaller datasets due to data requirements and overfitting risks. Clinically, effective ML tools could streamline workflows, reduce oversight in lesion detection, enhance early detection, and potentially improve outcomes, though attention to false positives is critical to avoid unnecessary follow-up in benign cases. The findings support the feasibility and potential clinical relevance of ML in lung cancer imaging across multiple modalities, while underscoring the need for rigorous validation and standardization.
Conclusion
This systematic review shows that machine learning AI architectures can accurately detect and classify lung cancer across imaging modalities, effectively distinguishing malignant from benign nodules and identifying SCLC and NSCLC. Reported sensitivity, specificity, and accuracy varied by study and model, with several architectures achieving high performance. Despite these promising results, further optimization, standardization, and validation in diverse, larger, multi-center cohorts are needed to enhance performance and reliability for real-world deployment.
Limitations
Heterogeneity across included studies (patient populations, imaging modalities, lesion types, ML architectures) limits generalizability and precluded robust pooled analysis. The number of eligible studies was small, and several had limited sample sizes or incomplete methodological details. Potential publication bias may be present. Study quality varied (fair to excellent). The review focused on diagnostic accuracy and did not assess downstream clinical impact, patient outcomes, or cost-effectiveness.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny