Medicine and Health

Artificial intelligence enables precision diagnosis of cervical cytology grades and cervical cancer

J. Wang, Y. Yu, et al.

This groundbreaking research introduces an artificial intelligence cervical cancer screening system that enhances grading of cervical cytology. With remarkable accuracy backed by a robust dataset of over 10,000 participants, the AICCS system not only improves specificity and sensitivity but also proves to be a valuable ally for cytopathologists. Conducted by Jue Wang, Yunfang Yu, Yujie Tan, and other esteemed authors, this study underscores the future of efficient cervical cancer screening.... show more

Introduction

Cervical cancer remains a major global health burden and is the fourth most common cancer and cause of cancer-related death among women worldwide. Early detection through screening is essential to prevent progression from precancerous lesions to invasive cancer. Current screening modalities include cervical cytology, HPV testing, and DNA ploidy testing, with cytology being widely adopted due to its simplicity and cost-effectiveness. However, there is a shortage of skilled cytopathologists, leading to over 10% false-negative rates and limited screening capacity, especially in countries like China. Artificial intelligence (AI) offers the potential to standardize and enhance cytology interpretation, reduce inter-observer variability, and improve sensitivity while maintaining efficiency. The purpose of this study was to develop and validate an AI-based cervical cancer screening system (AICCS) that analyzes whole-slide images (WSIs) of liquid-based cytology to classify cervical cytology grades and support clinical decision-making across retrospective, prospective, and randomized observational datasets.

Literature Review

The paper reviews progress in AI for medical imaging and diagnostics, including deep learning approaches such as CNNs, object detection, ensemble methods, and GANs. Notable successes include AI systems achieving expert-level performance in diabetic retinopathy detection and surpassing clinicians in lung cancer detection, as well as robust applications in breast cancer screening. Prior work in cervical cytology AI includes deep learning models for nucleus segmentation with many categories, hybrid feature approaches (e.g., HDF) focusing on squamous epithelial cells, and AI-assisted systems for TBS classification. While promising, earlier systems often increased task complexity, focused on limited cell types, or lacked broad validation, highlighting the need for a clinically practical and comprehensively validated solution.

Methodology

Study design: Multicenter study with retrospective development, internal and external validation, prospective validation, and a randomized observational trial. A total of 16,056 eligible participants were enrolled between January 2016 and December 2020 across three institutions. For model development, 11,468 WSIs from Sun Yat-sen Memorial Hospital (SYSMH) were split into training (n=9,316) and internal validation (n=2,152). External validation datasets were from Guangzhou Women and Children Medical Center (GWCMC, n=600) and The Third Affiliated Hospital of Guangzhou Medical University (TGHUMC, n=600). Prospective validation at SYSMH included 278 eligible participants, and a randomized observational trial included 605–608 eligible participants at SYSMH. Data acquisition and QC: Liquid-based cytology samples (sedimentation method) were scanned at 40× with PRECEED 600 (pixel size 0.2529 µm) or KF-PRO-400H (0.2484 µm). WSIs were saved in proprietary formats. Six experienced cytopathologists annotated and labeled WSIs per the Bethesda System (TBS) 2014. Unsatisfactory samples (e.g., low cell count, artifacts) were excluded. An AI-based thumbnail quality assessment (EfficientNet backbone with summary and detail branches) screened for scanning issues (blurriness, incomplete areas), with flagged images reviewed and excluded as needed. Annotation and labels: Each WSI was independently annotated by two cytopathologists; disagreements were adjudicated by an expert. For patch-level annotation, 2,845 WSIs were used. Positive cells were labeled with bounding boxes into six categories following TBS 2014: ASC-US, LSIL, ASC-H, HSIL, SCC, and AGC (all glandular lesions grouped as AGC due to low counts and overlapping features). Negative smears were not annotated. A two-stage annotation workflow combined initial manual labeling and AI-suggested annotations with verification to expand high-quality ROIs. WSI-level classes: For WSI-level classification, patches annotated as ASC-H, LSIL, and SCC were grouped into a high-grade category (HSI-I per text, representing higher-grade squamous lesions) due to morphological and management similarities. Final WSI-level classes: NILM, ASC-US, LSIL, HSI-I, and AGC. Model architecture: AICCS integrates (1) a patch-level object detection model and (2) a WSI-level classifier. For patch-level detection, RetinaNet (one-stage detector) with a ResNet+FPN backbone and focal loss was selected over Faster R-CNN based on performance. An additional binary classifier subnet distinguished squamous from glandular cells. The top detections per class (up to 20) were surfaced in the UI for review and were aggregated for slide-level features. WSI-level features and classifier: From patch detections, statistical features were computed per class (e.g., max/mean/std of confidence scores; proportions across confidence intervals). A random forest classifier used these features for WSI-level grade prediction. Among algorithm combinations tested (RetinaNet vs Faster R-CNN; Random Forest vs Neural Network), the Retina-ResNet–Random Forest pipeline performed best. Data augmentation: Applied systematically, including random patch crops around annotations with varying overlaps, rotations, color augmentations grounded in stain (DAB) space using Macenko color deconvolution and reconstruction to RGB, with controlled sampling of H&E components within constrained bounds to mitigate overfitting and standardize color variation. Evaluation: Retrospective internal and external validations assessed AUC, sensitivity, specificity, accuracy, and NPV across cytology grade thresholds (ASC-US+, LSIL+, HSIL+). Prospective validation compared AICCS alone, cytopathologists, and AICCS-assisted cytopathologists. A randomized observational trial (Aug 1–Dec 14, 2020) at SYSMH randomized participants 1:1 to diagnostic approaches; all cases received an expert histopathology-based gold-standard diagnosis. Statistical analyses used ROC curves, independent t-tests, chi-square tests, with two-tailed P<0.05 considered significant.

Key Findings

Model selection: The Retina-ResNet–Random Forest pipeline achieved AUC 0.922 (95% CI: 0.904–0.940) and sensitivity 0.906 (95% CI: 0.875–0.932) on development comparisons.
Retrospective validation (examples): High performance across ASC-US+, LSIL+, HSIL+ thresholds with accuracy and specificity typically above 0.810 across internal and external datasets. Reported NPVs: 0.973 (SYSMH internal), 0.913 (GWCMC external), 0.958 (TGHUMC external).
Subgroup AUCs for ASC-US+ detection: SYSMH internal 0.932 (95% CI: 0.905–0.941); TGHUMC external 0.879 (95% CI: 0.844–0.913); GWCMC external 0.929 (95% CI: 0.905–0.953).
Prospective assessment: AICCS achieved AUC 0.947, sensitivity 0.946, specificity 0.890, accuracy 0.892.
Comparative performance (prospective dataset): For ASC-US+, AUCs were AICCS 0.947 (95% CI: 0.936–0.958), cytopathologists 0.964 (95% CI: 0.948–0.974), AICCS-assisted 0.965 (95% CI: 0.954–0.976). For LSIL+, AICCS 0.965 (95% CI: 0.956–0.974), cytopathologists 0.975 (95% CI: 0.963–1.000), AICCS-assisted 0.965 (95% CI: 0.956–0.974). For HSIL+, AICCS 0.965 (95% CI: 0.949–0.982), cytopathologists 0.994 (95% CI: 0.992–0.996), AICCS-assisted 0.998 (95% CI: 0.996–0.999).
Randomized observational trial: All three groups (AICCS, cytopathologists, AICCS-assisted) had AUCs >0.900 across thresholds. AICCS assistance significantly improved specificity and accuracy versus cytopathologists alone (p<0.001) while maintaining comparable sensitivity (P>0.05). The study highlights a 13.3% increase in sensitivity with AICCS assistance. NPVs reached up to 1.000 for AICCS-assisted in some analyses.
Efficiency: AICCS reduced per-WSI reading time to under 120 s compared with approximately 180 s for manual reading.
Deployment: A cloud-based, multi-institutional platform enabled remote uploads and AI-assisted diagnosis.

Discussion

The AICCS system addresses critical bottlenecks in cervical cytology by automating patch-level detection of abnormal cells and aggregating evidence for WSI-level diagnosis, thereby reducing reliance on scarce cytopathology expertise and mitigating inter-observer variability. The system demonstrated robust performance across internal, external, prospective, and randomized trial settings, indicating strong generalizability. Notably, the highest sensitivities were observed in HSIL+ subgroups, aligning with clinical priorities to avoid missing high-grade lesions. Assistance from AICCS consistently improved specificity and accuracy for cytopathologists while preserving sensitivity, supporting its role as an augmentation tool rather than a replacement. The reduction in reading time and the availability of a cloud platform further enhance clinical utility, particularly in resource-limited settings. These results support the hypothesis that AI can standardize and enhance cervical cytology screening, potentially improving alignment with histopathology and enabling scalable, high-quality screening programs.

Conclusion

This study presents an AI-based cervical cancer screening system (AICCS) that combines a RetinaNet-based patch-level detector with a random forest WSI-level classifier for TBS 2014-grade prediction from cervical liquid-based cytology WSIs. Trained on large multicenter datasets and validated across internal/external cohorts, a prospective set, and a randomized observational trial, AICCS achieved high diagnostic accuracy and improved cytopathologists’ performance, while reducing reading time. A cloud-enabled deployment demonstrates practical feasibility for multi-institutional use. Future work should include broader, more diverse datasets, harmonization across scanners and preparation methods, prospective studies in varied healthcare settings, and integration with HPV and clinical data. Continued evaluation of privacy, security, workflow integration, and human-AI collaboration will be essential for safe and equitable adoption.

Limitations

Sampling and operator variability may result in some WSIs not accurately representing true lesion rates, potentially causing false negatives.
Differences in liquid-based cytology preparation across centers can lead to inconsistent smear quality, affecting AI performance.
Generalizability may be influenced by scanner types, staining variations, and site-specific workflows despite quality control and augmentation.
Privacy and security considerations, along with clear delineation of clinician responsibilities and oversight in AI-assisted decisions, require ongoing monitoring.

Related Publications

Explore these studies to deepen your understanding of the subject.

Engineering and Technology

Nanoparticles and convergence of artificial intelligence for targeted drug delivery for cancer therapy: Current progress and challenges

R. P. Singh, A. Natarajan, et al.

Medicine and Health

Can artificial intelligence improve the diagnosis and prognosis of disorders of consciousness? A scoping review

M. Bonanno, D. Cardile, et al.

Medicine and Health

Artificial intelligence unravels interpretable malignancy grades of prostate cancer on histology images

O. Eminaga, F. Saad, et al.

Medicine and Health

Development and evaluation of an artificial intelligence system for COVID-19 diagnosis

C. Jin, W. Chen, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny