logo
ResearchBunny Logo
Introduction
Artificial intelligence (AI), particularly deep learning, has demonstrated exceptional diagnostic capabilities using high-quality clinical images. In ophthalmology, ultra-widefield fundus (UWF) imaging has become a standard, with AI diagnostic systems showing promise. However, real-world UWF images often suffer from variable quality due to factors like patient non-compliance, operator error, and hardware limitations. Existing AI systems are typically trained and evaluated on good-quality images only, leading to uncertain performance in real-world scenarios where poor image quality is unavoidable. This study addresses this limitation by developing an automated image filtering system to pre-process real-world UWF images before AI diagnosis, ensuring that only high-quality images are used for analysis. This 'selective eating' approach aims to improve the accuracy and reliability of AI diagnostic systems in routine clinical practice. Manual image quality assessment is time-consuming and requires expertise, making an automated approach crucial for efficient high-throughput settings like screenings and multicenter studies. The researchers aimed to create a DLIFS, evaluate its performance across multiple independent datasets, and analyze its impact on the accuracy of existing AI diagnostic systems for various ocular diseases.
Literature Review
The introduction cites several studies that demonstrate the success of deep learning in medical image analysis, including diabetic retinopathy detection and skin cancer classification. It also acknowledges the increasing use of UWF imaging in ophthalmology and the development of deep learning-based AI diagnostic systems for various ocular diseases. However, it points out the lack of research on addressing the issue of variable image quality in real-world settings, which can significantly impact the performance of these AI systems. The authors highlight the need for an automated image filtering system to overcome the challenges of manual quality assessment. Previous studies on automated image quality assessment are mentioned, emphasizing the novelty of this research in applying this approach to UWF images and integrating it with AI diagnostic systems for ocular fundus diseases.
Methodology
The study used a total of 40,562 UWF images from 21,689 individuals across three independent datasets (CMAAI, ZOC, XOH). Image quality was determined using three criteria: (1) >1/3 fundus obscured; (2) inability to identify macular vessels or >50% macular area obscured; (3) inability to identify vessels within 1 disc diameter of the optic disc. Three retina specialists independently labeled images, with a senior specialist arbitrating disagreements. A deep convolutional neural network (CNN) architecture, InceptionResNetV2, was used to develop the DLIFS. The model was trained on a large dataset (25,241 images initially, augmented to 126,205), validated, and tested using separate subsets from CMAAI dataset. Two external datasets (ZOC and XOH) were used for independent validation. Image preprocessing included downsizing to 512x512 pixels and normalization. Data augmentation techniques such as flipping, rotation, and brightness adjustment were employed. The performance of the DLIFS was evaluated using sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). The heatmap visualization technique was used to understand the DLIFS’s decision-making process. The DLIFS was then integrated with pre-existing AI diagnostic systems for lattice degeneration/retinal breaks (LDRB), glaucomatous optic neuropathy (GON), and retinal exudation/drusen (RED) to assess its effect on the performance of these systems when using real-world datasets with mixed image quality. Statistical analyses included Wilson Score method for calculating confidence intervals for sensitivity and specificity, empirical bootstrap for AUC, and two-proportion Z-test for comparing disease distribution in good and poor quality images.
Key Findings
The DLIFS demonstrated excellent performance across all three datasets, achieving AUCs of 0.996 (CMAAI), 0.994 (ZOC), and 0.997 (XOH). Sensitivity and specificity were consistently high across all three datasets (e.g. CMAAI: 96.9% sensitivity, 96.6% specificity). Heatmaps effectively highlighted poor-quality regions in images. Integrating the DLIFS with established AI diagnostic systems significantly improved their performance in real-world settings (using mixed-quality images). The AUCs for the diagnostic systems improved substantially when using only good-quality images (after filtering with DLIFS) compared to using mixed-quality or poor-quality images alone. For example, in the ZOC dataset, the GON system AUC increased from 0.964 (mixed-quality) to 0.988 (good-quality after DLIFS). Similar improvements were observed for other diseases and datasets. Analysis showed a significantly higher proportion of GON, RED, and LDRB cases among poor-quality images compared to good-quality images across both external validation datasets. This suggests that poor-quality images are more likely to come from patients with these conditions.
Discussion
The study’s results highlight the significant impact of image quality on the accuracy of AI diagnostic systems. The high sensitivity and specificity of the DLIFS demonstrate its effectiveness in automatically identifying and filtering poor-quality UWF images. The substantial performance improvement observed after integrating the DLIFS with existing AI diagnostic systems underlines the critical need for such a preprocessing step in real-world clinical applications. The heatmap visualization enhances the system's interpretability, allowing photographers to understand why an image is deemed poor quality. This feature aids in improving image acquisition and reducing errors in future image captures. The finding that poor-quality images are more likely associated with disease cases emphasizes the importance of referring such cases to ophthalmologists for further evaluation.
Conclusion
The developed DLIFS accurately identifies poor-quality UWF images, significantly improving the performance of AI diagnostic systems. The "selective eating" approach demonstrated in this study is crucial for developing robust AI systems for real-world clinical use. Future research should focus on further improving the DLIFS to identify the causes of poor image quality and reduce the burden on healthcare systems by reducing false positives. This could involve developing more sophisticated techniques to analyze the cause of poor quality and integrating this information into the feedback mechanism for photographers.
Limitations
While the DLIFS performs well, it does not identify the cause of poor image quality. Referring all poor-quality images could increase the healthcare burden, and there is a need to refine the system to reduce false positives. The study's findings might not be generalizable to all types of imaging modalities and diseases beyond those specifically examined.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny