logo
ResearchBunny Logo
Introduction
The application of artificial intelligence (AI) in medical image analysis has shown significant potential for early disease detection, making it a promising solution for addressing the global health burden. Many AI models demonstrate accuracy comparable to or exceeding human experts, particularly in diabetic retinopathy (DR) screening. While high accuracy is crucial, the real-world cost-effectiveness of AI in long-term health screening is often underestimated. The trade-off between diagnostic performance (sensitivity and specificity) and cost-effectiveness is a significant challenge. Increasing sensitivity improves detection but raises costs, while increasing specificity reduces costs but might miss high-risk patients. Previous studies explored this trade-off theoretically, often assuming independent changes in sensitivity and specificity, which is unrealistic in practice. This study addresses this gap by using real-world data from a large-scale DR screening program in China to investigate whether the most accurate AI model is also the most cost-effective, particularly considering regional variations in DR prevalence and healthcare resource availability.
Literature Review
Existing literature highlights the high accuracy of AI in DR detection, with sensitivity and specificity ranging from 85% to 95% and 74% to 98%, respectively. However, a critical gap exists in understanding the cost-effectiveness of different AI models in real-world settings. Previous studies have attempted to analyze the cost-effectiveness of AI in DR screening by assigning different values to sensitivity and specificity. However, these studies often focus on theoretical scenarios and overlook the inverse correlation between sensitivity and specificity in practical applications. There's a lack of evidence guiding the selection of AI models that balance diagnostic performance and cost-effectiveness, particularly in low- and middle-income countries (LMICs) like China, where healthcare resources are often limited.
Methodology
This study utilized data from the Lifeline Express DR Screening Program in China, a nationwide program encompassing 251,535 participants with diabetes. A validated AI model was used, and its diagnostic performance was systematically varied by adjusting decision thresholds to generate 1100 different sensitivity/specificity pairs. A hybrid decision tree/Markov model was employed for cost-effectiveness analysis, simulating annual screening scenarios over 30 years. The 'status quo' was defined as the scenario with the most accurate AI model. Incremental cost-effectiveness ratios (ICERs) were calculated for other scenarios against the status quo. The analysis considered various factors including the prevalence of referable DR (4%, 8%, and 12%), willingness-to-pay (WTP) levels (ranging from US$0 to three times per capita GDP), and regional differences (rural vs. urban) and age groups. Sensitivity and probabilistic sensitivity analyses were performed to assess model robustness and parameter uncertainty. Subgroup analyses were conducted for rural/urban settings and different age groups.
Key Findings
Compared to the status quo (sensitivity/specificity: 93.3%/87.7%), six scenarios were cost-saving and seven were cost-effective. To achieve cost-saving or cost-effectiveness, the AI model should have a minimum sensitivity of 88.2% and specificity of 80.4%. The most cost-effective AI model exhibited higher sensitivity (96.3%) and lower specificity (80.4%) than the status quo. Higher DR prevalence and WTP levels necessitated higher AI sensitivity for optimal cost-effectiveness. Urban regions and younger patient groups also required higher sensitivity in AI-based screening. The cost-saving effect was greatest when sensitivity/specificity was 88.2%/90.3%, resulting in US$5.54 million in savings but a compromise of 1490 quality-adjusted life-years (QALYs). The best cost-effective scenario (highest sensitivity 96.3%, lowest specificity 80.4%) resulted in an additional US$14.8 million in cost but gained 839 extra QALYs. Sensitivity analysis showed that model parameters did not significantly alter cost-effectiveness rankings. Probabilistic sensitivity analysis indicated that the most cost-effective AI model had a 55.43% probability of being the dominant choice at a WTP of three times per capita GDP. Subgroup analysis showed that higher specificity was needed in rural settings, and higher sensitivity was needed in urban settings.
Discussion
This study demonstrates that the most accurate AI model may not be the most cost-effective in real-world DR screening. Cost-effectiveness should be independently evaluated, with sensitivity playing a crucial role. Higher sensitivity is crucial in settings with high DR prevalence and WTP levels. The trade-off between sensitivity and specificity is complex: low sensitivity can lead to missed diagnoses and subsequent blindness, incurring high costs associated with vision loss, especially in LMICs, while low specificity results in unnecessary referrals, straining healthcare resources. AI-based screening offers efficiency by initially filtering out negative cases, but the optimal balance between sensitivity and specificity is context-dependent and should be guided by cost-effectiveness analyses. This study highlights the need for locally tailored minimum performance standards for AI in DR screening, considering regional variations in DR prevalence and healthcare system capacity. The findings suggest that sensitivity should be prioritized over specificity, particularly in areas with higher DR prevalence and greater economic capacity. Although similar cost-effectiveness strategies might be applicable to other countries with comparable healthcare economic frameworks, further validation is needed.
Conclusion
The most accurate AI model is not always the most cost-effective in real-world DR screening. Independent cost-effectiveness evaluation is essential, emphasizing sensitivity, particularly in high-prevalence and high-WTP settings. Future research should focus on validating these findings across diverse settings and scaling up the implementation of cost-effective AI-based DR screening strategies.
Limitations
This study used a single AI model, though 1100 performance variations were simulated. The cost-effectiveness assessment reflects the Chinese healthcare system, limiting direct generalizability to other countries. The model simplified DR progression, not considering all possible health states like diabetic macular edema. The specificity of the AI model in this real-world dataset was lower than reported in its previous validation due to the reclassification of ungradable images as positive.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny