Medicine and Health

Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma

T. Chanda, K. Hauser, et al.

Explore how a groundbreaking Explainable AI (XAI) system transforms melanoma diagnosis. Conducted by a team of experts including Tirtha Chanda and Katja Hauser, this study reveals how precise, domain-specific explanations enhance dermatologists' accuracy and trust in AI-powered tools.

00:00

Playback language: English

Index

Introduction

Melanoma is a significant cause of skin cancer-related deaths globally. Early detection and removal are crucial for optimal prognosis. However, early melanomas are challenging to distinguish from other skin tumors. AI-based diagnostic support systems have shown promise in improving the accuracy of melanoma and nevi diagnoses using digitized images. While promising, the lack of transparency in Deep Neural Networks (DNNs) hinders their widespread clinical adoption. The GDPR mandates interpretability of algorithm-based decisions, and clinicians require understanding the characteristics determining DNN outputs. Explainable AI (XAI) methods aim to increase transparency, but often lack precise, domain-specific explanations, and their impact on dermatologist decision-making remains largely unevaluated. This study addresses these limitations by introducing an XAI system designed to provide dermatologist-like explanations, and evaluates its effect on clinical practice. This research addresses the need for transparent and trustworthy AI systems in dermatology, aligning with EU recommendations for collaboration between AI developers and clinical end-users.

Literature Review

Existing XAI methods fall into two categories: post hoc algorithms (e.g., Grad-CAM, LRP) that retrospectively explain DNN decisions, and inherently interpretable algorithms that are inherently understandable. Post hoc methods are model-agnostic but may lack faithfulness to the model. Inherently interpretable methods are model-specific but address faithfulness concerns. XAI methods also vary in their scope (global, class-level, or local) and target audience. A 2022 systematic review revealed that most XAI studies in skin cancer recognition used post hoc methods, highlighting the underrepresentation of inherently interpretable XAI methods. The 'interpretability gap' of post hoc XAI, where users struggle to understand why highlighted image regions are relevant, poses a critical issue, potentially leading to confirmation bias and decreased trust. Two recent dermatological XAI systems attempted to address the interpretability gap but used relatively small datasets. This study aims to improve upon these limitations by employing a larger, expert-annotated dataset and a multimodal XAI system with human-like explanations, evaluating its influence on dermatologists.

Methodology

This study involved a three-phase reader study with 116 international board-certified dermatologists. In phase 1, clinicians diagnosed 15 dermoscopic images (7 melanomas, 7 nevi, 1 repeated image) without AI assistance, providing explanations and confidence scores. In phase 2, clinicians received AI-predicted diagnoses (ResNet50 baseline) but no explanations, again providing their diagnoses, confidence, and trust scores. Phase 3 introduced the multimodal XAI system, which included a ResNet50 backbone, a dermoscopic ontology, and an attention mechanism to localize features, alongside text-based explanations. The XAI diagnoses, explanations, and confidence levels were provided. Clinicians again provided their diagnoses, confidence, and trust scores. The XAI was trained on an expert-annotated dataset of 3611 dermoscopic images (melanomas and nevi) from the HAM10000 dataset. 14 dermatologists annotated these images, selecting relevant features from the ontology and annotating regions of interest. The model predicted lesion characteristics (from the ontology), inferring a melanoma diagnosis if at least two melanoma characteristics were present. Grad-CAM heatmaps were used to identify salient image regions, visualized as polygons. Explanation faithfulness was assessed using contrastive examples. Data analysis included balanced accuracy, sensitivity, specificity, Sørensen-Dice similarity coefficient (for explanation and ROI overlap), correlation analysis (Spearman's rank correlation and paired t-tests), and Mann-Whitney U tests.

Key Findings

The XAI achieved good diagnostic accuracy (81% balanced accuracy, comparable to the ResNet50 baseline and two state-of-the-art approaches), focusing significantly more on regions inside the lesion than the baseline. Explanation faithfulness was high. The XAI showed strong alignment with clinicians' explanations, with a mean explanation overlap of 0.46 (melanoma) and 0.23 (nevus). ROI overlap between XAI and clinicians was 0.48, significantly higher than the baseline (0.39). AI support significantly increased clinicians' diagnostic accuracy (from 66.2% to 72.3%), but XAI did not significantly improve accuracy further (73.2%). However, XAI significantly increased clinicians' confidence (12.25% increase compared to AI alone) and trust (17.52% increase) in their diagnoses. Clinicians with more experience in dermoscopy benefited more from XAI support, while less experienced clinicians benefitted most from plain AI support. Clinician trust in XAI was significantly correlated with explanation overlap, particularly for melanoma diagnoses.

Discussion

This study demonstrates that a dermatologist-like explainable AI system can enhance trust and confidence in melanoma diagnosis without compromising diagnostic accuracy. The strong alignment between XAI explanations and clinicians' reasoning suggests that the XAI successfully captures human-relevant features. The increased confidence and trust observed with XAI support highlight the importance of interpretability in clinical decision-making. The finding that more experienced clinicians benefited more from XAI suggests that familiarity with the underlying dermoscopic features is a factor influencing XAI effectiveness. These results address the interpretability gap in AI for medical applications, promoting the adoption of trustworthy AI systems.

Conclusion

This study presents a multimodal XAI system for melanoma diagnosis, demonstrating improved clinician confidence and trust without sacrificing accuracy. The system’s dermatologist-like explanations bridge the interpretability gap, aligning with user expectations. Future research should investigate the effects of erroneous AI predictions and the optimal design of explanations for different clinician experience levels. Expanding the dataset to include more diverse lesion types and locations could further improve generalizability. This work contributes to the development of trustworthy and transparent AI systems for clinical practice, furthering the collaboration between AI developers and clinical end-users.

Limitations

The study used simulated clinical conditions. The XAI system was intentionally designed to provide human-like explanations, potentially limiting its ability to identify patterns not readily apparent to human experts. The reliance on a domain-specific ontology might limit generalizability to other domains. The impact of individual explanation components (textual vs. visual) was not analyzed separately. The study did not account for potential domain shift issues, i.e., differences between training data and real-world clinical data. Future research should address these aspects to further enhance the system's robustness and applicability in real-world clinical settings.

Related Publications

Explore these studies to deepen your understanding of the subject.

Computer Science

Exploring the mechanism of sustained consumer trust in AI chatbots after service failures: a perspective based on attribution and CASA theories

C. Gu, Y. Zhang, et al.

Environmental Studies and Forestry

Artificial intelligence and ESG in resources-intensive industries: Reviewing the use of AI in fisheries, mining, plastics, and forestry

R. Deberdt, P. L. Billon, et al.

Medicine and Health

Combined KRAS G12C and SOS1 inhibition enhances and extends the anti-tumor response in KRAS G12C -driven cancers by addressing intrinsic and acquired resistance

V. Thatikonda, H. Lu, et al.

Medicine and Health

Variations in racial and ethnic groups' trust in researchers associated with willingness to participate in research

W. T. Hu, S. M. Bergren, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny