AI-driven clinical decision support systems (AI-CDSS) hold significant potential for improving medical care, but a translation gap persists, particularly in critical care. Clinicians, researchers, and regulators demand explainable AI (XAI) to enhance trust and adoption. However, most XAI evaluations focus on general problems, not complex clinical tasks. Studies examining XAI in medical settings often involve diagnostic scenarios with pre-existing gold standards, unlike many non-diagnostic problems like sepsis management. This study uses the AI Clinician system, designed for sepsis resuscitation, to evaluate the impact of AI recommendations and XAI on clinician decision-making. The researchers aim to assess how clinicians' decisions are influenced by additional information, considering the source (human peers or AI) and presence/absence of XAI (simple feature importance).
Literature Review
The introduction section cites existing research highlighting the translation gap between AI's theoretical potential and its practical application in critical care. It notes the demand for XAI and the limitations of current XAI evaluation studies, mostly focusing on general problems or diagnostic scenarios with established gold standards. The researchers point out the lack of research on XAI in complex non-diagnostic areas like sepsis management, which affects millions globally despite decades of research and international guidelines.
Methodology
The study employed a modified between-subjects design with 86 ICU doctors (31 senior consultants, 42 intermediate, 13 junior). Participants completed 16 trials, each involving a patient case requiring prescription of two drugs (fluid and vasopressor). A multi-factorial design included four arms: baseline (control), peer recommendations (doses prescribed by other doctors in MIMIC-III), AI suggestions, and AI suggestions with XAI (feature importance). Each doctor experienced all four arms on different subsets of 24 patient scenarios. Pre-experiment questionnaires gathered data on clinician demographics, experience, and AI attitudes. Post-experiment questionnaires assessed self-reported usefulness of the different information types. Data analysis included assessment of dose shift and variability, associations between clinician factors (AI attitude, experience) and AI adherence, and correlation between self-reported XAI usefulness and adherence.
Key Findings
Additional information (peer, AI, or XAI) influenced prescriptions, with AI having a stronger impact than peer recommendations. Simple XAI did not significantly enhance the influence of AI suggestions. Clinician attitudes towards AI and clinical experience did not significantly correlate with AI-supported decisions. Self-reported XAI usefulness did not correlate with adherence to XAI suggestions. Inter-clinician dose variability was differentially affected by recommendations, being higher when recommendations exceeded baseline and lower when they were below baseline.
Discussion
The findings suggest that simple XAI offers limited additional benefit in this setting, compared to AI suggestions alone. The study also casts doubt on the reliability of self-reported data for evaluating XAI in clinical settings. The limited impact of XAI could be due to the simplicity of the explanation used, a ceiling effect (AI alone maximizing trust and adherence), or the nature of the repetitive, cognitively demanding task. The study acknowledges limitations such as the simplicity of the XAI method, the experimental nature of the clinical context, the limited range of patient scenarios, the low fidelity of the patient scenarios, the convenience sampling method, and the lack of insight into the doctors' internal decision-making processes.
Conclusion
The study demonstrates that while AI recommendations strongly influence clinician decisions, simple XAI provides limited additional benefit in this critical care setting. The lack of correlation between self-reported XAI usefulness and actual influence raises concerns about using self-reports as a reliable evaluation metric for XAI. Future research should explore more sophisticated XAI methods and investigate the cognitive processes underlying clinician decision-making in AI-supported settings.
Limitations
The study's limitations include the simplicity of the XAI method used (feature importance), the low-fidelity nature of the patient scenarios, the convenience sampling of ICU doctors, the lack of detailed insight into the internal decision-making processes of physicians, and potential limitations in the generalizability of findings due to specific scenarios explored. The study also notes that the AI suggestion was presented without confidence bounds or ranges, which could impact adherence.
Related Publications
Explore these studies to deepen your understanding of the subject.