logo
ResearchBunny Logo
Influence of believed AI involvement on the perception of digital medical advice

Medicine and Health

Influence of believed AI involvement on the perception of digital medical advice

M. Reis, F. Reis, et al.

Discover the intriguing findings from researchers Moritz Reis, Florian Reis, and Wilfried Kunde, who explored public perceptions of AI-generated medical advice in two comprehensive studies. Their work reveals an unexpected anti-AI bias, shedding light on the challenges faced in integrating AI into the healthcare sector despite physician supervision.

00:00
00:00
Playback language: English
Introduction
Large language models (LLMs) offer new avenues for seeking digital medical advice. While the performance of AI-based tools has been studied, public perception remains largely unexplored. This research addresses a critical gap by investigating how the perceived source of medical advice (AI, human physician, or human+AI) influences its perceived reliability, understandability, empathy, and willingness to follow. The study's importance stems from the immense potential of AI in medicine and the need to understand public acceptance for broader implementation. Previous research often used small samples, lacked experimental designs, or focused solely on physicians' perspectives, neglecting the crucial role of patient perception in treatment adherence and outcome. This study aims to address these limitations by employing large, representative samples and a rigorous experimental design to directly assess public attitudes towards AI-generated medical advice. This research specifically focuses on the label effect, abstracting from potential differences in the objective quality of AI-versus human-generated advice.
Literature Review
Existing research highlights the potential of AI in medicine, including image analysis and drug interaction detection. While some studies suggest that LLMs like ChatGPT can achieve diagnostic accuracy comparable to human physicians, and even outperform them in terms of perceived quality and empathy when authorship is undisclosed, algorithm aversion is a significant factor when AI involvement is known. Previous studies on public attitudes toward AI in healthcare often suffered from limitations such as small sample sizes, lack of experimental design, or a focus solely on physician perspectives. This study addresses these gaps by using large, representative samples and a controlled experimental design to focus on public perceptions of LLM-generated medical advice.
Methodology
Two preregistered studies were conducted. Study 1 (n=1050, diverse nationalities) and Study 2 (n=1230, representative UK sample) presented participants with scenarios of patients receiving medical advice. The advice itself was identical across conditions but the source was manipulated ('AI', 'human physician', 'human + AI'). Participants rated the advice on reliability, comprehensibility, and empathy (Study 1: 7-point Likert scale; Study 2: 5-point Likert scale). Study 2 additionally measured willingness to follow the advice and interest in the platform (saving a fictional link). Statistical analyses included ANOVAs and t-tests (Study 1) and mixed-effects regression analyses (both studies) to compare ratings across conditions. The Holm-Bonferroni method was used to correct for multiple comparisons. Exploratory analyses investigated correlations between attitudes toward AI and ratings.
Key Findings
Both studies consistently showed that 'human'-labeled advice received significantly higher ratings for empathy and reliability than 'AI'- and 'human + AI'-labeled advice. There was no significant difference in comprehensibility ratings across conditions. Study 2 further revealed that participants demonstrated a significantly lower willingness to follow advice when AI involvement was indicated. Interestingly, the number of participants who saved the link to the fictional platform did not significantly differ across conditions. Specific statistical results for Study 1 included significant main effects of author label on empathy (F(2,1047) = 7.98, P<0.001) and reliability (F(2, 1047)=9.68, P<0.001). Post-hoc t-tests revealed significant differences between human-labeled advice and both AI- and human+AI labeled advice for both empathy and reliability. Study 2 mirrored these results with significant differences in empathy, reliability, and willingness to follow the advice between the human condition and the AI and human+AI conditions. Effect sizes (Cohen's d) ranged from 0.21 to 0.31, indicating small to medium effects.
Discussion
The findings suggest an anti-AI bias in the perception of digital medical advice. This bias is consistent with algorithm aversion and the concept of 'dehumanizing' effects of AI. The lack of a significant difference in comprehensibility ratings across conditions suggests that the negative perception is not due to a lack of understanding, but rather to a lack of trust and perceived empathy. The finding that interest in AI-based tools, as reflected in the willingness to save a link, did not differ across conditions, implies a potential for acceptance if concerns about reliability and empathy are addressed. The persistence of the bias even under physician supervision suggests that merely adding a human element may not be sufficient to overcome these concerns.
Conclusion
This study reveals a significant bias against AI-generated medical advice, even when provided under human supervision. Addressing this bias requires a multi-faceted approach involving public education, transparent communication regarding AI's role, and careful consideration of the framing of AI-assisted healthcare. Future research could explore interventions to mitigate the bias, such as emphasizing AI's ability to personalize care or highlighting its potential to improve diagnostic accuracy and efficiency. Furthermore, investigating the long-term effects of AI-generated medical advice on patient behaviors and health outcomes is warranted.
Limitations
The study's limitations include the use of hypothetical scenarios instead of real-world interactions, the focus on single question-answer exchanges, and potential sampling biases related to the online recruitment method. The fact that participants adopted the perspective of another individual and did not formulate their own medical questions might also have influenced the results. The limited interaction setting might also not fully reflect the nuances of real doctor-patient consultations.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny