Psychology
Out with AI, in with the psychiatrist: a preference for human-derived clinical decision support in depression care
M. M. Maslej, S. Kloiber, et al.
This groundbreaking study by Marta M. Maslej and colleagues explores psychiatrists' views on AI-based clinical support tools for major depressive disorder. Despite advancements in AI, the findings reveal a surprising preference for human-generated summaries and recommendations, especially when accuracy is on the line. Dive into this intriguing research to understand perceptions shaping the future of psychiatric care.
~3 min • Beginner • English
Introduction
The study investigates how psychiatrists perceive and respond to information from AI-based clinical support tools (CSTs) in the context of major depressive disorder (MDD). While AI-enabled CSTs promise to assist in reviewing clinical data, informing diagnosis, and recommending treatments, their real-world performance has been uneven and sometimes harmful, and improving algorithmic accuracy does not always translate to better clinical outcomes. Given anticipated clinician–AI collaboration, acceptance and trust are crucial. Prior work shows mixed perceptions of AI in healthcare, concerns about over-reliance, and evidence that incorrect AI advice can adversely influence clinical decisions. The research question asks whether psychiatrists evaluate CST information differently when they believe it is AI-derived versus human-derived, and whether such perceptions interact with the objective quality (correct vs incorrect) of the CST information. Secondary questions explore whether clinical experience and familiarity with AI moderate these effects.
Literature Review
Background literature highlights: (1) In psychiatry, AI-based CSTs aim to predict treatment response and support personalized care in MDD; transformer-based models are improving clinical note summarization. (2) Clinician acceptance is pivotal for successful integration; perceptions of AI vary, with arguments that AI may struggle to outperform human judgment in psychiatry due to heterogeneity and interpersonal assessment aspects. (3) Experimental studies show clinicians may follow incorrect AI advice; expertise and AI familiarity can modulate susceptibility, though findings are mixed. (4) Qualitative work suggests openness to AI tools but also concerns about over-reliance and trust. These strands motivate empirical testing of how perceived source (AI vs psychiatrist) and information quality influence psychiatrists’ evaluations of CST outputs.
Methodology
Design: Online experimental study with a mixed design. Between-subjects factor: perceived source of CST information (AI vs psychiatrist). Within-subjects factor: information quality (correct vs incorrect) across four clinical note trials for a single hypothetical patient with moderate MDD and social anxiety. Each trial included two CSTs embedded in a dashboard: a clinical note summary and a treatment recommendation.
Participants: 83 psychiatrists/residents (42 assigned to AI-source condition; 41 to psychiatrist-source condition) recruited primarily from a large Canadian psychiatric hospital and four additional institutions; inclusion required treating adult patients.
Materials and Measures: Participants read four full clinical notes and corresponding CST outputs. They rated: (a) summary attributes (accuracy vs full note, usefulness, confidence using it, inclusion of important information) and (b) recommendation attributes (agreement with recommendation, confidence it was right). Ratings used 5-point Likert-type scales (higher = more favorable). AI familiarity was self-rated on a 1–5 scale post-experiment. Clinical expertise captured by job title, years practicing (0–5, 6–10, 11–20, >20), and patients per week.
Manipulation of information quality across trials: Trial 1: both summary and recommendation correct. Trial 2: summary correct; recommendation incorrect. Trial 3: summary incorrect; recommendation correct/incorrect? (note indicates omission of side effects and stressors with incorrect recommendation; overall across four trials two correct and two incorrect instances per CST). Trial 4: both summary and recommendation incorrect. Correct summaries contained the most relevant details; correct recommendations aligned with clinical guidelines for MDD; incorrect versions contained irrelevant details or guideline-inconsistent advice.
Procedure: After consent and eligibility confirmation, participants received instructions on interpreting the dashboard and CSTs, then completed four trials in fixed order. The content of CSTs was identical across source conditions; only the stated source (AI vs psychiatrist) varied. Blinding procedures prevented condition leakage. AI familiarity was rated at the end.
Statistical analysis: For each participant and trial, mean ratings were computed across attributes per CST type. Mixed-effects models (maximum likelihood) assessed effects of source, quality, and their interaction on mean summary ratings and mean recommendation ratings, with random intercepts for participants. Estimated marginal means compared conditions; interactions probed by stratifying on quality. Power target: ≥70 participants for 90% power to detect medium source effects. Secondary analyses tested interactions of information quality with expertise (years practicing) and with AI familiarity (continuous; restricted to AI-source subgroup). Exploratory analyses examined CST type by quality, source by resident status, and item-level attributes. Model diagnostics included residual checks. Analyses used R 4.1.1; data/code available online.
Key Findings
Sample characteristics: N=83 (AI n=42; Psychiatrist n=41). Majority residents (58%); most regularly treated depression/anxiety; median patients with depression per week ~10. Among completers (n=74), AI familiarity mean ~2.03 (SD 1.03) on 1–5 scale.
Primary effects: Information quality strongly influenced ratings.
- Summaries: Correct M=3.69 (95% CI 3.45–3.97) vs incorrect M=2.95 (95% CI 2.80–3.09); r=0.646, SE=0.107, p<0.001.
- Recommendations: Correct M=3.37 (95% CI 3.21–3.52) vs incorrect M=1.80 (95% CI 1.64–1.96); r=−1.380, SE=0.149, p<0.001.
Source effects: Human (psychiatrist) source received higher ratings than AI.
- Summaries: Psychiatrist M=3.48 (95% CI 3.30–3.66) vs AI M=3.07 (95% CI 2.85–3.25); z=4.16, SE=0.14, p=0.005.
- Recommendations: Psychiatrist M=2.73 (95% CI 2.56–2.90) vs AI M=2.44 (95% CI 2.26–2.61); z=0.481, SE=0.158, p=0.003.
Interactions:
- Summaries: No source x quality interaction (r=−0.016, SE=0.150, p=0.914): preference for psychiatrist-derived summaries regardless of correctness.
- Recommendations: Trend for interaction; stratified analyses showed source effect present for correct recommendations only: psychiatrist higher than AI (difference=−0.481, SE=0.159, p=0.003). For incorrect recommendations, source difference not significant (difference=−0.113, SE=0.161, p=0.486).
Clinical expertise: No consistent main effects or interactions with quality on recommendation ratings. For summary ratings, a quality x expertise interaction appeared due to one small subgroup (11–20 years) showing no significant difference between correct vs incorrect summaries; other groups showed expected differences (e.g., 6–10 years: 0.531, SE=0.218, p=0.016; >20 years: 0.515, SE=0.196, p=0.009). Overall, limited evidence that years practicing moderated effects.
AI familiarity (AI-source subgroup): Higher familiarity associated with slightly lower summary ratings (β≈−0.209, SE=0.097, p=0.036); no significant familiarity x quality interactions for summaries or recommendations.
Exploratory: Preference for psychiatrist-generated information was less pronounced for attributes requiring deeper comparison with the full note (accuracy, inclusion of important information) and absent for incorrect recommendations, suggesting potential heuristic vs analytical processing differences.
Discussion
Psychiatrists rated CST information more favorably when it was correct and when they believed it originated from another psychiatrist rather than AI, despite identical content across conditions. This indicates a general preference or bias toward human-derived CST information in depression care. The preference was robust for summaries irrespective of correctness but was observed for recommendations only when recommendations were correct; when recommendations were incorrect, source did not affect ratings, perhaps reflecting heightened scrutiny of potentially harmful advice. Item-level patterns suggest that tasks prompting deeper, analytical review (e.g., assessing summary accuracy against the full note) attenuated source-based preferences, whereas more intuitive judgments (e.g., agreement with correct recommendations) showed stronger human-source preference. Contrary to some prior studies, clinical expertise and AI familiarity did not meaningfully moderate quality effects, though greater AI familiarity corresponded to slightly more critical evaluations of AI summaries, possibly reflecting awareness of AI limitations. These findings imply that successful AI-CST integration must account for clinician biases favoring human input, encourage analytical engagement with AI outputs, and mitigate over- or under-reliance depending on context.
Conclusion
The study demonstrates that psychiatrists prefer human-derived CST information over AI-derived information, rating summaries and (when correct) recommendations from a presumed psychiatrist more favorably than identical outputs labeled as AI. Information quality had a strong, expected influence on ratings, and source-based preferences diminished when evaluation required deeper analysis or when recommendations were incorrect. Clinical expertise and AI familiarity showed limited moderating effects, aside from a small negative association between AI familiarity and ratings of AI summaries. Future research should examine clinician perceptions and behaviors in ecologically valid, simulated or real-world clinical settings, investigate mechanisms such as heuristic versus analytical processing, and test interventions (e.g., cognitive forcing strategies, timing or presentation changes) to promote critical engagement with AI-based CSTs and safe, effective integration into psychiatric care.
Limitations
- Ecological validity: Ratings were obtained in an online, controlled scenario with a hypothetical patient; generalizability to real-world clinical settings is uncertain.
- Behavioral outcomes: The study assessed perceptions, not actual clinical decisions or patient outcomes; impact on care remains unknown.
- Sample composition: Limited variability and small subgroup sizes (e.g., in years of practice) reduced power to detect moderation by expertise; AI familiarity variability was low.
- Fixed trial order and shared scenarios: The same sequence and quality manipulations across visits for all participants may introduce order or learning effects.
- Measurement constraints: Some text contains minor inconsistencies; exploratory analyses were not adjusted for multiple comparisons, increasing risk of Type I error.
Related Publications
Explore these studies to deepen your understanding of the subject.

