Medicine and Health

Usability Comparison Among Healthy Participants of an Anthropomorphic Digital Human and a Text-Based Chatbot as a Responder to Questions on Mental Health: Randomized Controlled Trial

A. O. Thunström, H. K. Carlsen, et al.

Explore how emotive “digital humans” compare with a text-only chatbot for mental health support in BETSY — a randomized study finding the text-only interface had significantly higher perceived usability while EEG showed no physiological differences, and women reported more annoyance. Research conducted by Almira Osmanovic Thunström, Hanne Krage Carlsen, Lilas Ali, Tomas Larson, Andreas Hellström, and Steinn Steingrimsson.... show more

Introduction

The study investigates whether a digital human interface with anthropomorphic features differs from a traditional text-only chatbot in usability and emotional response for mental health support. Conversational agents have long history in psychiatry (eg, ELIZA, PARRY), and their use has increased substantially. Prior research suggests chatbots can alleviate mental health symptoms and offer accessible support. Digital humans—voice-controlled avatars with facial expressions—may influence user engagement and emotion, measurable through EEG (α and θ associated with relaxation/positive affect; β with stress/anxiety). Usability is commonly measured via the System Usability Scale (SUS-10). Some studies suggest text-based chatbots yield more positive interactions than anthropomorphic agents. This randomized controlled trial evaluates the usability and emotional impact of BETSY’s two interfaces in healthy volunteers.

Literature Review

The paper reviews evidence that rule-based and chatbot-based interventions can improve mental health outcomes (eg, panic disorder CBT support, reduced helplessness/social phobia). EEG has been used to assess emotional states and user experience during chatbot interactions, focusing on α, θ (relaxation/positive affect) and β (stress/anxiety), and γ (positivity/problem-solving). Studies show interface design impacts perceived usefulness, with anthropomorphic features potentially increasing emotional engagement but also risk of annoyance when systems underperform. SUS-10 is widely used for usability across platforms, with thresholds for acceptable and excellent usability. Prior work indicates mixed preferences: users often value chatbots’ availability but prefer human therapists; text-based chatbots may elicit more positive interactions than digital humans.

Methodology

Design: Randomized controlled trial with healthy volunteers to compare usability and emotional responses between a digital human interface and a text-only chatbot (BETSY). Participatory design informed chatbot development via public surveys and multidisciplinary workshops. Interventions: Two Swedish-language BETSY versions covering 24 mental health topics (eg, anxiety, depression, suicidality): (1) Digital human with voice interaction and facial expressions (Dialogflow + UNEEQ; infrastructure by Deloitte Digital and VGR-IT), and (2) Text-only chatbot (Itsalive.io) deployed to a closed Facebook R&D account. Alignment testing showed near-equivalent responses across versions, with one discrepancy noted. Recruitment and Participants: Volunteers recruited via Sahlgrenska University social media; inclusion: ≥18 years, no current mental health disorders, able to attend Gothenburg facility. Of 50 volunteers, 45 consented and were screened; GAD-7 ≥14 were excluded (none excluded). Randomization: Automated, double-blind allocation by an independent researcher to text-only (n=20) or digital human (n=25). COVID-19 Procedures: Sessions held June–November 2021 with protective measures; participants isolated in sanitized rooms; remote monitoring via nonrecordable camera. Prechat: Baseline measures included blood pressure (digital sphygmomanometer) and pulse (pulse oximeter) after 5-minute rest, demographics and prior chatbot/therapy experience questionnaire, and Visual Analogue Scale for Well-Being (VAS-W, 1–10). EEG Setup: MUSE headband (Interaxon) with sensors at Fp1, Fp2, Tp9, Tp10; connected via Bluetooth to Android phone using Mind Monitor app (no user registration or identifying data). Calibration with eyes closed; continuous EEG recording throughout up to 30-minute chat session. Postchat: SUS-10 administered to all; self-reported emotional states (relaxed, nervous, sad, annoyance, closeness). Digital human group additionally completed SUISQ-MR (9 items, 7-point Likert). Open-ended feedback collected (reported separately). EEG Analysis: Absolute band powers (log-transformed power spectral density) for δ (1–4 Hz), θ (4–8 Hz), α (7.5–13 Hz), β (13–30 Hz), γ (30–44 Hz) summarized as average dB per session via Mind Monitor’s online tool. Statistical Analysis: SPSS v28; chi-square for categorical variables; linear regression for continuous outcomes (SUS-10, SUISQ-MR, brain waves, positivity, GAD-7); normality assessed via skewness/kurtosis; t tests conducted as appropriate; significance level .05. Ethics: Approved by Etikprövningsmyndigheten, Sweden (DRN 2021-02771); conducted per Declaration of Helsinki; prototype tested only with healthy volunteers to avoid risk.

Key Findings

Sample and Demographics: No significant differences between groups in sex, marital status, education, occupation, or housing. Age ranged 24–68 years (only 12 reported age; excluded from advanced analyses). No exclusions due to high GAD-7 scores. - Usability (SUS-10): Text-only mean 75.34 (SD 10.01; range 57–90) vs digital human mean 64.80 (SD 14.14; range 40–90); significant difference favoring text-only (P=.01). Both interfaces rated average or above average usability. - Voice Usability (SUISQ-MR; digital human group): Mean 4.92 (SD 0.83; range 2.83–6.75), indicating very good voice interaction usability per prior benchmarks. - Self-Reported Emotions: Digital human users more likely to report nervousness (Yes/sometimes 26.1% vs 0%; P=.02). Annoyance rates similar across interfaces (text-only 47.4% vs digital human 43.5%; P=.80). Relaxation reported by majority in both groups (text-only 89.5% vs digital human 73.9%; P=.23). Closeness to BETSY reported by 35% (text-only) and 45.8% (digital human) (P=.46). Sadness reported only in digital human group (13.0%; P=.10). - Biometrics: No significant differences in presession or postsession pulse; blood pressure differences not significant (data not shown). VAS-W presession: 8.8 (SD 1.32) text-only vs 8.4 (SD 1.41) digital human (P=.33); postsession: 8.8 (SD 1.23) vs 8.3 (SD 1.27) (P=.14). - EEG: Data quality suboptimal due to movement; averages computed. Only average α wave activity differed significantly: higher in text-only group (97±27) vs digital human (82±24) (P=.03). δ and θ showed trends (P=.06 and P=.08), β and γ not significant. - Regression: In text-only group, SUS-10 positively associated with average α (β=0.196, SE=0.083, P=.03) and θ (β=0.212, SE=0.10, P=.05). SUS-10 positively associated with SUISQ-MR in digital human group (β=8.10, SE=2.976, P=.01). Other brain bands not significantly associated. - Gender Differences: Men less likely to report annoyance than women across both interfaces (text-only P=.03; voice P=.03). Prechat positivity higher in men (mean 8.16±1.50) vs women (6.81±2.30), reported P=.34 (not significant). No gender differences in closeness, relaxation, nervousness, or sadness.

Discussion

Text-only chatbot interface was perceived as more user-friendly than the digital human interface among healthy volunteers, though both achieved average or better usability benchmarks. The digital human’s voice interaction (SUISQ-MR) scored very good, suggesting promise for voice-based designs despite lower overall SUS-10. Emotional responses indicated more nervousness with the digital human and higher α activity in the text-only group, aligning with more relaxed states and correlating positively with usability in the text-only condition. Gender analyses revealed men reported less annoyance postchat, contrary to some prior studies on reactions to anthropomorphic interfaces. Feelings of closeness did not differ by interface, suggesting anthropomorphic features did not enhance perceived connection in this context. Findings support offering users a choice of interaction modality (text vs voice/anthropomorphic) to accommodate preferences and potentially optimize usability and comfort.

Conclusion

The text-only chatbot was perceived as more user-friendly (higher SUS-10) than the digital human, although both interfaces achieved average or above-average usability relative to prior mental health chatbot studies. Biometric measures did not differ significantly between groups, and α/θ activity correlated with higher usability in the text-only group. Gender differences showed men reported less annoyance and higher prechat positivity. SUISQ-MR indicated strong usability of the voice interface. Overall, mental health chatbots show promise across interfaces; providing multiple interaction options may improve user experience.

Limitations

Healthy volunteer sample; findings may not generalize to individuals experiencing active mental health symptoms, which can affect usability perceptions and cognitive performance. - Small sample size limits statistical power and generalizability; age data incomplete. - EEG data quality was suboptimal due to movement and signal sensitivity; conclusions from biometrics are limited. - Conversations not recorded/analyzed; study focused on usability rather than content effects. - Early-stage prototypes without access to modern large language models or advanced avatar/voice technologies (eg, GPT APIs, Metahuman), potentially limiting conversational variability and anthropomorphic quality. - COVID-19-era testing conditions may have influenced participant experience.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Usability Comparison Among Healthy Participants of an Anthropomorphic Digital Human and a Text-Based Chatbot as a Responder to Questions on Mental Health: Randomized Controlled Trial

A. O. Thunström, H. K. Carlsen, et al.

Psychology

Effect of a Cognitive Behavioral Therapy–Based AI Chatbot on Depression and Loneliness in Chinese University Students: Randomized Controlled Trial With Financial Stress Moderation

Y. Wang, X. Li, et al.

Psychology

Efficacy of an Internet-based self-help intervention with human guidance or automated messages to alleviate loneliness: a three-armed randomized controlled trial

N. Seewer, A. Skoko, et al.

Medicine and Health

Effectiveness of mindfulness-based interventions on burnout, resilience and sleep quality among nurses: a systematic review and meta-analysis of randomized controlled trials

J. Dou, Y. Lian, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny