logo
ResearchBunny Logo
Usability Comparison Among Healthy Participants of an Anthropomorphic Digital Human and a Text-Based Chatbot as a Responder to Questions on Mental Health: Randomized Controlled Trial

Medicine and Health

Usability Comparison Among Healthy Participants of an Anthropomorphic Digital Human and a Text-Based Chatbot as a Responder to Questions on Mental Health: Randomized Controlled Trial

A. O. Thunström, H. K. Carlsen, et al.

BETSY compared a digital human with anthropomorphic features and a text-only chatbot for mental health support, finding the text-only chatbot rated significantly more user-friendly while EEG showed no difference; women reported more annoyance. This research was conducted by Almira Osmanovic Thunström, Hanne Krage Carlsen, Lilas Ali, Tomas Larson, Andreas Hellström, and Steinn Steingrimsson.

00:00
00:00
~3 min • Beginner • English
Introduction
Conversational user interfaces (chatbots) have been used since the 1960s (eg, ELIZA, PARRY) and their use has grown rapidly with modern NLP systems. Prior research suggests chatbots can alleviate mental health symptoms and complement traditional care, including during and after the COVID-19 pandemic. A newer direction involves voice-controlled visual avatars (“digital humans”) with anthropomorphic features designed to enhance emotional engagement. Emotional responses are often assessed objectively using EEG (e.g., increased alpha/theta linked to positive/relaxed states; beta linked to anxiety/stress), while subjective usability is commonly measured with the System Usability Scale (SUS-10). Evidence on interface effects is mixed; some findings suggest text-based chatbots may elicit more positive interactions than digital humans, and design flaws can reduce perceived usefulness. This study investigates whether a digital human versus a text-only chatbot interface differs in perceived usability among healthy participants and explores effects on self-reported feelings and biometrics during mental health–focused conversations using BETSY. The aim was to compare usability between interfaces and assess emotional impact (self-report and EEG), closeness, and related outcomes.
Literature Review
Prior work on chatbots in mental health shows promise for symptom alleviation and well-being improvements, including randomized comparisons to self-help books and other digital supports. Studies report that availability and convenience drive perceived usefulness, though many users prefer human therapists and see chatbots as complementary tools. Anthropomorphic interfaces can influence emotional states and social engagement. EEG has been used to capture user experience and emotional reactions (alpha/theta: positive/relaxed; beta: anxiety/stress; gamma: positivity/problem-solving). Usability of chatbots is frequently evaluated with SUS-10; scores ≥68 are considered passing, and ≥85 excellent. Prior studies report varied SUS-10 outcomes for chatbots, with design flaws (repetitiveness, incoherence, limited understanding) often lowering scores. Some findings suggest text interfaces can yield more positive user experiences than anthropomorphic avatars, highlighting the importance of interface design and user expectations.
Methodology
Design: Randomized controlled trial with healthy volunteers comparing two BETSY interfaces: (1) digital human, voice-activated interface with anthropomorphic features; (2) text-only chatbot interface. Participatory development: A multidisciplinary team (psychiatrists, nurses, psychologists, a healthcare user, an engineer) conducted public surveys (via Psytoolkit.org) and four workshops (June–December 2020) to iteratively define design, content, and personality. Two Swedish-language versions were created covering 24 mental health topics (e.g., anxiety, depression, stress, sleep, addiction, eating disorders, hopelessness, loneliness, sadness, suicidality). Conversation logic for the digital human used Dialogflow connected to the UNEEQ platform; infrastructure was hosted by Deloitte Digital and VGR-IT. The text-only version used Itsalive.io, deployed to a closed R&D Facebook account. No personal metadata were collected. Participants and recruitment: Recruitment via Sahlgrenska University social media. Inclusion: age ≥18, no current mental health disorder, attend onsite testing in Gothenburg, Sweden. Exclusion: GAD-7 ≥14. Of 50 volunteers, 45 consented and were randomized (text-only n=20; digital human n=25). Testing occurred June–November 2021 under COVID-19 precautions. Randomization and blinding: Double-blind allocation overseen by an independent researcher via automated randomization. Prechat procedure: Participants were greeted under infection control. Baseline measurements after 5-minute rest included blood pressure (digital sphygmomanometer) and pulse (pulse oximeter). Participants wore a MUSE dry-sensor EEG headband (7 sensors: 3 frontal references; active at Fp1, Fp2, Tp9, Tp10) connected to an Android smartphone running Mind Monitor. A questionnaire collected demographics, prior chatbot/therapy experience, and presession well-being (VAS-W 1–10). The tester provided topic guidance; each session lasted up to 30 minutes. The tester monitored remotely via a nonrecordable camera. EEG recording: Calibration with eyes closed, then continuous recording during the chatbot session. Data stored anonymously as CSV on-device. Absolute band powers were derived from the power spectral density and transformed to a 0–100 scale for display; session averages (dB) were computed via Mind Monitor tools. Bands analyzed: delta (1–4 Hz), theta (4–8 Hz), alpha (7.5–13 Hz), beta (13–30 Hz), gamma (30–44 Hz). Postchat measures: Participants completed SUS-10 (0–100 scaled). Emotional state questions assessed relaxation, nervousness, sadness, annoyance, and closeness to BETSY. The digital human group also completed SUISQ-MR (9 items, 7-point Likert; higher indicates better usability). An open-ended feedback questionnaire was collected (qualitative results reported elsewhere). Postsession biometrics were recorded. Statistical analysis: Data analyzed in SPSS v28. Normality checked via skewness/kurtosis; t tests and linear regression used for continuous outcomes (SUS-10, SUISQ-MR, brain waves, positivity, GAD-7). Pearson chi-square tested categorical variables; significance threshold p=.05. Group-wise analyses were conducted. Ethics: Approved by Swedish Ethical Review Authority (Etikprövningsmyndigheten; DRN 2021-02771). Conducted per Declaration of Helsinki. To avoid potential harm in a prototype-stage system, patients and individuals with severe anxiety were not included.
Key Findings
- Participants: 45 healthy volunteers randomized (text-only n=20; digital human n=25). No significant demographic differences; no exclusions due to high GAD-7. - Usability (SUS-10): Text-only mean 75.34 (SD 10.01; range 57–90) vs digital human mean 64.80 (SD 14.14; range 40–90); difference significant (p=.01). Both interfaces rated average or above average; text-only closer to “good–excellent”. - Voice interface usability (SUISQ-MR): Digital human mean 4.92 (SD 0.83; range 2.83–6.75), indicating very good usability for voice interaction. - Self-reported emotions: Digital human group more likely to report nervousness (Yes/sometimes 26.1% vs 0%; p=.02). Relaxation, sadness, annoyance, closeness did not differ significantly between groups. - Positivity toward talking to BETSY: Similar between groups (text mean 7.1 [SD 2.1] vs voice 7.5 [SD 2.1]; p=.69). - Biometrics: No significant differences in pulse pre/post. EEG data quality suboptimal but sufficient for session averages. Only alpha differed significantly between groups: higher in text-only (mean 97 vs 82; p=.03). Delta (p=.06) and theta (p=.08) trends favored text-only; beta and gamma not different. - Regression with SUS-10 (text-only group): Positive associations with average theta (β=0.212, SE 0.10; p=.05) and alpha (β=0.196, SE 0.083; p=.03); trends with delta (p=.07) and beta (p=.10); gamma not associated. In voice group, SUISQ-MR positively associated with SUS-10 (β=8.10, SE 2.98; p=.01); no significant EEG–SUS associations. - Gender differences: Men reported less annoyance than women in both interfaces (chat p=.03; voice p=.03). Men reported higher prechat positivity on average (men 8.16 [SD 1.50] vs women 6.81 [SD 2.30]; reported p=.34). No gender differences in closeness or relaxation. - GAD-7: Low in both groups (text 2.32 [SD 2.52] vs voice 2.80 [SD 2.60]; p=.56).
Discussion
The study addressed whether interface modality (digital human vs text-only) influences usability and user affect during mental health–oriented chatbot interactions in healthy participants. Text-only BETSY achieved significantly higher usability (SUS-10), suggesting that a simpler, text-based interface may reduce cognitive load or expectation mismatch, yielding smoother interactions. Despite expectations that anthropomorphic features might enhance emotional engagement, users of the digital human reported more nervousness, and no advantages were observed in closeness or positivity. EEG results showed higher alpha activity in the text-only group and positive associations between alpha/theta and SUS scores in that group, consistent with more relaxed or less aroused states aligning with better usability experiences. However, EEG data quality constraints and small sample size limit strong inferences. Gender analyses revealed that men reported less annoyance than women across interfaces, contrary to some prior findings about reactions to female avatars. This could reflect expectation management or sample-specific factors. Overall, the results suggest interface choice can shape perceived usability and emotional responses, with text-only providing a more user-friendly experience in this context. For deployment in mental health support, offering users a choice of interface may optimize engagement and satisfaction.
Conclusion
The text-only BETSY interface was perceived as more user-friendly than the digital human interface, with both scoring average or above average usability. No substantial differences were found in biometrics aside from higher alpha activity in the text-only group. Men reported less annoyance than women across interfaces. Findings indicate promise for mental health chatbots regardless of interface, but text-only may currently offer a usability advantage. Future research should include larger and more diverse samples, participants with mild to moderate anxiety, enhanced EEG data quality, and evaluation of newer technologies (e.g., large language models, improved avatar/voice generation) that may reduce repetitiveness and improve conversational quality. Allowing users to select their preferred interaction mode (text or voice/avatar) may optimize user experience.
Limitations
- Sample: Healthy volunteers; results may not generalize to individuals in acute distress or clinical populations. Small sample size limits statistical power and generalizability. - EEG data quality: Suboptimal due to movement and signal sensitivity; interruptions affected signal continuity; limits interpretation of biometric findings. - Age data incomplete (only 12 reported), restricting analyses; limited age range representation. - Prototype constraints: Systems developed before widespread availability of LLMs and advanced avatar/voice technologies, likely affecting conversational variability and anthropomorphic fidelity. - Content not analyzed: Conversations were not retained; focus was on usability rather than therapeutic content effects. - COVID-19 context and remote monitoring protocols may have influenced user experience. - Exclusion of participants with severe anxiety limits insight into usability under higher emotional load.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny