logo
ResearchBunny Logo
An automatic speech analytics program for digital assessment of stress burden and psychosocial health

Medicine and Health

An automatic speech analytics program for digital assessment of stress burden and psychosocial health

A. M. Y. Chu, B. S. Y. Lam, et al.

Explore an innovative automatic speech analytics program (ASAP) that accurately assesses stress and psychosocial health among family caregivers. Researchers Amanda M. Y. Chu, Benson S. Y. Lam, Jenny T. Y. Tsang, Agnes Tiwari, Helina Yuk, Jacky N. L. Chan, and Mike K. P. So achieved a remarkable 72% accuracy in identifying stress levels. This technology presents a quicker and cost-effective solution for timely healthcare referrals.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses the rising global burden of psychosocial health issues, intensified by COVID-19, and focuses on family caregivers who are especially vulnerable to stress-related problems. It highlights gaps in access to psychosocial care and the limitations of conventional, time-consuming, and potentially subjective assessments. The study proposes early detection and intervention as public health priorities and examines whether an automatic speech analytics program (ASAP) can identify caregivers’ stress burden (low vs high) by analyzing linguistic content from responses to non-sensitive, family-resilience-related questions. The purpose is to evaluate ASAP’s feasibility, efficiency, and accuracy for initial psychosocial assessment in a real-world Cantonese-speaking caregiver population.
Literature Review
Recent NLP research has used traditional and deep learning methods to detect mental health issues (e.g., depression, suicide risk, stress, anorexia) primarily from social media data (Twitter, Facebook, Reddit). These approaches may exclude populations less active online (e.g., older adults) and often function as black boxes, offering limited interpretability for clinicians. The paper notes the importance of explainability and inclusivity, especially for non-dominant languages like Cantonese where common tools (e.g., LIWC) are not applicable. Prior work also supports using linguistic features such as word frequency patterns as indicators of disorders (e.g., depression, anxiety, schizophrenia), motivating a speech-based, interpretable topic-modeling approach.
Methodology
System design (ASAP): Five components: (1) Automated Speech Recognition (ASR) using Google Cloud Speech-to-Text; Cantonese transcripts were standardized (e.g., WAV), punctuations/whitespaces removed, and Cantonese-specific cleansing applied. (2) Text Pre-processing (TP) with PyCantonese for word segmentation and stop-word removal; Cantonese lacks whitespace tokenization and contains many multi-character word segments; symbols, numbers, and stopwords were removed. (3) TF-IDF calculation to weight terms and filter non-discriminative words via two rules: total TF-IDF weight of a word ≥ 0.5 and the ratio (max TF-IDF weight / total TF-IDF weight) ≥ 0.1 to eliminate both rare and overly common words. (4) Text Analytics (TA) via a topic ensemble: Phase 1 used Non-negative Matrix Factorization (NMF) to discover 10 topics per run across 70 random seeds (70×10=700 topics). Phase 2 used hierarchical clustering (complete linkage) as a consensus function to integrate ensemble members and form two clusters corresponding to stress levels. (5) Visual Analysis (VA) with symmetric diagrams and multidimensional scaling to visualize clustering and list salient keywords per topic. Study design and data: 100 Cantonese-speaking family caregivers were recruited from a Hong Kong nonprofit (HKSKH Lady MacLehose Center). Social workers screened for caregiver stress burden, verified using the validated Chinese Caregiver Burden Inventory (CBI; 24 items, 5 subscales; Cronbach’s alpha 0.95). A total score ≤36 indicated low stress; >36 indicated high stress. Final sample: 53 low-stress and 47 high-stress caregivers. Ethics approval: HKUST Human and Artefacts Research Ethics Committee (HREP-2021-0213). Interview protocol: 12 open-ended questions based on Walsh’s family resilience framework (belief systems, organizational patterns, communication patterns) to elicit non-sensitive narratives about family and support. Speech responses were recorded and processed through ASAP.
Key Findings
- Unsupervised clustering distinguished stress levels: Cluster A contained 38/53 low-stress caregivers; Cluster B contained 34/47 high-stress caregivers. Overall accuracy = 72% ((38+34)/100). - Visualization (MDS) showed separation between low-stress/Cluster A and high-stress/Cluster B. - Performance comparison (10-fold CV for supervised baselines): • Proposed ASAP (unsupervised): True Positive (low-stress) 71.70%; True Negative (high-stress) 72.34%; Accuracy 72% (balanced performance). • SVM (Linear): TP 100.00%; TN 2.00%; Acc 54%. • SVM (RBF): TP 79.71%; TN 41.08%; Acc 61%. • SVM (Sigmoid): TP 60.86%; TN 54.76%; Acc 59%. • Deep Learning (Word Embedding): TP 100.00%; TN 0.00%; Acc 53%. • Deep Learning (RNN): TP 71.24%; TN 43.19%; Acc 56%. - Feature selection sensitivity: Selecting the top 60% of words by mutual information yielded the highest accuracy (75%); using all words resulted in 72%, suggesting some interview-content words may be non-informative. - Topic/keyword differences: Low-stress caregivers exhibited more diverse and positive/relaxation-related topics and gratitude terms (e.g., 多謝 “thank you”, 開開心心 “happy”, 旅行 “travel”, 行街 “wandering”), whereas high-stress caregivers focused more on family-member-related terms and coping/logistics (e.g., 先生 “husband”, 爸爸 “father”, 電話 “phone”, 約出 “appointment”, 解決 “resolve”). More salient topics were identified in the low-stress group (eight) than the high-stress group (three).
Discussion
Findings support that analyzing Cantonese speech content with a topic-ensemble approach can reliably separate caregivers by stress burden using non-sensitive questions, addressing biases from stigma and subjectivity in traditional assessments. The balanced true positive/true negative rates indicate robustness across classes and outperform common supervised baselines likely due to noise reduction and topic-focused representation. The approach improves efficiency (shorter interviews, automated processing) and demonstrates feasibility for scalable, initial psychosocial screening. Differences in topic content between groups align with psychological expectations (low stress: gratitude, leisure; high stress: family-related concerns), reinforcing construct validity. The method’s reliance on TF-IDF rules and ensemble modeling provides interpretability and applicability to non-dominant languages with limited NLP tooling.
Conclusion
The study introduces ASAP, an automated speech analytics pipeline that can provide accurate (72%), efficient, and cost-effective initial assessments of caregiver stress burden from Cantonese speech. It demonstrates practical feasibility for digital psychosocial screening, potential scalability to broader mental health conditions, and adaptability to languages with fewer NLP resources. Future work should: (1) refine modeling to further improve accuracy; (2) optimize the number and content of interview questions to enhance efficiency; (3) extend auto-detection to additional psychosocial conditions (e.g., acute panic, anxiety, depression, social phobia, communication disorders); and (4) explore integration with large language models (e.g., ChatGPT/GPT-4) to augment detection and interpretability.
Limitations
- Outcome definition used a strict CBI cut-off (≤36 low vs >36 high), which may affect classification accuracy; further evidence is needed to validate this thresholding. - Translation of digital health tools into routine healthcare remains limited; ecosystem factors (human support, organizational processes) can hinder adoption despite demonstrated research utility. - Technological access barriers may limit digital health reach; although ASAP can operate via ordinary phone calls or in-person visits, broader access inequities persist. - Deep learning baselines underperformed likely due to small training sample size (90 documents in cross-validation folds), indicating data scale constraints for certain methods.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny