logo
ResearchBunny Logo
Language or rating scales based classifications of emotions: computational analysis of language and alexithymia

Psychology

Language or rating scales based classifications of emotions: computational analysis of language and alexithymia

S. Sikström, M. Nicolai, et al.

Discover groundbreaking research by Sverker Sikström and collaborators on how language-based responses outperform rating scales in classifying emotional states, even amidst challenges posed by alexithymia. Their findings shed light on the nuanced capabilities of narrative emotions.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses whether computational analysis of language-based responses provides higher validity and accuracy than traditional rating scales in classifying emotional states, and how alexithymia impacts such assessments. Standardized rating scales (e.g., DSM-5 based) are efficient and widely used for MDD and GAD, but may reduce complex mental states to unidimensional responses and face psychometric challenges. Language is the natural medium for communicating mental states and may better capture nuance. Alexithymia (~10% prevalence) involves difficulty recognizing and describing emotions and has known links to depression and anxiety. Competing theories (Multiple Code Theory, Referential Process) suggest potential deficits in translating emotional experience to language among high-alexithymia individuals, which could reduce accuracy of language-based assessments. Prior research shows promising correlations between language-based computational measures (e.g., LSA) and rating scales, and improvements in certain classification tasks, but effects on real participant-generated narratives and the role of alexithymia remain unclear. Hypotheses: H1, QCLA is more accurate than rating scales for classifying emotional narratives; H2, narratives written by high-alexithymia individuals are harder to classify than those by low-alexithymia individuals; H3, evaluators with high alexithymia will show lower accuracy in word- or rating-based classification than low-alexithymia evaluators.
Literature Review
Rating scales (e.g., PHQ-9, GAD-7) are standard for assessing MDD and GAD but may inadequately capture the complexity and individual variability of emotional states. Life satisfaction and harmony are important positive constructs inversely related to depression and anxiety and may enhance assessment when combined. Alexithymia is linked to deficits in cognitive-emotional processing and language expression, potentially reducing emotional vocabulary and nuance, although basic emotion labeling may remain intact. Multiple Code Theory and the Referential Process describe difficulties translating subsymbolic emotional experiences into symbolic language in alexithymia. Prior computational language studies (e.g., LSA-based semantic measures) can differentiate psychological constructs and sometimes outperform rating scales in specific tasks (e.g., classifying emotions in facial expressions). However, generalizability to participant-generated narratives and the impact of alexithymia on both generation and evaluation of emotional language had not been established.
Methodology
Design: Two-phase online study with separate samples. Phase 1 participants generated narratives (~5 sentences) about one of four emotional states (harmony, satisfaction, depression, anxiety), five descriptive words for the same state, and completed rating scales (HILS, SWLS, GAD-7, PHQ-9) and PAQ (alexithymia). Phase 2 participants read a Phase 1 narrative, generated five descriptive words summarizing the emotion, and completed the same rating scales for the author’s emotional state. Participants: Inclusion criteria: adult US residents, English as first language, age ≥18. Exclusion: no consent, incomplete survey, failed control questions, or not following instructions (e.g., too short, off-topic). Phase 1: 150 recruited, 34 excluded; analyzed N=116? (Paper reports N=348 Phase 1 overall for this study; training set combined N=732; core sample with alexithymia N=348). Reported demographics: Phase 1 included after exclusions N=348 total (78 female, 31 male, 5 nonbinary, 2 prefer not to say; age 20–73, M=39.7, SD=13.9). Phase 2: 250 recruited, 18 excluded, N=232 analyzed (150 female, 73 male, 7 nonbinary, 2 prefer not to say; age 18–79, M=32.66, SD=11.15). Materials: PHQ-9 (9 items, 0–3), GAD-7 (7 items, 0–3), SWLS (5 items, 1–7), HILS (5 items, 1–7), PAQ (24 items, 1–7). Narrative instructions asked for autobiographical text within last two months on assigned state, avoiding the target word. Five descriptive words also provided. Semantic questions for the constructs were previously validated. Procedure: Phase 1 (approx. 20 min): narrative, five words, scales (HILS, SWLS, GAD-7, PHQ-9, PAQ), demographics. Phase 2 (approx. 12 min): read one narrative, provide five descriptive words, complete scales for the author’s state. Recruitment via Prolific; compensation £2 (Phase 1) and £1.25 (Phase 2). Conducted in English via Qualtrics. Data analysis: Descriptive words were quantified via Latent Semantic Analysis (LSA). Corpus: 69,167 words total (6,630 unique) from 7,088 responses to similar five-word tasks. Constructed word-by-word co-occurrence matrix (context = response), log(1+x) normalization, SVD retained first 300 dimensions; word vectors normalized to unit length. A narrative’s five-word representation was formed by summing vectors and normalizing. Classification: Multinomial logistic regression classified into harmony, satisfaction, depression, anxiety. Models: (a) words only (semantic vectors), (b) rating scales only (total scores of PHQ-9, GAD-7, SWLS, HILS; also item-level analysis of 26 items), (c) combined words + rating scales. Evaluation used 10-fold nested cross-validation with 10% leave-out; folds grouped by narrative so all responses to a narrative were in either train or test to prevent leakage. Dimensionality (number of semantic dimensions) tuned within training folds (mean selected 33.8, SD 6.5). Training data included Phase 1 and Phase 2 of current study and an additional related dataset (N=348) using the same procedures (no alexithymia) to increase training size to N=732; evaluation of accuracy reported only on Phase 2 of current study. Additional analyses: multiple linear regression to predict empirical rating scales from words (cross-validated). A BERT-based model was also tested but did not outperform LSA in this context of context-free descriptive words. Word clouds generated using regression coefficients for individual words.
Key Findings
- Overall classification accuracy: words 62% correct vs rating scales 33% (baseline chance 25%). Item-level rating scale model (26 items) yielded 30% accuracy, similar to total-score model. Combined words + rating scales achieved 58.6%. - Statistical significance: word-based classification 62% was significant (χ²(1,231)=19.48, p=0.0000, φ=0.29). No significant improvement for words+RS over words alone. - By emotion (Phase 2; Table 2/4): • Words (semantic): sensitivity 0.67 harmony, 0.42 satisfaction, 0.75 depression, 0.71 anxiety; accuracies 0.85, 0.77, 0.77, 0.85; F1 scores 0.63, 0.54, 0.61, 0.71. • Rating scales: sensitivity 0.23 harmony, 0.20 satisfaction, 0.91 depression, 0.00 anxiety; accuracies 0.73, 0.58, 0.61, 0.74; F1 scores 0.24, 0.24, 0.53, – (precision 0 for anxiety). Confusion matrices showed fewer errors for word measures than rating scales. - Test–retest and prediction (Table 5): Phase 1–2 correlations for words exceeded rating scales for all constructs: PHQ-9 r=0.77 (words) vs 0.47 (RS); GAD-7 0.48 vs 0.19; SWLS 0.69 vs 0.17; HILS 0.77 vs 0.48 (all p<0.001). Predicted-from-words correlations with empirical ratings: PHQ-9 r=0.62; GAD-7 r=0.33; SWLS r=0.67; HILS r=0.47 (all p<0.001). - Alexithymia effects on narrative generation (Phase 1 PAQ split at median ≤68 vs >68): Evaluations in Phase 2 were more accurate for narratives from low-PAQ authors than high-PAQ authors: words 68% vs 55% correct, t(226)=2.03, p=0.04; rating scales 39% vs 26%, t(226)=2.10, p=0.04. - Alexithymia effects on evaluators (Phase 2 PAQ): No significant differences in correct classification between low and high PAQ evaluators for either words or rating scales. - Mean levels by PAQ (Table 6): Rating scale means differed by PAQ (e.g., SWLS higher in low PAQ: 18.31 vs 15.88, p=0.0024; HILS 13.50 vs 15.77, p=0.0153; GAD-7 15.79 vs 12.74, p=0.0011; PHQ-9 8.38 vs 11.24, p=0.0001), but corresponding word-based estimates did not differ significantly across PAQ groups. - Variability (Table 7): Some RS standard deviations differed by PAQ (e.g., HILS and GAD-7 larger SDs for low PAQ), but word-based estimate SDs showed no significant differences. - BERT did not outperform the LSA-based approach for these context-free descriptive word inputs.
Discussion
Findings support that computational analysis of language-based responses can more accurately classify emotional states than traditional rating scales across depression, anxiety, satisfaction, and harmony. Language-based measures also showed stronger test–retest reliability, suggesting greater stability. The advantage likely stems from language’s capacity to capture nuanced, multidimensional aspects of affect that unidimensional rating items may miss. Despite concerns that alexithymia impairs emotional expression, evaluator alexithymia did not reduce classification accuracy for word or rating assessments, indicating that QCLA is not sensitive to alexithymia in evaluators. However, narratives authored by individuals with high alexithymia were harder to classify, consistent with theories (Multiple Code Theory, Referential Process) positing challenges in translating emotional experiences into rich linguistic expression. Clinically, language-based assessments may enhance person-centered evaluation and allow visualization of constructs (e.g., word clouds), complementing rating scales. Although transformer models can capture contextual language, here LSA sufficed and performed on par or better given the context-free word inputs. Overall, results suggest language-based computational tools can improve assessment accuracy and reliability and are applicable even when alexithymia is present among evaluators, though high-alexithymia narratives may require additional support or data for accurate classification.
Conclusion
The study demonstrates that computational analysis of language-based responses outperforms rating scales in classifying narratives of depression, anxiety, satisfaction, and harmony, with higher accuracy and stronger test–retest correlations. Evaluator alexithymia does not affect accuracy, but narratives authored by high-alexithymia individuals are more difficult to classify. These results support integrating language-based computational assessments into mental health evaluation as a complement to traditional scales. Future research should test these methods in clinical settings against DSM-5 diagnostic standards, involve clinicians to establish inter-rater reliability and clinical validity, explore additional constructs (e.g., Big Five, personality disorders), and refine models to capture symptoms less likely to appear in spontaneous text. Language-based methods may aid both assessment and intervention (e.g., expressive writing) and serve as effective triage tools.
Limitations
Key limitations include: (1) non-clinical online sample limits generalizability to clinical populations; (2) alexithymia measured by self-report (PAQ), which may be biased; (3) online administration limits control over participant adherence despite attention checks; (4) Phase 2 texts were not rated by multiple evaluators per text for inter-rater reliability estimation and clinicians did not participate, limiting assessment of reliability and clinical validity; (5) spontaneous free-text may omit important symptoms (e.g., anhedonia), so language assessments should currently complement rather than replace clinician-administered rating scales.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny