logo
ResearchBunny Logo
A computational text analysis investigation of the relation between personal and linguistic agency

Psychology

A computational text analysis investigation of the relation between personal and linguistic agency

A. Simchon, B. Hadar, et al.

This research conducted by Almog Simchon, Britt Hadar, and Michael Gilead delves into how personal agency influences linguistic agency through computational text analysis. Discover how recalling power, social media interactions, and participation in support forums reveals a compelling link between personal and linguistic expression.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper investigates whether and how individuals’ personal sense of agency is reflected in their linguistic choices, particularly the use of passive voice as a marker of non-agentive language. Building on philosophical and psycholinguistic claims that language structure can influence and reflect perceived agency, the authors ask whether reduced personal agency (e.g., lower power, lower social rank, depression-related experiences) corresponds to more non-agentive language. The work is motivated by prior research showing that linguistic framing affects observers’ attributions of agency and blame, and by qualitative observations connecting non-agentive language with diminished control, yet lacks large-scale quantitative evidence linking speakers’ own psychological agency to their linguistic agency. The study aims to provide such evidence across experimental and ecological contexts.
Literature Review
The authors review theory and evidence relating linguistic agency (e.g., active vs. passive constructions) to perceptions of control and blame. Orwell’s writings and subsequent psycholinguistic studies suggest passive voice can diminish perceived agency. Prior work shows agentive framing increases attributions of blame and memory for agents, and other linguistic and paralinguistic cues signal power and agency. Qualitative analyses reported more passive language in narratives of chronic pain and during difficult therapy periods. Quantitative findings include reduced agentive language among individuals with OCD and increased physiological arousal when describing trauma in passive voice. Anthropological evidence from a Western Samoan village shows agentive language correlates with social status. The authors note gaps: limited quantitative, large-scale analyses directly linking individuals’ personal agency to their own linguistic agency, and call for broader, systematic approaches.
Methodology
Across studies, non-agentive language was operationalized as passive voice use, extracted via spaCy (accessed through the spacyr R package) by counting passive auxiliary verbs, including be-passives and got-passives. Validation involved independent human coding (100 texts per platform), correlating human-coded non-agentive instances with spaCy passive counts (Study 1: r=0.70; Study 2: r=0.54; Study 3: r=0.57; all p<0.001). Self-referential language was measured with LIWC2015 I-pronoun dictionary. Contextualized Construct Representation (CCR) using SBERT all-MiniLM-L6-v2 embeddings (via the text R package; 384-d vectors) quantified construct loadings for constrained sense of control (constrained items only), internal–external locus of control (anchored continuum from 23 non-filler items), and depression (CESD). Negatively keyed items were rephrased with explicit negation; item embeddings were averaged per construct. Statistical models primarily used negative binomial generalized linear models (GLMs), with covariates such as text length and source, and checks of model assumptions via posterior predictive checks and visualization. Study-specific details: Study 1 (experimental re-analysis): Data from Kasprzyk & Calin-Jageman replication of a power manipulation were re-analyzed. Participants from MTurk and Prolific were assigned to recall events of high power (power over others) or low power (others had power over them). After exclusions, combined N=835. Passive voice count was predicted by condition (high vs. low power), self-referential language, word count, and text source (MTurk/Prolific). Additional models examined self-referential language as the outcome. Study 2 (Twitter social rank): 26,473,715 original English tweets (US; Apr–Jun 2019) were collected and cleaned (removed links, tags, emoticons). Passive voice was extracted as in Study 1. Data were aggregated to the user level, yielding 2,726,733 users with average passive use, average self-referential language, average tweet length, and average follower counts (rounded). A negative binomial GLM predicted follower count from average passive voice, self-referential language, and tweet length, including their interaction. CCR analyses were conducted on a subsample of 100,000 tweets (81,606 unique users after dependency control) using robust negative binomial regression with bootstrapping (1000 iterations; 1000 users per iteration). Study 3 (Reddit depression forums): Study 3a collected 10,000 r/depression posts and 100 posts from each of 100 randomly selected subreddits (Nov 2019–Jul 2020) via Pushshift API. After preprocessing (removing links, emoticons, removed/deleted posts, empty texts, users with multiple or cross-condition posts), final N=8,690 (5,703 depression). A negative binomial GLM predicted passive counts by group (depression vs. control), controlling for word count; self-referential language and interactions were examined. Due to zero inflation and deviation from preregistered linear plan, a count model was used; a replication (Study 3b; preregistered) collected older posts (Nov 2016–Jul 2019), original N≈20,000, final N=9,685 (6,325 depression). Study 3c (replication with support controls) used a curated list of support-oriented but non-psychological subreddits (initially 94 via a ChatGPT-4 prompt; ultimately 70 support and 79 control subreddits) with data from June 2019 via BigQuery. After preprocessing, 8,255 unique posts were sampled per condition (depression, support, control), total N=24,765. CCR embeddings were extracted in Python using Sentence Transformers.
Key Findings
Study 1 (N=835): - Manipulation checks via CCR on textual responses showed expected differences between high vs. low power conditions: constrained sense of control higher in low power (t(828.62)=-2.28, p=0.023, d=-0.16), locus of control shifted (t(831.96)=6.18, p<0.001, d=0.43), and depression higher in low power (t(827.91)=-6.90, p<0.001, d=-0.48). - Main outcome: Low power was associated with a 65% increase in passive voice use, IRR=1.65, p<0.001, 95% CI [1.35, 2.02]. No credible effects of text source (IRR=0.94, p=0.498) or self-referential language (IRR=0.96, p=0.082). - Modeling self-referential language as outcome showed a 29% increase in the low power condition, IRR=1.29, p<0.001, 95% CI [1.22, 1.36]. - Validation: Human coding of non-agentive language correlated with spaCy passives, r=0.70, p<0.001. Study 2 (Twitter; 2,726,733 users): - Each additional passive auxiliary verb (user-level average) predicted a 46% decrease in followers, IRR=0.54, p<0.001, 95% CI [0.53, 0.55]. - Self-referential language negatively associated with followers, IRR=0.67, p<0.001, 95% CI [0.67, 0.68]. - Interaction: self-referential language moderated the passive–followers link, IRR=1.23, p<0.001, 95% CI [1.21, 1.25]. - CCR robustness checks on a subsample (81,606 users) yielded coefficients in the hypothesized directions, with 95% CIs including 1: constrained sense of control average IRR=0.33 (95% CI [0.04, 1.21]; 90% CI [0.06, 0.90]); locus of control average IRR=3.61 (95% CI [0.18, 18.89]); depression average IRR=0.31 (95% CI [0.04, 1.22]; 90% CI [0.05, 0.86]). - Validation: Human vs. spaCy correlation r=0.54, p<0.001. Study 3 (Reddit): - Study 3a (N=8,690): Depression forum associated with a 26% increase in passive voice use vs. controls, IRR=1.26, p<0.001, 95% CI [1.18, 1.35]; no main effect of self-referential language on passives (IRR=1.00, p=0.592) but a significant interaction (IRR=0.99, p<0.001). Self-referential language was more than doubled in depression vs. control, IRR=2.54, p<0.001, 95% CI [2.45, 2.62]. CCR loadings showed large differences in constrained control (d≈2.74), locus of control (d≈-0.66), and depression (d≈2.48), all p<0.001. - Study 3b (replication; N=9,685): Depression forum associated with a 42% increase in passive voice, IRR=1.42, p<0.001, 95% CI [1.32, 1.52]; no main effect or interaction for self-referential language on passives. Self-referential language again doubled, IRR=2.56, p<0.001. CCR results replicated with large effects (e.g., constrained control d≈2.77; depression d≈2.64). - Study 3c (support controls; N=24,765): Depression forum showed a 16% increase in passive voice vs. control, IRR=1.16, p<0.001, 95% CI [1.11, 1.22]; support groups showed a 10% increase vs. control, IRR=1.10, p<0.001. Planned comparison indicated higher passives in depression vs. support, IRR=1.05, p<0.001. Self-referential language: depression IRR=2.06, p<0.001; support IRR=1.07, p<0.001; depression > support IRR=1.93, p<0.001. CCR ANOVAs showed large group differences (e.g., constrained control η²G≈0.594; depression η²G≈0.533; all p<0.001). - Validation: Human vs. spaCy correlation r=0.57, p<0.001. Overall: Across experimental and large-scale observational contexts, reduced personal agency (low power; lower social rank; depression forum participation) corresponded to more non-agentive (passive) language and greater self-referential language, linking linguistic markers to psychological and social indicators of agency.
Discussion
The findings demonstrate that individuals’ linguistic agency reflects their personal agency. An experimental manipulation of power causally increased non-agentive (passive) language when participants recalled low-agency experiences. In large-scale naturalistic data, higher passive voice use predicted lower social rank (fewer followers) on Twitter, and posts in depression-related communities—characterized by diminished control—used more passive voice than controls, including support-oriented but non-psychological forums. These converging results indicate that subtle linguistic choices mirror psychological states related to control and influence, with implications for social perception, mental health monitoring, and understanding how language use relates to status and well-being. Exploratory analyses showed that lower personal agency also coincides with increased self-referential language, aligning with known links between I-language and depression and extending them to other low-agency contexts. Collectively, the studies provide quantitative support for the long-posed connection between personal and linguistic agency.
Conclusion
This work offers comprehensive quantitative evidence that personal agency and linguistic agency are interrelated. Across an experimental manipulation and ecological analyses of Twitter and Reddit, lower personal agency—operationalized as reduced power, lower social rank, or participation in a depression forum—was associated with greater use of passive voice and increased self-referential language. These results suggest linguistic markers can index meaningful psychological and social states. The authors highlight future directions, including cross-cultural investigations of how linguistic structures and cultural beliefs about control interact, and whether languages that emphasize agentivity foster more agentic beliefs and behaviors. The approach may also inform interventions that assess or alter linguistic framing to influence perceptions of agency.
Limitations
Studies 2 and 3 are correlational, limiting causal inference about whether linguistic agency affects social rank or vice versa. It is unclear whether observed associations reflect stable traits or transient states (e.g., temporary empowerment via online audiences; episodic depression). In Study 3, depression status is inferred from participation in r/depression without clinical diagnoses, making findings more circumstantial. The predictors (sense of power, follower counts, depression) share agency-related variance but may include other contributing factors (e.g., content quality, social status, broader symptomatology, situational influences) that affect language independently of agency. The research examines English in a single cultural-linguistic context, so linguistic markers of agency and their relations may differ across languages and cultures.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny