Psychology

Using language in social media posts to study the network dynamics of depression longitudinally

S. W. Kelley and C. M. Gillan

Discover how the dynamic connectivity of depression-related language can reflect changes in mental health! Conducted by Sean W. Kelley and Claire M. Gillan, this study analyzes Twitter data to explore the intricate networks of depressive symptoms.

00:00

~3 min • Beginner • English

Index

Introduction

The study investigates predictions from the network theory of mental illness, which proposes that mental disorders like depression emerge from causal interactions among symptoms, forming positive feedback loops that sustain episodes. Prior work has suggested that higher symptom network connectivity may reflect vulnerability to sudden state transitions and persistence of depression. However, findings have been inconsistent, partly due to reliance on cross-sectional, between-subject analyses that may not capture within-person causal dynamics. The authors aim to test key predictions longitudinally within individuals: whether networks relevant to depression exhibit greater connectivity in people with higher current depression severity and whether connectivity increases during depressive episodes. To overcome the burden of intensive ecological momentary assessment, the study leverages archived Twitter language as a proxy for depression-relevant features, constructing personalized networks from linguistic markers previously linked to depression and comparing connectivity within versus outside self-reported depressive episodes over the past year.

Literature Review

Prior studies show that individuals with depression often have more strongly connected symptom networks than controls, and higher baseline connectivity can predict persistence and poorer outcomes in depression and other disorders. Some studies suggest increasing connectivity preceding episodes. Yet, other work—especially in adolescents or post-treatment contexts—has failed to replicate these effects or even found increases in connectivity after treatment. A proposed reason is the over-reliance on cross-sectional, group-level networks, which can diverge from personalized, within-subject networks. Two notable n=1 longitudinal studies (in depression and psychosis) qualitatively suggested connectivity rises during acute phases, but formal, larger-sample tests are lacking. Linguistic markers of depression—such as increased use of first-person singular pronouns, negative emotion words, negations, and swear words, and decreased positive emotion words—have been observed across speech, writing, and social media. These language features vary over time and may reflect underlying mental health changes, making them candidates for nodes in personalized networks to test network-theory predictions.

Methodology

Design and participants: 1,713 participants were recruited (1,395 via Clickworker for €2.5; 318 unpaid via public ads). Inclusion required age ≥18, ≥30 days of tweets, ≥50% English tweets, and passing an attention check. After exclusions (attention check failures and insufficient/low-English tweets), 946 participants were analyzed (mean age 29.6 years; 65.2% female; primarily from the U.K. and U.S.). 59.0% reported at least one depressive episode in the past year (mean 1.56 episodes; mean duration 104.1 days), and 45.7% reported physician-diagnosed depression. Procedure and measures: Participants provided demographics, their Twitter handle (to retrieve up to the most recent 3,200 tweets and 3,200 likes), and completed a depression measure (CES-D 8 for the first 263, Zung SDS for the remaining; scores standardized by mean/SD for a composite current depression severity). They retrospectively reported the dates of up to five depressive episodes in the prior year, defined as ≥2 weeks of near-daily low mood and loss of interest; episodes <2 weeks were recoded as not depressed, and episodes separated by <2 weeks were merged. Twitter text preprocessing: Analyses were restricted to tweets in the 12 months pre-survey. Removed reply symbols (@), hashtags (#), emojis, links/URLs, punctuation and non-alphanumeric characters (retained ., !, ?). Tweets were aggregated into daily bins per user. Daily text was analyzed using LIWC 2015 (∼6,400 words; 90 output variables). Days without tweets were omitted. Feature specification: Nine LIWC features were selected a priori based on prior depression literature: 1st person singular (24 words), 1st person plural (12), 2nd person (30), 3rd person (28) pronouns, negative emotions (744), positive emotions (620), swear (131), articles (3), and negations (62). For initial validity, each feature was averaged over the year and correlated with current depression severity. Outliers >3 SD from the group mean were removed for averaged features (~1.1% removed). Network construction: For each participant, time-series for the nine features were computed as daily proportions. Personalized networks were estimated using graphicalVAR (v0.2.4) with LASSO regularization, focusing on contemporaneous partial correlations (controlling lag-1 effects and other nodes). Hyperparameter gamma was set to 0 (favoring denser networks), with nLambda=10. Node strength (sum of absolute partial correlations incident to a node) and global network strength (mean node strength) quantified connectivity. Individual node strengths >3 SD from the node’s group mean were excluded (~2.5% omitted). Analyses: - Cross-sectional association (N=946): Tested associations between current depression severity and (a) averaged LIWC features, (b) global network strength, and (c) individual node strengths. Assessed edge reliability via split-half correlations. Evaluated distributional properties and adjusted for number of days as needed. - Within-subject episode analysis (N=286 with episodes): Constructed two networks per participant: within-episode and outside-episode, requiring ≥15 tweet-days in each period. Compared global and node strengths using within-subject regressions with episode (1=within, 0=outside) as predictor. Addressed unequal numbers of days via covariate adjustment, permutation tests (label shuffling within-subject, 1,000 iterations), Wilcoxon Signed Rank, and bootstrapping (80% resamples, 1,000 times). - Generalizability: From 87 LIWC features, identified those significantly associated with depression severity (uncorrected p<0.05; ~59%) as “depression-relevant,” versus “irrelevant.” Randomly sampled 1,000 sets of 9 features from each list (2,000 total networks) and estimated within/outside-episode personalized networks per participant. For each network, tested within-subject episode effects on global strength, then compared betas across relevant vs irrelevant sets via general linear regression. - Stability: Used bootnet (v1.5) to compute correlation stability (CS) coefficients for strength centrality via case-dropping bootstraps (up to 75% dropped; 1,000 resamples). Reported mean CS for all, within-episode, and outside-episode networks. Statistical software: R 3.6.1 with glm/lmer for regressions, qgraph for visualization, graphicalVAR for network estimation. Control analyses examined third-person node omission, exclusion of supra-categories, and number-of-days effects.

Key Findings

Sample and behavior: Of 946 participants, 558 (59.0%) reported at least one depressive episode in the past year. Those with an episode tweeted and liked more than those without (Tweets β=84.7, SE=38.4, p=0.03; Likes β=193.8, SE=70.0, p=0.006). Paid Clickworker recruits engaged less on Twitter than unpaid. Associations of averaged LIWC features with current depression severity (N=946; Table 1): - Negative emotions: β=0.14, SE=0.03, p<0.001 (positive association) - 1st person singular: β=0.17, SE=0.03, p<0.001 (positive) - 2nd person: β=0.08, SE=0.03, p=0.02 (positive) - Swear: β=0.11, SE=0.03, p<0.001 (positive) - Negations: β=0.12, SE=0.03, p<0.001 (positive) - 1st person plural: β=−0.11, SE=0.03, p<0.001 (negative) - Articles: β=−0.11, SE=0.03, p<0.001 (negative) - Positive emotions: β=−0.07, SE=0.03, p=0.03 (negative) - 3rd person: β=0.06, SE=0.03, p=0.06 (ns) Eight of nine features were significantly associated with current depression severity (all but 3rd person). Personalized network connectivity and depression severity: - Global network strength positively associated with depression severity: β=0.008, SE=0.003, p=0.002. - Node strengths increasing with severity: Negative emotions β=0.02, SE=0.007, p=0.007; Swear β=0.02, SE=0.007, p=0.009; Articles β=0.01, SE=0.003, p<0.001. - Mean network showed mostly weak positive connections, with a strong positive edge between negative emotions and swear words. Edge reliability split-half r=0.99 (r=0.97 excluding the dominant NegEmo–Swear edge). Global strength increased with number of days (β=0.00009, SE=0.00002, p<0.001) but was not associated with depression severity via number of days; effects remained after controlling for days. Within-subject changes during episodes (N=286 with ≥15 days within and outside): - Global network strength higher within-episode vs outside: β=0.03, SE=0.009, p=0.005; robust via Wilcoxon (V=16,840, p=0.009) and bootstrapping (distribution of within-episode betas >0, p<0.001). Permutation label-shuffle showed bias due to unequal days (β=0.007, SE=0.0002, p<0.001), but 99.3% of permuted betas were smaller than the real effect. After adjusting for number of days, the within-episode increase remained significant (p=0.02). - Node strengths higher within-episode (unadjusted): 1st person singular β=0.03, SE=0.01, p=0.03; 1st person plural β=0.04, SE=0.01, p=0.002; 2nd person β=0.03, SE=0.01, p=0.04; 3rd person β=0.04, SE=0.01, p<0.001; Articles β=0.04, SE=0.01, p=0.01; Negations β=0.05, SE=0.01, p=0.001. After adjusting for number of days, only Articles remained significant (β=0.03, SE=0.02, p=0.04). No feature showed a significant mean-level change within vs outside episodes, indicating connectivity changes were not driven by mean frequency shifts. Generalisability across networks: - Of 87 LIWC features, ~59% were associated with depression severity at p<0.05 (uncorrected). In 2,000 random 9-node networks (1,000 depression-relevant; 1,000 irrelevant), depression-relevant networks showed larger within-episode increases in global connectivity than irrelevant networks (β=0.01, SE=0.0005, p<2e-16). The a priori network’s within-episode effect lay within the relevant distribution. Features like Tentative and Time appeared frequently (30%) among the top 100 depression-relevant networks most sensitive to episode status. Stability: Personalized networks showed good stability for strength centrality (mean CS: all data 0.65; within-episode 0.50; outside-episode 0.64).

Discussion

Findings support key predictions of network theory in a large, longitudinal, within-subject context using linguistic proxies: individuals with higher current depression severity exhibited more strongly connected personalized networks of depression-relevant language, and within persons, network connectivity increased during depressive episodes compared to non-episode periods. These effects generalized beyond a single, preselected network, indicating that many depression-relevant linguistic networks show heightened connectivity during episodes, unlike networks built from depression-irrelevant features. This suggests that co-fluctuation among depression-associated language features intensifies in depressed states, consistent with the notion of increased vulnerability and reduced resilience predicted by network theory. While language features are not direct clinical symptoms, their dynamic interrelations appear to reflect changes in mental state. The approach demonstrates the utility of social media text as a scalable, objective, archival alternative to intensive EMA for testing theoretical predictions about psychopathology network dynamics. Clinically, personalized network insights might inform individualized targets if similar dynamics are confirmed for self-reported symptom networks, though current results are not intended for clinical decision-making and effect sizes are modest.

Conclusion

This study shows that personalized networks of depression-relevant linguistic features derived from Twitter are more strongly connected in individuals with greater depression severity and become more connected within-subject during depressive episodes. The results generalize across many alternate depression-relevant networks, reinforcing a central tenet of network theory regarding heightened connectivity during acute states. The work provides a proof-of-principle that large-scale, noisy, longitudinal language data can be used to study dynamic mental health processes that are otherwise difficult to capture. Future research should (1) test whether personalized networks of validated self-report symptoms show similar within-subject connectivity changes and whether effect sizes are clinically meaningful; (2) examine other, potentially more indicative language sources (e.g., texts, speech) and more sophisticated NLP features; (3) assess generalizability to diverse populations, languages, and clinical samples; and (4) explore how personalized network metrics could guide individualized prevention and intervention strategies.

Limitations

- Language features are proxies and cannot be directly mapped to clinical symptoms; LIWC captures proportions without context, missing irony and nuanced sentiment. - Twitter data are noisy and subject to impression management and platform-specific usage (work, promotion, venting); effect sizes are small. - Depressive episodes were retrospectively self-reported with a broadened definition (only two core symptoms required), potentially inflating episode rates and introducing recall error. - Sample is not representative: social media users and online workers differ demographically and have higher mental health problem rates than the general population; most participants were from English-speaking countries. - Generalizability to other languages/cultures is unknown; most NLP resources used are English-centric. - Personalized networks exhibited non-normal node distributions; implications for edge/centrality estimates under Gaussian assumptions are unclear. - Unequal numbers of days within vs outside episodes may bias connectivity estimates; analyses adjusted for this and used permutation/robust tests, but residual confounding is possible. - LIWC-based features may be less sensitive than more advanced NLP approaches; the chosen features favored methodological independence over maximal predictive power.

Related Publications

Explore these studies to deepen your understanding of the subject.

Political Science

Using the president's tweets to understand political diversion in the age of social media

S. Lewandowsky, M. Jetter, et al.

Medicine and Health

Development of prediction models for screening depression and anxiety using smartphone and wearable-based digital phenotyping: protocol for the Smartphone and Wearable Assessment for Real-Time Screening of Depression and Anxiety (SWARTS-DA) observational study in Korea

Y. Shin, A. Y. Kim, et al.

Interdisciplinary Studies

Second-order Citations in Altmetrics: A Case Study Analyzing the Audiences of COVID-19 Research in the News and on Social Media

J. P. Alperin, A. Fleerackers, et al.

Linguistics and Languages

The potential of emotive language to influence the understanding of textual information in media coverage

A. Absattar, M. Mambetova, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny