logo
ResearchBunny Logo
Persistent interaction patterns across social media platforms and over time

Sociology

Persistent interaction patterns across social media platforms and over time

M. Avalle, N. D. Marco, et al.

This groundbreaking research conducted by Michele Avalle and colleagues reveals persistent patterns of toxic content across social media platforms over nearly four decades. The study uncovers how human behavior shapes online discourse, highlighting that longer conversations can exhibit higher toxicity but do not always discourage participation. Discover the fascinating dynamics behind digital discussions!... show more
Introduction

The advent and proliferation of social media platforms have transformed online participation and become integral to daily life, serving as primary sources for information, entertainment and personal communication. Alongside benefits, platforms present challenges because their engagement-driven designs intertwine with complex social dynamics, raising concerns about polarization, misinformation and antisocial behaviours. A key obstacle to disentangling inherent human behaviour from platform effects is limited, platform-dependent data access that hinders clear causal attribution between design choices and observed behaviours. This study addresses that challenge by focusing on toxicity—defined operationally as rude, disrespectful or unreasonable speech likely to make someone leave a discussion—and by using a comparative, multiplatform, longitudinal approach to uncover invariant patterns of online conversations. The goal is to identify human-behavioural regularities across platforms, topics and time, and to test common assumptions about the evolution and impact of toxicity within discussions.

Literature Review

Prior work documents increased incivility online relative to face-to-face contexts, especially in news comment sections and political debates, where hostile language can undermine deliberation and foster polarized interpretations. Users often seek confirmatory content, forming echo chambers that can intensify polarization; the prevalence of such phenomena varies by platform, influenced by design and recommendation algorithms aimed at maximizing engagement. Research on online toxicity has grown with machine-learning advances for automated detection, yet studies often focus on single platforms or topics, limiting generalizability. This fragmentation complicates assessment of whether online discussions are inherently toxic and how toxic and non-toxic conversations differ. Additionally, debates persist about definitions of toxicity, the scope of abusive language, and the strengths and weaknesses of automated toxicity classifiers. These gaps motivate a broad, comparative analysis to clarify toxicity dynamics across platforms and over time.

Methodology

Data: Approximately 500 million comments spanning eight platforms (Facebook, Gab, Reddit, Telegram, Twitter, Usenet, Voat, YouTube) and multiple topics (for example, news, politics, conspiracy, vaccines, climate change, science, talk) over 34 years, from early Usenet (1989) to recent social media (2023). Table 1 in the article provides per-dataset counts of comments, threads, users, time ranges, and baseline toxicity fractions (generally <10%).

Conversation and activity measures: A conversation (thread) is defined as the chronologically ordered comments following an initial post. User activity is the number of comments per user; thread length is the number of comments per thread. The study characterizes macroscopic patterns (activity distributions and lifetimes) and introduces a participation metric over the course of a thread.

Participation metric: Threads are filtered to ensure sufficient length. Each thread is divided into equal chronological intervals (for example, 0–5%, 5–10%, …, of normalized thread length). Within each interval, participation is defined as the ratio of unique users to the number of comments, with 1 indicating each comment authored by a distinct user. Trends in participation over thread progression are assessed across datasets; Mann–Kendall tests (nonparametric) evaluate monotonic trends.

Toxicity detection: The Google Perspective API assigns toxicity scores in [0,1] to comments, with a threshold of 0.6 used to classify comments as toxic (per prior literature). Robustness checks include varying thresholds and employing alternative classification tools. Toxicity at the user level is the fraction of their comments classified as toxic; thread toxicity is the fraction of toxic comments within that thread.

Toxicity vs size and time: Conversations are grouped by length using logarithmic binning to examine how average thread toxicity varies with size. Statistical validation includes linear regression of binned trends and Mann–Kendall trend tests; randomization tests shuffle toxicity labels to compare observed slopes to null distributions (z-scores, typically ≥2 s.d.). Toxicity vs conversation lifetime (elapsed time from first to last comment) is analysed similarly.

Toxicity evolution within threads: For sufficiently long threads, the fraction of toxic comments is computed within each normalized interval and averaged across threads to obtain toxicity trajectories. These are compared to participation trajectories. Pearson correlations between participation and toxicity trends are computed per dataset. Threads are also split into toxic vs non-toxic sets based on each dataset’s distribution of long-thread toxicity (toxic if t ≥ μ(T_i)+σ(T_i)); participation trends are compared between sets, and correlations of participation between sets are reported.

Controversy and sentiment: For Facebook News, Twitter News, Twitter Vaccines, and Gab Feed, user political leaning l ∈ [−1,1] is inferred via endorsements (likes/upvotes) of news outlets with independently assessed leanings. Threads with sufficient labelled participation are assigned leaning labels per comment; controversy is quantified as the s.d. σ(l) of leaning across labelled comments in a thread. Relationships among controversy, thread size, and toxicity are analysed (correlations reported). Sentiment of comments is scored using a pretrained multilingual BERT model (Hugging Face NLPTown). For each comment c, a weighted mean sentiment s(c) = Σ x_i p_i (x_i ∈ [1,5]) is computed and normalized to [0,1] per dataset; correlations between mean sentiment dispersion and toxicity are assessed.

Endorsement vs toxicity: Mean likes/upvotes are used as an endorsement proxy and examined against comment toxicity scores (binned), testing whether endorsement increases with toxicity beyond the 0.6 threshold.

Engagement bursts and toxicity: Conversation temporal intensity (comments over time) is analysed using Kleinberg’s burst detection algorithm to identify peak activity levels. Fractions of toxic comments are computed for intervals before, during (peak), and after the highest intensity level; distributions are compared across datasets to assess shifts in toxicity at engagement peaks.

Robustness and controls: Analyses are replicated with alternative toxicity thresholds and classifiers, on additional datasets, with varying bin counts; shuffling of labels produces non-increasing trends, and observed slopes are significantly different from randomized baselines. Moderation policies and platform features are discussed in Methods; results are reported as invariant across platform types and topics.

Key Findings
  • Macroscopic invariance: Across all platforms and topics, user activity (comments per user) and thread size (comments per thread) follow heavy-tailed distributions. These macroscopic patterns are consistent regardless of platform features, moderation, user base, or topic.
  • Participation declines over thread progression: On average, the participation metric (unique users per comment interval) decreases as conversations unfold, indicating fewer users remain active while those who stay become more active. Mann–Kendall tests confirm decreasing trends for most datasets, with rare ambiguous cases (for example, Usenet Conspiracy and Talk show weak effects but negative regression slopes).
  • Low prevalence of toxicity overall: Despite some unmoderated platforms (Gab, Usenet, Voat) having higher toxic content, most datasets exhibit <10% toxic comments (Table 1). Extremely toxic users are rare (fraction between ~1e-4 and 1e-1), but most active users wrote at least one toxic comment; user toxicity follows a sharply decreasing (approximately exponential) distribution. Thread toxicity shows similar patterns.
  • Toxicity increases with conversation size: The fraction of toxic comments increases with thread length across nearly all datasets and topics. Linear regression and Mann–Kendall tests confirm statistically significant increasing trends; shuffling toxicity labels yields non-increasing trends, and observed slopes are ≥2 s.d. above randomized baselines. A notable exception is a decreasing trend for Usenet Politics. Results are robust to binning choices, alternative thresholds, and different classifiers/datasets.
  • Toxicity not tied to conversation lifetime: Toxicity versus conversation lifetime is mostly flat; no general association between toxicity and duration of discussions or user interaction lifetimes.
  • Toxicity does not systematically escalate during threads: Averaged toxicity within normalized intervals remains mostly stable over thread evolution, showing no distinctive increase toward thread ends.
  • Toxicity and participation are largely independent: While participation typically decreases, toxicity remains stable during conversations, and Pearson correlations between their trends are heterogeneous across datasets (no consistent pattern). Splitting long threads into toxic vs non-toxic sets (t ≥ μ+σ) shows highly similar participation patterns (strong positive correlations), and no significant differences in participation trend slopes, indicating toxicity does not, on average, deter participation.
  • Controversy correlates with toxicity and increases with size: In datasets with inferred political leaning (Facebook News, Twitter News, Twitter Vaccines, Gab Feed), controversy (σ(l)) increases with conversation size and its trend is positively correlated with toxicity trends, supporting a link between ideological disagreement and toxicity.
  • Sentiment dispersion relation varies by platform: On moderated platforms (Facebook, Twitter), greater discrepancies in sentiment are positively correlated with toxicity; on Gab, correlations are negative, suggesting platform governance moderates the sentiment–toxicity relationship.
  • Endorsement does not rise with high toxicity: Mean likes/upvotes vs toxicity is not increasing beyond the toxicity threshold (0.6), indicating endorsement does not amplify highly toxic content.
  • Toxicity peaks with engagement bursts: Using burst detection, the fraction of toxic comments is typically higher at the peak of engagement than before, and often higher than after; post-peak is usually higher than pre-peak. Distributions differ significantly in almost all cases, indicating toxicity likely rises with engagement intensity.
Discussion

The study addresses whether observed toxicity patterns are intrinsic to human interaction or driven by platform design by comparing conversations across multiple platforms, topics, and decades. The findings demonstrate robust invariance: heavy-tailed activity, decreasing participation over time, and a general increase in toxicity with conversation size appear consistently across settings. Critically, toxicity does not deter participation on average, nor does it systematically escalate during the course of a thread, contradicting common assumptions and even the operational definition used by the toxicity classifier. Instead, increased toxicity is closely associated with controversy—measured as ideological diversity—and with peaks in engagement intensity, suggesting that polarization and contentious exchanges, rather than toxicity per se, shape discussion dynamics. These insights imply that monitoring polarization/controversy and engagement surges may offer more actionable levers for early moderation and healthier discourse than simply removing toxic comments. The persistence of these patterns over three decades indicates that human behavioural drivers play a prominent role in online discourse irrespective of platform technologies and norms. Nonetheless, the dynamics are multifaceted, with platform moderation and governance influencing relationships among sentiment, endorsement, and toxicity; hence interventions should be nuanced and context-aware.

Conclusion

This work provides a large-scale, multiplatform, longitudinal analysis revealing persistent, invariant patterns in online interactions: heavy-tailed engagement, decreasing user participation as threads evolve, and increasing toxicity with conversation size. Toxicity neither reliably deters participation nor escalates over time within threads; rather, it aligns with controversy and engagement peaks. These results challenge assumptions about the direct impact of toxic content on engagement and suggest that polarization may be a more central driver of hostile exchanges. Practical implications include prioritizing the monitoring of controversy and early identification of engagement bursts for timely interventions, and designing moderation tools that account for context and conversation dynamics rather than isolated toxic utterances. The observed cross-platform homogeneity also suggests that models trained on one platform may transfer to others, encouraging unified approaches to detection and moderation. Future research should further disentangle controversy from other contributors to toxicity (topic specificity, influential users/trolls, temporal factors, demographics), incorporate passive user behaviour (exposure without posting), and deepen comparative studies to separate invariant human factors from platform-specific features in polarization, misinformation, and content consumption.

Limitations
  • Political leaning used as a proxy for broader ideological stance may not capture the full nuance of opinions; leaning inference was feasible only on select platforms (Facebook, Twitter, Gab), though these cover most of the data volume.
  • Breadth-over-depth approach may obscure thread-, platform-, or context-specific complexities, raising concerns of reductionism when homogenizing diverse datasets across time and platforms.
  • Passive user behaviours are unobserved; while toxicity does not appear to make active participants leave, it may still discourage non-participants from joining.
  • Small groups of highly toxic yet highly engaged users, although rare, could disproportionately influence dynamics in specific conversations.
  • Reliance on automated toxicity classifiers (Perspective API and others) entails known limitations and potential biases; while validated across thresholds and tools, classification errors may affect measurements.
  • Platform moderation, recommendation algorithms, and evolving social norms vary across datasets and time; although patterns appear invariant, residual confounding cannot be fully excluded.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny