logo
ResearchBunny Logo
Persistent interaction patterns across social media platforms and over time

Social Work

Persistent interaction patterns across social media platforms and over time

M. Avalle, N. D. Marco, et al.

Dive into the dynamics of online discourse with this insightful study by Michele Avalle and colleagues. This research uncovers persistent patterns of toxic content across eight social media platforms over 34 years. Explore how human behavior shapes hostile interactions and discover the nuances of user participation amidst rising toxicity.... show more
Introduction

The study investigates whether toxic behaviour and engagement patterns observed in online conversations are inherent to human interaction or driven by platform-specific design and algorithms. Given the challenges of disentangling human behaviour from algorithmic effects due to limited and platform-embedded data access, the authors adopt a multiplatform, longitudinal perspective. They focus on toxicity—defined via an automated classifier—as a salient facet of online discourse, and ask: Are online discussions inherently toxic? How do toxic and non-toxic conversations differ? Do these dynamics persist across platforms, topics, and time? By comparing eight platforms across three decades and multiple topics, the work aims to identify invariant human patterns in online conversations and clarify common assumptions (e.g., that toxicity escalates over time or deters participation).

Literature Review

Prior research has explored polarization, misinformation, echo chambers, and antisocial behaviours online, often confined to single platforms or topics, limiting generalizability. The lack of transparent, comprehensive, multiplatform datasets has hindered causal interpretation of platform effects versus inherent behaviour. Advances in machine learning have enabled large-scale toxicity detection, yet definitions and detection reliability remain debated. Studies suggest that echo chambers and exposure to opposing views can influence polarization; incivility affects risk perception and judgement; and moderation and recommendation algorithms may shape online dynamics. However, evidence on the pervasiveness and evolution of toxicity across platforms remains fragmented, motivating this broader comparative analysis.

Methodology

Design: Comparative, observational analysis of conversation dynamics across time, platform, and topic, using approximately 500 million comments from eight platforms: Facebook, Gab, Reddit, Telegram, Twitter, Usenet, Voat, and YouTube, covering topics such as news, politics, conspiracy, climate change, science, vaccines, and Brexit (1989–2023). Data collection: Platform-specific strategies combined existing datasets and API/archives (Facebook Graph API from prior works; Pushshift for Reddit and Gab; Telegram Web/manual collection from themed channels; Twitter Academic API; Usenet Archive; Voat dataset from prior work; YouTube Data API). Dataset sizes and toxicity baselines are summarized in Table 1. Conversation/unit definitions: A conversation (thread) is a chronologically ordered sequence of comments following an initial post. Toxicity measurement: Primary classifier is Perspective API (definition: rude, disrespectful, or unreasonable content likely to make someone leave a discussion). Each comment receives a score in [0,1]; a threshold of 0.6 marks a comment as toxic. Robustness checks used alternative thresholds (0.5) and other classifiers (Detoxify; IMSYPP), with qualitative consistency across analyses. Participation metric: For sufficiently long threads (selected via normalized logarithmic binning of thread length; analyses often focus on [0.7,1] size interval), threads are divided into equal comment-count intervals (linear binning), and per-interval participation is computed as unique users divided by comments in that interval. Conversation size and toxicity: Threads are grouped via logarithmic binning by length to examine mean fraction of toxic comments versus size; significance assessed via linear regression, Mann–Kendall tests, and label-shuffling randomizations (z-scores vs randomized slopes; sensitivity to number of bins tested). Similar analysis conducted versus conversation lifetime. Conversation evolution: Toxicity and participation trends are computed over normalized comment positions across thread evolution. Pearson correlations between participation and toxicity trends are computed per dataset. Threads are split into toxic vs non-toxic based on dataset-specific long-thread toxicity distribution (label toxic if t ≥ mean + s.d.) to compare participation trends. Controversy and sentiment: Political leaning-based controversy is inferred for Facebook News, Twitter News, Twitter Vaccines, and Gab Feed by mapping user endorsements of news outlets (scored by MBFC and Newsguard) to leaning scores l ∈ [-1,1]; controversy per thread is the s.d. of leaning among comments with attributed leanings. Correlations between controversy and toxicity trends, and between sentiment dispersion and toxicity, are assessed. Sentiment is computed via a pretrained BERT model (Hugging Face) yielding weighted mean scores normalized per dataset. Endorsement vs toxicity: Mean likes/upvotes are examined as a function of comment toxicity (binned) to test whether endorsement increases with toxicity beyond the threshold. Engagement peaks and toxicity: Kleinberg burst detection is applied to comment time series (first 24 hours; sample up to 5,000 threads per dataset; exclude bot-like patterns and low-burst threads) to compare toxicity fractions before, at, and after peak engagement using Mann–Whitney U-tests with Bonferroni correction. Statistical analyses and validation: Mann–Kendall tests for monotonic trends; linear regressions with slope significance; Pearson correlations; permutation/shuffling for null trends; robustness across thresholds, classifiers, bin counts; multilingual support for toxicity classifiers; additional Usenet-specific validation by simulating false negatives to test trend stability under potential temporal language shifts.

Key Findings
  • Heavy-tailed dynamics: Across all platforms and topics, user activity (comments per user) and thread length (comments per thread) show heavy-tailed distributions; macroscopic patterns (activity distributions and lifetimes) are consistent across datasets.
  • Participation declines over thread evolution: User participation (unique users per comment interval) decreases as threads progress on nearly all datasets (Mann–Kendall significant decreasing trends; exceptions with ambiguous but negative-slope tendencies in Usenet Conspiracy and Talk).
  • Toxicity prevalence is generally low: Fraction of toxic comments is mostly below 10% per dataset (Table 1), with higher baselines in unmoderated spaces like Voat News/Politics (~0.19) and Gab Feed (~0.13). Extremely toxic users are rare (CCDFs show fractions between ~10^-4 and 10^-10), and most active users posted at least one toxic comment, indicating toxicity is not confined to a few users or threads.
  • Longer conversations are more toxic: The mean fraction of toxic comments increases with conversation size for nearly all datasets (linear regression and Mann–Kendall significant; robustness to bin choices and randomization; only Usenet Politics shows a decreasing trend).
  • Toxicity does not necessarily escalate during a thread: When plotting toxicity over normalized comment positions, average toxicity remains mostly stable rather than increasing near the end, contradicting the assumption that interactions inevitably devolve into toxicity.
  • Participation is largely independent of toxicity: Participation typically decreases while toxicity remains stable during thread evolution; Pearson correlations between participation and toxicity trends are heterogeneous across datasets. Participation dynamics are highly correlated between toxic and non-toxic thread sets (no significant difference in slopes), suggesting similar user behaviour irrespective of toxicity level.
  • Conversation lifetime shows little link to toxicity: Trends of toxicity vs lifetime are largely flat at both user and thread levels, indicating no general association with duration.
  • Controversy and toxicity are linked and increase with size: In datasets where political leanings could be inferred (Facebook News, Twitter News, Twitter Vaccines, Gab Feed), controversy (s.d. of leanings) increases with thread size and is positively correlated with toxicity trends (strong positive correlations in Facebook and Twitter; weaker/near-zero in Gab). Sentiment dispersion correlates positively with toxicity in Facebook and Twitter but negatively in Gab.
  • Endorsement does not increase with high toxicity: Likes/upvotes do not show increasing trends past the toxicity threshold (0.6), suggesting endorsement is not amplified by higher toxicity.
  • Toxicity peaks with engagement: Toxicity fractions are significantly higher at engagement peaks than before (and often after) in nearly all datasets (Mann–Whitney U with correction). Post-peak toxicity tends to exceed pre-peak toxicity as well.
  • Robustness: Core results hold with a lower toxicity threshold (0.5) and across alternative classifiers (Detoxify, IMSYPP), and persist across three decades and diverse moderation regimes.
Discussion

The findings indicate that key features of online conversational dynamics—heavy-tailed activity, declining participation over time, and the increase of toxicity with conversation size—are persistent across platforms, topics, and historical periods, suggesting they stem from invariant aspects of human behaviour rather than platform-specific design alone. Contrary to common assumptions, toxicity does not necessarily escalate as threads progress and does not systematically reduce participation, despite the classifier’s definition implying that toxic content could drive users away. Instead, toxicity appears associated with heightened engagement and the presence of opposing views (controversy), aligning with the idea that ideological disagreement and sentiment divergence can fuel hostile exchanges. These insights help disentangle human behavioural regularities from platform mechanics and imply that monitoring polarization/controversy could enable earlier, more effective interventions than focusing on toxicity alone. The results underscore the need for nuanced moderation considering conversational context and dynamics, rather than relying solely on static content labels.

Conclusion

This study offers a large-scale, multiplatform, longitudinal analysis showing that online conversational patterns and toxicity dynamics are remarkably consistent across platforms, topics, and time. It challenges assumptions that toxicity inherently escalates or suppresses participation, instead linking toxicity to engagement peaks and controversy. Contributions include: (1) identifying invariant, human-driven conversation patterns; (2) establishing that longer threads tend to be more toxic independent of platform; (3) demonstrating that toxicity is not a reliable predictor of participation drop-off; and (4) connecting controversy/sentiment divergence with toxicity. Future research should: (a) refine controversy and polarization measurements beyond political leaning; (b) develop context-aware moderation that anticipates toxic escalations via polarization signals; (c) examine the roles of trolls, influential users, subject matter, and temporal factors; (d) test cross-platform transferability of toxicity/polarization models; and (e) integrate conversation-structure context into automated toxicity detection to improve precision and generalizability.

Limitations

Key limitations include: (1) Political leaning is used as a proxy for broader ideological disagreement; it captures only part of opinion diversity and was inferable only for a subset of platforms (Facebook, Twitter, Gab). (2) Breadth vs depth trade-off: Comparing heterogeneous datasets across platforms and eras may reduce contextual nuances specific to each discussion or community. (3) Passive users are not observable; toxicity may still deter potential participants from joining even if it does not drive active participants away. (4) Automated toxicity detection has known limitations and potential biases (context insensitivity, annotator/cultural bias, difficulty with subtle/implicit abuse), though results were validated across classifiers and thresholds. (5) Historical language shifts (e.g., Usenet) may affect classifier performance; dedicated sensitivity analyses suggest minimal impact on core findings. (6) Potential influence of small, highly toxic, highly engaged groups cannot be fully excluded. (7) Endorsement metrics (likes/upvotes) are imperfect proxies for approval and may vary by platform norms.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny