logo
ResearchBunny Logo
Universality, criticality and complexity of information propagation in social media

Social Work

Universality, criticality and complexity of information propagation in social media

D. Notarmuzi, C. Castellano, et al.

Discover groundbreaking insights into the universal nature of information propagation in social media, based on extensive analysis of nearly one billion events across various platforms by notable researchers Daniele Notarmuzi, Claudio Castellano, Alessandro Flammini, Dario Mazzilli, and Filippo Radicchi. Their findings reveal striking similarities in how information spreads, challenging preconceived notions about individual systems.... show more
Introduction

The study investigates whether information propagation in online social media exhibits universal, critical dynamics and which contagion mechanisms (simple vs. complex) best describe it. Social media generate bursty activity patterns similar to neuronal firing and earthquakes, where avalanches (bursts) of events are separated by low-activity periods. Prior studies reported power-law avalanche statistics but with highly variable exponent estimates across platforms and methodologies, casting doubt on universality and leaving uncertainty about appropriate theoretical models. Competing hypotheses posit simple contagion (single exposure sufficient; branching-process-like, with τ = 3/2, α = 2) versus complex contagion (multiple exposures required; exemplified by linear threshold models and the Random Field Ising Model, RFIM). This work compiles long-term, large-scale hashtag-based time series from six platforms (Twitter, Telegram, Weibo, Parler, StackOverflow, Delicious) totaling over 900 million events and more than 200 million time series, defines avalanches via a principled percolation-based temporal resolution, and tests for universal scaling, criticality, and the relative support for simple vs. complex contagion. The authors hypothesize that social media exhibit universal and critical avalanche dynamics and that both simple and complex contagion coexist, with semantic content correlating with contagion complexity.

Literature Review

Empirical analyses of online information cascades often find power-law-like avalanche size and duration distributions, but estimated exponents vary widely (e.g., τ between ~2 and ~4; α reported near 3.6 or 2.5), and correlations between size and duration sometimes fail to show power-law relations. Variability may stem from differing avalanche definitions (hashtags time series, reply trees, retweet chains) and sensitivity to temporal resolution. Theoretical models frequently assume simple contagion akin to disease spread (branching processes with mean-field critical exponents τ = 3/2, α = 2), yet several studies support complex contagion wherein multiple exposures are needed (e.g., linear threshold, RFIM). Network structure, competition, memory, and attention limits have been shown to influence cascade dynamics and criticality. Prior work typically focused on single platforms and short observation windows, limiting generalizability and cross-system comparisons.

Methodology

Data and time series construction: The authors build a time series for each (hash)tag: a sequence of timestamps {t1, t2, …} when the tag appears. Datasets include: Twitter (2,353,192,777 tweets; 10% random sample; Oct 1–Nov 30, 2019 via OSOME Decahose), Telegram (317,224,715 messages), Parler (183,062,974 posts), Weibo (226,841,249 posts), StackOverflow (46,947,635 questions/answers), Delicious (7,034,524 user actions). Timestamps are to-the-second, except StackOverflow (millisecond). Pre-processing ensures roughly constant event rates over the observation window. The corpus contains 206,972,692 time series with 905,377,009 total events.

Avalanche definition and temporal resolution selection: An avalanche is the maximal sequence of contiguous events where inter-event times are less than Δ. For an avalanche starting at ta with size S, duration T = t_{a+S−1} − ta. The optimal resolution Δ is chosen via a one-dimensional percolation mapping: for each time series, compute the size SL of its largest avalanche at a given Δ; across all series, define percolation strength Pc = ⟨SL⟩ and susceptibility χ = ⟨S_L^2⟩ − ⟨S_L⟩^2. The optimal Δ maximizes χ(Δ). Time series with a single event are excluded from Pc and χ computations. Values of Δ vary by platform (e.g., ~1500 s for Twitter, ~30,000 s for Telegram). Avalanche statistics exclude the largest avalanche per time series (as in percolation theory, the largest cluster follows different statistics).

Aggregate scaling analysis: With Δ set to the platform-specific optimum, compute P(S) and P(T) via logarithmic binning and the relation ⟨S⟩ ∼ T^γ. Estimate τ and α by maximum likelihood; estimate γ by linear regression on log ⟨S⟩ vs log T. Compare empirical exponents and scalings against mean-field BP and RFIM predictions; perform RFIM and BP simulations to assess finite-size and preasymptotic effects on exponent estimates and scaling functions.

Model definitions:

  • Branching process (BP): Discrete-generation process characterized by branching ratio n (critical at n ≈ 1). Finite-avalanche statistics follow power laws with exponents τ = 3/2, α = 2, γ = 2, with scaling functions depending on the reduced distance from criticality.
  • Random Field Ising Model (RFIM), mean-field, zero-temperature: Agents with intrinsic propensities h_i (Gaussian, variance R), all-to-all ferromagnetic coupling, and external field H. Starting from all inactive, H slowly increases; activation can trigger avalanches under condition H + h_j + Σ_k y_k > 0. The model is critical at R_c = √2/π ≈ 0.8. Exponents: τ = 9/4, α = 7/2, γ = 2; scaling functions differ from BP and induce strong preasymptotic corrections for P(T) and ⟨S⟩(T).

Model fitting and selection for individual time series: For each time series with at least 50 events and at least 10 avalanches, construct the empirical P(S) for avalanches with S ≥ S_min (typically S_min = 10; robustness checks provided). Build conditional model distributions Q_RFIM(S|R) and Q_BP(S|n) by discretizing parameter spaces (R in [0.025, 2.7] with step dR = 0.025; n in [0.02, 1.7] with step dn = 0.015) and averaging over 500 parameter samples uniformly drawn within each bin to account for parameter uncertainty. Evaluate log-likelihood L(P||Q) = Σ_{S≥S_min} P(S) log Q(S), equivalent to minimizing cross-entropy, with smoothing of Q to avoid numerical issues. Determine best-fit parameter by maximizing L.

Goodness-of-fit and model selection: Assign p-values following Clauset et al.: generate synthetic samples that mix model-generated avalanches for S ≥ S_min (with probability Z_emp/Z) and empirical data for S < S_min (with complementary probability), where Z is the fraction of avalanches with S ≥ S_min. Compute KS distances for synthetic vs. model and empirical vs. model; p-value is the fraction of synthetic samples with larger KS distance than empirical. A fit is not rejected if p ≥ 0.1 (robustness to threshold verified). If one model is rejected and the other not, select the non-rejected model. If both not rejected, choose by log-likelihood ratio; if both rejected, classify as “None.” The possibility of mixtures of models within a single time series is neglected. Validation on synthetic data confirms recovery of ground-truth model and parameters.

Key Findings
  • Universality across platforms: After selecting platform-specific optimal temporal resolutions Δ via percolation susceptibility maximization and rescaling Δ by each platform’s Δ*, percolation strength and susceptibility curves collapse across Twitter, Telegram, Parler, Weibo, StackOverflow, and Delicious, indicating universal behavior of information avalanches.
  • Critical avalanche statistics: Aggregate distributions of avalanche size P(S) and duration P(T), and the scaling of average size with duration ⟨S⟩ ∼ T^γ, follow power laws consistent with near-critical dynamics. Maximum-likelihood exponent estimates indicate τ compatible with the mean-field RFIM value (τ = 9/4). Apparent deviations of α and γ from RFIM expectations (α = 7/2, γ = 2) are explained by finite-size and preasymptotic effects observed in RFIM simulations; overall, data are compatible with RFIM rather than BP phenomenology.
  • Parameter ranges and criticality: Best-fit parameters for individual time series span wide subcritical to near-critical ranges for both models, but the majority of events originate from a minority of time series whose parameters lie close to the critical point. Nearly 20% of time series are within 5% of criticality; these account for 53% of all events.
  • Coexistence of contagion mechanisms: At the individual time-series level, a large majority are well fit by at least one model. Model selection splits series into two comparably sized classes, with a mild dominance of complex contagion (RFIM) over simple contagion (BP). Approximately half of time series are better explained by complex contagion.
  • Aggregate universality from mixture: Aggregating only RFIM-class time series yields RFIM-consistent power-law scaling across avalanche sizes. Aggregating BP-class series exhibits a crossover: BP-like scaling for small avalanches and RFIM-like scaling for large avalanches. The overall mixture produces universal distributions more compatible with RFIM.
  • Temporal resolution scales: Optimal Δ varies substantially by platform (e.g., ~1500 s for Twitter; ~30,000 s for Telegram), yet rescaled metrics collapse, reinforcing universality.
  • Semantics correlate with mechanism: Among popular Twitter hashtags, conversational topics (e.g., music, cinema/TV) tend to fall into the simple contagion (BP) class, while political/controversial or periodic topics tend toward the complex contagion (RFIM) class.
Discussion

The findings demonstrate that information propagation in diverse social media platforms exhibits universal and near-critical avalanche dynamics, addressing the core question of whether robust universality classes exist for online information diffusion. The aggregate scaling exponents and hyperscaling relations align with the mean-field RFIM rather than the branching process, suggesting that complex contagion—requiring multiple exposures and influenced by social reinforcement and external signals—is a key driver of large-scale dynamics. Nonetheless, model selection at the time-series level reveals a coexistence of simple and complex contagion processes, reconciling prior conflicting observations: many hashtags are well captured by simple contagion, while others, especially those tied to political or controversial themes or periodic exogenous drivers, reflect complex contagion. The semantic correlation hints that topic content and external forcing shape the effective mechanism of spread. These insights challenge predictive algorithms that rely solely on temporal signals without accounting for semantics or network effects; incorporating both could improve forecasting and intervention strategies. The evident universality across platforms suggests underlying mechanisms that transcend platform-specific features (e.g., feed algorithms, network structure), possibly rooted in collective critical dynamics; elucidating these mechanisms and leveraging them for prediction remain key open problems.

Conclusion

This study provides large-scale evidence that information avalanches in social media exhibit universal, near-critical behavior characterized by power-law size and duration distributions and hyperscaling. Aggregate statistics align with the RFIM universality class, indicating a prominent role for complex contagion. At the micro level, simple and complex contagion coexist, with semantic content correlating with the dominant mechanism. Methodologically, a principled percolation-based temporal resolution and a likelihood-based model selection framework enable quantifying proximity to criticality and classifying dynamics. Future research should: (i) uncover the generative mechanisms producing universality across platforms; (ii) integrate semantic analysis and network structure into predictive models; (iii) extend validation beyond the six platforms and beyond hashtag-based series; (iv) develop and test mixture or hybrid models capturing within-series heterogeneity; and (v) investigate causal drivers (endogenous vs. exogenous) of transitions between contagion regimes.

Limitations
  • Model scope: Each time series is fitted to a single model (BP or RFIM), neglecting the possibility that a single series may be generated by a mixture or switching between mechanisms, which could affect classification and parameter estimates.
  • Finite-size and preasymptotic effects: Deviations in α and γ from RFIM predictions are attributed to finite-size and scaling-function corrections; residual bias may remain, impacting exponent inference and model comparisons.
  • Avalanche definition and resolution: Results depend on the percolation-based temporal resolution Δ and the specific avalanche definition; while principled and robust, alternative definitions or resolutions could alter statistics.
  • Data selection and thresholds: Fitting requires series with ≥50 events and ≥10 avalanches and uses S_min (typically 10), potentially biasing toward more active hashtags and excluding sparse dynamics.
  • Platform coverage and semantics: Only six platforms and hashtag/topic identifiers are considered; generalization to other platforms, languages, content types, or non-hashtag propagation remains to be established. The semantic analysis of topics is qualitative.
  • Network structure: Mean-field RFIM fits suggest effective all-to-all interactions; explicit user network structures and platform algorithmic mediation are not directly modeled, which may limit interpretability for platforms with strong network effects.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny