logo
ResearchBunny Logo
A meritocratic network formation model for the rise of social media influencers

Social Work

A meritocratic network formation model for the rise of social media influencers

N. Pagan, W. Mei, et al.

Dive into the innovative world of directed network formation on social media! This research, conducted by Nicolò Pagan, Wenjun Mei, Cheng Li, and Florian Dörfler, reveals how users follow fellow creators based on the quality of their content. Get ready to uncover a model that predicts user interactions follow Zipf's law, supported by fascinating Twitch data!

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses how user-generated content quality drives the formation of directed online social networks and the emergence of influencers. Unlike traditional undirected friendship networks (e.g., Facebook), modern platforms (Twitter, Instagram, TikTok, Twitch) allow unilateral following and discovery via hashtags and search, leading users to optimize their followee lists for high-quality content. The authors pose two core questions: (i) how UGC quality relates to the rapid rise of influencers, and (ii) what properties the resulting networks exhibit. They provide empirical motivation from a longitudinal Twitter network of >6000 complex network scientists, showing users tend to raise their quality threshold when adding followees. This motivates a meritocratic, quality-based formation mechanism and sets the context for analyzing network-level consequences such as degree distributions, small-world features, and robustness to recommendation-like meeting processes.
Literature Review
Classical network formation models often emphasize social or topological incentives without explicit content quality. Stochastic Actor-Oriented Models and strategic network formation models rely on reciprocity, closure, or centrality and typically produce undirected, highly clustered networks. Random graph and preferential attachment models (Barabási–Albert) explain scale-free in-degree distributions via rich-get-richer dynamics but struggle to account for the rise of new influencers without prior popularity. The fitness model incorporates intrinsic node fitness but tends to connect early to high-fitness nodes, not capturing the observed temporal search for increasingly better content. Empirical contrasts show that directed UGC platforms (e.g., Instagram) have low reciprocity (~14%) and low clustering (<10%) compared to Facebook. The authors also discuss Zipf’s law as observed in many systems and highlight open questions on its origins beyond generic power-law behavior, motivating a meritocratic mechanism grounded in content quality.
Methodology
Model: Consider a directed network of N≥2 agents producing or consuming UGC on a common interest. Each agent i has a content quality attribute q_i (from any continuous distribution; only the ranking matters). At discrete time t=1,2,..., each agent i meets a distinct agent j sampled from a meeting distribution over all other nodes. The primary analysis assumes uniform meetings; numerical analyses also consider in-degree-based preferential attachment (to mimic recommendation systems) and a 50–50 mixed meeting process. Utility and update rule: Each agent i’s payoff is the maximum quality among its followees V_i(t)=max_{j in F_i^{out}(t)} q_j. Upon meeting j, i forms a link if q_j>V_i(t); otherwise the state is unchanged. There are no self-loops, and agents control only their outgoing links. Empirical motivation (Twitter): The authors construct a longitudinal data set of >6000 network scientists on Twitter. They order potential followees by in-degree rank and, for each user i, compute P_i (the fraction of new followees that exceed the rolling median rank of previous followees). Compared to a null model (temporal reshuffling), the empirical distribution shows a significantly lower median (0.436 vs. 0.5; KS p<1e-8), indicating users tend to seek increasingly higher-quality followees over time. Theoretical analysis: They prove almost sure convergence to an equilibrium under uniform meetings. They derive transient probabilities that a node i is followed by j after t steps and the transient expected in-degree. At equilibrium, they obtain a closed-form expected in-degree E[d_i]=N−1 for i=1 and E[d_i]=N/i for i≥2 (Zipf’s law in rank). They derive the average in-degree distribution across nodes, analyze deviations from pure Pareto or log-normal forms, and define an audience overlap metric O(i,j)=|F_i∩F_j|/|F_i|. For out-degree, they provide a recursive formula for P[d_i^{out}=d] and show the expected out-degree equals the (N−1)-th harmonic number. Empirical validation (Twitch): They crawled Twitch categories chess and poker (English-language), identifying stable top broadcasters and constructing quasi-bipartite follower–broadcaster networks. Crawling occurred hourly over one week (starting Sept 20, 2020), yielding 305 (chess) and 358 (poker) broadcasters; 690,917 (chess) and 708,443 (poker) unique users; and 1,450,403 (chess) and 1,739,712 (poker) directed ties. They analyze top-15 rank–in-degree relations, in-/out-degree distributions, small-world metrics, clustering coefficients, and audience overlap, and compare with theoretical predictions and simulations under uniform and mixed preferential-attachment meetings.
Key Findings
- Meritocratic search evidence: On Twitter data, the distribution of P_i has median 0.436 vs 0.5 under the null (mean±sd: 0.450±0.189 vs 0.489±0.173), rejecting randomness (KS p<1e-8) and supporting continuous search for higher quality. - Convergence: Under uniform meetings, the process reaches equilibrium almost surely. Numerical results show expected time to equilibrium grows rapidly with N; preferential-attachment-based meetings (pure or mixed) accelerate convergence. - In-degree (transient and equilibrium): Closed-form transient probabilities P[a_{ij}(t)=1] and E[d_i^{(t)}] derived. At equilibrium, E[d_1]=N−1 and for i≥2, E[d_i]=N/i (Zipf’s law). Thus, the top node expects twice the followers of rank-2, three times of rank-3, etc. Zipf’s slope is −1 in log–log. - Empirical Zipf on Twitch: Top-15 broadcaster followers vs. rank fit: chess slope −1.04 (Pearson −0.98, R^2=0.96, RMSE=0.16); poker slope −0.98 (Pearson −0.97, R^2=0.95, RMSE=0.17), closely matching theory. - In-degree distribution across nodes: Aggregated theoretical distribution resembles a power-law in the tail with estimated α≈−2.06±0.003 (p<1e−8) above d_min≈7, but deviates at low degrees where a log-normal fits better. Unlike Pareto/log-normal, the theoretical distribution is non-monotonic near the right tail (e.g., slightly higher mass near N/2) and predicts negligible probability near N−1 for nodes other than the top, capturing structured spacing among top influencers. - Robustness to recommendation-like meetings: With a 50–50 mixed meeting process, heterogeneity increases slightly, some early random advantages can be reinforced, yet the average quality–followers correlation persists and Zipf’s relation remains robust. Tail fits under mixed meetings: power-law α≈−2.11 (d_min≈9); log-normal fit μ≈1.12, σ≈1.05 (for comparison regions). - Out-degree distribution: Identical across ranks, non-monotonic with fast-decaying tail (gamma/Poisson-like). The expected out-degree equals the (N−1)-th harmonic number H_{N−1} (grows ~log N). Empirically on Twitch, out-degrees are concentrated: 99th percentile at d=15 (chess) and d=19 (poker); maxima 151 and 142, far below heavy-tail predictions and broadly consistent with the model’s fast tail decay. - Small-world properties: Average degree grows ~log_2 N; average node distance ≈5 at N=10^4; network diameter grows similarly to log_2 N, aligning with small-world behavior. - Clustering: Directed clustering coefficient remains small but non-vanishing, decreasing with N yet >10% up to N=10^6; marginally higher with mixed preferential-attachment meetings. - Audience overlap: At equilibrium, strong structured overlaps arise: all followers of any node also follow rank-1 (O(i,1)=1) and O(1,j)=1/j, with similar patterns across rows on average. Twitch data show qualitatively similar horizontal decay, with some row-wise deviations (e.g., slightly higher overlap for low-ranking nodes).
Discussion
The findings demonstrate that a simple meritocratic decision rule—forming links only when encountering strictly higher-quality content than currently followed—explains the rapid emergence of influencers and the macroscopic properties of directed UGC networks. The model yields Zipf’s law for expected in-degree by rank, capturing the structured spacing among top nodes and providing a principled, content-based mechanism for Zipf beyond generic multiplicative growth or pure degree-based attachment. It reproduces small-world characteristics and realistic directed clustering levels. The in-degree distribution’s deviations from pure Pareto/log-normal at the extremes align with the observed prominence structure among influencers, while the out-degree distribution’s fast decay matches practical constraints on following behavior. Importantly, the key predictions, including Zipf’s law, are robust even when introducing recommendation-like preferential meetings, because link formation remains quality-threshold-driven. Empirical analyses on Twitch strongly support the theory, especially the rank–in-degree Zipf relation and qualitative overlap patterns, indicating the model captures central dynamics of UGC-driven platforms.
Conclusion
This work introduces a meritocratic, quality-based mechanism for directed network formation that parsimoniously explains the rise and structure of social media influencers. The model predicts and empirically validates Zipf’s law for expected in-degree by quality rank, small-world properties, realistic directed clustering, and a fast-decaying out-degree distribution. It further shows robustness to recommendation-like meeting processes. Future research directions include: enriching update rules with additional sociological incentives (e.g., network closure), modeling multi-dimensional qualities for multiple interests, extending to growing networks to study influencer life cycles, analyzing spreading dynamics with emphasis on influencers, testing across additional platforms (e.g., Instagram, TikTok), leveraging longitudinal data to forecast emerging influencers, and systematically studying the interplay between user behavior and platform recommendation systems.
Limitations
- Empirical data constraints: Twitch crawling captured only users live-streaming within a one-week window, potentially underrepresenting less active broadcasters and biasing the left tail of degree distributions. The network is dynamic with users joining/leaving, whereas the model assumes a closed set during formation. - Model assumptions: Uniform meeting is an idealization; while mixed preferential-attachment meetings were tested numerically, real recommendation systems are more complex. The utility considers only the maximum followee quality; alternative payoff functions (e.g., averages) may alter exploration–exploitation tradeoffs. The assumption of distinct qualities simplifies analysis; ties were addressed only in supplementary materials. - Network structure: Twitch networks are quasi-bipartite (many viewers, few broadcasters). While compatible with the model, this differs from general social graphs with more reciprocal ties. Deviations in empirical out-degree (e.g., abundance of very low out-degree users) may reflect ongoing growth stages or recommendation effects not fully captured. - External validity: Validation focused on two Twitch categories (chess, poker); broader platform and topical diversity would strengthen generalizability.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny