Computer Science
Friendship paradox biases perceptions in directed networks
N. Alipourfard, B. Nettasinghe, et al.
The study investigates how the friendship paradox in directed social networks biases individuals’ perceptions of how common traits or opinions are. While friendship paradox is known to skew comparisons in undirected settings, many online networks (e.g., Twitter) are directed, with asymmetric information flow. The authors ask: under what structural and attribute conditions do directed networks cause systematic over- or underestimation of trait prevalence? They formalize variants of the friendship paradox for directed graphs, introduce global and local perception bias measures, and analyze when and why perceived prevalence deviates from true prevalence. The work is motivated by real-world misperceptions (e.g., overestimating risky behaviors) and aims to both quantify these biases and propose a polling method that exploits the paradox to estimate true prevalence more efficiently.
Prior work documents systematic perceptual biases and pluralistic ignorance in social settings and links them to network structure. The classic friendship paradox (Feld, 1991) and its empirical consequences (e.g., coauthors’ h-index paradox) show that, on average, one's friends are more popular than oneself. Generalized versions demonstrate that any attribute correlated with degree can be misperceived, producing effects like the majority illusion and underestimation of minority sizes. In directed networks, multiple friendship paradox variants have been identified and formalized using vector norms. On polling, studies show expectation-based and network-informed polling (e.g., social sampling, neighborhood expectation polling) can outperform intent polling by aggregating local neighborhood information. Concepts such as inversity (distinct from assortativity) characterize when local friendship paradox exceeds global in undirected graphs; the present work extends analogous insights to directed networks and arbitrary binary attributes.
The authors develop a theoretical framework for directed networks G = (V, E) with in-degree d_i and out-degree d_o. They define three node sampling schemes: (i) random node X (uniform over V), (ii) random friend Y (nodes sampled proportional to d_o), and (iii) random follower Z (nodes sampled proportional to d_i). They derive four variants of the friendship paradox in directed networks. Two hold universally: random friends have more followers than random nodes on average, and random followers have more friends than random nodes; their magnitudes depend on the variance of degree distributions. Two additional variants require positive correlation between nodes’ in- and out-degrees: random friends have more friends than random nodes, and random followers have more followers than random nodes. They model a binary attribute f ∈ {0,1} and define global prevalence as E{f(X)} and perceived prevalence among friends as E{f(Y)}. Global perception bias B_global = E{f(Y)} − E{f(X)} equals Cov(f(X), d_o(X)) divided by average degree, showing that attribute–out-degree correlation drives bias magnitude; it increases with the standard deviation of out-degree and decreases with average degree. They also analyze E{f(Y)} − E{f(Z)} as the expected difference in opinions between observed and observer nodes. To capture individual-level perceptions, they define local perception q(v) as the fraction of v’s friends with the attribute and local perception bias B_local = E{q_f(X)} − E{f(X)}. They introduce an attention model A(v) = 1 / d_i(v), reflecting divided attention over more friends. They express E{q_f(X)} via expectations over uniformly sampled edges and show that B_local equals B_global if and only if f(U) and A(V) are uncorrelated along random edges. Sufficient conditions for positive local bias (overestimation) are: (i) Cov(f(X), d_o(X)) ≥ 0 (popular nodes more likely to have the attribute) and (ii) Cov(f(U), A(V)) ≥ 0 (nodes with the attribute tend to be followed by high-attention followers). They characterize cases where B_global and B_local differ in sign depending on the correlation between attribute and follower attention. They relate these results to inversity in undirected networks by setting f = d and show their condition generalizes known results. Empirical data collection: From Twitter (2014), starting with 100 seed users active around California’s 2012 ballot initiatives, they expanded to 5,599 seed users by retrieving accounts they followed, then collected all outgoing links from seed users and posts by seed users and their friends (over 600K users) from June–November 2014, totaling over 18M hashtags. Seed users are fully observed (their activity and feeds via friends’ posts); friends are partially observed (their activity). They compute degree statistics for seed users and evaluate perception biases for hashtags treated as binary attributes. Polling algorithm: They propose follower perception polling (FPP). Sampling: select b individuals as random followers (nodes sampled proportional to in-degree, equivalent to sampling random edges and taking their endpoints). Query: ask sampled individuals for the fraction of their friends with the attribute (their perception). Estimator: \hat{f}{FPP} = (1/b) Σ q_i(v). They show analytically that Bias(\hat{f}{FPP}) = B_global and derive an upper bound on Var(\hat{f}_{FPP}) in terms of attribute–out-degree correlation and spectral properties (second largest eigenvalue λ2) of the degree-discounted bibliographic coupling matrix B_d = D_o^{-1/2} A D_i^{-1} A^T D_o^{-1/2}. They also describe an unbiased variant based on weighted social sampling (less practical operationally). For practicality when exact follower sampling is infeasible, they propose a heuristic: sample a random node and ask for a random follower nomination, which performs comparably in supplemental experiments. Synthetic polling experiments: They extract a subgraph of 5,409 Twitter users (non-zero in- and out-degrees) and estimate prevalence for the 500 most frequent hashtags using IP (intent polling on random nodes), NPP (node perception polling on random nodes), and FPP (perception polling on random followers), comparing squared bias, variance, and MSE across sampling budgets (e.g., b = 25 and b = 250).
Theoretical findings:
- In directed networks, two friendship paradox variants always hold: random friends have more followers than random nodes; random followers have more friends than random nodes. Two additional variants hold when in- and out-degrees are positively correlated: random friends have more friends than random nodes; random followers have more followers than random nodes.
- Global perception bias B_global = E{f(Y)} − E{f(X)} is proportional to Cov(f(X), d_o(X)) divided by average degree; thus any positive correlation between attribute and out-degree causes overestimation of prevalence. The bias increases with out-degree variance and decreases with average degree.
- Local perception bias B_local refines B_global by incorporating follower attention A = 1/d_i. Sufficient conditions for B_local ≥ B_global ≥ 0 are: Cov(f(X), d_o(X)) ≥ 0 and Cov(f(U), A(V)) ≥ 0 along random edges, reflecting popular attribute holders followed by high-attention users. Conditions for disagreement in signs of B_global and B_local are analytically characterized via the sign and magnitude of Cov(f(U), A(V)). Empirical findings (Twitter 2014 dataset):
- Degree stats for seed users: average degree d = 123.55; Var{d_o(X)} = 30,096.16; Var{d_i(X)} = 24,338.66; Cov{d_o(X), d_i(X)} = 14,226.32 with correlation ρ = 0.52. Friend averages: E{d_o(Y)} = 367.14; E{d_i(Y)} = 238.68. Follower averages: E{d_i(Z)} = 320.54; E{d_o(Z)} = 238.68. These confirm all four paradox variants in this subgraph.
- Among 1,153 popular hashtags (each used by >1,000 users), local perception bias distribution is skewed positive: 865 hashtags show positive B_local (overestimation). Many hashtags linked to social movements, memes, current events, sports, and entertainment are strongly overestimated (e.g., #ferguson perceived E{q_f(X)} = 12.1% vs true E{f(X)} = 3.1%, about 4× overestimation).
- Some hashtags are underestimated (negative bias), including conventions for gaining followers or retweets (e.g., #oscars, #tcot, #quote, #rt), often due to negative Cov(f(X), d_o(X)) and/or negative Cov(f(U), A(V)). Cases exist where B_global and B_local have opposite signs (e.g., several political hashtags such as #sotu, #occupy, #marriageequality). Polling performance:
- At sampling budget b = 25 (~0.5% of nodes), FPP estimates are biased by B_global but exhibit substantially lower variance than IP and NPP, yielding lower MSE for most hashtags. As b increases, the performance gap narrows; at b = 250 (5% of nodes), FPP outperforms IP in >80% and NPP in >55% of hashtags by MSE.
- The variance bound for FPP depends on λ2 of the degree-discounted bibliographic coupling matrix; for the Twitter data λ2 = 0.5984. The derived bound is conservative (loose) but valid across 503 hashtags evaluated.
The results demonstrate that directed network structure can strongly distort individual perceptions of attribute prevalence. When attribute holders tend to be popular (higher out-degree) and are followed by attentive users (low in-degree, hence higher per-friend attention), local perception bias is amplified, making attributes appear far more common than they are. This helps explain overestimation of risky behaviors or salient topics in social media feeds and connects to phenomena like the majority illusion. Empirical analysis on Twitter shows many topics are perceived as multiple times more prevalent, which could in turn accelerate diffusion of behaviors and hashtags by providing social proof. The proposed FPP polling method operationalizes these insights, exploiting the friendship paradox to reduce estimator variance by querying better-informed observers (random followers). Despite an inherent bias equal to B_global, the variance reduction usually yields lower MSE than standard polling methods at practical sample sizes. The spectral and degree-correlation dependencies clarify when FPP will be most effective. The study suggests mitigation strategies such as modifying local topology or attention allocation (e.g., link recommendations or feed design) to increase information exposure of low-attention users, potentially reducing bias. Homophily and preference biases in link formation may contribute to the observed conditions, indicating avenues for interventions.
This work formalizes how the friendship paradox manifests in directed networks and how it biases perceptions of attribute prevalence. It introduces global and local perception bias measures, derives conditions under which biases arise and differ, and relates them to degree distributions, attribute–degree correlation, and follower attention. Empirical analysis of Twitter demonstrates substantial positive local bias for most popular hashtags and provides concrete examples of over- and underestimation. The follower perception polling (FPP) algorithm leverages the paradox to achieve lower mean-squared error than traditional polling by trading bias for significant variance reduction, with a variance bound tied to spectral properties of a degree-discounted bibliographic coupling matrix. Future research directions include designing network or platform interventions to mitigate perception bias, tightening variance bounds, developing practical unbiased estimators with low variance, and addressing sampling constraints and representativeness in large-scale directed networks.
The empirical study relies on a subgraph of Twitter seeded from specific users, which may not be representative of the full network. Only outgoing links from seed users were collected, leaving follower information for non-seed nodes unobserved and potentially biasing degree and attention estimates. Some analyses (e.g., spectral bounds) depend on assumptions such as connected, non-bipartite degree-discounted bibliographic coupling. The FPP algorithm assumes non-zero in- and out-degrees and, in its exact form, access to link-level sampling or full network structure; the proposed heuristic mitigates but does not eliminate this requirement. The variance bound is loose, and results reflect 2014 data, which may limit temporal generalizability.
Related Publications
Explore these studies to deepen your understanding of the subject.

