Computer Science
Linking the network centrality measures closeness and degree
T. S. Evans and B. Chen
The paper addresses whether different network centrality measures encode redundant information, focusing on the relationship between degree centrality and closeness centrality. Centrality is widely used across domains to identify important nodes, yet many indices are strongly correlated, hinting at redundancy. Prior work reported inconsistent correlations across networks and conjectured that degree and closeness might be more related than other measures because both reflect proximity via direct ties. The authors hypothesize a specific non-linear relationship: the inverse of a node’s closeness (farness) is linearly dependent on the logarithm of its degree. If true, this would explain high but variable linear correlations observed previously and imply that closeness often provides little additional information beyond degree unless adjusted for degree dependence. The work is significant because it offers a simple, general explanation grounded in network distance structure and has practical implications for efficient centrality analysis at scale.
The study situates itself in a long tradition of centrality research from Bavelas, Sabidussi, and Freeman to modern network science texts, with applications in bibliometrics, transportation, biology, and social systems. Numerous studies have documented correlations among centrality indices, often using Pearson correlation, with mixed conclusions about the extent and consistency of overlap. Valente et al. conjectured degree and closeness might be more correlated than other measures, but found only moderate support. Schoch and others highlighted that relationships vary by network and that linear metrics may miss non-linear dependencies. The authors connect their results to established findings on network length scales (small-world O(ln N) distances), average path length, diameter, eccentricity, diffusion-based measures, and ring-based approximations for first-passage times, noting both alignments and distinctions with prior analytic results primarily derived for specific random graph models.
Analytical derivation:
- Consider an unweighted, connected simple graph. The distance between nodes is the shortest-path length. Closeness of node v is the inverse of the average shortest-path distance from v to all others.
- For each root node r, construct a shortest-path spanning tree T(r) (e.g., via breadth-first search). Conjecture: beyond the immediate neighborhood of r, branches of T(r) are statistically similar across roots.
- Assume the number of nodes at distance l from r grows exponentially with rate z, with local initialization by the root’s degree kr: n_r(l) ≈ kr z^(l-1) up to a cutoff L, and 0 beyond.
- Enforce a size cutoff by requiring 1 + sum_{l=1..L} kr z^l = N, giving L ≈ ln(N (z−1)/kr) / ln z for large N.
- Express inverse closeness as a distance-weighted sum over shells: 1/c_r = (1/(N−1)) sum_{u≠r} d_{ru} = (1/(N−1)) sum_{l=1..L} l n_r(l). Using the exponential form and eliminating L yields the predicted relation: 1/c_r = − (1/ln z) ln kr + β, where β depends on global network parameters and, in the crude cutoff model, β(z,N) = (z−1) + (1/ln z) + (1/ln z) ln N.
- In practice, β is treated as a free parameter alongside z to improve empirical fit.
Empirical evaluation on networks:
- Artificial models with average degree ≈ 10: Erdős–Rényi (ER), Barabási–Albert (BA, preferential attachment), and a configuration-model graph with the BA degree sequence (Config-BA). Networks generated using NetworkX. Sizes N = 1000, 2000, 4000; 100 realizations per case.
- For each graph, compute degree k and closeness c for all nodes; bin nodes by degree to obtain mean and standard error of mean 1/c at each k. Fit 1/c versus ln k using linear regression to estimate slope (−1/ln z^(fit)) and intercept β^(fit). Report goodness of fit via reduced chi-square and Pearson correlation ρ(c,k).
- Konect-SNAP real networks: 18 datasets spanning social, communication, citation, co-authorship, and hyperlink networks; analyze largest connected component; perform the same binning and linear fit procedure.
- Netzschleuder real networks: 112 datasets (largest component sizes up to 40,000 nodes, average degree < 300), selected automatically if downloadable and processable; same analysis; investigate effect of density (average degree divided by N−1) on fit quality; summarize success rates by density thresholds.
- Use c^(norm) = c_v / c_v^(fit) to identify node-level deviations after removing degree dependence.
Notes:
- The study focuses on unweighted graphs. Theoretical β from the sharp cutoff is compared to fitted β; fitted β performs better empirically. Potential extensions include improved cutoff models and incorporating second-degree k^(2) to refine shell estimates.
- Main relationship: The inverse of closeness depends linearly on the logarithm of degree: 1/c_r ≈ −(1/ln z) ln k_r + β. This non-linear degree–closeness link explains previously observed high but inconsistent linear correlations.
- Artificial networks (avg degree ~10): Linear fits of 1/c vs ln k show excellent agreement (reduced chi-square near 1). Typical mean inverse closeness values for a given degree are within about 2% of fitted predictions across ER, BA, and Config-BA models. Pearson ρ(c,k) ranged from ~0.65 to 0.94 depending on model and size. Fitted growth factors z^(fit) are of order a few to several units (e.g., ER around 4.3–4.8; BA and Config-BA around 3.6–4.2), reflecting bias toward high-degree nodes near roots in shortest-path trees.
- Konect-SNAP real networks (18 datasets): 11 of 18 (61%) had reduced chi-square < 2.0; 16 of 18 (89%) had χ^2 < 3.0. High Pearson correlations between c and k are common (e.g., many > 0.7), consistent with the proposed relationship.
- Netzschleuder real networks (112 datasets): Of 99 datasets with evaluable χ^2, 50 (51%) had χ^2 < 2.0 and 63 (64%) had χ^2 < 3.0. Success is strongly associated with sparsity: for density ≤ 0.04, roughly 67–70% of networks achieve χ^2 < 2.0 and ~80% achieve χ^2 < 3.0 (Table 3). Dense networks tend to yield poorer fits due to limited distance variation.
- Where the relationship struggles: very dense graphs; networks with strong structural inhomogeneities (e.g., bipartite structure, pronounced core–periphery “octopus” configurations, biased sampling such as snowball-sampled homicide network). Planar or Euclidean-embedded graphs (e.g., street networks) are expected to violate assumptions due to polynomial rather than exponential shell growth.
- Practical implication: On most networks, closeness offers little additional information beyond degree unless adjusted using the fitted degree–closeness relation. A normalized measure c^(norm) = c_v / c_v^(fit) highlights nodes more or less central than expected from their degree.
- Broader links: The analysis is consistent with small-world scaling (length scales O(ln N)); connects to average path length and ring-based first-passage analyses; parameters z^(fit) and β^(fit) provide compact network-level descriptors of exponential growth in neighborhood shells.
The findings support a simple generative picture: shortest-path trees from different roots share statistically similar branch structures beyond local neighborhoods, with exponential shell growth characterized by a common factor z. This yields a universal leading-order dependence of node-centric distance scales on ln k, translating to a linear 1/c versus ln k relation. The results rationalize high observed correlations between closeness and degree and explain their variability as arising from network-dependent z and β. Practically, if the relation holds, degree suffices as a fast proxy for closeness at first order. To extract meaningful additional information from closeness, one should first remove degree dependence using the fitted relation and analyze residuals (e.g., c^(norm)). The fitted parameters themselves (z^(fit), β^(fit)) offer interpretable network summaries (e.g., strength of exponential growth/small-world behavior) and enable comparisons across networks of different sizes and types. Failures of the relation are informative diagnostics: they signal deviations from the assumed homogeneous exponential growth, such as bipartite structure, dense cores, extreme assortativity, strong planarity, or sampling artifacts. Thus, deviations can reveal macroscopic structure or geometric constraints. The analysis also aligns with established theory on network length scales (average path length and diameter growing like ln N in many random graphs) and clarifies why closeness often appears redundant with degree in complex networks, while noting contexts (e.g., Euclidean-embedded graphs) where closeness provides distinct information.
The paper derives and validates a non-linear degree–closeness relationship showing that inverse closeness scales linearly with the logarithm of node degree. This relation emerges from a shortest-path tree approximation with statistically similar branches and exponential shell growth. It fits a wide range of model and real networks—especially sparse ones—often within a few percent error. Consequently, closeness is frequently redundant with degree unless its degree dependence is removed; normalized closeness relative to fitted expectations can then identify genuinely central outliers. The fitted parameters (z, β) compactly characterize network expansion and distance scales. Conversely, failure of the relation indicates strong structural inhomogeneities or geometric constraints. Future directions include refining the theoretical cutoff to improve β predictions, incorporating additional local shell information (e.g., second degree), and extending the framework to structured graphs (e.g., bipartite networks with separate odd/even growth rates), as well as exploring weighted, directed, and dynamic networks.
- Assumes unweighted, connected simple graphs; extensions to weighted or directed networks are not developed here.
- Relies on a shortest-path tree approximation with a sharp shell cutoff; the resulting β expression is crude and empirically inferior to treating β as a free parameter.
- Assumes homogeneous exponential growth in shells and statistical similarity of branches beyond local neighborhoods; violations (e.g., strong community or core–periphery structure, high assortativity) degrade performance.
- Performance declines in dense networks where distances have low variance and in Euclidean-embedded graphs (planar, geometric) with polynomial shell growth.
- Bipartite and other multi-mode structures are not explicitly modeled; these can produce poor fits unless adapted (e.g., separate growth rates for odd/even shells).
- Sampling biases (e.g., snowball sampling) can distort shell structure and invalidate assumptions.
- The approach is phenomenological rather than fully rigorous; z and β are fitted network-level parameters.
Related Publications
Explore these studies to deepen your understanding of the subject.

