
Sociology
Leading countries in global science increasingly receive more citations than other countries doing similar research
C. J. Gomez, A. C. Herman, et al.
This research, conducted by Charles J. Gomez, Andrew C. Herman, and Paolo Parigi, unveils how disparities in citation flows reveal a troubling global inequality in scientific knowledge. Through their innovative 'citational lensing' framework, they highlight the alarming trend of leading countries receiving disproportionate citations while peripheral nations struggle to gain visibility.
~3 min • Beginner • English
Introduction
The paper addresses international inequality in the visibility and recognition of scientific work. It posits that national scientific infrastructures, reputations, and resources create systematic advantages for some countries, leading them to receive more citations than expected based solely on the subject matter similarity of their research. The authors propose that misalignments between the flow of citations and the similarity of research topics reflect distortions in global knowledge production and visibility, motivating a framework to quantify these disparities and assess their implications for inclusion, knowledge incorporation, and scientific progress.
Literature Review
The study builds on the science-of-science tradition that uses citation networks and text analysis to represent idea flows and map scientific structure. Prior work shows that citation networks and textual similarity often diverge, with more or fewer citations than expected based on textual similarity. Bibliometrics has typically evaluated both against external criteria, while science-of-science research treats misalignment as evidence that models of diffusion need both citations and text. The paper situates these perspectives in an international context, arguing that misalignments among countries carry practical significance as they blend quality, visibility, and national reputation effects.
Methodology
The authors introduce 'citational lensing' by modeling science as a multiplex network with three layers for each field and publication year t: (1) L_citation, an international citation network where edge l_ij is the number of citations received by papers from country i (published in year t) from country j over a five-year window (t to t+5). Edge weights are standardized as z-scores, with citation inflation deflated by converting future-year citations to an exchange-rate equivalent for the publication year. (2) L_text, an asymmetric text similarity network derived from abstracts and titles of papers with English-only abstracts (and a robustness check including translated non-English abstracts). Text processing uses RAKE to extract unigrams–trigrams and remove academic stopwords. A nation-labeled LDA (NL-LDA) is fit per field-year corpus to estimate national signature distributions over terms for each country. Directed similarity between countries is computed using Kullback–Leibler divergence KLD(c_i || c_j), interpreting lower information loss as higher similarity. KLD values are transformed by taking the negative of their z-scores to align with the direction of similarity (higher means more similar). To compare with citation flows from j to i, the transpose of L_text is used. (3) L_distortion (the citational well) is defined as L_citation − L_text (using the transposed text layer), capturing over- or under-recognition relative to expected citations given textual similarity. Country-level distortion is summarized via in-degree in L_distortion. Data span ~20 million papers across ~150 fields from 1980–2012 in the Microsoft Academic Graph (MAG). Fields use MAG’s six-tier scheme (primarily the second-highest level) and are grouped into four broad areas: biomedical/behavioural/ecological; engineering/computational; physical/mathematical; social sciences. Analyses include: (a) counts of countries represented over time; (b) network regression (semi-partialing QAP) regressing L_text on L_citation yearly by field, run for (i) all countries (core + periphery) and (ii) core-only countries (Western Europe, East Asia—China, Japan, South Korea—United States, Canada, Australia, New Zealand, Singapore, Israel); (c) estimation and visualization of average national distortion over time and by field area; (d) comparisons of distortion in 2000 vs 2012 by region (Europe; Asia; Africa & Middle East; Latin America & Caribbean) and field area; (e) shares of overcited vs undercited countries by core/periphery and field area in 2000 and 2012; and (f) regional maps of average 2012 distortion. Robustness checks address citation inflation, inclusion of journals regardless of tenure since 1980, and English-only vs translated abstracts. Topic cohesion diagnostics for NL-LDA nation labels are computed via a modified UMASS measure and percentile ranks, with sensitivity analyses excluding lower-cohesion countries.
Key Findings
- Participation: The number of countries represented in both international text similarity and citation networks increased across fields from 1980 to 2012, though growth tapers just before 2012. Adjusting for the number of papers modestly reduces variance: country-related text similarity variance decreases 19% (0.16 to 0.13), and citational distortion variance decreases 12.5% (0.08 to 0.07).
- Citations–text alignment (QAP): In 2012, a one-standard-deviation increase in citations is associated with a 0.228 SD higher KLD-based similarity for all countries (N=135; 95% CI: 0.226–0.231). For core-only countries, the association is stronger: 0.312 SD (N=139; 95% CI: 0.309–0.315). Over time, alignment for core countries is consistently high and slightly increasing, while alignment for all countries (core + periphery) is lower and weakens over time.
- Citational distortion (L_distortion): The United States is consistently the most central and highly overcited relative to textual similarity across fields and over time; other overcited core countries include Germany, the Netherlands, the United Kingdom, and Japan. China transitions from undercited in the 1980s/early 1990s to overcited in the 2000s, approaching many Western European countries.
- Growing inequality: The gap in average distortion between core and periphery countries widens over time, with core countries increasingly overcited and periphery countries increasingly undercited.
- Field differences: Inequality (core–periphery gap) is most pronounced in the physical and mathematical sciences, followed by engineering/computational and biomedical/behavioural/ecological sciences; the social sciences show a more recent emergence of the gap.
- Stability over time: Comparing country averages in 2000 vs 2012 shows strong clustering near the parity line (Pearson r≈0.659; 95% CI: 0.545–0.749; P<2.2×10^−16), indicating most countries remain overcited or undercited across both years, with limited mobility in status.
- Shares of over/undercited countries (2000→2012): Among core countries, the percentage overcited increased in biomedical/behavioural/ecological (9.68%→22.73%), engineering/computational (13.64%→22.73%), and social sciences (2.27%→9.09%); physical/mathematical remained steady. Among periphery countries, the undercited share rose in biomedical/behavioural/ecological (40.32%→43.55%), engineering/computational (39.83%→45.76%), and social sciences (36.9%→42.86%); physical/mathematical roughly steady. Group-average distortion magnitudes do not necessarily track representation shares.
- Regional patterns: In 2012, regional averages and maps highlight notable overcitation for China in Asia and Brazil in Latin America, while many countries in Africa & the Middle East and parts of South America are near parity, receiving citations proportionate to textual similarity.
Discussion
The findings indicate a persistent and growing misalignment between recognition (citations) and research similarity for many countries, especially between core and periphery. This complements prior evidence of China’s rise in global science by showing a concurrent increase in overcitation relative to textual similarity, while some European countries (e.g., Netherlands, Switzerland) do not uniformly mirror trends found in elite researcher-focused inequality studies. The pronounced inequality in the physical sciences is notable given those fields’ established evaluation norms, suggesting systemic reputational or infrastructural advantages drive recognition beyond topic similarity. The observed stratification implies inefficiencies in knowledge circulation: under-recognition of peripheral countries limits integration of ideas and underutilizes human capital, potentially dampening novelty and innovation. Citational lensing offers a diagnostic to identify countries whose scientific visibility exceeds or lags what would be expected based on subject matter, enabling assessment of national science policy effectiveness and the roles of quality, funding, and reputation. The framework may generalize to reputation dynamics among journals or universities and to innovation systems (e.g., patents), providing a tractable tool to study recognition distortions beyond nation-states.
Conclusion
The paper introduces citational lensing, a multiplex network framework contrasting international citation flows with asymmetric text similarity to quantify over- and under-recognition across countries. Using MAG data (1980–2012) across ~150 fields, the study documents growing inequality: core countries (notably the United States, several in Western Europe, Japan, and increasingly China) are overcited relative to textual similarity, while many peripheral countries are undercited, with limited mobility over time. The contribution is both methodological—establishing a scalable, adaptable measure of recognition distortion—and empirical—revealing stratified global attention that may impede efficient knowledge diffusion and innovation. Future research should refine text similarity measures, better isolate causal drivers (quality, visibility, funding, reputation), extend analyses to institutions/journals/patents, and use citational lensing longitudinally to evaluate policy impacts and interventions to improve inclusion and knowledge integration.
Limitations
- Measurement noise in text similarity: Abstract-based NL-LDA topic signatures and KLD provide a noisy proxy for subject matter; alternative text models could yield different similarity estimates.
- Language and coverage: Primary analyses use English-only abstracts (with robustness to machine-translated abstracts), potentially biasing representations for non-English scholarship.
- Journal selection: Main analyses restrict to journals present since 1980; including all journals yields similar trends but selection may affect representation.
- Citation inflation and normalization: Although deflation and z-scoring are applied, residual temporal/field-specific citation dynamics may remain.
- Quality controls: Only rudimentary controls for research quality and visibility (e.g., journal tenure) are used; unobserved confounders (funding, collaboration networks, prestige) may drive distortions.
- Observational design: Results are descriptive; causality regarding national policies or reputations cannot be inferred.
- Stability of core/periphery classification and field taxonomy may influence comparative patterns.
Related Publications
Explore these studies to deepen your understanding of the subject.






