logo
Loading...
Gravity models do not explain, and cannot predict, international migration dynamics

Social Work

Gravity models do not explain, and cannot predict, international migration dynamics

R. M. Beyer, J. Schewe, et al.

Explore the complex dynamics of international migration in this eye-opening research by Robert M. Beyer, Jacob Schewe, and Hermann Lotze-Campen. This study critiques traditional gravity models, revealing their limitations in capturing temporal migration patterns and the potential pitfalls for policymakers relying on these statistical insights.... show more
Introduction

The study examines whether gravity models, widely used to relate bilateral migration flows to demographic, economic, geographic, and socio-political predictors, can validly describe both spatial and temporal variation in international migration. Although these models are calibrated on spatio-temporal panel data and are often assumed to map effects across countries and over time, migration flows vary across countries by orders of magnitude, whereas temporal variation within a country pair is much smaller. Consequently, strong spatial fits may mask poor temporal performance. The authors test the hypothesis that gravity models calibrated on pooled data largely capture spatial patterns but fail to reproduce temporal dynamics, which undermines their use for explaining past time trends or predicting future flows (e.g., under economic growth or climate change scenarios).

Literature Review

Gravity models originated from analogies to Newtonian gravity using populations of origin and destination and distance, and have expanded to include economic, social, climatic, political, and cultural variables. Prior work has used such models to infer that migration from poor countries may rise with income growth and to link climate-related factors (temperature, disasters, environmental degradation, exposure/vulnerability) to migration. Common validations emphasize pooled goodness-of-fit, not time-series performance. The paper situates its contribution against two categories of gravity models: those with origin–destination fixed effects and those without. It also notes evidence from earlier applications (e.g., Haag et al. 1988; Rikani and Schewe 2021) where pooled fits were good but temporal tracking at country level was poor, underscoring a persistent validation gap in the literature.

Methodology

Design: Four bilateral gravity models were specified. Models without fixed effects: Model 1 includes log populations of origin and destination, log per-capita GDP in origin and destination, and log distance. Model 2 augments Model 1 with unemployment (U), education (E; expected years of schooling), political stability (S), health (H; life expectancy), youth share (Y; aged 20–35), immigration restrictiveness (B, destination), colonial link (C_ij), common official language (L_ij), and log distance, with non-log transforms for variables not approximately log-normal. Models with fixed effects: Model 3 includes log populations and log per-capita GDPs plus origin–destination fixed effects c_ij (capturing time-invariant bilateral factors like distance, language, colonial history). Model 4 adds the broader set of time-varying predictors (U, E, H, S, R/Y/B as defined; B available to 2010) alongside c_ij. Parameterization: Parameters P_k and c_ij were estimated by least squares on pooled observations. Data: Annual bilateral migration flows M_ijt from OECD (206 origins, 37 OECD destinations), 1995–2019. Predictors: per-capita GDP (James et al., updated), populations and unemployment (World Bank), expected years of schooling and life expectancy (UNDP), youth share (Lutz et al.), political stability (Economist Intelligence Unit), immigration restrictiveness (IMPIC; to 2010). After removing zero flows and missing data, 17,759 observations (Models 1 and 3) and 9,166 (Models 2 and 4) remained. Robustness: Repeated analysis using 5-year interval bilateral flows (Abel and Cohen, 2019), restricted to the same country pairs and period; predictors averaged over 5-year windows; sample sizes 5,265 (Models 1 and 3) and 3,957 (Models 2 and 4). Evaluation metrics: (i) Pooled goodness-of-fit R^2 on log flows (Eq. 5). (ii) Temporal dynamics via relative changes ΔM_ijt and ΔR^2 (Eq. 7) on pooled data. (iii) Bilateral time-series fit R^2_ij on levels (Eq. 8) and ΔR^2_ij on changes (Eq. 9) for each origin–destination pair. (iv) Comparison of pooled vs bilateral coefficients: re-estimated bilateral regressions for Model 1/3 predictors (Eq. 10) per country pair (with ≥6 observations), using robust bisquare weighting, and compared distributions of p_k^(ij) to pooled coefficients. Additional diagnostic models included a time-averaged version of Model 1 (Eq. 11) and a fixed-effects-only model log(M) ~ c_ij (Eq. 12) to test whether high pooled R^2 can be achieved without any temporal signal.

Key Findings
  • Strong pooled fit on levels but not on time dynamics: Using annual data, pooled R^2 values were 0.62 (Model 1), 0.68 (Model 2), 0.94 (Model 3), and 0.95 (Model 4), comparable to prior literature. However, ΔR^2 for temporal changes was negative in all models (≈ −0.003 to −0.004), indicating worse performance than using the constant mean rate of change.
  • Bilateral time-series performance was poor: For annual calibrations, R^2_ij on levels was negative for 87% (Model 1) and 93% (Model 2) of country pairs; even with fixed effects it was negative for 37% (Model 3) and 42% (Model 4). For temporal changes, ΔR^2_ij was negative for 72% (Model 1), 82% (Model 2), 75% (Model 3), and 76% (Model 4) of pairs.
  • Underestimation of temporal variability: Models substantially underestimated magnitudes of both positive and negative year-to-year changes.
  • High pooled R^2 can be achieved without temporal information: A time-averaged version of Model 1 (no temporal variation in predictors) achieved R^2 ≈ 0.66, similar to Model 1. A fixed-effects-only model log(M) ~ c_ij (no temporal variation by design) achieved R^2 ≈ 0.93, close to Models 3–4.
  • Inconsistency of coefficients across space vs time: Bilateral regressions for time-varying predictors (populations and per-capita GDPs) yielded coefficient distributions that often differed markedly from pooled estimates; for around half of country pairs, coefficient signs disagreed with pooled signs, and magnitudes varied widely.
  • Robustness: Results were virtually identical with 5-year interval data for all evaluations.
Discussion

Findings show that gravity models calibrated on pooled data primarily capture spatial variation, not temporal dynamics. Because spatial flows vary by orders of magnitude while temporal fluctuations within a corridor are much smaller, pooled regressions are dominated by spatial signals, enabling high pooled R^2 without describing time trends. Even when fixed effects absorb spatial heterogeneity so that remaining coefficients could, in principle, target temporal patterns, models still failed to capture dynamics consistently across corridors. This indicates that temporal responses to changes in classic predictors (population, income) are heterogeneous and more complex than the cross-sectional relationships. Consequently, inferences that use pooled gravity coefficients to explain past temporal trends or to forecast future changes (e.g., due to income growth or climate stress) are likely artefacts of spatial correlations rather than true temporal mechanisms. The study also highlights that typical validations in the literature—focused on pooled fits—are insufficient to assess temporal adequacy and may obscure this limitation.

Conclusion

The paper demonstrates that gravity models, despite strong pooled fits, do not capture temporal dynamics of international migration and thus lack statistical support for explaining past time trends or predicting future flows. The authors recommend that model validation explicitly assess bilateral time-series behavior and temporal change rates, not only pooled fits. Given that temporal responses to even established predictors differ widely across corridors, projections based on current gravity approaches are likely unreliable. While the authors cannot exclude that alternative predictors or very long time horizons might align spatial and temporal patterns, such scenarios are of limited practical relevance and face extreme uncertainty. Future research should develop modeling frameworks that explicitly learn temporal dynamics at the bilateral level, include richer time-varying drivers, and validate against out-of-sample time-series changes, potentially leveraging hierarchical or dynamic models rather than static pooled regressions.

Limitations
  • Temporal horizon and data scope: Analysis covers 1995–2019 (annual) and comparable 5-year intervals, with destinations limited to 37 OECD countries, which may not capture very long-term or non-OECD dynamics.
  • Predictor set: Although both simple and more complex predictor sets were tested, it remains possible that other, untested time-varying drivers could improve temporal performance.
  • Bilateral coefficient estimation constraints: Country-pair-specific regressions have relatively short time series versus the number of parameters, so coefficient distributions must be interpreted cautiously.
  • The study cannot rule out convergence of spatial and temporal relationships over centuries; however, such horizons are impractical for policy and highly uncertain.
  • Zero flows and missing data were excluded, which may affect generalizability to corridors with intermittent flows.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny