Computer Science
A Deep Gravity Model for Mobility Flows Generation
F. Simini, G. Barlacchi, et al.
The paper addresses the challenging problem of mobility flow generation: estimating flows between locations within a region using demographic and geographic characteristics without relying on historical flow data. Classic gravity models, inspired by Zipf and Newtonian analogies, assume flows increase with population and decrease with distance, and have been widely used in transport planning, spatial economics, and epidemic modeling. However, gravity models are limited by their reliance on a small set of variables (typically population and distance) and cannot capture the variability and complex structure of real flows, ignoring crucial geographic information like land use, POIs, and transportation networks. Deep learning has been successful for flow prediction (forecasting future flows from historical ones) but its capability for flow generation without historical flows is underexplored. The authors propose Deep Gravity, which leverages rich geographic features extracted from OpenStreetMap and deep neural networks to learn nonlinear relationships between features and mobility flows. The approach is motivated by the equivalence of the singly constrained gravity model to a shallow linear neural network, suggesting a natural nonlinear extension via deeper architectures. The study evaluates Deep Gravity on England, Italy, and New York State, focusing on performance gains in densely populated areas, generalization to unseen geographies, and interpretability via explainable AI (SHAP).
The literature on human mobility modeling spans migration between cities, urban mobility patterns, traffic and crowd flow prediction, population estimation, and epidemic spread. Gravity models have long been the standard for generating spatial flows due to their interpretability and simplicity, relying on population and distance with deterrence functions (exponential or power-law). Yet, they often fail to capture real-world flow heterogeneity and structural nuances. Prior work highlights the importance of incorporating land use, POIs, and transportation network characteristics. Deep learning methods have been extensively applied to flow prediction tasks using historical data, but offer limited applicability for flow generation when historical flows are unavailable. Explainable AI methods, particularly SHAP, have been proposed to interpret complex models and attribute feature importance, offering transparency to black-box approaches. This work situates Deep Gravity at the intersection of these strands, extending gravity models with deep neural networks and enriched geographic features to improve flow generation and employing SHAP for global and local interpretability.
Problem setup: Given a region of interest R partitioned into nonoverlapping locations (tessellation T) that fully cover R, and given each origin location’s total outflow O_i, estimate flows y(l_i,l_j) between all origin-destination pairs within R without using any historical flows from R during training. Performance metrics include Common Part of Commuters (CPC, Sørensen-Dice index), Pearson correlation, Normalized RMSE, and Jensen–Shannon divergence. CPC, under equal total generated and real outflows, corresponds to accuracy (fraction of trips assigned to the correct destination).
From gravity to deep learning: The singly constrained gravity model generates expected flow via ŷ_{ij}=O_i P_{ij}=O_i m_j f(r_{ij})/∑k m_k f(r{ik}), with deterrence function f(r) exponential or power-law. This can be cast as a multinomial GLM, whose negative log-likelihood is proportional to cross-entropy of a shallow linear neural network with softmax, using population and distance as inputs. Interpreting flow generation as a classification problem (assign each unit trip from origin i to destination class j) enables extension with nonlinear deep networks and additional features.
Deep Gravity architecture: For each origin i, construct n input vectors x_{ij}=concat(x_i, x_j, r_{ij}), one per candidate destination j in the region. Each x_{ij} contains: origin features x_i, destination features x_j, and great-circle distance r_{ij} between centroids. All x_{ij} are processed in parallel by the same feed-forward neural network with 15 hidden layers (bottom six layers of size 256, remaining layers size 128), LeakyReLU activations. The network outputs a scalar score s_{ij} per pair, then applies softmax over destinations to obtain probabilities p_{ij}. Generated flows are O_i p_{ij}.
Features: Each location’s features are normalized by area and include:
- Land use areas (5): areas of residential, commercial, industrial, retail, natural.
- Road network (3): lengths of residential, main, and other roads.
- Transport facilities (2): counts of transport-related POIs/buildings (e.g., stations, stops, parking).
- Food facilities (2): counts of related POIs/buildings (e.g., bars, cafes, restaurants).
- Health facilities (2): counts of related POIs/buildings (e.g., clinics, hospitals, pharmacies).
- Education facilities (2): counts of related POIs/buildings (e.g., schools, colleges, kindergartens).
- Retail facilities (2): counts of related POIs/buildings (e.g., supermarkets, department stores, malls).
- Population of origin and destination.
- Geographic distance r_{ij}. This yields 39 features per origin–destination pair (18 origin, 18 destination, distance, and populations).
Variants: A light version aggregates POIs to a total count without categories (5 features per flow). A heavy version augments features with averages of the k nearest neighbors’ features for origin and destination (e.g., 77 features for k=2). Both perform comparable to or worse than the main Deep Gravity model.
Training: The loss is cross-entropy H = −∑_i ∑j (y(l_i,l_j)/O_i) ln p{ij}, assuming independence across origins. Optimization uses RMSprop (momentum 0.9), learning rate 5×10^−6, batch size 64 origins, trained for 20 epochs. To reduce training time, negative sampling is used with up to 512 randomly selected destinations per origin.
Datasets and experimental setup: Experiments are conducted on England (UK), Italy (EU), and New York State (US). Regions of interest are 25×25 km squares: England 885 ROIs, Italy 1551, New York State 475; half used for training and half for testing in stratified splits by population deciles. Locations within ROIs are administrative tessellations: England Output Areas (OAs), Italy Census Areas (CAs), New York State Census Tracts (CTs). Mobility flows: UK and Italy commuting flows from national censuses (2011); New York State flows from anonymized mobile-phone-based dataset (Kang et al.). Population per location is from censuses (England, Italy) and estimated for CTs (sum of outgoing flows) for New York State. A 10×10 km ROI size is also tested.
Baselines and hybrid models: Comparisons include: (G) classic gravity model; (NG) Nonlinear Gravity using the same deep network structure as DG but only population and distance features; (MFG) Multi-Feature Gravity using the rich geographic feature set but processed by a single-layer linear (softmax) model. Performance is evaluated globally and by population deciles.
Generalization test: A leave-one-city-out setup (LDG) in England evaluates geographic transferability by excluding all ROIs of one major city (London or one of the Core Cities) from training and testing solely on that city, rotating across cities.
- Deep Gravity (DG) consistently outperforms the classic gravity model (G) and the hybrid baselines (NG, MFG) across England, Italy, and New York State.
- England: Overall CPC for DG ≈ 0.32 versus MFG 0.23, NG 0.12, G 0.11; DG improves over MFG by ~39%, over NG by ~166%, and over G by ~190%. Visual comparisons show DG’s generated flow networks are structurally closer to observed flows than G.
- Italy and New York State: DG achieves substantial gains over G; the relative improvement of DG over G reaches about 66% (Italy) and 1076% (New York State) in the highest population deciles. Improvement is widespread geographically across all ROIs.
- Population density effect: All models tend to achieve higher CPC in sparsely populated ROIs and degrade in highly populated ROIs. However, DG degrades much less; its relative improvement over G increases with population, making the largest gains in dense regions where prediction is hardest.
- Model ranking: Across countries and deciles, G is always worst, DG always best; NG and MFG are intermediate, with country-specific ordering (MFG > NG in England; NG > MFG in Italy and New York State).
- ROI size sensitivity: Using smaller ROIs (10×10 km) reduces CPC by ~0.03 across models compared to 25×25 km; DG’s relative improvement over G in the highest decile remains large (e.g., ~220% in England), albeit somewhat smaller than with 25 km tiles.
- Geographic transferability: Leave-one-city-out DG (LDG) yields CPCs close to standard DG when tested on excluded cities (e.g., London, Newcastle, Liverpool, Nottingham). Slight increases or decreases are observed per city, indicating robust generalization to unseen urban areas.
- Explainability insights (SHAP): Distance is a strong negative contributor for large separations; destination population is highly important in Italy and New York State. In England, population shows mixed effects, and POI/land-use features (food, retail, industrial, roads) often dominate in increasing or decreasing flow probabilities. Case studies show DG can assign different probabilities to symmetric pairs with identical populations and distances due to other geographic features.
The study reframes flow generation as a classification problem and demonstrates that augmenting the gravity model with deep nonlinear transformations and rich geographic features markedly increases realism in generated mobility flows. DG’s advantages are most pronounced in highly populated regions with many plausible destinations, where classic models struggle, thereby addressing a key practical challenge in urban mobility modeling. The interplay between nonlinearity and feature richness underlies DG’s performance: in England, detailed geographic features drive gains, while in Italy and New York State, capturing nonlinear relationships among population and distance is paramount. DG maintains strong performance with different ROI sizes and generalizes well to cities excluded from training, supporting its applicability to regions lacking historical flow data. SHAP-based analyses provide transparency, revealing both global patterns (e.g., distance deterrence, population attraction) and local, instance-specific determinants (e.g., the role of POIs and land use), enhancing trust and interpretability for stakeholders.
This work introduces Deep Gravity, a deep neural model that leverages voluntary geographic information to generate realistic mobility flow probabilities without historical flows. It consistently outperforms gravity-based and shallow alternatives across multiple geographies, with especially large gains in densely populated areas. The approach is transferable to unseen cities and can be interpreted via SHAP to understand feature contributions. Future research directions include: expanding feature sets (e.g., travel times by mode, detailed road-network structure, socio-economic indicators such as housing prices, gentrification, segregation); developing tailored explainability tools for spatial flow models; enabling full flow synthesis by also generating total outflows per location; and exploring cross-scale and cross-region transferability (rural-to-urban, urban-to-rural, cross-country).
- Performance remains lower in highly populated regions (though DG degrades less than baselines), reflecting the inherent complexity and stochasticity of human mobility decisions.
- The model currently assumes known total outflows O_i and does not generate them; extending to outflow generation is future work.
- Reliance on voluntary geographic data (OpenStreetMap) may introduce regional variability due to data completeness and quality differences.
- Generalization was explicitly tested within England using major cities; broader transferability across countries and urban–rural contexts requires further study.
- Interpretability uses general-purpose SHAP; more specialized explanations for spatial flow generation are needed.
- Training efficiency improvements (e.g., negative sampling up to 512 destinations) approximate full softmax over all destinations and might affect exact probabilities compared to exhaustive evaluation.
Related Publications
Explore these studies to deepen your understanding of the subject.

