Introduction
Human mobility significantly impacts various societal aspects, including well-being, disease spread, and environmental quality. Accurate mobility flow data is crucial for understanding and predicting these impacts. However, such data is often unavailable, necessitating the use of mathematical models for flow generation. Existing models, such as the classic gravity model, have limitations in capturing the complexity of real-world flows. The gravity model, while interpretable and requiring few parameters, struggles to accurately reflect the structure of real flows due to its reliance on limited variables (population and distance). This paper addresses this gap by proposing Deep Gravity, a novel approach that incorporates a richer set of geographical features and leverages the power of deep neural networks to capture non-linear relationships between these features and mobility flows. The model utilizes freely available geographic data from OpenStreetMap (OSM) to extract various features, including land use, road networks, transportation facilities, and points of interest (POIs) related to food, health, education, and retail. Deep neural networks are employed to learn complex, non-linear relationships between these features and mobility flows, aiming for more realistic flow generation compared to existing methods.
Literature Review
The gravity model, first proposed by Zipf in 1946, serves as a foundational model for estimating mobility flows, drawing an analogy with Newton's law of gravitation. It assumes that flows between locations increase with population size and decrease with distance. Despite its simplicity and interpretability, the gravity model suffers from inaccuracies due to its limited consideration of geographical complexities like land use, POIs, and transportation networks. While deep learning has been applied to flow prediction using historical data, its application to flow generation without historical information remains largely unexplored. This research aims to address this gap by integrating a large set of geographically detailed variables extracted from OpenStreetMap within a deep learning framework.
Methodology
Deep Gravity is designed as a deep neural network that extends the singly constrained gravity model. The singly constrained gravity model is formally equivalent to a linear neural network with a softmax layer. Deep Gravity enhances this by adding multiple hidden layers and non-linearities to capture more complex relationships in the data. The model takes as input features extracted from OpenStreetMap, such as land-use areas, road network features, and counts of POIs for various categories (transport, food, health, education, retail). These features are normalized by the location's area. The distance between locations is also included as a feature. The model’s architecture comprises 15 hidden layers with LeakyReLU activation functions. The output is a vector of probabilities that sum up to one for each potential destination, given an origin location. The generated flow between two locations is then calculated by multiplying the probability by the origin's total outflow. The model is trained using the cross-entropy loss function and the RMSprop optimizer. Experiments were conducted on mobility flow datasets from England, Italy, and New York State, using a stratified sampling approach to divide regions of interest into training and testing sets. The performance of Deep Gravity was evaluated using several metrics: the Sørensen-Dice index (CPC), Pearson correlation coefficient, Normalized Root Mean Squared Error (NRMSE), and Jensen-Shannon divergence (JSD). To better understand the model’s performance, additional models were created including: 1) the Nonlinear Gravity model (NG), which uses the same deep neural network architecture as Deep Gravity but only takes population and distance as input features; 2) the Multi-Feature Gravity model (MFG), which uses the same input features as Deep Gravity but employs a single-layer linear neural network. A leave-one-city-out cross-validation method was used to assess the model's geographic transferability. SHapley Additive exPlanations (SHAP) values were used to interpret the model's predictions, providing both global (feature importance across the entire dataset) and local (feature importance for individual origin-destination pairs) explanations.
Key Findings
Deep Gravity significantly outperforms both the classic gravity model and the hybrid models (NG and MFG) across all three countries. The improvements are substantial, especially in densely populated areas. For example, in highly populated regions of interest, Deep Gravity showed a relative improvement in CPC compared to the classic gravity model of up to 66% (Italy), 246% (England), and 1076% (New York State). The model demonstrates robust generalization capabilities, effectively generating realistic flows for geographic areas not included in the training data, as evidenced by the leave-one-city-out validation. The SHAP value analysis reveals that the relative importance of features varies across countries. While distance and destination population are consistently important, the influence of other geographic features is more pronounced in England, suggesting a greater interplay between these factors in shaping mobility patterns. Conversely, in Italy and New York State, the model predictions largely depend on the non-linear relationship between populations and distance. The performance of all models decreases with increasing population density (except for New York State, where a slight increase is observed). However, Deep Gravity exhibits the smallest performance degradation compared to other models. This result is significant because in densely populated areas, the sheer number of potential destinations makes accurate prediction challenging.
Discussion
The findings show that Deep Gravity effectively addresses the limitations of traditional gravity models by incorporating a rich set of geographic features and utilizing deep neural networks to capture non-linear relationships. The superior performance of Deep Gravity, particularly in densely populated areas, demonstrates the value of integrating detailed geographic data within a flexible, non-linear model. The model's good generalization ability enhances its applicability to various geographic locations where training data may be limited or absent. The variable importance analysis highlights potential differences in the factors that drive mobility patterns across different countries. Future research could focus on further refining the model by incorporating additional features (e.g., travel time by different modes of transportation, road network details, socio-economic information) and exploring more sophisticated explanation techniques tailored to the specifics of human mobility.
Conclusion
Deep Gravity offers a substantial improvement over existing methods for mobility flow generation, especially in densely populated regions. The model's ability to generalize to unseen areas and the insights gained from explainable AI techniques pave the way for developing more realistic and interpretable models of human mobility. Future research could investigate the model's scalability to larger datasets, explore the application of Deep Gravity in various domains, and delve deeper into the geographic and cultural influences on mobility patterns across different regions.
Limitations
While Deep Gravity outperforms existing methods, several limitations should be noted. The model’s performance relies on the availability of detailed geographic data, which might not be readily accessible for all regions. The model assumes that flows from different origin locations are independent, which might not always hold true in practice. Further work is needed to explore the model’s sensitivity to different data sources and preprocessing techniques.
Related Publications
Explore these studies to deepen your understanding of the subject.