logo
ResearchBunny Logo
Forecasting the evolution of fast-changing transportation networks using machine learning

Transportation

Forecasting the evolution of fast-changing transportation networks using machine learning

W. Lei, L. G. A. Alves, et al.

This research explores the dynamic of edge removal in major transportation networks, specifically the Brazilian bus and U.S. air systems, utilizing machine learning for accurate predictions. Conducted by Weihua Lei, Luiz G. A. Alves, and Luís A. Nunes Amaral, the study highlights complex behavior amidst external shocks, offering valuable insights for infrastructure planning.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses how to forecast the evolution of mature, fast-changing transportation networks by predicting which connections (edges) will be removed over time. Transportation networks enable mobility and commerce but facilitate disease spread and contribute significantly to greenhouse gas emissions, with transportation being a large share of U.S. and global emissions and aviation a notable contributor. Anticipated climate policies may drive substantial reconfiguration of transportation networks, creating a need for predictive tools. While network connection dynamics have been studied, temporal edge removal mechanisms in real transportation systems are not well understood because they result from competing business strategies, governmental policies, and contingencies. The authors propose using machine learning to determine whether local network features can predict edge removals, thereby enabling scenario building for infrastructure planning and assessing robustness under shocks like COVID-19.
Literature Review
Prior work on network dynamics includes studies of temporal networks, diffusion, epidemics on networks, and failure processes. Edge addition has often been studied via missing link prediction and growth models, while edge removal has been analyzed in percolation, attack/error tolerance, dismantling strategies, catastrophic and cascading failures, synchronization transitions, pruning underutilized links, among others. However, these processes do not capture bounded edge removals observed in real temporal transportation networks. Machine learning has recently been applied to human mobility, infrastructure sustainability, and pandemic-related demand forecasting, motivating its application here to edge removal prediction. The authors also note that link prediction literature commonly assumes unweighted networks, prompting separate analysis of topological vs weight information.
Methodology
Data: Monthly temporal networks were constructed for two systems. Brazil Bus net: Brazilian inter-city bus routes (ANTT) from Jan 2005–Dec 2014 (120 snapshots; ~1734 nodes; ~18,781 edges on average). Nodes are cities (bus stops). Undirected edges indicate at least one bus route within the month; edge weights are the number of buses per month. U.S. Air net: U.S. domestic air transportation (BTS) Jan 2004–Dec 2018 (and Jan 2019–Mar 2021 for COVID-19 analysis). Nodes are cities (airports). Undirected edges indicate at least one direct airline connection in the month; edge weights are number of flights. Networks average ~819 nodes and ~6,547 edges across 192 snapshots. Problem formulation: In snapshot G_m, edges are labeled as added, retained, or removed over the month; prediction focuses on edges present at the start (retained vs removed), yielding a binary classification. For each edge e_ij, features X are computed from G_m and a classifier estimates Prob(e_ij = removed) = f(X). Train/test design: For a chosen snapshot, 70% of retained and removed edges are sampled for training (class balancing by random under-sampling of the majority class). Two testing regimes: (i) simultaneous — test on remaining edges from the same snapshot; (ii) non-simultaneous — test on edges from later snapshots n > m not used in training, probing temporal generalization. Features: The study evaluates four feature sets for classification: (1) unweighted topological features (11 common link-prediction metrics: CN, Salton, Jaccard, Sørensen, HPI, HDI, LHNI, PA, Adamic–Adar, Resource Allocation, Local Path); (2) weighted topological features; (3) edge weight alone; (4) unweighted topological features plus edge weight. Kolmogorov–Smirnov tests (Bonferroni-corrected α=0.05/12) show feature distributions for retained vs removed edges differ significantly (p < 3×10^-4) for all considered features in Jan 2014 snapshots of both networks. Model selection: 27 classification algorithms (scikit-learn and XGBoost) were benchmarked via stratified 10-fold cross-validation on a balanced training set; balanced accuracy, F1, and ROC-AUC were compared. Eight algorithms achieved high, stable performance (balanced accuracy ~0.6–0.8). XGBClassifier (gradient boosting trees) was selected for subsequent analyses due to high accuracy and low variance. Performance evaluation: Balanced accuracy, F1, ROC-AUC computed using scikit-learn. A null model shuffles labels to test whether predictability exceeds chance. Hyperparameters: Grid search over learning rate (0.01–0.4), gamma (0–0.2), max_depth (0–10), n_estimators (0–200). Reported results use XGBoost defaults avoiding overfitting/underfitting: learning_rate=0.3; n_estimators=100; max_depth=3; objective=binary:logistic; booster=gbtree; gamma=0; min_child_weight=1; subsample=1; colsample_bytree=1; reg_alpha=0; reg_lambda=1; scale_pos_weight=1; base_score=0.5. Interpretability: SHAP values quantify feature importance and direction of effect for predictions across snapshots, highlighting stability/variability of feature rankings over time. COVID-19 shock analysis: U.S. Air net from Jan 2019–Mar 2021 used to test prediction robustness under travel restrictions; simultaneous and non-simultaneous tests conducted; feature ranking tracked. Long-term forecasting: Two approaches evaluated to roll forecasts: model using unweighted topological features and model using edge weight only. For weight-only forecasts, a regression w_ij(t) = f(w_ij(t-1)) predicts next-month weights; weights for newly added edges are taken from data to avoid modeling additions. Starting from G_m, predict removals for G_{m+1}, then iteratively predict forward; Jaccard similarity between predicted and actual edge sets evaluates performance over time. Scenario simulation and CO2 estimation: A probabilistic removal process uses a model trained on a known snapshot (e.g., Dec 2018) to assign removal probabilities to edges each month. To simulate policy-driven downsizing, a target edge count N* is imposed and at each step a fraction δN = γ(N_m − N*) of edges is removed according to predicted probabilities, with rates γ ∈ {0.02, 0.04} and targets resulting in overall removal fractions R_γ ∈ {2/3, 4/5}. CO2 emissions are estimated from route-level activity using 2018 average fuel efficiency: Monthly CO2 (tons) = 3.16 × 32.5 g fuel/km × distance (km) × monthly flights × 10^-6 (tons per gram). Spatial implications are visualized alongside Amtrak’s 2035 plan to assess multimodal connectivity tradeoffs.
Key Findings
- Simultaneous prediction (same-month test): Using only unweighted topological features, balanced accuracy averaged 0.65 for Brazil Bus net and 0.70 for U.S. Air net, indicating significant separability of retained vs removed edges from local structure. Adding weighted topological features increased accuracy marginally to 0.69 (Brazil) and 0.71 (U.S.). Using edge weight alone markedly improved accuracy for U.S. Air net to 0.82; combining weight with topological features did not significantly exceed weight alone. - Non-simultaneous prediction (future months): Brazil Bus net performance collapsed to near chance, indicating temporal instability of removal dynamics and model overfitting to a snapshot. U.S. Air net maintained similar accuracy to simultaneous tests: ~0.70 with unweighted features and ~0.82 with edge weight, demonstrating stable, generalizable removal dynamics. - Feature importance: For U.S. Air net, edge weight, hub promoted index (HPI), and resource allocation index (RA) consistently had the highest predictive power across snapshots. Low HPI and RA values were associated with higher removal likelihood. Feature rankings were temporally stable for U.S. Air net but unstable for Brazil Bus net, explaining poor generalization in the latter. - Robustness to COVID-19 shock: Despite a collapse in passenger volumes (70 million in Feb 2020 to 2.87 million in Apr 2020), monthly fractions of removed edges resembled pre-pandemic levels. The model’s balanced accuracy for simultaneous and non-simultaneous tests remained similar pre- and post-restrictions, and SHAP-based feature rankings (HPI, RA prominence) persisted, indicating robustness to this exogenous shock. - Long-term forecasting: Models using only unweighted topological features yielded more stable, consistent predictions across starting months; weight-only models showed better average performance but higher variability, with about 13% of trajectories exhibiting very large errors that degraded to near-random by the end of simulations. - Scenario forecasting for U.S. Air net downsizing: Under hypothetical reductions (R_γ = 2/3 or 4/5) at rates γ = 0.02 or 0.04, hub-to-hub connections (e.g., Chicago–Boston) had longest survival times. Edge survival rankings depended on γ and R_γ (Pearson correlation of 0.7 between two scenarios). Projected CO2 emissions decreased relative to 2018 consistent with removal magnitude. Spatially, many Midwestern cities would lose air connections; given limited, non-high-speed rail plans, these losses likely shift trips to automobiles, potentially offsetting emissions gains via increased road travel and infrastructure demands.
Discussion
The study demonstrates that edge removals in transportation networks are not random; local network features can accurately distinguish edges likely to be removed. For the U.S. Air net, consistent importance of edge weight, HPI, and RA suggests a stable removal mechanism: direct hub-to-hub links are preserved, while redundant links to hub-connected cities are more disposable. This stability enables accurate generalization across time and under shocks like COVID-19. In contrast, the Brazil Bus net exhibits time-varying feature importance and removal dynamics, limiting temporal transferability and suggesting stronger influence of shifting operational, regulatory, or economic factors. By enabling long-term forecasts and scenario analyses, particularly when relying on robust unweighted topological features, the approach informs planning under climate policies that may reduce air connectivity. The findings imply that without complementary high-speed rail development, reductions in air links could redirect medium-distance travel to roads, with environmental and infrastructure consequences. The methodology also highlights the value of interpretable ML (SHAP) to uncover structural drivers of network evolution and to inform multi-modal planning.
Conclusion
This work introduces a machine learning framework to predict and forecast edge removals in mature transportation networks using local topological features and, when available, edge weights. The approach achieves high same-month accuracy in both Brazilian bus and U.S. air networks, and strong temporal generalization in the U.S. air network, including during the COVID-19 shock. Feature importance analyses consistently identify edge weight, hub promoted index, and resource allocation index as key predictors in the U.S. air network. Long-term simulations show that unweighted topological features provide more stable forecasts than weight-only models. Scenario analyses illustrate potential network reconfigurations under CO2-reduction policies, highlighting likely persistence of hub-to-hub connections and vulnerability of peripheral links, with implications for multimodal transport planning. Future research directions include expanding the feature set (e.g., demand, fares, costs), modeling multilayer interactions between transportation modes, integrating airline strategic factors and regulatory events, and coupling with models of population shifts and climate impacts to enhance scenario realism.
Limitations
- Feature scope: Only a limited set of topological features was examined. Adding more features may improve prediction. Tests with global measures (edge betweenness, current-flow betweenness) and demographic factors (intercity gravitation flow) did not improve performance in this study. - Multilayer interactions: The interplay between layers (e.g., air and rail) was not modeled; cross-modal dynamics could influence edge removal/addition patterns. - Strategic/exogenous factors: Airline strategies (fleet, scheduling, hub optimization), regulatory changes, infrastructure expansions, mergers/bankruptcies, epidemics, and natural disasters were not explicitly modeled; these can alter removal decisions unpredictably. - Weight forecasting: Models leveraging weights need an additional model to forecast future weights, increasing complexity and potentially reducing long-term robustness due to error accumulation. - Generalizability across systems: Temporal instability in the Brazil Bus net indicates that snapshot-trained models may not transfer across time in all transportation systems, limiting forecast applicability where dynamics are volatile.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny