Economics
Eye in outer space: satellite imageries of container ports can predict world stock returns
H. Yu, X. Hao, et al.
The study asks whether real-time satellite imagery of global container ports can predict stock market returns. The motivation stems from asset pricing theory linking returns to macroeconomic state variables and the observed failure of many traditional, lagged, and revised macro indicators to beat simple no-predictability benchmarks. Traditional data are delayed, revised, publicly available, and low frequency, limiting daily/weekly forecasting utility. Advances in AI and remote sensing enable near-real-time measures of economic activity. Because roughly 90% of non-bulk dry cargo is shipped by container, the number of containers stacked in ports can proxy for supply-chain frictions and economic conditions: increases imply congestion and lower effective shipping throughput, presaging weaker economic activity and lower future stock returns. The paper tests whether satellite-derived container counts from major ports predict daily returns across 33 global equity indices and explores the economic mechanism via links to shipping indicators and industrial production.
Prior return prediction literature evaluates numerous predictors (dividend-price/yield, earnings-price, payout, volatility, book-to-market, T-bill, term spread, inflation), with large-sample assessments (Welch and Goyal, 2008; Goyal et al., 2021) finding they typically fail to outperform historical means out-of-sample. Alternative data from satellites have been used to measure economic variables such as GDP growth and inequality (night lights) and poverty/sustainable development. In finance, studies show satellite-based parking lot counts can predict retailer sales and affect price reactions to earnings, and satellite oil storage imagery smooths price responses to government reports. Unlike studies using commercial providers, this paper builds a public-satellite-based dataset to examine asset pricing predictability directly. Shipping indicators like the Baltic Dry Index (BDI) and container throughput (RWI/ISL) are established proxies for global economic activity but are lagged. The paper positions satellite-based container yard coverage as a higher-frequency, forward-looking complement that may lead these indicators and connect to real output (industrial production).
Data and imagery processing: The authors collect 83,672 multispectral daytime Sentinel-2 images (10 m/pixel) covering the top 48 container ports (by throughput) from 2017-01-01 to 2021-11-01. Container identification is posed as semantic binary segmentation (container vs non-container). Due to lack of a standard dataset, 3,711 cloud-free images from 2017 were hand-labeled to create a training set. A U-Net model with varied hyperparameters (input sizes and depth) was trained and evaluated via 10-fold cross-validation. Best performance was achieved with medium-size inputs (e.g., 480×480) and deep networks; the selected model attains 93.20% accuracy, 92.45% recall, and 92.81% F-score on a test set. Because stack height is unobservable at Sentinel-2 resolution, the approach proxies container quantity by coverage area (assuming uniform stack heights across stacks).
Container indicator: For each port i and date t, the number of container pixels is aggregated to produce NC_it. To standardize irregular observation intervals due to cloud cover, the daily average growth in the number of containers GNC_it is computed as [log(NC_it) − log(NC_is)]/(t − s), where s is the most recent clear-image date before t. Higher GNC indicates more yard-stacked containers (greater congestion) and is hypothesized to predict lower future returns.
Stock returns and alignment: Daily returns for 33 market indices from 28 countries (2017–2021) are sourced from Wind. To avoid look-ahead, Sentinel-2 UTC timestamps are converted to local time; imagery within 24 hours before the market close is used to forecast next-day close-to-close excess returns. The initial in-sample estimation window uses 40% of the sample (2017–2018), with out-of-sample evaluation from 2019-01 to 2021-11. Robustness to alternative initial windows is discussed in the appendix.
Forecasting framework: Univariate predictive regressions of excess returns r_{t+h} on lagged predictors X_t (intercept and a GNC-based predictor for each port) are estimated via OLS with expanding windows; one-step-ahead forecasts are produced. Forecasts from individual ports are combined using equal-weight averaging across the 48 port-based models to form the final forecast. Evaluation metrics include out-of-sample R^2 (R^2_oos) relative to the historical mean benchmark, Clark–West tests for nested models, a bootstrap-based Diebold–Mariano (DM) test (stationary bootstrap with optimal block length, 2000 resamples) to address small-sample size distortions, Success Ratio (SR) with Pesaran–Timmermann (PT) tests for directional accuracy, and cumulative sum of squared error differences (CSSED) to assess performance stability over time.
Economic value: Two timing strategies are constructed: (1) an untilted strategy that goes long the index (financed by bills) when forecasts exceed the historical mean and shorts otherwise; and (2) a tilted strategy that only flips to short when forecasts are in the bottom quartile. Position sizes are scaled by Z-scores of forecasts (deviation from mean or 25th percentile divided by prevailing standard deviation). Strategies incorporate a one-day trading lag to reflect data processing delays. Performance is compared to a buy-and-hold benchmark using annualized mean returns and Sharpe ratios.
Mechanism tests: Aggregate GNC (sum across 48 ports) is compared to shipping indicators through predictive regressions of changes in container throughput (RWI/ISL) and BDI at horizons of 1–4 months, controlling for their own lags. Additional regressions examine whether GNC predicts industrial production growth (ΔIP) across 28 countries using autoregressive distributed lag models at horizons up to 6 months. Interaction terms with a COVID-19 dummy (post-2019-12) test for structural changes in predictive relations.
- Predictability of returns: The equal-weighted combination of port-based forecasts yields positive out-of-sample R^2 across all 33 markets at horizons up to 5 days. At h=1 day, R^2_oos is positive in all markets and statistically significant in 27 out of 33 at the 10% level (average daily R^2_oos ≈ 0.0529%). Predictability persists at longer horizons with average R^2_oos around 0.05%.
- Statistical significance: Bootstrap-based Diebold–Mariano tests indicate significant forecast improvements over the historical mean in 29/33 markets at h=1, 22/33 at h=2, 26/33 at h=3, 32/33 at h=4, and 24/33 at h=5 (10% level).
- Directional accuracy: Success ratios exceed 0.5 in 27 markets at h=1; 23 of these are significant via PT tests. Directional predictability remains above chance at longer horizons with some attenuation.
- Stability over time: CSSED plots show predominantly positive slopes during 2019–2021, indicating consistent outperformance, with marked jumps around March 2020 (COVID-19 onset), especially in U.S. and European markets.
- Economic value: Timing strategies based on GNC forecasts outperform buy-and-hold. The untilted strategy averages 14.85% annual return (SR 1.16) and beats the benchmark in 26/33 markets. The tilted strategy averages 16.38% annual return (SR 1.19) and beats the benchmark in 30/33 markets.
- Mechanism via shipping indicators: GNC significantly leads traditional shipping measures at 2-month horizons. A one standard deviation increase in GNC predicts a 27.2% decrease in changes in the RWI/ISL container throughput index and a 26.1% decrease in BDI changes (significant at 5% for h=2 months). These relations are not statistically altered by COVID-19 interaction terms.
- Mechanism via real activity: GNC negatively predicts industrial production growth across countries. At a 4-month horizon, 27 of 28 countries show negative coefficients, 15 of which are statistically significant. The average IP growth across countries also shows a significant negative relation at h=4 months, with enhanced predictability during COVID-19 (significant negative COVID interaction).
The findings demonstrate that satellite-derived measures of container yard congestion contain forward-looking information about macroeconomic conditions that is not immediately embedded in prices. By capturing real-time supply-chain frictions and effective shipping capacity constraints, the GNC indicator anticipates declines in shipping throughput, freight rates, and industrial production, which translate into lower expected stock returns. The increased predictability during the COVID-19 period aligns with heightened and sudden disruptions in global logistics and domestic demand, especially in U.S. and European markets, where lockdowns and supply-chain frictions were particularly acute. The results are consistent with market efficiency under costly information acquisition: specialized processing of public satellite data entails costs and expertise, limiting the immediate incorporation of this information into prices and producing short-horizon predictability. Satellite data improve the timeliness and frequency of economic signals compared with lagged official indicators, potentially reducing information asymmetries and enhancing price informativeness as these signals diffuse through markets. Delays in data acquisition, processing, and interpretation plausibly explain the multi-day horizon of predictability. Overall, the study underscores the economic significance of alternative, high-frequency public satellite data in empirical asset pricing and investment management.
Satellite imagery of global container ports, processed with deep learning to estimate container coverage, provides a timely proxy for real economic activity that significantly predicts global equity index returns. The predictive power is statistically robust, economically meaningful, and stronger during periods of acute supply-chain disruption such as COVID-19. The aggregate container indicator leads traditional shipping metrics (RWI/ISL, BDI) and predicts industrial production, offering a plausible mechanism linking port congestion to expected returns. The study highlights the value of public satellite data for investors and policy analysts. Future research could extend coverage beyond the top 48 ports, incorporate higher-resolution or multisensor data to infer stack heights and yard operations, explore nonlinear/machine learning forecast combinations, integrate additional alternative data (AIS vessel tracks, port call logs), and assess transaction costs, implementation frictions, and real-time processing pipelines to translate signals into executable strategies.
- Proxy limitation: Container coverage area assumes uniform stack heights and cannot observe vertical stacking due to Sentinel-2 resolution, introducing measurement error in container counts.
- Data gaps and timing: Cloud cover and processing pipelines create irregular observation intervals and delays; imagery publication and processing often lag by hours to days, reducing real-time immediacy.
- Port coverage: The dataset focuses on 48 major ports; results may not generalize to smaller ports or regions with different logistics dynamics.
- Model training period: U-Net is trained on 2017 labels; domain shifts over time could affect segmentation accuracy despite tests for stability.
- Implementation frictions: The portfolio analysis abstracts from transaction costs, shorting constraints, financing costs, and market impact, which could reduce realized returns.
- Public diffusion: Alternative channels (e.g., text/news, proprietary datasets) may partially transmit similar information before satellite-derived signals are tradable, diluting alpha.
- Macro confounds: While mechanism tests are supportive, causal identification between GNC and economic activity is not fully established, and unobserved factors may drive both.
Related Publications
Explore these studies to deepen your understanding of the subject.

