logo
ResearchBunny Logo
Predicting global patterns of long-term climate change from short-term simulations using machine learning

Earth Sciences

Predicting global patterns of long-term climate change from short-term simulations using machine learning

L. A. Mansfield, P. J. Nowack, et al.

Discover how L. A. Mansfield, P. J. Nowack, M. Kasoar, R. G. Everitt, W. J. Collins, and A. Voulgarakis have harnessed machine learning to transform climate change projections, linking short-term simulations to long-term temperature patterns. Their innovative study shows promise in capturing regional diversity, especially in aerosol scenarios, offering a leap beyond traditional methods.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge of rapidly estimating regional patterns of long-term climate change across diverse anthropogenic forcing scenarios despite the high computational cost of running multi-decadal to centennial Global Climate Model (GCM) simulations. Motivated by evidence of characteristic links between short-term and long-term climate responses to different forcers, the authors propose learning a mapping from short-term (first ~10 years) to long-term (quasi-equilibrium after ~70 years) surface temperature responses within a given GCM. The objective is to construct a fast surrogate that can predict spatial long-term responses from cheaper short-term simulations, thereby accelerating scenario exploration and aiding detection, predictability, and attribution of climate change. The work is set in the context of informing realistic emission pathways to meet mitigation targets (e.g., 1.5–2 °C), where regional responses to varied forcers (greenhouse gases vs. aerosols) can differ substantially and are not fully captured by global aggregate metrics.
Literature Review
Prior studies have linked fast (short-term) and slow (long-term) components of climate response to different forcing agents and highlighted spatially similar energy flux perturbations across greenhouse gases and aerosols. Traditional pattern scaling has been widely used to estimate spatial climate response patterns by scaling a reference pattern (often 2xCO2) but is limited by assumptions of linear scaling with forcing and inability to represent diverse, spatially heterogeneous responses, particularly for short-lived aerosols. Data science and machine learning applications in climate science have grown, including statistical learning to extract forced signals and ML-based parameterizations, but prior to this study no work directly predicted spatial long-term responses across a wide range of forcings from short-term simulations. The paper builds on intercomparison and perturbation experiment datasets (e.g., PDRMIP, ECLIPSE) and on emulator methodologies proposed for expensive environmental models, extending them toward spatially resolved, long-term climate predictions.
Methodology
Data: The authors use 21 long-term simulations from the HadGEM3 GCM (atmosphere, land, ocean, sea-ice; 1.875°×1.25° resolution; 27,840 grid cells) from PDRMIP, ECLIPSE, and Kasoar et al. studies. Forcings include: long-lived GHGs (CO2 ×2, CH4 ×3, CFC-12 ×10), solar constant (+2%), aerosols (e.g., SO4 ×5 global, BC ×10 global), and regional aerosol perturbations (e.g., SO4 ×10 over Europe/Asia; regional SO2 removals over US, Europe, East Asia, China; NH mid-latitude SO2/BC removals; global removals of SO2/BC/CO; CH4 emissions −20%). Scenarios with global mean responses not exceeding control variability were excluded to limit noise. Definitions: Short-term response is the average over simulation years 1–10; long-term response is a quasi-equilibrium average after discarding the first 70 years. Specifically: years 70–100 for PDRMIP and Kasoar et al.; years 70–80 for ECLIPSE (limited run length). Responses are computed relative to corresponding controls. Learning setup: The target variable is surface temperature at each grid cell. For each grid cell, an independent regression is learned that maps the entire short-term global response field x (all grid cells) to the long-term response y at that grid cell, allowing nonlocal predictive information. Models: (1) Ridge regression with L2 regularization to address the underdetermined setting (p=27,840 predictors vs. N=~20 training samples). Regularization strength λ is chosen by cross-validation. Implementation used scikit-learn. (2) Gaussian Process Regression (GPR): a Bayesian non-parametric model with linear mean function µ0(x)=βx and a squared-exponential covariance kernel C(x,x′)=σ² exp(−||x−x′||²/(2l²)). Hyperparameters (β, σ², l) are optimized via L-BFGS in GPy. For both models, training uses cross-validation and a leave-one-scenario-out scheme: train on all but one simulation, predict the left-out case; repeat so each scenario is predicted once. Benchmark: Pattern scaling baseline uses the long-term 2xCO2 response pattern as reference, scaled by the ratio of global mean effective radiative forcing (ERF) between the target scenario and 2xCO2. ERFs are computed from 5-year fixed-SST simulations averaged over 5 years. Evaluation: Errors are computed as (a) area-weighted grid-cell RMSE between predicted and GCM long-term responses; (b) absolute errors of regional mean and global mean temperatures over ten broad regions (Arctic, North America, South America, Europe, Northwest Asia, East Asia, Southeast Asia/Australia, Northern Africa, Southern Africa, South Asia). The methodology notes potential biases in grid-scale RMSE from small spatial displacements of patterns and complements it with regional-scale metrics. Additional analyses: Examined learned Ridge coefficients to identify influential short-term indicators regionally and globally; assessed impact of training sample size on accuracy by varying the number of training simulations; tested alternative predictors (e.g., 500 hPa temperature/geopotential height, radiative forcing, sea level pressure), dimensionality reduction (regional averaging, PCA), and other ML methods (LASSO, Random Forest).
Key Findings
- Predictive skill: Both ML methods (Ridge and GPR) capture broad spatial features (enhanced Northern Hemisphere and land warming, Arctic amplification) and, critically, the regional diversity of responses for heterogeneous forcings that pattern scaling misses. GPR generally yields the lowest errors among the tested methods. - Short-lived forcers: For aerosol scenarios with spatially heterogeneous emissions and short lifetimes, ML methods accurately reproduce a wider range and variability of regional temperature responses than pattern scaling, which underestimates variability because it imposes a fixed reference pattern regardless of the scenario. - Error magnitudes: Regional absolute errors show large spread across scenarios due to varying response magnitudes. Large absolute errors (e.g., ~1–2 °C for ML and >3 °C for pattern scaling) mostly occur for strongly forced scenarios and can be small relative to the response magnitude; for weakly forced scenarios, smaller absolute errors may be large relative to the response, increasing difficulty. Regional aerosol perturbations tend to be more challenging than long-lived GHG perturbations. - Early indicators: Analysis of Ridge coefficients reveals that certain regions’ short-term responses are strong indicators of long-term outcomes, including sea-ice areas (Arctic), high-altitude regions, primary emission regions, and mid-latitude jet regions. Europe’s predictions rely on remote Arctic signals, emphasizing nonlocal influences. - Data scaling: Increasing the number of training simulations improves mean accuracy and consistency across regions, indicating substantial potential gains from larger datasets. - Alternatives tested: Surface temperature as predictor outperforms other tested variables; dimension reduction via regional averaging or PCA degrades performance; shorter inputs (e.g., first 5 years) show promising skill; alternative ML models (LASSO, Random Forest) underperform compared to Ridge and GPR.
Discussion
The study demonstrates that short-term (first decade) spatial temperature responses contain sufficient information to predict long-term (post-70-year) quasi-equilibrium patterns for a wide range of forcing scenarios within a given GCM. This addresses the core objective of enabling fast emulation of long-term climate responses from computationally inexpensive short-term simulations. The superiority of ML over pattern scaling for short-lived, spatially heterogeneous forcers underscores the importance of data-driven approaches that can learn nonlocal and agent-specific patterns rather than relying on a single reference pattern. The identification of early indicator regions (e.g., Arctic sea-ice and high-altitude areas) suggests practical avenues for detection and attribution, leveraging regions with strong and rapid signals to inform broader long-term projections. The framework paves the way for multi-level emulation strategies: emulating short-term responses to diverse forcings and then mapping them to long-term outcomes, substantially reducing training costs for spatially resolved, long-term emulators. The gains in accuracy with added training data highlight the value of coordinated, multi-model, multi-scenario data sharing to enhance generalization and robustness.
Conclusion
This work introduces and validates a machine learning surrogate that predicts spatial patterns of long-term surface temperature response from short-term GCM simulations. Using Ridge regression and Gaussian Process Regression trained on HadGEM3 perturbation experiments, the approach outperforms traditional pattern scaling, particularly for short-lived aerosol forcings with strong regional imprints. The method identifies early indicator regions whose short-term signals are informative of long-term outcomes, offering insights for detection and attribution. The results indicate substantial potential to accelerate climate projections and to enable development of long-term, spatially resolved emulators via multilevel strategies. Future research should expand and diversify training datasets (more scenarios, stronger short-lived forcer perturbations, ensembles to better isolate forced signals), extend to other variables (e.g., precipitation), explore deep learning as data volume grows, and undertake multi-model collaborations to assess transferability and improve generalizability.
Limitations
- Data volume and diversity: Training is limited to 21 simulations from a single GCM (HadGEM3), creating a high-dimensional, low-sample regime (27,840 predictors vs. ~20 samples). This constrains model generalizability and robustness. - Internal variability: Only single realizations per perturbation are available; both short- and long-term averages contain noise from internal variability, especially in regions like Europe with weaker signal-to-noise, impacting learned mappings. - Regional challenges: Some regions (e.g., Europe) are particularly difficult due to large inter-scenario variability and dependence on remote, variable signals (e.g., Arctic), increasing prediction uncertainty. - Methodological constraints: Attempts at dimensionality reduction (regional averaging, PCA) degraded performance; alternative predictors and ML methods tested underperformed relative to chosen approaches, limiting options under current data constraints. - Baseline comparison: While ML outperforms pattern scaling for spatial variability, evaluation at grid scale can be sensitive to small spatial displacements; results are therefore complemented but also limited by chosen error metrics. - Model dependence: Learned relationships are specific to HadGEM3; cross-model transferability is untested and may require multi-model training and validation.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny