logo
ResearchBunny Logo
Artificial intelligence achieves easy-to-adapt nonlinear global temperature reconstructions using minimal local data

Earth Sciences

Artificial intelligence achieves easy-to-adapt nonlinear global temperature reconstructions using minimal local data

M. Wegmann and F. Jaume-santero

Dive into the fascinating world of climate science! This research by Martin Wegmann and Fernando Jaume-Santero introduces a machine learning method using Recurrent Neural Networks to reconstruct climate variability from sparse data, yielding realistic temperature patterns efficiently. Discover how this innovative approach can adapt to various regions and time periods!

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge of reconstructing realistic monthly-to-annual climate variability, which is crucial for understanding and adapting to climate extremes under ongoing global warming. Observational records are sparse further back in time, and paleoclimate proxies are noisy, seasonal, and unevenly distributed, particularly underrepresenting winters. Traditional reconstruction methods often assume stationarity and linearity, and reanalyses require expensive model priors. The authors propose a flexible, low-cost machine learning approach using recurrent neural networks (RNNs) to reconstruct global temperature anomaly fields from sparse local time series, testing whether simple sequence models trained on existing reanalyses and climate model outputs can learn non-linear spatiotemporal relationships and produce robust reconstructions over centuries.
Literature Review
Prior work on climate field reconstructions includes kriging, principal component regression (PCR), and Bayesian algorithms that often rely on linearity and stationarity assumptions and typically emphasize regional reconstructions. Reanalyses assimilate observations into model priors to yield four-dimensional fields but depend on costly GCM outputs and are limited by data availability. GCMs have known deficiencies in representing climate variability magnitude and patterns in key regions. Recent applications of deep learning in geosciences show promise for extracting features from gridded data, forecasting, and emulating physical systems, with some work on reconstructing missing gridded fields and time series, but limited application to reconstructing global fields from sparse local inputs. Meta-heuristic methods have been used to optimize station/proxy network design. These studies highlight gaps: handling nonlinearity, minimizing costs, and leveraging sparse, noisy, and unevenly distributed inputs, motivating the present RNN-based approach.
Methodology
Data and inputs: The authors use three gridded training datasets of monthly 2 m temperature: (1) NOAA 20th Century Reanalysis v3 (20CRv3; 1851–2015 used), ensemble mean and 80 members; (2) MPI Grand Ensemble (MPI-GE; 1850–2005), 100 members; (3) CESM Last Millennium Ensemble (CESM-LME; 850–2005), 13 members. All are regridded to EKF400v2’s T63 grid (1.875°). Monthly anomalies are computed relative to 1951–1980. Pseudo-stations: 25 pseudo-station locations are selected based on long, realistic historical station distribution from the ISTI database (all in the Northern Hemisphere, many in Western/Central Europe and North America). For each training dataset, nearest-neighbor values at these coordinates are extracted to form 25 monthly anomaly time series. Training sizes: Two sample sizes are used: N=1,980 months (e.g., 1851–2015 for 20CRv3) and N=20,000 months (randomly sub-sampled from model ensembles). A sensitivity test with MPI-GE suggests N=10,000 with 5% dropout avoids overfitting, whereas N=5,000 requires ~20% dropout. Model architectures: 140 deep learning models are tested across RNN, LSTM, GRU, and 1D CNN architectures, varying layers, neuron counts, and dropout (including recurrent dropout). Overfitting is assessed via validation loss; performance via MSE and Pearson correlation on 1,000 withheld time steps per training dataset. Selected model: Based on evaluations, a simple LSTM is selected for testing: one LSTM layer with 64 units, tanh activation, 5% dropout, followed by a dense layer producing 18,432 outputs reshaped to a 96×192 grid (T63). The network has 955,232 trainable parameters. Inputs per time step are sequences across time of 25 locations with three features each (latitude, longitude, and temperature anomaly). Optimization uses Adam (learning rate 1e-4) with MSE loss; train/validation split is 80/20. Five ensemble members are trained and averaged to produce each reconstruction. Training on a CPU laptop takes under an hour; GPU (e.g., RTX 3050 Ti) reduces this to 5–10 minutes. Validation and evaluation: For each training dataset, 1,000 time steps not used in training are reconstructed and compared to the source data using MSE and Pearson correlation. Simpler one-layer LSTM/GRU models with small dropout perform best; CNNs perform acceptably only with 20CRv3 multi-member training data. Testing: Using the selected LSTM, the 25-location EKF400v2 time series (ensemble mean and individual members) for 1602–2003 (N=4,824 months) are input to produce global reconstructions (termed 20CRv3-REC, MPI-GE-REC, CESM-LME-REC according to training source). Reconstructions are compared against EKF400v2 (ensemble mean) at monthly, seasonal (JJA, DJF), and annual scales via correlation maps, field correlations, and bias assessments. Additional comparisons are made to LMRv2 (annual only) and to a cold-season Bayesian reconstruction (CSR; Oct–May, 1701–1905) for case studies. Linear baseline: A PCR reconstruction is trained on MPI-GE (N=20,000) with a calibration period 1850–2003 (overlap between MPI-GE and EKF400v2). Performance is assessed on 1602–2003, pre-calibration (1602–1849), and calibration (1850–2003) periods. Significance testing: Two-sided t-tests are used to assess significance in correlation maps; noise sensitivity is probed by using EKF400v2 individual-member pseudo-stations and averaging 30 reconstructed members. Computational setup: All data are open-access; preprocessing includes bilinear interpolation to T63 and anomaly computation. The pipeline is implemented with open-source software; code and data links are provided.
Key Findings
- Model selection and overfitting: For small training size (N=1,980), most RNNs overfit unless dropout ≥20%; with N=20,000, 5% dropout suffices. Simple, single-layer LSTM/GRU models (32–64 neurons for small N; up to 256 for large N) outperform deeper or convolutional models in validation loss and correlation. CNNs perform adequately only with 20CRv3 multi-member training. - Reconstruction skill (MPI-GE-trained LSTM, N=1,980) against EKF400v2: Global mean Pearson correlations are approximately 0.57 (JJA), 0.62 (DJF), 0.51 (monthly), and 0.71 (annual). Adding pseudo-station noise (using individual EKF400v2 members and averaging 30 reconstructions) lowers correlations to ~0.38 (monthly) and ~0.63 (annual). Increasing training to N=20,000 improves monthly correlations (~0.53) and enhances regional skill (e.g., W. US, equatorial W. Africa, E. Mediterranean). Southern Hemisphere skill remains lower due to lack of stations. - Spatial/seasonal patterns: NH winter (DJF) reconstructions are more skillful due to strong planetary wave teleconnections; summer (JJA) performance is lower and more sensitive to training data. Tropical ENSO region shows weak correlations across all training datasets, reflecting challenges in teleconnections and boundary SST representations. Central North Atlantic is difficult for all models, less so for 20CRv3-trained models, likely due to assimilated ship pressure data. - Time-evolving field correlations (10-yr running): In JJA, 20CRv3-REC > MPI-GE-REC > CESM-LME-REC for most decades; CESM-LME-REC often insignificant over two-thirds of the period. In DJF, all three perform similarly, with MPI-GE-REC frequently outperforming 20CRv3-REC, indicating strong representation of winter teleconnections in MPI-GE. Correlation skill generally increases over time, reflecting improved EKF400v2 constraints and training familiarity post-1850. - Bias characteristics: MPI-GE-REC vs EKF400v2 (1602–2003) shows widespread positive biases, strongest over the Southern Ocean; Arctic slightly cooler than EKF400v2 but warmer than LMRv2, indicating possible EKF400v2 warm Arctic bias. Discrepancies between EKF400v2 and 20CRv3 (1836–1850) align with MPI-GE-REC differences, suggesting EKF400v2 may be too warm in the European Arctic and too cold over many continents. RNN reconstructions have narrower anomaly distributions and a slight positive bias, underrepresenting cold extremes; 20CRv3-trained reconstructions display a wider distribution than model-trained ones. - Case studies (early 19th century cold seasons): MPI-GE-REC reproduces locations of maxima/minima comparable to EKF400v2 and CSR, though with slightly reduced magnitudes; 20CRv3 is often the outlier among datasets. - Linear vs non-linear (MPI-GE training): LSTM outperforms PCR globally across all periods: global mean correlations ~0.53 vs 0.46 (1602–2003), 0.34 vs 0.22 (1602–1849), and 0.53 vs 0.44 (1850–2003). Regionally, LSTM excels in northern extratropics; PCR performs better over the tropics and some extratropical oceans, likely benefiting from calibration to EKF400v2 SST-related teleconnections and systematic ocean biases. - Efficiency and accessibility: Training plus reconstructing >4,800 months of global fields takes about 1 hour on a mid-range CPU laptop and minutes on a standard GPU, using open data and open-source tools.
Discussion
The results demonstrate that simple sequence models (LSTM/GRU) can learn non-linear spatiotemporal relationships from sparse local time series to reconstruct global monthly temperature anomalies. The approach captures key features of climate variability, particularly in the NH extratropics and during boreal winter where planetary waves facilitate teleconnections, addressing the challenge of limited observations and the need for realistic variability. Seasonal and regional differences reflect both model training sources and baseline uncertainties: data assimilation in 20CRv3 aids summer features; MPI-GE provides strong winter teleconnections; the tropical ENSO region remains challenging due to complex teleconnections and SST boundary condition uncertainties. Compared to PCR, the LSTM produces higher global correlations without needing a calibration period, indicating advantages in handling nonlinearity, time dependence, and extrapolation. Bias analyses suggest that reconstruction biases partly mirror training data characteristics and baseline uncertainties (e.g., EKF400v2 vs LMRv2 discrepancies), highlighting the importance of training data choice and evaluation against multiple baselines. Overall, the method provides a fast, low-cost, and adaptable tool that complements existing linear methods and paleo-reanalyses, expanding access to climate reconstructions and enabling event-scale analyses.
Conclusion
The study introduces a fast, adaptable RNN-based framework that reconstructs global monthly temperature anomalies from sparse local inputs, achieving competitive or superior performance to a PCR baseline while requiring modest computational resources. Simple LSTM architectures trained on open reanalyses and model outputs produce realistic patterns and variability, particularly strong in boreal winter and extratropical regions, and can reconstruct specific historical events. The approach is flexible and extensible to other regions, periods, variables, and input types, potentially including proxies and even non-numeric historical documents. Future improvements include: incorporating more and better-distributed input data; optimizing station/proxy locations via meta-heuristics; adding seasonal and covariate information; extending training periods; more thorough hyperparameter and loss-function tuning; and generating larger reconstruction ensembles. Training data selection matters—20CRv3 aids summer features; MPI-GE excels in winter teleconnections; CESM-LME was less suitable in this application. The RNN method, PCR, and paleo-reanalyses are complementary and could be combined, for example by assimilating RNN reconstructions into paleo-reanalyses to reduce uncertainties.
Limitations
- Sparse and uneven input network: Only 25 pseudo-stations, all in the Northern Hemisphere with clustering in Europe, limiting SH and tropical skill and reducing representation of local processes, especially in summer. - Dependence on baseline and pseudo-station construction: Pseudo-stations are sampled from gridded EKF400v2 fields (ensemble mean or members), potentially inheriting its biases and smoothing, which affects magnitude and extremes. Evaluation relies on imperfect baselines (EKF400v2, LMRv2), complicating absolute skill attribution. - Training data sensitivities: Small training sizes risk overfitting; performance varies with training source (e.g., CESM-LME underperforms), and seasonal skill differences reflect properties of the training datasets (assimilation vs free-running models). - Magnitude biases and extremes: Reconstructions show narrower anomaly distributions and a slight warm bias relative to EKF400v2, underrepresenting cold extremes, especially far from station locations and over the Southern Ocean. - Variable/temporal limitations: The study focuses on monthly 2 m temperature anomalies without explicit seasonality inputs; other variables (e.g., precipitation) with strong local, short-term dynamics may be harder to reconstruct. Tropical land climate dominated by daily variability is not captured from monthly anomalies. - Station selection not optimized: Locations were chosen for realism rather than optimal covariance coverage; meta-heuristic optimization could improve network design. - Linear baseline constraint: PCR benefits from a calibration period partly removing biases, giving it an advantage in some regions (tropics, oceans), making direct comparisons nuanced.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny