Earth Sciences

Artificial intelligence reveals past climate extremes by reconstructing historical records

É. Plésiat, R. J. H. Dunn, et al.

This groundbreaking study by Étienne Plésiat, Robert J. H. Dunn, Markus G. Donat, and Christopher Kadow harnesses artificial intelligence to revolutionize our understanding of historical European climate extremes from 1901 to 2018. With a cutting-edge CRAI model, the research overcame dataset limitations, enhancing risk management and policy development through improved climate extremity characterization.... show more

Introduction

Recent years have seen unprecedented warmth globally and in Europe, with 2022 and 2023 featuring exceptional heat, drought, and wildfire activity. Understanding how recent extremes compare with historical events and how trends vary regionally requires long, spatially consistent records of extreme indices. Observations-based extreme indices (ETCCDI, e.g., TX90p, TN90p, TX10p, TN10p) provide a model-bias-free perspective but suffer from missing data, especially pre-1960. Traditional spatial infilling methods (IDW, ADW, Kriging) can oversmooth, require variogram choices, or struggle when data are sparse. The study addresses the question: Can AI reconstruct European monthly temperature extreme indices over 1901–2018 with improved accuracy and spatial realism compared to conventional methods, enabling better historical context and trend analysis? The authors propose a CNN-based approach (CRAI) trained with transfer learning from CMIP6 simulations, targeting the non-infilled HadEX3 intermediate product (HadEX-CAM) over Europe, to deliver improved reconstructions suitable for extreme event analysis and long-term trends.

Literature Review

Prior work shows deep learning can outperform classical interpolation for climate data inpainting, notably AI reconstructions of HadCRUT4, achieving lower RMSE and higher spatial correlation than statistical methods. CNN approaches emphasize pixel-level accuracy; GANs emphasize realism, with the choice task-dependent. Traditional methods have known limitations: ADW depends on angular/spatial proximity and can oversmooth; Kriging better captures spatial variability but requires variogram modeling, is sensitive to outliers, and can be computationally expensive. Recent advancements include CNN/GAN inpainting for SST, radar, soil moisture, and physics-informed methods. These motivate a CNN-based method with partial convolutions tailored to irregular, large missing regions seen in historical extremes indices.

Methodology

Data and targets: The study reconstructs four ETCCDI monthly temperature indices over Europe on a 1.875° × 1.25° grid: TX90p (warm days), TX10p (cool days), TN90p (warm nights), TN10p (cool nights). An intermediate non-infilled dataset, HadEX-CAM, is built from the same station inputs as HadEX3 using the Climate Anomaly Method (CAM): station indices are converted to anomalies (1981–2010), averaged within grid boxes when available, and then converted back to absolute values using the theoretical 10% (36.5 days) climatology, masking any values outside 0–100%. A problematic station (Lugano) was excluded for quality control. AI architecture (CRAI): CRAI employs a U-Net with partial convolution layers, better suited for reconstructing large, irregular missing regions. Inputs are the masked index fields and corresponding binary masks. Training minimizes MAE on missing regions over land only, with a rescaled sigmoid on output to constrain results to 0–100%. Stochastic gradient descent is used with learning rate 5×10⁻⁵ up to 1,000,000 iterations. A hyperparameter grid search (on TX90p) determined the network depth and channels; the same configuration is used for all indices. For each index, 20 independently trained models are ensembled (mean) and the ensemble spread provides uncertainty. Training data and transfer learning: Because HadEX-CAM is sparse, training uses transfer learning on monthly ETCCDI indices computed from 45 historical runs of 8 CMIP6 models (1901–2014), remapped to the HadEX-CAM grid after index calculation. Masks sampled from HadEX-CAM are applied to complete CMIP6 fields to create training inputs. Data are split randomly into train/validation/test due to low month-to-month autocorrelation: 50,616 train (37/month), 9,576 validation (7/month), 1,368 test (1/month). Evaluation datasets and metrics: Generalization is assessed on three types of data not used in training: (i) simulation test set (CMIP6), (ii) reanalysis (ERA5; 1940–2018 primary, with 20CRv3 for early 20th-century comparisons), and (iii) observations-based HadEX-CAM with additional masking stress test (retaining only grid points valid in January 1901 at all times). Metrics include RMSE (percent), Spearman rank-order correlation coefficient (SROCC), Wasserstein distance (WD), and R², computed on reconstructed values only; SROCC is averaged over time steps. Comparative methods: Baselines include IDW (power=4) and ordinary Kriging (exponential variogram; 200 bins), with PyKrige-based implementation and hyperparameter tuning on the test set. State-of-the-art diffusion models are trained per index using guided-diffusion code with 2000 diffusion steps, a U-Net (channels 128, 256, 512, 256, 128, 1; 3 encoder/decoder layers), batch size 16, learning rate 1×10⁻⁴, 500k iterations. For diffusion, 20 stochastic runs per dataset are averaged to a single reconstruction. Uncertainty: The ensemble of 20 CRAI models per index provides spread estimates; standard deviations are mapped to quantify reconstruction confidence, notably for case studies (e.g., September 1911). Analyses: Performance is compared spatially and temporally (RMSE differences CRAI minus Kriging), and reconstructions are used to compute regional means (Europe; NEU, WCE, MED) and to perform spatial linear trend analyses (1901–2018 and 1980–2018) using the Theil–Sen median of pairwise slopes. Spatial structure realism is assessed via Moran’s I. Case studies of known extremes (Nov 1947 warm spell; Sep 1911 heatwave; Feb 1929 coldwave) cross-reference reanalyses and independent proxies (e.g., French mortality).

Key Findings

Overall performance vs statistical and generative methods:
- Test dataset (CMIP6, masked with HadEX-CAM patterns): CRAI outperforms IDW and Kriging across indices and metrics; diffusion models are close but CRAI is superior in a majority of cases.
  - TX90p: RMSE CRAI 4.29 vs IDW 5.47, Kriging 5.12, Diffusion 5.04; SROCC CRAI 0.85 vs IDW 0.80, Kriging 0.81, Diffusion 0.82.
  - TX10p: RMSE CRAI 5.79 vs IDW 7.26, Kriging 6.96, Diffusion 5.79 (tie); SROCC CRAI 0.88 vs IDW 0.84, Kriging 0.85, Diffusion 0.88 (tie).
  - TN90p: RMSE CRAI 4.17 vs IDW 5.02, Kriging 4.77, Diffusion 4.14 (slightly better); SROCC CRAI 0.84 vs IDW 0.80, Kriging 0.81, Diffusion 0.84 (tie).
  - TN10p: RMSE CRAI 6.20 vs IDW 7.55, Kriging 7.12, Diffusion 6.08 (slightly better); SROCC CRAI 0.86 vs IDW 0.82, Kriging 0.84, Diffusion 0.86 (tie).
- ERA5 (1940–2018; masked similarly): CRAI is best for all indices and metrics, indicating strong generalization to observationally constrained data.
  - TX90p: RMSE 4.39 (best), SROCC 0.87 (best) vs Diffusion RMSE 4.71, SROCC 0.86; Kriging RMSE 5.08, SROCC 0.84; IDW RMSE 5.52, SROCC 0.82; HadEX3 RMSE 6.17, SROCC 0.75.
  - TX10p: RMSE 4.70 (best), SROCC 0.88 (best); Diffusion RMSE 4.88; Kriging 5.31; IDW 5.83; HadEX3 8.55.
  - TN90p: RMSE 4.33 (best), SROCC 0.87 (best); Diffusion RMSE 4.36; Kriging 4.95; IDW 5.26; HadEX3 5.96.
  - TN10p: RMSE 5.24 (best), SROCC 0.87 (best); Diffusion RMSE 5.39; Kriging 5.81; IDW 6.30; HadEX3 9.23.
Robustness under data scarcity: CRAI shows lower RMSE than Kriging across most regions and times; advantages are larger where/when missing data are more prevalent (e.g., Mediterranean, northern Africa; early decades 1901–1960).
Spatial realism: CRAI reduces over-smoothing evident in HadEX3’s ADW fields; Moran’s I indicates CRAI and ERA5 have more intricate spatial structures than HadEX3.
Regional means and agreement: At European scale, datasets broadly agree on increasing warm days (TX90p) and decreasing cool nights (TN10p), consistent with IPCC AR6. Discrepancies are largest where HadEX-CAM is sparsest (e.g., MED region, pre-1960), with HadEX3 differing more from CRAI/ERA5.
Case studies:
- November 1947 warm spell: CRAI and ERA5 show localized Iberian/Western Morocco warmth; HadEX3 overextends warmth into North Africa due to sparse observations and ADW smoothing.
- September 1911 heatwave: CRAI reconstructs TX90p > 40% in southern France with low ensemble uncertainty (<3%); correlation with French departmental excess mortality is significant (SROCC 0.39 TX90p; 0.51 TN90p), supporting CRAI’s spatial depiction; HadEX3 signals are weaker.
- February 1929 coldwave: CRAI shows >80% of days below the 10th percentile across central Europe, with spatial gradients consistent with contemporaneous reports and 20CRv3; HadEX3 less clearly represents regional contrasts (e.g., milder Scandinavia).
Trend analysis (1901–2018): CRAI reveals finer spatial heterogeneity than HadEX3. TX90p trends are limited in North Africa and along parts of southern Turkey/Syria, but stronger in central Europe and the Baltic. TN10p shows large decreases in North Africa and western Europe in both datasets, but CRAI indicates additional regional contrasts (e.g., negatives in parts of Ukraine/Romania) and smaller decreases in the Middle East versus HadEX3. TN90p patterns are broadly similar between datasets, with pronounced increases from western Norway to Georgia.
Recent trends (1980–2018): CRAI’s spatial structures resemble ERA5 more than HadEX3, including localized extremes (e.g., around the Black Sea), and show more realistic spatial autocorrelation (Moran’s I).

Discussion

The study demonstrates that AI-based reconstruction (CRAI) trained via transfer learning on CMIP6 can reliably infill historical monthly temperature extreme indices in Europe, outperforming established statistical approaches and matching or surpassing diffusion models with lower computational cost. This addresses the core question of whether AI can recover historical extremes and trends with improved fidelity, particularly under data sparsity typical of the early 20th century. CRAI’s gains are most pronounced where conventional methods falter—periods/regions with scarce observations—yielding more realistic spatial variability, better alignment with reanalyses, and corroboration by independent proxies (e.g., mortality in 1911). Consequently, CRAI reconstructions enable more nuanced assessments of regional climate risk and long-term changes, while maintaining consistency with continent-scale signals documented by IPCC AR6 (more warm days, fewer cool nights). The improved spatial detail enhances the interpretation of historical events and trend heterogeneity, which is valuable for regional planning, attribution studies, and policy design. Results suggest CRAI’s generalization capacity across datasets and its robustness to sparse masks, supporting its broader application and integration alongside reanalyses as an independent, observations-based product.

Conclusion

This work delivers an AI-infilled dataset of European monthly temperature extreme indices (TX90p, TX10p, TN90p, TN10p) for 1901–2018 that maintains the observational basis of HadEX3 while offering higher spatial fidelity and lower reconstruction error than IDW, Kriging, and generally diffusion models. CRAI improves reconstruction accuracy especially in data-sparse contexts, clarifies historical extreme events (e.g., 1911 heatwave, 1929 coldwave), and reveals spatially heterogeneous long-term trends consistent with, yet more detailed than, prior assessments. Future research directions include: (1) extending the approach globally and to regions with greater data scarcity; (2) incorporating additional correlated datasets (e.g., HadCRUT5 monthly means) to further constrain reconstructions; (3) applying neural downscaling to enhance spatial detail; and (4) training on daily observations to project to monthly ETCCDI indices, leveraging richer temporal information. These advances can further improve the characterization of climate extremes and support targeted climate risk management and policy.

Limitations

Performance degrades across all methods (including CRAI) under extreme masking and sparse valid data, with reduced metric separation between methods when a single severe mask is applied (e.g., January 1901 mask).
The evaluation using ERA5 is limited to 1940–2018; earlier periods rely on 20CRv3 and proxy/qualitative corroboration.
CRAI is trained on CMIP6-simulated indices (transfer learning), which could introduce model-dependent features, although strong agreement with reanalyses and proxies mitigates this concern.
Trend estimation uses linear Theil–Sen slopes over 1901–2018 to summarize non-linear changes, potentially obscuring temporal variability.
Reconstructions are at monthly index level rather than from daily temperatures, which may limit capturing some sub-monthly dynamics; authors note future work to leverage daily data.
The approach and results are currently limited to Europe; generalization to other regions may face different station densities and climatic regimes.

Related Publications

Explore these studies to deepen your understanding of the subject.

Engineering and Technology

Tracking artificial intelligence in climate inventions with patent data

V. Verendel

Chemistry

Single-atom alloy catalysts designed by first-principles calculations and artificial intelligence

Z. Han, D. Sarker, et al.

Space Sciences

Fe-rich X-ray amorphous material records past climate and persistence of water on Mars

A. D. Feldman, E. M. Hausrath, et al.

Medicine and Health

Patient-level proteomic network prediction by explainable artificial intelligence

P. Keyl, M. Bockmayr, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny