
Agriculture
Multi-annual prediction of drought and heat stress to support decision making in the wheat sector
B. Solaraju-murali, N. Gonzalez-reviriego, et al.
This research by Balakrishnan Solaraju-Murali and colleagues explores the effectiveness of decadal climate forecasts in predicting drought and heat stress impacts on wheat production globally. Their findings indicate a promising reliability and skillfulness of these forecasts, which can greatly aid in decision-making in the agricultural sector.
~3 min • Beginner • English
Introduction
The study addresses the need for usable near-term (decadal) climate predictions to support strategic decisions in climate-sensitive sectors, particularly wheat production. Wheat yields are strongly influenced by droughts and heatwaves, especially during flowering and grain filling. The research question is whether initialized decadal prediction systems can skillfully and reliably predict multi-annual drought and heat stress conditions, expressed through agro-climatic indices (SPEI and HMDI), during months prior to wheat harvest worldwide. The motivation stems from stakeholders’ needs (e.g., risk estimates, infrastructure investments, supply chain planning, breeding programs) identified in the EU H2020-MED-GOLD project. The study aims to convert existing decadal climate predictions into user-relevant, calibrated information and to evaluate their forecast quality (skill and reliability) over global wheat harvesting regions.
Literature Review
The paper situates its work within advances in decadal climate prediction and sectoral climate services. Prior studies have demonstrated skill in decadal predictions relevant to extremes and sectoral applications (e.g., Smith et al. 2019; Merryfield et al. 2020) and the need for calibration to ensure trustworthy predictions. Agricultural impacts of drought and heat on wheat yield are well documented (e.g., Zampieri et al. 2017; Dolferus et al. 2011). SPEI is a widely used drought index sensitive to PET estimation choices (Vicente-Serrano et al. 2010; Beguería et al. 2014; Sheffield et al. 2012). Heat stress relevance for wheat has been emphasized (Zampieri et al. 2017). Reliability and skill evaluation frameworks (RPS/CRPS and fair versions FRPSS/FCRPSS) address ensemble-size sensitivity (Müller et al. 2005; Weigel et al. 2007; Ferro 2014). The signal-to-noise and ensemble size considerations motivate future multi-model approaches (Scaife & Smith 2018).
Methodology
Data and forecasts: The study uses CESM-DPLE initialized decadal hindcasts (NCAR) with 10-year simulations initialized on November 1 each year from 1960 to 2014, forced with CMIP5 historical forcings pre-2015 and RCP8.5 thereafter. There are 40 ensemble members at ~1° resolution. Analysis focuses on forecast years 1–5 over global wheat-harvesting regions. To assess initialization value, comparisons are made against non-initialized CESM Large Ensemble (CESM-LE) historical simulations with the same external forcings but without contemporaneous state initialization.
Observational references: JRA-55 reanalysis for 2 m temperature (1958–present) and GPCC v2018 for precipitation. To gauge observational uncertainty, alternative verification with ERA5/ERA5 preliminary hourly for temperature and GPCC for precipitation is provided in supplementary material, showing consistent results.
Target regions and timing: Evaluation is performed only over wheat-producing grid boxes using MIRCA2000 crop calendars, aligning the aggregation windows with local harvest months (varying seasonally by region).
Indices:
- SPEI6 (drought): Compute monthly climate water balance (precipitation minus PET) accumulated over the six months prior to the local harvest month for each year and ensemble member; average across forecast years 1–5 for each start date. PET is estimated primarily with the Thornthwaite method; sensitivity tests with Hargreaves and modified Hargreaves show little impact on results. Standardize multi-annual averaged values using a three-parameter shifted log-logistic distribution fitted across all ensemble members and start dates; positive values indicate wet conditions, negative indicate dry.
- HMDI3 (heat stress): Define heatwaves as periods not interrupted by more than 3 days, with daily maximum temperature exceeding the 90th percentile (computed with a moving-window climatology for 1961–2018). The Heat Magnitude Day Index sums daily magnitudes above the 90th percentile during the three months prior to harvest. Compute annual HMDI3, then average across forecast years 1–5 for each start date; values >0 indicate heatwaves, with larger values indicating higher intensity.
Calibration: Apply variance inflation calibration (Doblas-Reyes et al. 2005) in cross-validation to adjust forecast interannual variance to match observations at each grid point while preserving ensemble-mean correlation. Due to non-Gaussianity, apply square-root transform to HMDI prior to calibration; SPEI6 standardization handles its distributional properties. Calibration is performed for both indices.
Skill and reliability assessment: Compute probabilistic skill with fair Ranked Probability Skill Score (FRPSS) for tercile categories (below-normal, normal, above-normal) and fair Continuous Ranked Probability Skill Score (FCRPSS) for full distributions, using climatology as baseline. Reliability is assessed with reliability diagrams comparing forecast probabilities to observed frequencies. To assess initialization impact, compute skill relative to the calibrated uninitialized historical simulations as baseline. Spatial maps are averaged over forecast years 1–5 for 1961–2018.
Communication example: Provide a climate service-style product (maps of most likely tercile probabilities for 2014–2018 from Nov 2013 start) and time series for Italian durum wheat regions (Jesi, Ravenna, Foggia), with tercile thresholds and ensemble probability calculation (e.g., 28/12/0 of 40 members translating to 70/30/0% for SPEI6 categories).
Key Findings
- SPEI6 skill: Unadjusted forecasts show positive FRPSS across most wheat regions, with exceptions in parts of Africa and the Americas; FCRPSS is low in many regions except Iberian Peninsula, South Africa, Western US, Australia, and the Middle East. Calibration substantially increases FCRPSS globally, especially where unadjusted FCRPSS was negative; FRPSS generally increases but with limited improvement in some areas (e.g., Eastern Europe, Middle East). Reliability diagrams indicate reliable below- and above-normal categories globally, further improved by calibration; the normal category is generally unreliable except in Asia and Africa.
- Drivers of SPEI6 predictability: Skill arises from high predictive skill in the underlying variables (six-month accumulated PET and precipitation) and the model’s ability to capture their influence on SPEI6.
- HMDI3 skill: Unadjusted forecasts yield best FRPSS/FCRPSS over Europe, Western US, and South Africa; negative scores in Eastern US, South America, and Angola. Calibration increases FRPSS in select regions (e.g., Eastern US, Andean countries, UK, parts of Africa, Northern Australia) and raises FCRPSS broadly, with biggest gains where unadjusted scores were highly negative. Skill is linked to the system’s ability to predict maximum temperature in the 3 months prior to harvest. Reliability for below/above-normal is good pre-calibration and improves post-calibration; normal category tends to be under-confident at high probabilities pre-calibration and improves after calibration.
- Initialization impact: Relative to uninitialized historical simulations, initialized decadal predictions improve SPEI6 skill over most wheat regions, notably Australia, Central Europe, South Africa, and Eastern US; HMDI3 improvements occur in Western/Central Europe, Central US, South America, South Africa, and southern Australia. Reliability modestly improves with initialization, particularly for Europe and Australia.
- Application example (2014–2018, start Nov 2013): Calibrated forecasts indicate increased likelihood of drought (SPEI6 below-normal) and heatwaves (HMDI3 above-normal) before harvest across many wheat regions; spatial patterns agree well with observed indices. At Jesi, Italy, multi-annual SPEI6 forecasts capture wet-to-dry transitions and slow drought variability; HMDI3 shows an increasing heat stress trend since the early 1990s in both forecasts and observations. Example probability calculation: for 2014–2018 at Jesi, 28/12/0 of 40 members in below/normal/above SPEI6 terciles correspond to 70/30/0% probabilities.
Discussion
The study demonstrates that decadal predictions, once calibrated, provide skillful and reliable probabilistic information on multi-annual drought and heat stress relevant to wheat harvest periods. Calibration is essential to correct systematic variance errors and enhance both skill (FRPSS/FCRPSS) and reliability, particularly for the below- and above-normal categories. Initialization adds value beyond forced, uninitialized simulations, improving skill in several key wheat-growing regions. These findings address the primary research question by confirming the feasibility of converting decadal climate forecasts into actionable agro-climate indices for decision support. The results have practical significance for climate services, enabling risk assessments, strategic planning (e.g., infrastructure, supply contracts), and breeding program prioritization. The approach is transferable to other crops (e.g., maize, rice) and sectors where water and heat stress are critical.
Conclusion
This work provides an end-to-end assessment showing that initialized, calibrated decadal forecasts can skillfully and reliably predict multi-annual drought (SPEI6) and heat stress (HMDI3) conditions across many wheat-growing regions, supporting climate services for strategic decisions in the wheat sector. The methodology—index computation aligned with crop calendars, calibration, and probabilistic verification—offers a template for operational products, illustrated by global maps and site-specific time series. Future research should: (i) expand to multi-model ensembles to increase robustness and ensemble size; (ii) investigate links to low-frequency climate modes (AMV, PDO) to better understand predictability sources; and (iii) continue co-design with stakeholders to refine products, visualization, and communication for user uptake.
Limitations
- Single-model assessment (CESM-DPLE) limits generalizability; multi-model ensembles and larger ensembles are likely to improve reliability and skill.
- Calibration assumes Gaussianity for variance inflation; non-Gaussian indices (e.g., HMDI) require transformations and may still be imperfectly handled.
- PET estimation primarily via Thornthwaite; while sensitivity tests (Hargreaves variants) show limited impact, more advanced methods (e.g., FAO-56 Penman-Monteith) were not assessed and may affect regional results.
- Normal tercile category exhibits low reliability in many regions, reflecting weak driving signals and forecast challenges for near-normal events.
- Dependence on reanalysis/observational datasets (JRA-55, GPCC) introduces observational uncertainties despite cross-checks with ERA5.
- Skill is reported for multi-annual averages over forecast years 1–5 and may differ for other lead-time windows or single-year predictions.
Related Publications
Explore these studies to deepen your understanding of the subject.