Agriculture
The optimization of model ensemble composition and size can enhance the robustness of crop yield projections
L. Li, B. Wang, et al.
The study addresses how the composition and size of linked climate and crop model ensembles influence projections of crop yield changes under climate change and the associated uncertainties. With climate change threatening global food security and increasing the risk of simultaneous failures across breadbasket regions, accurate and robust yield projections are critical for informing adaptation and policy. Prior multi-model assessments show largely negative impacts on yields with substantial uncertainty stemming from both climate (GCM) and crop (GGCM) models. However, the effect of model selection (which GGCMs and GCMs) and ensemble size on the magnitude and attribution of uncertainty remains unclear. This work aims to quantify how ensemble configuration shapes yield outcomes and uncertainty attribution, to determine minimum effective ensemble sizes, and to provide guidance for designing representative and efficient ensembles for global and regional agricultural impact assessments under a high-emissions scenario (SSP585) for the late 21st century.
Previous intercomparisons (e.g., AgMIP GGCMI) using CMIP5/CMIP6 show large uncertainties in projected yield changes, with GGCM structure often dominating variance in relative yield changes, though shares vary by crop, region, and time (Rosenzweig et al. 2014; Müller et al. 2021; Jägermeyr et al. 2021). Studies found site- or region-specific dominance of uncertainty sources (e.g., Wang et al. 2020 for wheat), and that ensemble composition (which models included) can strongly influence results. On ensemble size, prior work suggested thresholds for climate models to span projection space (e.g., ≥13 GCMs for full spread; representative 5-GCM subsets capturing basic classes) and for crop models to ensure skill for historical periods (e.g., ≥8 models for maize in SSA), but these do not necessarily translate to adequately sampling future uncertainty. The literature highlights sensitivity of projections to CO2 fertilization effects, temperature response functions, and model process representations, and recommends careful model selection, weighting, or sub-selection to avoid redundancy, while acknowledging the limited guidance available for global-scale crop yield ensemble design.
The study used two complementary datasets and a unified framework under SSP585 for 2069–2099 relative to 1980–2010. (1) GGCMI Phase 2 emulators: nine GGCM emulators for wheat and maize (CARAIB, EPIC-TAMU, JULES, GEPIC, LPJ-GUESS, LPJmL, PDSSAT, PEPIC, PROMET) and eight for rice and soybean (all except LPJ-GUESS for these crops) were driven by growing-season mean temperature and precipitation changes from 32 CMIP6 GCMs plus atmospheric CO2 (baseline 360 ppm; future 810 ppm). Future monthly ΔT and ΔP from each GCM were applied to AgMERRA historical daily data via a change factor method to generate emulator inputs at 0.5° resolution; no bias correction was necessary for Δ fields. Rainfed and irrigated yields were simulated and aggregated using Monfreda et al. harvested areas; for wheat, winter and spring wheat were distinguished following Müller et al. and then combined. Nitrogen inputs followed Elliott et al.; adaptation was set to A0 (no growing season adaptation). (2) GGCMI Phase 3 simulations: 12 process-based GGCMs driven by bias-adjusted daily CMIP6 outputs from five selected GCMs (GFDL-ESM4, IPSL-CM6A-LR, MPI-ESM1-2-HR, MRI-ESM2-0, UKESM1-0-LL) at 0.5° resolution were used for comparison of uncertainty attribution. Analysis steps: (a) Regionalization: global cropping areas were partitioned into 12 sub-regions per crop using k-means based on baseline crop productivity (multi-model mean), climate variables (mean temperature, solar radiation, precipitation, relative humidity, wind speed), location (lat/lon), and nitrogen input, to reflect environmental and management heterogeneity. (b) Cluster analysis of ensemble members: agglomerative hierarchical clustering with Canberra distance was applied to grid-level relative yield change fields to group GGCM×GCM ensemble members into three clusters representing distinct spatial-magnitude patterns (approximately decrease, small change, increase). Cluster trees (family trees) were constructed to diagnose grouping by GGCM vs GCM. (c) Uncertainty attribution: two-way ANOVA decomposed variance of relative yield changes into contributions from GGCM, GCM, and their interaction at grid, zonal (latitude-mean), and sub-regional scales. (d) Subsampling experiments: to quantify how ensemble size affects captured uncertainty, exhaustive combinations were evaluated for GGCM subsets (i from 2 to 9 for wheat/maize; 2 to 8 for rice/soybean). For GCM subsets (32 total), random combinations (up to 100,000 per i for i=6–27) were used due to combinatorial explosion; exhaustive subsets for i≤5 and i≥28. Minimum effective ensemble sizes were defined when the estimated uncertainty proportion was within ±2.5% of the full GGCM ensemble or within ±1.5% of the full GCM ensemble. (e) Influence of individual GGCMs: leave-one-out analyses removed each GGCM in turn to assess changes in the GGCM-induced uncertainty share spatially. Both emulator-based and Phase 3 datasets were analyzed for consistency and contrasts. All analyses considered CO2 fertilization unless stated otherwise.
- Cluster structure and drivers: Hierarchical clustering revealed three distinct patterns of future relative yield change for each crop, often grouping by GGCM rather than GCM, indicating GGCM process differences frequently dominate ensemble structure. However, in GGCMI Phase 3, some clusters for wheat were driven by specific high-ECS GCMs (IPSL-CM6A-LR, UKESM1-0-LL), reflecting strong climate forcing sensitivity in certain GGCMs. Spatially, some regions consistently exhibited yield losses across clusters (e.g., southern Brazil and southern Africa for wheat). - Uncertainty attribution and dependence on ensemble composition: Across the full emulator ensembles, GGCMs generally contributed a larger share of uncertainty than GCMs at global and many regional scales. Yet, dominant sources varied by cluster and region; similar-sized subsets with different model compositions exhibited markedly different uncertainty attributions (e.g., for maize cluster 2 vs cluster 3, GCM dominated uncertainty in Northern Europe, Eastern Asia, and the USA in cluster 2, while GGCM dominated in cluster 3). - Minimum effective ensemble sizes (emulators): Random selection indicated approximately 5–6 GGCMs and 9–12 GCMs suffice to capture the full-ensemble uncertainty proportion for global yield changes: six GGCMs for wheat, maize, and rice; five for soybean; nine GCMs for wheat; 10 for maize; and 12 for rice and soybean. Regional requirements typically fell within 4–6 GGCMs and 6–14 GCMs. Spatial patterns of GGCM-induced uncertainty stabilized once approximately five GGCMs were included. - Cluster-based subset selection: Selecting one GGCM from each cluster (S3) or four GGCMs ensuring cluster coverage (S4) effectively reproduced GGCM-induced variance with only 3–4 models, for both emulator and Phase 3 datasets (with a caveat for wheat in Phase 3 where clustering sometimes centered on GCMs). - GGCMI Phase 3 ensemble size: Six to seven GGCMs (seven for wheat and soybean; six for maize and rice) were sufficient to reflect overall uncertainty in yield change projections. - Influence of individual GGCMs: The contribution of each GGCM to uncertainty was region- and crop-dependent. For wheat, CARAIB, PEPIC, and pDSSAT generally increased uncertainty across many regions, whereas LPJ-GUESS and JULES effects varied; for maize, pDSSAT and PEPIC notably increased uncertainty; for rice, PEPIC and JULES increased uncertainty in many regions (e.g., Africa and Asia, respectively); for soybean, JULES increased uncertainty across most regions, with CARAIB, PROMET, and pDSSAT also influential. Outlier behaviors (e.g., strong CO2 fertilization in JULES or strong temperature response in pDSSAT) were key drivers. - CO2 fertilization and yield change spread: Including CO2 fertilization increased inter-model spread, especially for wheat, rice, and soybean, with model-specific CO2 responses being a major source of variance. - Geographic and latitudinal patterns: Uncertainty attribution varied by latitude and sub-regions, with some regions requiring more GCMs to capture climate-induced uncertainty (e.g., USA, Central Asia, southern South America). Overall, careful ensemble composition is crucial as different combinations can switch the dominant uncertainty source and alter projected yield patterns.
The analysis demonstrates that ensemble configuration—both which GGCMs and GCMs are included and how many—substantially affects projected yield changes and the attribution of uncertainty. Clustering reveals distinct, interpretable families of responses tied to model structure and, in some cases, to strong climate forcings (high-ECS GCMs). This indicates that relying solely on multi-model means risks masking divergent outcomes and may mislead adaptation planning where sign changes occur across members. The finding that approximately six GGCMs and 9–12 GCMs adequately capture global uncertainty, and that 3–4 GGCMs selected to span identified clusters can emulate the full variance, provides a practical pathway to optimize ensembles, reduce computational burdens, and maintain representativeness. Region- and crop-specific sensitivities, particularly to CO2 fertilization and temperature response functions, imply that locally tailored ensemble composition can improve relevance for decision-making. Recognizing models with outlier behaviors allows more informed inclusion, down-weighting, or targeted evaluation to balance diversity with robustness. Overall, the framework addresses the research question by identifying how model composition and size shape uncertainty and by offering actionable strategies to design efficient, representative ensembles for climate-crop impact assessments.
This study presents a framework combining hierarchical clustering, regionalization, ANOVA-based uncertainty decomposition, and subsampling to optimize climate–crop ensemble design. It shows that ensemble composition strongly influences both projected yield patterns and the dominant sources of uncertainty, while a relatively modest number of models can capture the full-ensemble uncertainty if selected judiciously. Approximately six GGCMs and 9–12 GCMs suffice at the global scale, with 3–4 GGCMs spanning identified clusters efficiently representing GGCM-induced variance. Individual GGCMs contribute unevenly to uncertainty depending on crop and region, underscoring the need for careful, context-specific model selection. The approach can enhance robustness of projections and better inform adaptation strategies and food security assessments. Future work should further harmonize climate and crop modeling (including expanding GCM coverage for process-based simulations), improve representations of CO2 fertilization and extreme events, integrate parameter uncertainty and management factors, and leverage field experiments to constrain model responses, thereby reducing uncertainty and improving confidence in projections.
Key limitations include: (1) Omitted uncertainty sources in GGCM simulations such as soil data quality, management options (e.g., fertilization rates beyond assumed inputs), and pest/disease impacts. (2) Parameter uncertainty within individual GGCMs was not explicitly quantified; models were not comprehensively calibrated, which may affect relative change estimates. (3) Many GGCMs underestimate yield losses from extremely wet conditions; temperature and drought extremes are better represented than excessive wetness. (4) The change factor method for emulator climate inputs does not fully sample changes in extremes or intra-seasonal variability, potentially underrepresenting extreme-event impacts. (5) Emulators may not perfectly reproduce raw GGCM process-based simulations. (6) The Phase 3 comparison used only five GCMs, limiting climate-forcing diversity. (7) Focus on SSP585 and end-of-century changes may not capture all trajectories, though relative uncertainty shares are expected to be similar under lower forcing. These constraints could affect the generalizability and magnitude of projected uncertainties, particularly regarding extreme events and management-driven variability.
Related Publications
Explore these studies to deepen your understanding of the subject.

