Earth Sciences
Seasonal-to-decadal prediction of El Niño-Southern Oscillation and Pacific Decadal Oscillation
J. Choi and S. Son
This study by Jung Choi and Seok-Woo Son reveals the exciting capabilities of predicting the El Niño-Southern Oscillation (ENSO) and Pacific Decadal Oscillation (PDO) well ahead of time. Discover how improved radiative forcing and model initialization can enhance our understanding of climate patterns across the Pacific Basin!
~3 min • Beginner • English
Introduction
Near-term climate predictions at seasonal-to-decadal (S2D) timescales are increasingly important for climate risk management. Predictive skill on these horizons depends on both external boundary conditions (e.g., greenhouse gases and aerosols) and initial conditions (particularly the ocean state). Initialized prediction systems have shown improved S2D skill over uninitialized projections, especially when using multi-model ensembles (MMEs) with bias adjustment. The key sources of S2D predictability are low-frequency SST variabilities including ENSO, PDO, and AMO. While decadal prediction skill in the North Atlantic has been quantified with CMIP5/6 hindcasts, comparable assessments for Pacific variability (ENSO and PDO) with the latest decadal hindcasts have been lacking. ENSO strongly influences global climate; state-of-the-art dynamical models have demonstrated skill up to ~12 months, and machine learning approaches have extended skill to ~18 months, but robust multi-year ENSO prediction remains less explored across systems. The PDO, as the leading mode of decadal SST variability in the North Pacific, is driven by complex tropical–extratropical interactions and typically requires ensemble forecasts; prior studies suggest some multi-year skill but have been limited by the number of models. This study revisits S2D predictions of ENSO and PDO using very large ensembles from CMIP5 and CMIP6 retrospective decadal predictions spanning over half a century, isolates the roles of initialization versus external radiative forcing by comparison with uninitialized simulations, and evaluates the relative importance of ensemble size versus multi-model averaging for multi-year ENSO and PDO prediction.
Literature Review
Prior work has established that initialized near-term predictions improve S2D skill relative to uninitialized projections, particularly when using MMEs and bias correction. For ENSO, dynamical systems commonly achieve skill up to ~12 months, with studies showing potential extension to ~18 months using machine learning techniques; most existing studies, however, focus on ≤18-month leads or specific extreme events, often using single systems. For PDO, earlier analyses using a limited set of CMIP5 decadal hindcasts indicated potential predictability several years ahead, but comprehensive evaluation in CMIP6 decadal hindcasts had not been reported. Systematic extratropical Pacific biases and the intertwined nature of ENSO–PDO variability have been noted as barriers to higher PDO skill. Recent advances also emphasize the benefits of combining multiple prediction systems to surpass single-system performance, especially across seasons affected by the spring predictability barrier for ENSO.
Methodology
Data: Six CMIP5 and ten CMIP6 initialized decadal prediction systems (retrospective hindcasts) initialized annually from winter 1960/1961 to 2009/2010 were used. Each model has 50 initializations with 3–10 ensemble members, totaling 142 members. Seven models are initialized in January and nine in November; lead time counting is standardized from January. Fields are interpolated to 2.5°×2.5°. Bias correction: a lead-time-dependent model bias (modeled minus observed climatology) is removed from each member before analysis. The MME is the equal-weight average of all bias-corrected members. Observations: NOAA ERSSTv5 on 2.5°×2.5°, with anomalies relative to 1961–2020 monthly climatology. Indices: ENSO is the three-month running mean NINO3.4 index (170°–120°W, 5°S–5°N) evaluated up to 3-year leads. PDO is the annual-mean index obtained by projecting each model’s bias-corrected annual-mean SSTA onto the observed leading EOF of North Pacific (≥20°N) ERSSTv5 SSTA (after removing global-mean SSTA). PDO skills are evaluated for YR1, YR2, YR3–4, and YR5–9. For YR5–9, 132 members are used (MRI-ESM2-0 provides only five-year hindcasts). Uninitialized reference: An MME of 121 members from ten CMIP6 historical simulations is used to quantify skill arising from external radiative forcing; comparisons for YR5–9 are limited to 1961–2014 (historical coverage). Skill metrics: Anomaly Correlation Coefficient (ACC) measures phase agreement; Mean-Squared Skill Score (MSSS) assesses amplitude and error variance reduction relative to climatology; Ratio of Predictable Components (RPC) quantifies ensemble spread versus observed variability (RPC≈1 indicates reliable signal-to-noise; RPC<1 overconfidence; RPC>1 underconfidence). Significance: Non-parametric block bootstrap (blocks of five years) with 1,000 resamples assesses significance of ACC, MSSS, RPC deviations from 1, and initialized–uninitialized skill differences (one-tailed where appropriate). Ensemble size sensitivity: For ensemble size M (1–142), 1,000 bootstrap resamples estimate ACC(M); a Min–Max normalization defines saturation when normalized ACC reaches 0.95, indicating the minimum M for near-maximal skill. Multi-model ensemble average effect: Under perfect-model assumptions, theoretical ensemble-mean error variance and corresponding MSSS (MSSS_THEORY) are derived from individual-member MSE/MSSS to distinguish benefits from increasing ensemble size versus canceling model-dependent errors via multi-model averaging. Discrepancies where practical MSSS exceeds MSSS_THEORY indicate additional gains from multi-model error cancellation beyond pure ensemble-size effects.
Key Findings
- Global SSTA prediction: Initialized MME shows high skill for YR1 and YR2, especially in the tropical Pacific. At lead times ≥3 years (YR3–4 and YR5–9), initialized and uninitialized skill maps and RPC patterns are similar, indicating dominant contributions from external radiative forcing. Regions with strong forcing influence exhibit low signal-to-noise ratios (RPC>1), implying large inter-member diversity at decadal scales.
- ENSO skill and ensemble size: MME ACC remains statistically significant up to 25–27 lead months (JFM of year 3). MSSS is significant out to 14–16 lead months (FMA of year 2), indicating reliable amplitude prediction to ~1.25 years and phase prediction to just over two years. RPC is close to 1 up to these leads. MME generally outperforms individual models; CanCM4 matches/exceeds MME ACC at certain leads (ASO year 1 to SON year 2). Ensemble size requirements: ~40 members suffice for skillful one-year ENSO (DJF1) prediction with no significant gain beyond M≈40–50; winter ENSO at two-year lead (DJF2) requires >70 members to approach maximum skill; summer of year 2 (JJA2) saturates faster (~50–60 members). Multi-model averaging boosts skill beyond MSSS_THEORY in spring–summer, evidencing cancellation of model-dependent errors and partial mitigation of the spring predictability barrier. CMIP6 shows slight improvements over CMIP5 in second-year springtime MSSS, likely due to larger ensembles.
- PDO skill and ensemble size: MME ACC is significant at YR1, YR2, and YR5–9; long-lead (YR5–9) skill arises from external forcing rather than initialization. MSSS is significant only at YR1 (phase predicted better than amplitude at longer leads). RPC values are <1 (overconfident) but not significantly different from 1. Ensemble size for PDO skill saturation is smaller than for ENSO: ~30 members (YR1) and ~40–50 members (YR2 and YR5–9). Practical MSSS closely matches MSSS_THEORY, indicating limited additional benefit from multi-model averaging (common systematic errors across models). CMIP5 MME outperforms CMIP6 MME at YR1–YR2; skills are comparable at longer leads.
- Initialization versus forcing: Model initialization contributes meaningfully up to ~2 years, particularly for tropical Pacific SST/ENSO. Beyond ~3 years, skill is primarily attributable to external radiative forcing. Effective S2D prediction in the Pacific therefore requires both accurate initialization and realistic near-term forcing estimates.
- Ensemble design implications: Achieving near-maximal skill requires relatively modest ensemble sizes for annual PDO (~30–50) and larger ensembles for multi-year winter ENSO (>70). Multi-model averaging is especially beneficial for ENSO during seasons affected by the spring predictability barrier.
Discussion
The study addresses how well current initialized multi-model decadal prediction systems capture key Pacific low-frequency SST modes and disentangles the roles of initialization versus external forcing. Results show that initialization substantially enhances skill for up to two years (notably in the tropical Pacific and for ENSO), while decadal lead-time predictability of both basin-wide SST and PDO is largely forced. The strong agreement between initialized and uninitialized skill at leads ≥3 years corroborates the forcing-dominated nature of long-lead skill. For ENSO, multi-model averaging reduces model-dependent errors, overcoming some of the spring predictability barrier and delivering skill beyond what ensemble-size effects alone would suggest. Conversely, for PDO, practical skill tracks theoretical expectations, implying common systematic errors that multi-model averaging does not remove. These findings highlight the need to optimize ensemble sizes strategically (larger for multi-year ENSO), improve initial ocean states, and better constrain near-term radiative forcing pathways to enhance Pacific S2D prediction and related regional climate impacts.
Conclusion
Using a very large ensemble of CMIP5/6 initialized decadal hindcasts spanning ~50 years, this study quantifies S2D predictability of ENSO and PDO and isolates the contributions of initialization and external forcing. Key contributions include: (1) demonstrating reliable ENSO amplitude prediction to ~14–16 months and phase prediction to ~25–27 months, with multi-model averaging mitigating the spring predictability barrier; (2) identifying that multi-year winter ENSO prediction requires >70 ensemble members, whereas one-year prediction needs ~40; (3) showing PDO phase skill at YR1–YR2 and YR5–9 with amplitude skill limited to YR1, and that long-lead PDO skill is forcing-driven and not substantially improved by multi-model averaging; and (4) establishing that initialization impacts persist ~2 years, while external forcing dominates skill at ≥3-year leads. Future work should exploit additional CMIP6 DCPP experiments to probe idealized Pacific/Atlantic decadal variability, develop advanced post-processing and statistical error-correction methods (as used for AMO) to improve PDO predictions, and refine both initialization strategies and near-term forcing estimates to enhance S2D prediction skill in the Pacific.
Limitations
- Long-lead (YR5–9) assessments are limited to 132 ensemble members due to one model providing only five-year hindcasts; some YR5–9 maps use only 46 initializations in figure diagnostics.
- Uninitialized historical simulations used for forcing attribution are limited to CMIP6 models and to the 1961–2014 period for YR5–9 comparisons.
- Models have differing initialization months (January vs. November), potentially introducing modest inconsistencies despite standardized lead-time counting from January.
- Bias correction and common-basis projection for PDO mitigate but do not eliminate systematic model errors; extratropical Pacific biases and common errors likely limit PDO improvements from multi-model averaging.
- The study does not exploit all CMIP6 DCPP experiment types; idealized experiments targeting Pacific/Atlantic decadal variability are deferred to future work.
Related Publications
Explore these studies to deepen your understanding of the subject.

