Earth Sciences
Explainable deep learning for insights in El Niño and river flows
Y. Liu, K. Duffy, et al.
Explore groundbreaking research by Yumin Liu, Kate Duffy, Jennifer G. Dy, and Auroop R. Ganguly, as they harness explainable deep learning methods to unravel the complexities of ENSO-driven river flows, moving beyond traditional black box models to enhance predictive understanding with interpretable insights from global sea surface temperature data.
~3 min • Beginner • English
Introduction
ENSO is a dominant mode of interannual climate variability with documented impacts on regional hydrology, including flood timing in Africa and flow variability in major rivers like the Amazon and Congo. Accurate prediction of ENSO and its hydrometeorological effects remains challenging at interannual to multidecadal scales. Traditional ENSO indices rely on fixed rectangular regions that may not capture the broader, interconnected SST oscillations influencing river flows, and relationships can be highly nonlinear. The study aims to leverage global SST information and uncover complex geographic dependence structures, including long-range teleconnections, to improve prediction and interpretability of ENSO-driven river flows. The central hypothesis is that explainable deep learning using saliency maps, combined with complex network analysis, can extract interpretable predictive signals from global SST beyond canonical ENSO indices and enhance river flow prediction for the Amazon and Congo.
Literature Review
Conventional approaches to identify dependencies among climate variables include visual comparison, correlation, mutual information, coefficient of determination, and (sparse) linear regression weights; these methods often require heuristic feature selection and are difficult to scale to high-dimensional spatiotemporal features. Recent advances have shown promise for deep learning in climate science, including improved ENSO forecasting and explainable AI techniques (e.g., saliency maps) to interpret models in environmental applications. Studies suggest ENSO is embedded within a larger system of interrelated SST oscillations (e.g., ENSO–IOD coupling), and nonlinear relationships are prevalent in hydroclimate teleconnections. Prior work also indicates biases in Earth System Models (ESMs) in representing ENSO and ocean-atmosphere coupling, motivating the comparison between ESM and reanalysis-based predictors.
Methodology
- Data: Monthly SST from CMIP5 ESM simulations (32 models; NASA NEX; Jan 1950–Dec 2005; 1°×1° global grid) and three reanalysis products (Hadley-OI, COBE, ERSSTv5). Monthly river discharge for the Amazon (Obidos, 1927–2018) and Congo (Kinshasa, 1903–2011) from UCAR; common analysis period Jan 1950–Dec 2005 used.
- Preprocessing: Align coordinates; bilinear interpolation to 1°×1°; select common time span; minimal missing values set to 0; extract SST region 37.5°N–42.5°S, 50.5°E–0.5°W (input size 80×300). Compute 3-month moving mean river flow; use third month value as smoothed target. Data split: first 600 months train, next 36 validation, last 36 (Jan 2003–Dec 2005) test.
- Baselines: Ensemble of ML models using Niño 3.4 region indices (mean SST and Niño 3.4 anomaly) as predictors: linear regression, lasso, ridge, elastic net, random forest, and dense neural network. Historical climatological mean river flow as an additional baseline.
- CNN architecture: 4 conv layers (channels: 32, 32, 64, 64; kernel sizes 3×3, 3×3, 3×3, 1×1; stride 1), each followed by ReLU and 2×2 max pooling; 3 fully connected layers (128, 64, 1). Input size 80×300×C; C varies: all ESMs C=32, all reanalysis C=3, mean ESM or mean reanalysis C=1. Batch size 64; Adam optimizer (lr 5e-5, weight decay 1e-5); squared loss minimized. Predictive uncertainty estimated as the standard deviation across five CNN runs with different learning rates.
- Target formulation: For SST-based models, seasonality retained; models predict temporal climatology of river flow (3-month rolling mean). For Niño 3.4 index-based models, predict anomaly and then add seasonality back to reconstruct river flow.
- Explainability: Saliency maps computed as gradients of output w.r.t. input SST. Introduced Cyclical Saliency Maps (Cyclic-SM) by averaging saliency across periodic cycles (e.g., M=12 for months) to enhance robustness and climate interpretability; also aggregated seasonal and yearly saliency.
- Complex network analysis: Construct degree maps by computing pairwise Pearson correlations across ocean grid points; edges exist if correlation > c1. Define teleconnections when correlation > c2 and great-circle distance > d (e.g., d>19,000 km for ESM, d>15,000 km for reanalysis). Examine degree distributions and distance histograms under thresholds (c1 and c2 at 0.5 and 0.9) to quantify proximity-based and long-range connections.
Key Findings
- Predictive performance: CNNs using large-area SST (41.5°S–37.5°N, 50.5°E–9.5°W) with full spatiotemporal information outperformed models using only Niño 3.4 indices for predicting 3-month rolling mean flows of the Amazon and Congo, and surpassed the historical climatological mean baseline. This indicates useful information outside the canonical ENSO region contributes to river flow predictability.
- Amazon-specific results: ESM+CNN achieved lower mean absolute error and higher linear correlation with observed Amazon discharge than climatology; however, RMSE was higher in spring when discharge peaks. Models using climatological SST in Niño 3.4 outperformed those using Niño 3.4 anomaly for Amazon.
- Congo-specific results: Prediction was more challenging (likely due to basin management), but reanalysis SST-based CNN still achieved lower RMSE than climatology. For Congo, Niño 3.4 anomaly-based models often outperformed climatological SST in Niño 3.4.
- Explainability: Cyclical saliency maps highlighted dominant predictive regions in the tropical Pacific and Indian Oceans for both rivers when using ESM SST, implicating ENSO and the Indian Ocean Dipole (IOD) as key drivers. Saliency from reanalysis SST was more diffuse, suggesting weaker or more spatially distributed relationships; nevertheless, maps confirmed linear and nonlinear information content in global SST about river flows.
- Complex networks: ESM SST fields exhibited high degrees and numerous strong teleconnections among the tropical Pacific, Indian, and Atlantic Oceans, concentrated near the equator, persisting even at high correlation thresholds. Edge-distance histograms showed many long-distance connections, indicating multicollinearity across SST regions. Reanalysis SSTs showed fewer and weaker long-distance connections, especially at higher thresholds, consistent with literature that ESMs tend to overestimate coupling strength relative to observations.
- Uncertainty: Predictive uncertainty was quantified via ensemble variability of repeated CNN trainings, providing dispersion estimates around point forecasts.
Discussion
The study demonstrates that leveraging full-field SST information via CNNs, combined with explainable AI, yields superior prediction of large river flows compared to models constrained to traditional ENSO indices. Saliency analyses attribute predictive skill primarily to ENSO and IOD regions, offering physical interpretability and supporting the hypothesis that broader Indo-Pacific SST variability co-impacts regional hydrology. Complex network diagnostics corroborate strong teleconnection structures in ESMs relative to reanalysis, suggesting model-dependent coupling strengths that can influence predictive pathways. These findings address the research question by revealing additional predictive information content beyond canonical indices and elucidating the spatial dependence structures linking SST and river discharge. The approach informs climate adaptation by enabling more accurate and interpretable projections of river flows, with uncertainty estimates, at interannual to decadal scales; it also points to opportunities for using data-driven methods to diagnose and potentially bridge discrepancies between model and observed coupling.
Conclusion
This work integrates explainable deep learning with complex network analysis to improve prediction and understanding of ENSO-driven river flows. Key contributions include: (1) demonstrating that global SST fields contain predictive information for Amazon and Congo discharge beyond Niño 3.4 indices; (2) introducing cyclical saliency maps to provide physically interpretable attributions highlighting ENSO and IOD regions; and (3) quantifying teleconnections and coupling structures, showing stronger long-range correlations in ESMs than reanalysis. The framework yields improved predictive skill and uncertainty quantification, advancing interpretability and actionable insights for hydrologic projections. Future research should expand to additional rivers and longer records, assess lead-time forecasting, further analyze model–observation coupling differences, and explore causal inference to clarify mechanism pathways and improve ESM representations.
Limitations
- Limited dataset length (1950–2005 common period) constrains training and evaluation; authors note the need for additional discharge records and more rivers to bolster generality.
- Congo River predictions are more challenging, potentially affected by basin management and non-climatic influences not represented in SST predictors.
- Retaining seasonality and using zero-lag (concurrent) SST focuses on contemporaneous mapping; lead-time predictability was not the primary target here.
- Smoothing (3-month moving mean) may dampen extremes, potentially affecting peak-flow error characteristics (e.g., higher RMSE in Amazon spring peak).
- Reanalysis SSTs exhibit weaker correlation structures, leading to more diffuse saliency and potentially reduced interpretability/skill compared to ESM-based predictors.
- Minimal missing data imputed with zeros and domain cropping may introduce minor artifacts; model dependence on ESM coupling biases may limit transferability to observations.
Related Publications
Explore these studies to deepen your understanding of the subject.

