Environmental Studies and Forestry

Improved seasonal prediction of harmful algal blooms in Lake Erie using large-scale climate indices

M. Tewari, C. M. Kishtawal, et al.

Harmful Algal Blooms (HABs) in Lake Erie are wreaking havoc, leading to significant economic damage. This innovative research by Mukul Tewari, Chandra M. Kishtawal, Vincent W. Moriarty, Pallav Ray, Tarkeshwar Singh, Lei Zhang, Lloyd Treinish, and Kushagra Tewari showcases a groundbreaking machine learning method that enhances prediction accuracy by combining nutrient loading data with large-scale climate indices. Seasonal predictions can be made by early June, paving the way for effective mitigation strategies.

00:00

~3 min • Beginner • English

Index

Introduction

Harmful algal blooms (HABs) threaten ecosystems, human health, and economies, with Lake Erie experiencing increasingly intense cyanobacterial blooms driven by nutrient loading and meteorological trends. NOAA’s seasonal severity index (SI) for Lake Erie has traditionally been predicted from spring discharge and total phosphorus (TP), but models using nutrient metrics alone have struggled to capture extreme years (e.g., 2011). The study hypothesizes that large-scale atmospheric variability influences conditions conducive to HABs and that including climate teleconnection indices with nutrient loading in a machine learning framework can improve seasonal SI prediction, particularly for extreme events, while enabling earlier forecasts by early June.

Literature Review

Prior work has linked HAB dynamics to nutrient inputs, temperature, and meteorological drivers. Stumpf et al. showed TP and discharge predict bloom magnitude but leave unexplained variance in extremes. Studies have connected HABs to large-scale teleconnections (e.g., PNA, ENSO), hurricanes, and rainfall impacts on nutrient loads, and demonstrated that meteorological factors improve hypoxia prediction in Lake Erie relative to discharge-only models (82% vs 39% explained variability in Zhou et al.). Large-scale climate indices often outperform local weather in ecological prediction. Machine learning has been applied to HAB forecasting (autoregressive models, SVM, random forests, probabilistic graphical models, ANNs), yet the combined use of nutrient loading and climate indices for Lake Erie SI had not been systematically evaluated.

Methodology

A Genetic Algorithm (GA) machine learning framework was used to predict the Lake Erie HAB seasonal severity index (SI), defined from the maximum bloom biomass over the peak 30 days. Three model classes were developed: (1) GA-chem using March–May integrated nutrient/chemical loading; (2) GA-clim using large-scale climate indices; and (3) GA-chem-clim combining both. Nutrient predictors (March–May accumulations) included total phosphorus (TP), soluble reactive phosphorus (SRP), total Kjeldahl nitrogen (TKN), total nitrogen (TN; nitrite + nitrate), chlorides (CL), total suspended solids (TSS), silica dioxide (SD), and sulfate (SL). Climate indices considered (monthly) spanned November of the previous year to May of the current year and included Arctic Oscillation (AO), Pacific-North American pattern (PNA), Quasi-biennial Oscillation (QBO), Atlantic Multidecadal Oscillation (AMO), ENSO (Southern Oscillation Index, SOI), Niño3.4 index, North Atlantic Oscillation (NAO), and Pacific Decadal Oscillation (PDO). The GA began with 2000 size-constrained random functions and evolved via reproduction and mutation over 4000–6000 iterations, selecting only 4–5 predictors to avoid overfitting given 19 annual observations (2002–2020). Fitness was defined by match to observed SI, accepting models with RMSE ≤ 2.0 units in development. Uncertainty and robustness were evaluated with a jackknife (leave-one-year-out) procedure, yielding 19 models and cross-validations; model strength and standard errors were computed. Additional tests trained on 79% of the data (15 years) and predicted 21% (4 years), designed so each prediction set included at least one extreme. A sensitivity variant GA-chemplus used March–June nutrient loading to assess the impact of June inputs. To explore mechanisms, composites of large-scale atmospheric and oceanic conditions were analyzed for low-SI (<2) and high-SI (>7) years using ERA5 reanalysis (geopotential height at 200/500/850 hPa, winds), HadISST SSTs, accumulated snow depth, and 2 m temperature; anomalies were computed relative to 2002–2020 climatology. Data sources: nutrient loading from NCWQR; climate indices from NOAA CPC and PSL; ERA5 from Copernicus CDS; SSTs from HadISST. Predictor scaling factors were applied to loading variables as reported (e.g., TP scaled by 2.5×10^3).

Key Findings

Predictor selection and model performance: In GA-chem, TP (March–May) was the dominant predictor (selected by 12 of 19 jackknife models), followed by SRP (8), TSS (6), SD (3), SL (2), and CL (1). Correlations of SI with monthly TP were 0.31 (Mar), 0.43 (Apr), 0.37 (May); with SRP were 0.41 (Mar), 0.35 (Apr), 0.34 (May). GA-clim consistently selected PDO (Nov), PNA (Dec), and ENSO SOI (Apr). GA-chem-clim selected TP (Mar–May) combined with ENSO SOI (Apr), PNA (Dec), and PDO (Nov). Performance (2002–2020): GA-chem RMSE 2.67, correlation 0.53, standard deviation of bias 2.74; GA-clim RMSE 2.52, correlation 0.58, std. dev. of bias 2.59; GA-chem-clim RMSE 2.26, correlation 0.67, std. dev. of bias 2.32. Per-year GA-chem-clim jackknife models showed R^2 between 0.55 and 0.77 with RMSE 0.96–1.77. Extremes and June loading: GA-chem underpredicted 2015 severely (Obs 10.5 vs 2.70), likely due to anomalously wet June; including June loads (GA-chemplus) improved 2015 prediction to 10.28. For high SI years (>7), GA-chem RMSE rose to 4.02, while GA-chem-clim better captured extremes (e.g., 2017 Obs 8.0; GA-chem 5.65; GA-chem-clim 8.29). Reduced-training experiments (4 sets; 15 train/4 predict years): GA-chem vs GA-clim vs GA-chem-clim RMSE/mean bias—Exp1: 1.12/2.27/0.35 and 1.10/1.56/−0.20; Exp2: 2.25/2.54/1.15 and 1.60/2.20/−0.19; Exp3: 2.28/1.93/1.70 and −0.28/1.70/−0.39; Exp4: 2.92/3.44/1.66 and −1.48/0.63/−0.67. GA-chem-clim reduced RMSE in all experiments. Large-scale drivers: Low-SI years featured positive PNA and El Niño-like conditions with higher mid-tropospheric heights over the northern U.S., reduced cold-air outbreaks, and less snowfall over the Great Lakes. High-SI years showed negative PNA/La Niña-like patterns with lower heights, greater Arctic air intrusions, cooler temperatures, increased snowfall, and enhanced runoff preceding stronger blooms. Additional insights: SD and TSS emerged as occasional nutrient predictors; SD may indirectly affect cyanobacteria via diatom sedimentation and internal phosphorus loading. Chloride loading in spring was selected in some experiments, potentially reflecting road-salt impacts and serving as a proxy for winter severity.

Discussion

Including large-scale climate indices with nutrient loading captures non-linear teleconnection influences on regional hydrometeorology that affect nutrient delivery and bloom development, improving prediction skill versus nutrient-only models. GA-clim’s comparable performance to GA-chem underscores that climate modes (ENSO, PNA, PDO) encode information relevant to seasonal HAB risk. The combined GA-chem-clim better matches observed variability and extremes and enables early-June prediction, providing actionable lead time ahead of peak July–October activity. Composite analyses support a mechanistic pathway: wintertime negative PNA/La Niña configurations yield colder, snowier conditions, augmenting spring runoff and nutrient loads that foster severe blooms; the opposite occurs during positive PNA/El Niño. Selection of chloride loading suggests anthropogenic winter maintenance can modulate spring chemistry and phytoplankton dynamics, while occasional selection of SD and TSS aligns with internal loading and water column optics/sediment dynamics. Despite improvements, some years (e.g., 2013) remain challenging, indicating additional local meteorological, hydrodynamic, or biological processes may be important.

Conclusion

A GA-based model combining nutrient loads and large-scale climate indices (GA-chem-clim) improves seasonal prediction of Lake Erie HAB severity over nutrient-only (GA-chem) and climate-only (GA-clim) models, reducing RMSE and increasing correlation, including for high-severity years. The approach enables forecasts by early June, offering timely guidance for management and mitigation. Analysis of reanalysis and SST composites indicates distinct winter circulation regimes preceding mild versus severe bloom seasons, supporting the role of teleconnections. Future work should expand predictors (e.g., additional local meteorology, hydrodynamics), extend datasets to reduce overfitting risk, and transition from deterministic to probabilistic forecasting to provide ranges and likelihoods. Application to other water bodies is a promising next step.

Limitations

The dataset spans only 19 annual SI observations (2002–2020), limiting model complexity and increasing overfitting risk; GA was constrained to 4–5 predictors and validated by jackknife. The primary operational nutrient window excludes June loads, which can be critical in anomalously wet years (addressed in a sensitivity GA-chemplus). Local station-based predictors (wind, temperature) were only preliminarily explored at five sites and not fully integrated. The current system is deterministic and does not provide probabilistic uncertainty estimates. Some extreme years (e.g., 2013) remain poorly predicted, suggesting missing predictors or processes.

Related Publications

Explore these studies to deepen your understanding of the subject.

Environmental Studies and Forestry

Combined Earth observations reveal the sequence of conditions leading to a large algal bloom in Lake Geneva

A. I. Rahaghi, D. Odermatt, et al.

Earth Sciences

Abrupt drainage of Lago Greve, a large proglacial lake in Chilean Patagonia, observed by satellite in 2020

S. Hata, S. Sugiyama, et al.

Environmental Studies and Forestry

Prediction of changes in war-induced population and CO₂ emissions in Ukraine using social media

Z. Liu, J. Li, et al.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny