logo
ResearchBunny Logo
Introduction
Seasonal precipitation forecasting in the western United States is challenging due to the high year-to-year variability driven by atmospheric river events. The economic consequences of inaccurate forecasts, particularly during droughts, are significant, highlighting the need for improved prediction accuracy. Existing approaches, including dynamical and empirical methods, have limitations. Dynamical models, while providing probabilistic forecasts, often exhibit low skill, especially for precipitation. Traditional statistical methods, such as canonical correlation analysis (CCA), struggle with incorporating multiple predictor variables and their nonlinear interactions, while also suffering from the constraints of limited observational data. Machine learning offers a potential solution, but the limited observational dataset poses a barrier to training effective models. This study proposes a novel approach: leveraging large climate model ensembles to provide a long, physically consistent training dataset for machine learning models, aiming to improve seasonal precipitation forecast skill.
Literature Review
The literature highlights the importance of teleconnections in seasonal forecasting, particularly the El Niño Southern Oscillation (ENSO). However, the relationship between ENSO and western US precipitation is complex, with traditional ENSO indices sometimes showing poor skill. Other studies indicate the influence of tropical diabatic heating anomalies in the western tropical Pacific and the Indian Ocean. Subseasonal-to-seasonal predictability is also relevant. Existing seasonal forecasting methods include dynamical, empirical, and hybrid approaches. Dynamical models, such as those in the North American Multi-Model Ensemble (NMME), have shown limited skill in precipitation forecasting, especially at longer lead times. Traditional statistical methods, while widely used, are constrained by the limited length of observational records and their inability to capture nonlinear relationships. Recent research has explored the application of machine learning to seasonal forecasting, showing promising results when using climate model simulations as a training dataset.
Methodology
This study uses the Community Earth System Model Large Ensemble (CESM-LENS) data (1920-2005) for training. The training data consists of thousands of seasons, providing a much larger sample size compared to observational data. Several machine learning models are employed: Random Forests (RF), Extreme Gradient Boosting (XGBoost), neural networks (NN), and Long Short-Term Memory (LSTM) networks. The predicted variable is defined by applying k-means clustering to CESM-LENS precipitation data, resulting in four precipitation clusters representing large-scale spatial patterns of precipitation anomalies. These clusters are designed to align with the spatial scales of dominant predictability sources (Rossby wave trains). The models are trained separately for predicting November-January (NDJ) and January-March (JFM) seasons, using preceding months as predictors. Once trained, the models are tested on observational data (1980-2020) to evaluate their predictive skill. Interpretability methods, including analysis of variable importance, pairwise interactions (via partial dependence and ALE plots), and local interpretable model-agnostic explanations (LIME), are used to gain insights into the models' decision-making process and identify relevant physical processes.
Key Findings
The machine learning models demonstrate skillful predictions compared to baselines and NMME models, particularly for the JFM season. Accuracy is higher in JFM than NDJ, with some models achieving 70-80% accuracy after combining similar clusters. The 'widespread wet' pattern (cluster 2) proved the most challenging to predict. The Random Forest model, selected for detailed interpretability analysis, revealed ENSO as the dominant predictor, consistent with prior knowledge. However, the model also identified other important factors such as velocity potential anomalies in the Indian Ocean/Maritime Continent and SST anomalies in the western tropical Pacific. These factors can modulate the influence of ENSO on precipitation. Local interpretability using LIME illustrated how specific combinations of predictor variables, including conditions beyond ENSO, influenced individual seasonal forecasts, both correct and incorrect. This suggests that the model successfully learned complex and physically plausible teleconnections, offering real-time explanations for forecast outcomes.
Discussion
The study's findings demonstrate the potential of using large climate model ensembles to train machine learning models for skillful seasonal precipitation forecasting. The ability of machine learning models to surpass or match dynamical models in skill, combined with their interpretability, makes them a valuable tool for water resource management. The identification of non-ENSO factors modulating ENSO's influence expands our understanding of seasonal precipitation predictability. The use of cluster analysis allowed examination of predictive skill at different spatial scales and a focused view of forecast uncertainty. The successful application of LIME enhances trust in predictions by providing transparent explanations of model predictions.
Conclusion
This study successfully demonstrates a novel approach to seasonal precipitation forecasting, leveraging machine learning trained on large climate model simulations. The results show competitive skill compared to established dynamical models, and interpretability techniques reveal important physical teleconnections. Future research could explore the use of different climate models, incorporate high-resolution data (such as QBO), and investigate advanced machine learning architectures for further improvement.
Limitations
The study primarily focuses on large-scale precipitation patterns. Higher-resolution predictions would require modifications to the methodology. The reliance on CESM-LENS introduces potential biases stemming from the model's limitations in representing certain climate phenomena. The interpretability methods used provide approximations of the complex decision-making process, especially when analyzing individual forecasts.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny