logo
ResearchBunny Logo
Introduction
South Korea, like many regions globally, is experiencing increasingly intense heatwaves, leading to significant health risks and mortality. Accurate and timely heatwave prediction is therefore crucial for effective mitigation strategies. This study focuses on the role of teleconnections – the influence of distant land and ocean variability on local weather – in driving long-term heatwave predictions in South Korea. Traditional physical-based prediction models struggle with the inherent complexity of these teleconnections. This research addresses this challenge by employing a machine learning model combined with explainable artificial intelligence (XAI) techniques to identify and understand the key teleconnection drivers for South Korean heatwaves. The complexity of interactions between land and ocean variables (snow depth, soil moisture, sea surface temperature, sea-ice concentration) and the atmosphere poses a significant challenge for General Circulation Models (GCMs). Data-driven models offer a potential solution, capable of capturing complex non-linear relationships and improving prediction performance. Variable selection within data-driven modeling is crucial for enhanced forecasting skill. This study aims to identify statistically significant teleconnection drivers, evaluate their predictability using a machine learning model, and interpret the model’s predictions through XAI to gain insights into the underlying physical mechanisms.
Literature Review
Previous research has explored the influence of various land and ocean variables on upper atmospheric circulation and regional climate variability. Studies have utilized correlation analysis on global SST anomalies to identify potential teleconnection drivers for predicting extreme weather events, including heatwaves. However, the complexity of teleconnection patterns and the limitations of traditional physical models necessitate the use of advanced data-driven techniques for improved prediction accuracy.
Methodology
The study utilized a comprehensive dataset including global monthly sea surface temperature (SST) and sea ice concentration (SIC) data, soil moisture (SM), and snow depth (SD) data. In situ daily maximum temperature data from 103 stations across South Korea were used to calculate the annual heatwave frequency (HF), defined as the total number of days exceeding 33°C in July and August. A predictor screening strategy was developed using the DBSCAN algorithm to cluster statistically significant points (|R| > 0.3, p < 0.05) related to HF for each climate component (SST, SIC, SD, and SM) into teleconnection drivers. Nine regression algorithms were compared, with the Light Gradient Boosting Machine (LGBM) demonstrating superior performance in terms of root mean square error (RMSE) and correlation coefficient (R). The LGBM model was then used to predict HF based on the selected teleconnection drivers. Model performance was evaluated using leave-one-year-out cross-validation (LOOCV) and compared with a multiple linear regression (MLR) model and the Pusan National University Coupled CGCM (PNU CGCM). SHAP (SHapley Additive exPlanations) values were used to interpret the model's predictions and assess the contribution of each teleconnection driver to HF prediction. Composite analysis was employed to examine the relationship between the most significant drivers (snow depth variability in the Gobi Desert and Tianshan Mountains) and actual summer climate conditions in South Korea. The Structural Similarity Index Measure (SSIM) was used to compare teleconnection patterns identified in this study with known large-scale atmospheric circulation patterns.
Key Findings
The study identified 16 teleconnection drivers significantly correlated with annual heatwave frequency in South Korea. Snow depth (SD) variability in two regions (Gobi Desert and Tianshan Mountains) emerged as the most important and predictable drivers. The LGBM model significantly outperformed both MLR and PNU CGCM in predicting HF, achieving a lower RMSE (3.151 days) and higher R (0.644) in LOOCV. SHAP analysis revealed that snow depth variability in the Gobi Desert (MAM GD SD) and Tianshan Mountains (DJF TM SD) had the highest mean absolute SHAP values, indicating their strongest influence on HF prediction. Removing either driver substantially reduced the model's predictive skill, with the impact of MAM GD SD being more pronounced. Composite analysis showed that the negative phase of MAM GD SD was associated with conditions conducive to heatwaves in South Korea, characterized by robust vertically coherent atmospheric ridges, positive temperature anomalies, and negative precipitation anomalies. Similarly, the positive phase of DJF TM SD exhibited similar atmospheric conditions. Comparison of the teleconnection patterns of MAM GD SD and DJF TM SD with known large-scale atmospheric circulation patterns (SCAND and DT-type) revealed high similarity, suggesting a link between these drivers and established teleconnection systems. The bootstrapping analysis indicated that the LGBM model's predictive behavior remains consistent even with decreasing sample sizes, while the hyperparameter tuning ensured optimal predictive performance.
Discussion
The findings highlight the critical role of teleconnections, particularly snow depth variability in the Gobi Desert and Tianshan Mountains, in influencing heatwave frequency in South Korea. The superior performance of the LGBM model compared to traditional statistical and climate models underscores the value of advanced machine learning techniques for capturing the complex non-linear relationships between teleconnection drivers and heatwave occurrences. The XAI methods used in this study provide valuable insights into the underlying physical mechanisms connecting these drivers to heatwaves, suggesting that changes in snow cover in these regions can trigger large-scale atmospheric circulation patterns conducive to prolonged hot and dry conditions in South Korea. The strong correlation of the identified teleconnection patterns with existing large-scale circulation patterns points to a connection between local land surface changes and established global teleconnection systems.
Conclusion
This study demonstrates the successful application of machine learning and explainable AI in identifying and understanding key teleconnection drivers for heatwave prediction in South Korea. The LGBM model, incorporating snow depth variability in the Gobi Desert and Tianshan Mountains, offers improved prediction accuracy compared to existing methods. Future research should focus on further investigation of the underlying physical mechanisms and exploring additional teleconnection drivers, particularly considering newly emerging patterns. The findings contribute to enhanced heatwave prediction capabilities and improved mitigation strategies in South Korea.
Limitations
The study's primary limitation is the relatively small sample size of annual heatwave frequency data, which could affect the model's generalizability. The LGBM model's sensitivity to hyperparameters also requires careful tuning, although a rigorous optimization process was implemented. The DBSCAN algorithm used for clustering may not capture all relevant drivers, and future studies could explore alternative approaches. The study focuses on South Korea; further research should explore the applicability of the findings to other regions.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny