Earth Sciences

Lightning nowcasting with aerosol-informed machine learning and satellite-enriched dataset

G. Song, S. Li, et al.

This groundbreaking study by Ge Song, Siwei Li, and Jia Xing leverages machine learning to enhance lightning nowcasting accuracy using aerosol features and satellite observations. With a remarkable 94.3% accuracy, the team reveals unexpected influences of different aerosol types on lightning occurrences.

00:00

Playback language: English

Index

Introduction

Lightning, a major cause of natural fatalities and economic losses, necessitates accurate and timely prediction. While numerical models simulate lightning formation, they struggle with balancing detection and false alarm rates, and are computationally intensive. Observation-based, data-driven models offer a computationally efficient alternative. Machine learning models, such as LightGBM, have shown promise but are limited by accuracy due to insufficient training datasets and incomplete feature data. Previous studies relied heavily on ground-based networks and polar orbit satellites, which have limitations in detection efficiency and spatial coverage. The Geostationary Lightning Mapper (GLM) offers real-time, full spatiotemporal coverage, improving data quality and availability for lightning prediction models. Furthermore, current models overlook the significant impact of aerosols on lightning formation. Observational studies have demonstrated that aerosols affect lightning through their microphysical and radiative properties. Different aerosol components exert diverse influences; for example, some stimulate convection, while others suppress particle activation. Incorporating aerosol information into machine learning models is expected to enhance the accuracy of lightning prediction. This study enhances existing lightning nowcasting by integrating aerosol optical depth, aerosol composition, conventional meteorological variables, and geostationary satellite observations (GLM) as the primary data source. The model's performance is evaluated using established nowcasting and forecasting metrics. The results demonstrate the efficacy of aerosol-informed machine learning in predicting lightning occurrence.

Literature Review

Existing literature highlights the challenges of accurate lightning nowcasting. Numerical weather prediction models, while capable of explicitly simulating lightning formation, often struggle to achieve both high lightning detection and low false alarm rates, limiting their practical application. Data-driven approaches using machine learning offer a more efficient alternative. Studies have explored various machine learning models, including artificial neural networks, decision trees, LightGBM, support vector machines, random forests, and recurrent neural networks. While these models show potential, they often struggle with high false alarm rates at high probability of detection levels, possibly due to insufficient training data and a lack of comprehensive feature data, particularly the impact of aerosols on lightning. The use of geostationary satellites like GOES-16 with the GLM sensor offers a significant advancement, providing real-time, high-resolution lightning data with full spatial coverage, crucial for improving model accuracy and temporal resolution. However, the influence of aerosols on lightning patterns has been largely overlooked in previous machine-learning-based lightning prediction models, despite observational evidence demonstrating a substantial impact. This study addresses these limitations by incorporating both advanced satellite data and aerosol information to improve lightning nowcasting.

Methodology

This study utilizes a LightGBM model for hourly lightning nowcasting. The dataset includes lightning observations from the Geostationary Lightning Mapper (GLM) onboard GOES-16, serving as the primary data source and labels for the model. Meteorological data and aerosol data were obtained from the Copernicus Atmosphere Monitoring Service (CAMS) forecast products. The meteorological variables included surface pressure (SP), temperature at 500 hPa (T500), relative humidity at 500 hPa (SH), 10 m U-component wind speed at 500 hPa (UW), and 10 m V-component wind speed at 500 hPa (VW). Aerosol information included aerosol optical depth (AOD) for five aerosol components (black carbon, dust, organic carbon, sulfate, and sea salt) and surface PM2.5 concentration. All data were gridded to 0.25° × 0.25° spatial resolution and hourly temporal resolution. The LightGBM model's hyperparameters were optimized using a grid search strategy. To address data imbalance (fewer lightning-active cases), a focal loss function was implemented. The model was evaluated using 10-fold day-based cross-validation on the 2020 summer dataset and out-of-sample validation on the 2021 summer dataset. Model performance was assessed using various metrics including accuracy, probability of detection (POD), false alarm ratio (FAR), critical success index (CSI), Heidke skill score (HSS), and the area under the precision-recall curve (PRC-AUC). The Shapley Additive exPlanation (SHAP) method was used to interpret the model and analyze feature importance. The model was designed to predict the presence or absence of lightning in the next hour; the temporal information was captured by including day of year (DOY) and local hour (HH). A comparative analysis was conducted by comparing the proposed model (incorporating aerosol information) with a baseline model without aerosol information to assess the impact of aerosols on prediction accuracy.

Key Findings

The LightGBM model, trained on the 2020 summer data and validated on the 2021 summer data, demonstrated excellent performance. The model achieved an accuracy of 94.3%, POD of 75%, FAR of 38.1%, and PRC-AUC of 0.727. The model significantly outperformed baseline models (Persistence model, CAPE model) across all evaluation metrics (POD, FAR, CSI, HSS). The spatial distribution of model performance (POD and FAR) correlated with lightning density, with higher performance in regions of higher lightning activity (southeastern CONUS). The model showed high transferability, with only a slight reduction in performance when applied to the 2021 data. Using GLM data significantly improved the model compared to using only Lightning Mapping Array (LMA) data; the FAR was substantially lower with the GLM data (38.1% vs 56%). The improvement in performance with GLM data was also attributed to the improved spatial coverage and detection stability compared to ground-based networks. The inclusion of aerosol information significantly enhanced the model's performance, especially at higher POD levels (above 75%). The diurnal variation of aerosol optical depth (AOD) was strongly correlated with lightning occurrence, indicating its potential as a predictor. SHAP analysis revealed that sulfate aerosols had the most significant positive impact on lightning occurrence, followed by sea salt, organic compounds. Black carbon exhibited a negative impact. The model's performance was better in regions with high lightning density and high aerosol loading (southeastern and Midwestern CONUS) but was limited in regions with low lightning frequency or aerosol loading (western CONUS).

Discussion

The results demonstrate that the aerosol-informed machine learning model significantly improves lightning nowcasting accuracy. The integration of high-quality satellite data (GLM) and aerosol features from CAMS forecast products proved crucial in achieving superior performance compared to previous models. The model's ability to capture the relationship between aerosol composition and lightning activity aligns with existing scientific knowledge, providing valuable insights into the complex interactions between aerosols and lightning formation. The limitations in model performance in low-lightning-frequency areas highlight the challenge of handling imbalanced datasets in machine learning. Future work should focus on addressing data imbalance and incorporating additional features related to lightning formation. The superior accuracy of this model has important implications for enhancing emergency preparedness and mitigating economic losses associated with lightning strikes. The model’s capability to pinpoint areas with high lightning risk aligns well with regions where such prediction models are most needed. The findings underscore the potential of integrating detailed aerosol information and advanced satellite observations into machine learning models to improve the accuracy and reliability of lightning nowcasting and forecasting.

Conclusion

This study presents a novel approach to lightning nowcasting that integrates aerosol information and high-resolution geostationary satellite observations. The resulting LightGBM model significantly outperforms existing methods, providing more accurate and reliable predictions. The findings highlight the importance of considering aerosols in lightning prediction and offer valuable insights into the complex interplay between aerosol properties and lightning activity. Future research could focus on improving the model's performance in regions with low lightning frequency, exploring other aerosol datasets, and incorporating additional relevant meteorological and environmental features to further enhance nowcasting capabilities. The development of more accurate and timely lightning nowcasting systems is crucial for reducing risks and losses associated with lightning events.

Limitations

The model's accuracy is limited in regions with low lightning frequency and low aerosol loading, primarily in the western CONUS. This is likely due to the challenge of handling imbalanced datasets in machine learning. The aerosol data used in this study are from CAMS forecast products, which may not fully capture the spatial and temporal variability of aerosols. The model's generalizability to regions outside the CONUS remains to be tested. Future studies should address these limitations by improving the data balance, using more accurate aerosol observations, and testing the model in diverse geographical regions.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

A wearable sensor and machine learning estimate step length in older adults and patients with neurological disorders

A. Zadka, N. Rabin, et al.

Veterinary Science

Divide-and-conquer: machine-learning integrates mammalian and viral traits with network features to predict virus-mammal associations

M. Wardeh, M. S. C. Blagrove, et al.

Computer Science

On responsible machine learning datasets emphasizing fairness, privacy and regulatory norms with examples in biometrics and healthcare

S. Mittal, K. Thakral, et al.

Medicine and Health

Radiogenomics for predicting p53 status, PD-L1 expression, and prognosis with machine learning in pancreatic cancer

Y. Iwatate, I. Hoshino, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny