Earth Sciences

Mapping wind erosion hazard with regression-based machine learning algorithms

H. Gholami, A. Mohammadifar, et al.

Explore the impact of wind erosion hazards in Isfahan province, Iran, with cutting-edge regression-based machine learning methods employed by Hamid Gholami, Aliakbar Mohammadifar, Dieu Tien Bui, and Adrian L. Collins. Discover how DEM, precipitation, and vegetation play pivotal roles in shaping these environmental challenges.

00:00

~3 min • Beginner • English

Index

Introduction

Wind erosion poses significant environmental and socio-economic impacts affecting human health and ecosystems. Identifying and predicting land susceptibility to wind erosion hazards (e.g., dust emissions) is essential for mitigation. Prior approaches (remote sensing, data mining, sediment fingerprinting) often require intensive field sampling and costly lab analyses, limiting applicability over large areas. With advances in geospatial technology and computing, machine learning has been successfully applied to mapping various environmental hazards and soil properties. However, the utility of advanced regression-based ML techniques in predicting land susceptibility to wind erosion has not been comprehensively assessed. This study aims to fill that gap by evaluating 16 regression-based ML models to map wind erosion hazard susceptibility in Isfahan province, Iran, and to derive broader recommendations.

Literature Review

Previous environmental studies have applied a wide range of ML models, including decision trees and linear models, PANFIS, genetic algorithms, SVR, ANN, hybrid models, random forest (RF), Wang and Mendel’s, PLSR, PCR, Cubist, BART, RBF, XGBoost, and regression trees, to tasks such as land subsidence, gully erosion, landslides, dust provenance, and soil property mapping. Studies have often highlighted strong performance of RF and boosting methods for spatial prediction tasks (e.g., landslide susceptibility, PM2.5 estimation, soil carbon mapping). Despite these advances, a comprehensive comparison focused on regression-based ML for wind erosion susceptibility mapping has been lacking, motivating the current comparative evaluation of 16 regression-based algorithms.

Methodology

Study area: Isfahan province, central Iran (30°45′59.51"–34°27′13.27" N; 49°41′53.86"–55°30′13.67" E), arid, with intensive wind erosion in the southeast (Segzi plain) and northern parts. Elevation ranges from 686 m to 4398 m (SRTM-derived DEM). Annual precipitation ranges 72–320 mm; mean temperatures ~18 °C (east) to 13 °C (west). Predictor variables (13 factors): Soil properties (available water content, bulk density, calcium carbonate percentage, electrical conductivity, exchangeable sodium percentage, organic carbon content, soil texture); lithology (geology); land use; vegetation (NDVI); topography (DEM); climate (wind speed, precipitation). Soil properties were extracted from the world soil map at 803 sampling points and spatially interpolated in ArcGIS 10.4.1. Lithology and land use were sourced from the Forests, Rangelands, and Watershed Management Organization of Iran (FRWMOI). NDVI was computed as (NIR - RED)/(NIR + RED). DEM derived from SRTM at 30 m. Wind speed (daily average) and total annual precipitation were derived from 23 meteorological stations and mapped in ArcGIS. Inventory map and data partitioning: An inventory map of active wind erosion (detachment, transport, deposition) from FRWMOI delineated 10,961 km² (440 pixels) of active regions. Pixels were randomly split into training (70%; 308 pixels) and validation (30%; 112 pixels) sets. Collinearity assessment: Tolerance coefficient (TC = 1 − R²) and variance inflation factor (VIF = 1/TC) were computed; thresholds of TC < 0.1 and VIF > 10 indicate multicollinearity. Models: Sixteen regression-based ML algorithms implemented via R (caret) were tested: RLR, Cforest (RF-based conditional inference trees), NCPQR (penalized quantile regression), NNFE, MMLPNN, RR (kernel ridge regression), BGLM, NBGLM, BGAM, SGAM, SSR, SGB (gradient boosting machine), SVM (linear kernel), RVM (polynomial kernel), Cubist, and ANFIS. Output and classification: Model outputs were continuous susceptibility values in [0,1], categorized into four classes: low (0–0.25), moderate (0.25–0.50), high (0.50–0.75), very high (0.75–1). Performance evaluation: Root mean square error (RMSE), mean absolute error (MAE), mean bias error (MBE), and Taylor diagrams were used for both training and validation datasets. The best-performing model (lowest RMSE, MAE, MBE) was further used to compute relative importance of predictors.

Key Findings

- Multicollinearity: TC and VIF showed no multicollinearity among the 13 predictors. Lowest TC was for EC; highest VIF was 5.93 (bulk density), both within acceptable limits (VIF < 10, TC > 0.1). - Variable importance (from best model, MMLPNN): DEM (0.95), precipitation (0.80), and NDVI (0.54) were the most influential factors; other factors had lower importance. - Model performance: Based on RMSE, MAE, MBE, and Taylor diagrams, MMLPNN was the most accurate model for mapping wind erosion hazard. Across evaluation datasets, five models showed low errors: MMLPNN, SGAM, Cforest, BGAM, and SGB. SSR and NBGLM had the lowest accuracies among the 16; NCPQR was identified as the overall worst model. Reported for importance computation, MMLPNN had RMSE, MAE, and MBE < 0.002%. - Susceptibility class areas (examples from top models): • MMLPNN: low 32.8%, moderate 1.1%, high 1.2%, very high 64.9% of Isfahan province. • SGAM: low 27.4%, moderate 5.6%, high 5.4%, very high 61.6%. • Cforest: low 26.0%, moderate 6.4%, high 6.6%, very high 61.0%. • BGAM: low 23.2%, moderate 7.8%, high 7.0%, very high 62.0%. • SGB: low 32.0%, moderate 0.6%, high 2.2%, very high 65.2%. - Overall ranges across all 16 models: low 15.5–32.8%, moderate 0.6–15.7%, high 1.2–20.2%, very high 41.0–65.2%. - Critical controls: Topographic elevation (DEM), precipitation, and vegetation cover (NDVI) were the dominant controls on wind erosion susceptibility in the study area.

Discussion

The study demonstrates that regression-based machine learning can effectively map land susceptibility to wind erosion, addressing the need for scalable, cost-efficient hazard assessment. The superior performance of MMLPNN suggests that monotonic multi-layer perceptron architectures can robustly capture complex, potentially monotonic relationships between environmental predictors and erosion susceptibility. SGAM’s strong performance underscores the value of flexible spline-based representations for non-linear relationships without specifying parametric forms. Cforest (RF with conditional inference trees) performed well, consistent with literature highlighting RF’s robustness for spatial predictions of environmental hazards. Boosting-based methods (BGAM, SGB) also achieved high accuracy, reflecting the strength of ensemble methods for improving predictive performance. The identification of DEM, precipitation, and NDVI as the most influential predictors aligns with physical understanding: terrain and climate govern erosivity and exposure, while vegetation modulates surface protection and roughness. The resulting susceptibility maps consistently indicate large fractions of the province in high to very high susceptibility, informing prioritization for mitigation. Collectively, these findings confirm that advanced regression-based ML provides accurate, generalizable tools for wind erosion hazard mapping.

Conclusion

This work provides a comprehensive comparison of 16 regression-based machine learning algorithms for mapping land susceptibility to wind erosion in an arid region of central Iran. Using 13 environmental and climatic predictors and an inventory of active wind erosion, the study found MMLPNN to be the most accurate model, with SGAM, Cforest, BGAM, and SGB also performing strongly. DEM, precipitation, and NDVI emerged as the key controlling factors. The results support the broader application of regression-based ML for wind erosion hazard mapping in arid and semi-arid ecosystems. Future research should compare regression-based and classification-based ML approaches for wind erosion and dust source mapping to strengthen evidence for management decisions.

Limitations

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Prediction of mortality risk and duration of hospitalization of COVID-19 patients with chronic comorbidities based on machine learning algorithms

P. Amiri, M. Montazeri, et al.

Computer Science

Rewritable two-dimensional DNA-based data storage with machine learning reconstruction

C. Pan, S. K. Tabatabaei, et al.

Medicine and Health

Passive detection of COVID-19 with wearable sensors and explainable machine learning algorithms

M. Gadaleta, J. M. Radin, et al.

Medicine and Health

Improved metabolomic data-based prediction of depressive symptoms using nonlinear machine learning with feature selection

Y. Takahashi, M. Ueki, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny