logo
ResearchBunny Logo
Introduction
Wind erosion poses significant socioeconomic challenges and harms terrestrial and marine ecosystems. Accurately predicting land susceptibility to wind erosion, particularly dust emissions, is crucial for mitigation efforts. While various techniques exist for investigating wind erosion (remote sensing, data mining, sediment fingerprinting), they often require extensive and expensive field sampling, limiting their applicability to large areas. Machine learning (ML) offers a promising alternative, with successful applications in mapping various environmental hazards. However, the potential of advanced ML techniques in predicting wind erosion susceptibility remains largely unexplored. This study aims to fill this gap by comprehensively evaluating the performance of sixteen regression-based ML models in mapping wind erosion hazard in Isfahan province, Iran, providing more generalizable recommendations.
Literature Review
Previous research on wind erosion and its consequences has employed diverse methods, including remote sensing, data mining, and sediment fingerprinting. These methods, however, often involve intensive and costly field sampling and laboratory analyses, making them inefficient for large-scale assessments. Recent advancements in geospatial technology and computer science have led to increased use of machine learning in environmental hazard mapping, including land subsidence, gully erosion, landslides, and dust provenance. Existing ML applications in environmental research encompass decision trees, linear equation models, PANFIS, genetic algorithms, SVR, ANNs, hybrid models, RF, WM, PLSR, PCR, Cubist, BART, RBF, XGBoost, and regression tree analysis. Despite this progress, a comprehensive study applying advanced regression-based ML models to map wind erosion hazards remains absent in the literature, motivating this research.
Methodology
The study area is Isfahan province, central Iran, characterized by arid conditions and significant wind erosion. Thirteen factors influencing wind erosion were considered: available water content (AWC), bulk density, calcium carbonate percentage, digital elevation model (DEM), electrical conductivity (EC), exchangeable sodium percentage (ESP), land use, geology, precipitation, organic carbon content, normalized difference vegetation index (NDVI), soil texture, and wind speed. Multicollinearity among these factors was assessed using the tolerance coefficient (TC) and variance inflation factor (VIF). Sixteen regression-based machine learning algorithms were applied: Robust Linear Regression (RLR), Cforest, Non-convex Penalized Quantile Regression (NCPQR), Neural Network with Feature Extraction (NNFE), Monotone Multi-layer Perception Neural Network (MMLPNN), Ridge Regression (RR), Boosting Generalized Linear Model (BGLM), Negative Binomial Generalized Linear Model (NBGLM), Boosting Generalized Additive Model (BGAM), Spline Generalized Additive Model (SGAM), Spike and Slab Regression (SSR), Stochastic Gradient Boosting (SGB), Support Vector Machine (SVM), Relevance Vector Machine (RVM), Cubist, and Adaptive Network-Based Fuzzy Inference System (ANFIS). Model performance was evaluated using RMSE, MAE, MBE, and Taylor diagrams on both training (70%) and validation (30%) datasets derived from a wind erosion inventory map. The model with the highest performance (MMLPNN) was used to determine the relative importance of the factors influencing wind erosion. Spatial maps of wind erosion hazard were generated for each model and classified into four susceptibility classes (low, moderate, high, very high).
Key Findings
The multicollinearity test indicated no significant multicollinearity among the thirteen factors. The MMLPNN model exhibited the highest prediction accuracy, followed by SGAM, Cforest, BGAM, and SGB. DEM, precipitation, and NDVI were identified as the most important factors influencing wind erosion. The area proportions of the four susceptibility classes varied significantly among the models. For instance, the MMLPNN model classified 32.8% as low, 1.1% as moderate, 1.2% as high, and 64.9% as very high susceptibility. The SGAM model showed 27.4% low, 5.6% moderate, 5.4% high, and 61.5% very high susceptibility. Statistical indicators (MAE, MBE, RMSE) and Taylor diagrams confirmed the superior performance of MMLPNN, SGAM, Cforest, BGAM, and SGB. NBGLM and RVM demonstrated the lowest accuracies.
Discussion
The findings highlight the effectiveness of regression-based machine learning models in mapping wind erosion hazards, particularly MMLPNN, which offers a robust and accurate approach. The importance of DEM, precipitation, and NDVI aligns with established understanding of wind erosion processes. The variation in susceptibility class areas among models underscores the importance of model selection based on specific data characteristics and performance metrics. The superior performance of models like MMLPNN and SGAM, known for their ability to handle non-linear relationships, suggests the complex nature of wind erosion processes. The relatively poor performance of some simpler models (NBGLM, RVM) emphasizes the advantage of utilizing advanced ML techniques for complex environmental modeling. Future work could investigate the comparative performance of regression-based and classification-based models, improving the robustness of spatial wind erosion and dust source modeling.
Conclusion
This research demonstrated the effectiveness of sixteen regression-based machine learning algorithms for mapping wind erosion hazard in an arid region. The MMLPNN model emerged as the most accurate, highlighting the potential of advanced ML techniques for such applications. Future research should compare regression-based and classification-based ML models for more robust spatial modeling of wind erosion and dust sources to inform effective management strategies.
Limitations
The study's limitations include the reliance on existing spatial datasets, which may have inherent uncertainties. The accuracy of the wind erosion inventory map, used for model training and validation, influences the overall model performance. The study's regional focus might limit the generalizability of findings to other geographical areas with different climatic and environmental conditions. Further research could address these limitations by incorporating higher-resolution data, improving the inventory map, and extending the analysis to diverse geographical settings.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny