logo
ResearchBunny Logo
Using machine learning to predict the efficiency of biochar in pesticide remediation

Environmental Studies and Forestry

Using machine learning to predict the efficiency of biochar in pesticide remediation

A. Nighojkar, S. Pandey, et al.

Discover how innovative research by Amrita Nighojkar, Shilpa Pandey, Minoo Naebe, Balasubramanian Kandasubramanian, Winston Wole Soboyejo, Anand Plappally, and Xungai Wang is leveraging ensemble machine learning to enhance biochar's efficiency in removing pesticide pollutants from water, revolutionizing agricultural practices and environmental remediation!

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge of pesticide contamination in water, a major environmental concern impacting biodiversity and human health, particularly in rapidly developing regions. While biochar has emerged as a promising, sustainable adsorbent for pesticide remediation in aqueous environments, its adsorption efficiency is influenced by numerous interacting factors, including biochar feedstock and production conditions, water matrix parameters (e.g., pH, temperature, initial pesticide concentration), experimental setup (dose, contact time), and pesticide properties. Existing experimental and review studies provide insights but struggle to quantify the combined effects and relative importance of these variables. The research question is whether ensemble machine learning models can accurately predict pesticide adsorption on biochar across diverse conditions and elucidate the relative contributions of physicochemical and experimental parameters to optimize biochar-based remediation in agricultural systems. The study proposes a data-driven framework using CatBoost, LightGBM, and Random Forest to derive generalizable insights that can guide efficient, sustainable biochar design and application.
Literature Review
Prior work has extensively examined biochar for pesticide and organic pollutant removal, noting mechanisms such as hydrogen bonding, surface complexation, electrostatic interactions, π–π interactions, van der Waals forces, and pore filling. Reviews and meta-analyses have assessed factors like feedstock and production parameters, but have been limited in jointly quantifying variable contributions. Machine learning has been applied in related pesticide domains (identification in real samples, dissipation in plants, impacts on soil microbiota, toxicity assessment) and in predicting adsorption for other contaminants. Molecular simulations have probed biochar–pollutant interactions at microscopic scales but have not captured the bulk physicochemical parameter effects in aqueous biochar systems for pesticides. The gap remains in applying ensemble ML to predict biochar-mediated pesticide adsorption and to rank feature importance across heterogeneous experimental conditions.
Methodology
- Dataset compilation: 96 peer-reviewed articles were collected via Google Scholar, Scopus, and Web of Science on pesticide adsorption in biochar-mediated aqueous systems. Data included biochar properties, water matrix conditions, experimental factors, and adsorption capacities; details are provided in Supplementary materials, with the dataset available on GitHub. - Response variable: Pesticide adsorption capacity (mg/g) was obtained from tables/graphs (digitized with WebPlotDigitizer) or computed from initial and post-treatment concentrations, solution volume, and biochar mass. - Input attributes: Initially ten attributes were considered: biochar surface area (SA), biochar pH (pH_BC), total pore volume (Vt), biochar dose (Dose), cation exchange capacity (CEC), solution pH (pH), initial pesticide concentration (Co), contact time (CT), temperature (T), and others grouped under biochar properties, aqueous matrix, and experimental conditions. After preprocessing, nine features remained (excluding CEC) with eight used as inputs and one as output. - Missing data handling: Attributes with >50% missingness were dropped (CEC). Remaining missing values were imputed with the median to avoid bias. Pearson correlation among nine features was assessed. - Final dataset: 878 records × 9 columns covering SA, Vt, pH_BC, pH, CT, Dose, T, Co, and the output (Pesticide Adsorption). Reported variability (Table 1) includes ranges such as SA 0.25–2192 m²/g, Vt 0.0024–1.085 cm³/g, pH_BC 2.29–10.12, pH 2–10, CT 40–4320 min, Dose 0.001–8 g/L, T 288–333 K, Co 0.2–1000 mg/L, and adsorption 0.1–1120 mg/g. - Modeling: Three ensemble decision-tree models were developed: CatBoost, LightGBM, and Random Forest. Data were split via stratified random sampling into training (n=703) and testing (n=175) sets. Grid search was used for hyperparameter tuning. Five-fold cross-validation mitigated overfitting. - Evaluation metrics: Coefficient of determination (R²) and root-mean-squared error (RMSE) were computed for train and test sets. - Model interpretation: Feature importance was analyzed using CatBoost’s built-in importance, SHAP values for contribution interpretations, and partial dependence plots (1D and 2D PDPs) to explore marginal and pairwise effects. - Implementation: Python with scikit-learn and conda packages for CatBoost, LightGBM, and RF; plotting with pyplot.
Key Findings
- Predictive performance: CatBoost achieved the best performance with R²_train = 0.968 and R²_test = 0.956, outperforming LightGBM (R²_train = 0.931; R²_test = 0.862) and Random Forest (R²_train = 0.820; R²_test = 0.796). CatBoost showed lower train–test loss divergence than LightGBM; RF errors were higher than boosting models. - Feature importance: Surface area (SA) was the most influential predictor, followed by initial pesticide concentration (Co), total pore volume (Vt), and biochar dose. SHAP analyses corroborated these rankings. - Partial dependence insights: • Textural properties: Increases in SA (≈0.25–1000 m²/g) and Vt (≈0.004–0.5 cm³/g) positively and strongly correlated with adsorption capacity, indicating more active sites and improved diffusion via interconnected pores. • Dose and contact time: Adsorption capacity decreased with higher biochar dose; highest capacities observed for Dose < 1 g/L and CT < 500 min, likely due to reduced aggregation/pore blockage and efficient site accessibility. • Water matrix: Adsorption capacity increased approximately linearly with Co across 0.2–2000 mg/L, driven by steeper concentration gradients enhancing mass transfer. pH and temperature had comparatively smaller effects within the studied ranges. - Practical implication: CatBoost’s ~96% prediction accuracy and interpretability enable optimization of biochar design and operating conditions to maximize pesticide removal efficiency in water.
Discussion
The findings demonstrate that ensemble ML, particularly CatBoost, can accurately predict pesticide adsorption on biochar across heterogeneous experimental conditions while quantifying the relative impact of key parameters. By highlighting the dominance of biochar textural properties (SA, Vt), initial pesticide concentration, and dose, the models address the central challenge of navigating multidimensional trade-offs that traditional single-factor analyses cannot capture. These insights support evidence-based optimization: researchers can design targeted experiments; producers can tailor biochar with optimal porosity and surface area; farmers and practitioners can select appropriate doses and contact times to enhance removal while minimizing resource use; and regulators can evaluate treatment effectiveness for runoff management. Incorporating more real-world field data will further improve generalizability, enabling robust deployment in diverse agricultural water systems to reduce pesticide loads and protect ecosystems and public health.
Conclusion
This study establishes a data-driven framework using ensemble machine learning to predict and interpret pesticide adsorption by biochar in aqueous systems. CatBoost provided superior accuracy (R²_test ≈ 0.956) and interpretability, revealing that biochar surface area, initial pesticide concentration, pore volume, and dose are principal determinants of adsorption capacity. The approach offers actionable guidance for optimizing biochar design and operational conditions to support sustainable agricultural water management. Future work should integrate richer descriptors of pesticide properties and biochar selectivity, incorporate field-scale and real-world datasets to enhance model robustness, and explore ML-assisted management strategies for post-adsorption challenges, including desorption risks and handling of pesticide-saturated biochar.
Limitations
- Data predominantly derived from laboratory-scale studies; limited representation of complex field conditions may affect external validity. - Missing data required imputation; one key attribute (CEC) was excluded due to >50% missingness, potentially omitting relevant chemistry. - Heterogeneity across studies (feedstocks, production methods, analytical protocols) may introduce unobserved confounding. - The model focuses on adsorption capacity rather than long-term stability, regeneration, or fate of adsorbed pesticides, limiting end-to-end lifecycle insights.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny