logo
ResearchBunny Logo
Clustering micropollutants and estimating rate constants of sorption and biodegradation using machine learning approaches

Environmental Studies and Forestry

Clustering micropollutants and estimating rate constants of sorption and biodegradation using machine learning approaches

S. J. Lim, J. Seo, et al.

This study harnesses the power of machine learning to cluster micropollutants in wastewater, accurately estimating their sorption and biodegradation rate constants. Conducted by Seung Ji Lim and colleagues, this innovative approach improves monitoring of environmental contaminants, achieving significantly higher accuracy than past methods.

00:00
00:00
Playback language: English
Introduction
The increasing use of various chemicals, including pharmaceuticals, personal care products, and pesticides, leads to the release of micropollutants (MPs) into aquatic environments via wastewater treatment plants (WWTPs). Traditional monitoring of MPs in WWTP effluent is costly and time-consuming due to the vast number of potential contaminants. This necessitates more efficient monitoring strategies. While monitoring individual MPs is resource-intensive, monitoring a representative subset could significantly reduce the workload and cost associated with comprehensive analysis. Previous studies have attempted to group MPs for more efficient monitoring; however, these methods have had limitations, including insufficient information for interpretation (dendrograms) or low prediction accuracy due to a lack of detailed chemical characteristics (clustering with biotransformation rules). This study aimed to develop a novel approach leveraging machine learning to improve the accuracy and efficiency of MP monitoring in WWTP effluent by identifying marker constituents and predicting rate constants.
Literature Review
Several clustering methods have been employed to group MPs based on their behavior in WWTPs. Dendrograms offer a visual representation of biodegradation rate constants but lack detailed information. Clustering methods based on biotransformation rules, using systems like the Eawag pathway prediction system (Eawag-PPS), are more interpretable but often suffer from insufficient prediction accuracy due to incomplete information regarding chemical characteristics such as functional groups. These limitations highlight the need for a more sophisticated approach that incorporates various physicochemical properties and provides higher predictive accuracy.
Methodology
This study used a two-step machine learning approach combining unsupervised and supervised learning techniques. First, a self-organizing map (SOM) was used to cluster 29 of the 42 MPs based on their physicochemical properties, functional groups, and initial biotransformation rules. Two scenarios were considered: (PF) using physicochemical properties and functional groups; and (BT) using initial biotransformation rules from Eawag-PPS. Ward's method was then applied to define cluster boundaries based on the SOM's distance map. Following the clustering, a random forest classifier (RFC) was trained using the clustered data (training dataset) to identify marker constituents within each cluster and classify the remaining 13 MPs (testing dataset). These markers, representing the average behavior of the MPs in their respective clusters, were used to estimate sorption coefficients (Kd) and biodegradation rate constants (kbio) for unlabeled MPs in the testing dataset. The accuracy of the estimation was evaluated using equations incorporating the markers' rate constants and standard deviations to define ranges for the rate constants of unlabeled MPs. The model's performance was assessed using metrics such as accuracy, F1-score, precision, and recall for classification, as well as accuracy for rate constant estimation. The batch experiments were conducted under both aerobic and anoxic conditions to fully capture the MP removal characteristics under varying environmental conditions. The removal efficiency of 42 MPs was evaluated using a pseudo first-order degradation kinetic model.
Key Findings
The SOM analysis effectively clustered MPs based on their physicochemical properties and functional groups (PF scenario), achieving a classification accuracy of 0.75 for both aerobic and anoxic conditions. The clustering analysis showed that MPs with similar physicochemical properties and functional groups tended to cluster together, reflecting their similar behavior during wastewater treatment. Based on this, 11 marker constituents were identified for each condition. The RFC, utilizing the marker constituents, accurately estimated the rate constants for unlabeled MPs (test dataset) with an accuracy of 0.77, significantly higher than previously reported accuracies. The estimation accuracy varied with the number of standard deviations used to define the range of rate constants, indicating that a wider range improved accuracy but sacrificed precision. Comparing the PF and BT scenarios, PF consistently demonstrated superior performance in classification and estimation of rate constants. The DBI value was lower for the PF scenario than the BT scenario (0.49 vs. 0.87), indicating better-organized clusters. When applying the model to different microbial community data from the literature, the BT scenario exhibited slightly better performance, suggesting that the optimal input features depend on the specific dataset and the predominant MP removal mechanisms. This highlights the importance of selecting appropriate input features depending on the datasets used.
Discussion
The high accuracy achieved in estimating the rate constants for unlabeled MPs demonstrates the effectiveness of the proposed machine learning approach in reducing the monitoring effort required for comprehensive MP assessment in wastewater. The identification of marker constituents simplifies monitoring by focusing on a smaller subset of MPs that represent the behavior of larger clusters. This approach is particularly useful for emerging MPs, where data may be limited. The superior performance of the PF scenario, utilizing physicochemical properties and functional groups, suggests that these characteristics are more strongly predictive of MP behavior in WWTPs than the initial biotransformation rules alone. The differences observed between the PF and BT scenarios when using different microbial community datasets highlight the context-dependency of MP removal mechanisms, emphasizing the need for model customization based on specific operational conditions and microbial communities. Future research could investigate the applicability of the model to other wastewater treatment technologies and further explore the interactions between MP properties and removal mechanisms. The development of more comprehensive databases including a wider range of MPs and their physicochemical characteristics will further improve the accuracy and generalizability of the model.
Conclusion
This study successfully developed a machine learning-based approach for clustering micropollutants and estimating their sorption and biodegradation rate constants. The combination of SOM and RFC significantly improved the accuracy of rate constant estimations, enabling efficient monitoring using a smaller set of marker constituents. The findings highlight the importance of considering both physicochemical properties and functional groups for accurate prediction of MP behavior in WWTPs. Future work should focus on expanding the dataset and refining the model to enhance its robustness and generalizability.
Limitations
The study's limitations include the relatively small number of MPs in the dataset, which may affect the generalizability of the model. The accuracy of the rate constant estimations depends on the selection of markers and the number of standard deviations used. The model's performance might vary depending on the specific microbial community and operational conditions of the WWTP. Future studies should address these limitations by expanding the dataset and validating the model in a wider range of settings.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny