logo
ResearchBunny Logo
Introduction
Per- and polyfluoroalkyl substances (PFAS) are a class of persistent chemicals with known adverse health effects. Their widespread use since the mid-20th century has resulted in ubiquitous environmental contamination and human serum presence. Some longer-chain PFAS, such as PFOS and PFOA, exhibit long elimination half-lives in humans, increasing the risk of exposure-related health issues like increased cholesterol, reduced infant birth weight, and certain cancers. Vulnerable populations, including infants, children, and those disproportionately burdened by environmental pollution, are at higher risk. While the USEPA's Third Unregulated Contaminant Monitoring Rule (UCMR3) tested a small percentage of US public water systems, revealing PFAS contamination in many states, including Colorado, significant data gaps remain, particularly regarding private wells and smaller water systems. This study aimed to develop a PFAS groundwater contamination risk prediction map for Colorado to guide sampling prioritization efforts by the Colorado Department of Public Health and Environment (CDPHE), focusing limited resources efficiently toward protecting vulnerable communities.
Literature Review
Several studies have explored similar prediction models for PFAS contamination. These studies improved the understanding of PFAS dynamics in the environment, including the identification of potential sources and factors influencing groundwater vulnerability. This work builds upon those studies by incorporating a broader range of data sources specific to the Colorado context, to aid in prioritizing limited resources for water sampling.
Methodology
The researchers employed random forest classification, a supervised machine learning technique, to create a predictive surface of potential groundwater contamination from PFOS and PFOA. The model utilized a diverse dataset including: PFAS water sampling results from ten different sources (1232 data points); information on known and potential PFAS sources (fire stations, military installations, airports, landfills, wastewater treatment plants, etc., categorized and analyzed both individually and as density rasters); physical environmental characteristics (elevation, soil properties, geology, precipitation, groundwater flow, and land use); and population vulnerability data (Colorado's definition of disproportionately impacted (DI) communities, considering low-income households, people of color, and housing cost-burdened households). Data below the limit of detection (LoD) were assigned a value of 1½ the LoD. The model was trained using 75% of the data and validated using the remaining 25%. Model parameters (tree depth, number of trees) were optimized to maximize performance metrics (false positive rate, sensitivity, specificity, precision, accuracy). The final model predicted PFAS risk into three categories: low (<5 ng/L), moderate (5-35 ng/L), and high (≥35 ng/L). Predictions were generated for a 1-mile grid across Colorado and interpolated into a continuous risk surface. Variable importance was assessed using Gini coefficients. A sampling prioritization plan was developed based on the prediction map, considering unsampled public water systems (including transient and non-transient non-community systems), census blocks with high private well use, schools, and mobile home parks, particularly those located in DI communities. The distance to various potential source types was also analyzed to identify areas needing further investigation.
Key Findings
The final random forest model achieved high sensitivity and precision for the "low" and "high" risk categories (85% and 90% for low risk; 80% and 71% for high risk, respectively), but lower performance for the "moderate" risk category (58% and 55%). Population density emerged as the most important variable influencing model predictions. Other important variables included ski resorts, soil permeability class, elevation, airports, annual average precipitation, AFFF spills, fire stations, and water flow direction. The model identified priority sampling locations, including 15 schools, 19 mobile home parks (3 schools and 12 mobile home parks located in DI communities), over 300 potentially at-risk public water systems (many very small, TNCs, and NTNCs), and 20 priority census blocks in DI communities with high private well density. Analysis of distances to potential source types indicated a need for additional investigation of historical PFAS-containing AFFF use, as this data is often lacking.
Discussion
The results provide a valuable tool for prioritizing PFAS sampling in Colorado. The identification of specific locations with elevated risk, particularly those serving vulnerable populations, helps focus limited resources effectively. The model highlights the importance of population density in PFAS contamination, suggesting a need for further research into the factors driving this correlation. The identification of data gaps, particularly regarding historical PFAS use and the prevalence of PFAS in less-studied source types, underscores the need for expanded data collection to improve future model predictions. The study's approach of integrating multiple data sources and employing machine learning provides a flexible and adaptable framework for future risk assessments.
Conclusion
This study provides the first state-wide risk assessment for PFAS contamination in Colorado groundwater, enabling targeted sampling and resource allocation to protect vulnerable populations. The random forest model successfully integrated diverse data to predict PFAS contamination risk. Key priority areas were identified, including specific schools, mobile home parks, public water systems, and census blocks. Future work should focus on addressing data gaps, particularly investigating historical PFAS sources and improving data consistency across different sampling efforts. Annual model updates with refined data will further improve risk assessment and resource allocation.
Limitations
The study's limitations primarily stem from data availability and quality. High LoDs in some datasets, preferential sampling in certain areas, inconsistent PFAS analysis across datasets, and limited knowledge of PFAS releases from many potential sources affected model accuracy. The aggregation of PFOS and PFOA data, while necessary for model robustness given data inconsistencies, limited the ability to predict the full range of PFAS contamination. The lack of reliable statewide well depth data also constrained the model's predictive capabilities.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny