This study addresses the scarcity of reliable drinking water quality data in Ethiopia by integrating existing household survey data with machine learning techniques. Predictive models were developed using data from the 2016 Ethiopia Socio-Economic Survey (ESS) to identify households with contaminated drinking water (≥1 E. coli per 100 mL). The best-performing model (Random Forest) achieved high accuracy (88.5%) and discrimination (AUC 0.91), effectively predicting water quality based on demographic, socioeconomic, and geospatial variables. This model was successfully applied to other ESS waves lacking water quality testing, demonstrating its potential for filling data gaps in drinking water safety monitoring.
Publisher
npj Clean Water
Published On
Sep 08, 2023
Authors
Alemayehu A. Ambel, Robert Bain, Tefera Bekele Degefu, Ayca Donmez, Richard Johnston, Tom Slamyaker
Tags
drinking water quality
machine learning
E. coli
Ethiopia
predictive modeling
socioeconomic factors
water safety monitoring
Related Publications
Explore these studies to deepen your understanding of the subject.