logo
ResearchBunny Logo
Tracking and tracing water consumption for informed water sensitive intervention through machine learning approach

Environmental Studies and Forestry

Tracking and tracing water consumption for informed water sensitive intervention through machine learning approach

A. T. Abraha, T. A. Woldeamanuel, et al.

This intriguing study, conducted by Abraha Tesfay Abraha, Tibebu Assefa Woldeamanuel, and Ephrem Gebremariam Beyene, delves into residential water consumption in Adama city, Ethiopia. It uncovers significant variances in water use across urban areas, emphasizing the urgent need for sustainable water management practices and conservation efforts.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the growing challenge of providing reliable access to clean water in developing nations, where progress toward water-related Sustainable Development Goals is lagging. Urban growth, centralized single-source supply, inadequate conservation practices, and limited adoption of alternative water sources contribute to physical and economic water scarcity. Within this context, residential (domestic) water consumption is a major sectoral driver. The research focuses on the water sensitive city framework—promoting diversified sources, integration of centralized and decentralized systems, and water-sensitive communities—to guide interventions. It identifies critical gaps: few quantitative, context-specific studies in developing countries; limited integration of socioeconomic, climatic, and spatial factors; scarce tracking and mapping of water flows; and limited application of machine learning to predict household water use. Using Adama city, Ethiopia, as a case, the study asks: Where are the water sources and how do they support urbanization? Which factors influence residential water consumption? How do neighborhoods differ in consumption and reliability? What practical water-sensitive interventions are suitable? The purpose is to build baseline evidence, identify key determinants, and demonstrate a predictive machine learning approach to support decision-making.
Literature Review
The paper synthesizes prior research showing that residential water use is influenced by complex socioeconomic, physical, climatic, technological, and spatial factors, with inconsistent findings across contexts. Studies report income, housing type and quality, household size, rooms, outdoor amenities, and garden characteristics as influential, though effects vary (e.g., per capita use tends to decline with larger families, but total use rises; income effects are sometimes significant and sometimes not). Educational background shows mixed associations. Climatic variables (temperatures, rainfall) and seasonality influence demand, often increasing use during hotter periods; however, local supply conditions can invert typical seasonal patterns. Prior works have used statistical models (e.g., multiple linear regression, probit, stepwise approaches), ANN, PCA, and other machine learning techniques to model consumption and identify determinants. Gaps highlighted include the predominance of studies in developed countries, limited integration of multi-domain determinants in developing nations, and insufficient use of spatial tracking/mapping and machine learning to support water-sensitive planning.
Methodology
Study design: A mixed-method data strategy combined top-down and bottom-up approaches. Top-down municipal data included city-level monthly water production, consumption, sectoral breakdowns, and NRW. Bottom-up data came from a household questionnaire survey collecting monthly billed consumption and socioeconomic/parcel attributes, plus Likert-scale measures of conservation behaviors. Sampling: The sampling frame was the municipal land inventory of residential parcels (95,823 households). Using simple random sampling with proportional allocation across three settlement zones (central, intermediate, periphery), 400 households were surveyed (central 141, intermediate 149, periphery 110), achieving a 100% response rate. Variables: The response variable was daily household water consumption (liters/household/day). Independent variables included: socioeconomic (household size, monthly income, housing condition, parcel legal status, number of rooms, parcel area, location relative to city center, service reliability), climatic (mean monthly min/max temperature, annual total rainfall), and topographic (DEM/elevation, slope, aspect, TPI, TRI). See Table 11 of the paper for details. Data processing and spatialization: Socioeconomic records with unique IDs and coordinates were joined in GIS and interpolated (Kriging) to create raster layers. Topographic and climatic rasters were prepared from NMIE data and a 2 m contour digital topographic map to derive DEM, slope, TPI, TRI, and aspect. All predictor layers were rasterized and aggregated for modeling. Data cleaning: Missing values were checked in SPSS; two missing entries were corrected via questionnaire back-check. Outliers were detected via Z-score (|Z|>3) and two outliers were imputed using median values. Modeling: A Random Forest Regression (RFR) model was implemented in R (v4.0.5) using caret (ranger), with 16 predictor rasters. Data were split 90:10 into training and testing sets. Cross-validation used 10-fold CV repeated 5 times. Hyperparameters included mtry and min.node.size; splitrule held at maxstat. Feature importance was computed to rank determinants. Performance metrics included R², RMSE, MAE, NSE, KGE, and others (hydroGOF). A predictive raster map of household water consumption was generated. Software: R (caret/ranger, terra, dplyr) and GIS applications were used for spatial data preparation, modeling, and mapping. Ethical approvals and permissions were obtained; climate/topographic data were sourced from NMIE; utility data from Adama Water Supply and Sewerage Service Enterprise.
Key Findings
- Water source and supply system: Adama now relies predominantly on the Awash River (~43,000 m³/day nominal; effective production ~40,663 m³/day). Boreholes are declining in viability (only 6 of 9 functional; ~3,024 m³/day total with irregular flow), with concerns over sustainability and water quality (fluoride). - Supply-demand gap and infrastructure: 2023 average total daily demand (domestic and non-domestic) ~65,226.6 m³/day vs. supply ~40,663 m³/day, leaving ~24,563 m³/day unmet (≈38% of demand). The distribution network covers only ~45% of the city master plan; SDBI = 0.6. NRW is ~20% of production (2013–2023 range up to 22%). - Sectoral consumption: Residential sector consumes ~73.15% of total (average annual 8,685,708 m³), followed by commercial (15.13%) and governmental (8.53%). - Consumption levels: Average household consumption is ~586 L/household/day; average per-capita consumption is ~69.2 L/person/day (below the national standard of 80 L/person/day). Spatial differences: average per-capita consumption is highest in central areas (79 L), intermediate 76 L, periphery 47 L; per-household averages: intermediate 739 L, central 568 L, periphery 402 L. - Reliability and access: Weekly supply frequency among 400 households: once (1.8%), twice (15.8%), thrice (12.5%), four times (11.5%), five times (15.0%), six times (11.0%), always available but restricted flow (32.5%). About 30% of households receive water no more than every three days; many neighborhoods rely on centralized point supply and shifting schedules. Decentralized truck tankers are used due to unreliability. - Seasonal pattern: Unlike many cities, monthly residential consumption increases during the rainy season (June–September) due to increased river flow, higher production, and reduced rationing. - Alternative sources: Estimated annual roof water harvesting potential per household (roof sizes 9–270 m²) ranges from ~6 to 181.9 m³, yet current traditional collection yields only ~1–3 m³. - Determinants (variable importance, RFR): Most important predictors include total household size (100), very good housing condition (29.24), household monthly income (27.72), number of rooms (26.66), formal parcel ownership (18.15), parcel area (13.14), higher supply reliability (>3 times/week) (11.42), climatic variables (min temp 10.60; max temp 6.24; annual rainfall 4.07), location (intermediate 10.48), housing quality categories, and topographic factors (aspect 5.46; DEM 3.18; TPI 1.74). TRI and slope had minimal importance. - Model performance: Cross-validated training best model mtry=13, min.node.size=20, splitrule=maxstat with RMSE ≈ 166.68 L/day, R² ≈ 0.774, MAE ≈ 114.28 L/day. Test performance: R² ≈ 0.77–0.88 (reported R²=0.88, bR²=0.77), RMSE ≈ 151.89 L/day, MAE ≈ 110.79 L/day; NSE=0.76, KGE=0.72, d=0.92, VE=0.82. - Predictive mapping: Predicted household daily consumption ranges across the city span approximately 229–909+ L/household/day, with highest bands exceeding 1,135 L/day in limited areas; clear spatial gradients aligned with settlement type, infrastructure, and socioeconomics. - Equity and formality: Formal parcels consume ~2.4× more water than informal parcels. Higher housing quality households use up to ~3× more water than lower-quality ones. - Conservation behaviors: Households frequently repair leaks and turn off taps while brushing/washing, but rarely install efficient fixtures (flow restrictors, low-flow showerheads/toilets) or reuse water; outdoor practices favor timing (watering early) but limited use of alternative sources.
Discussion
Findings directly answer the research questions. Adama’s water supply is effectively single-sourced (Awash River) and insufficient for current demand, with infrastructure coverage and reliability constraints causing physical and economic water scarcity. Residential consumption dominates city demand, and per-capita usage is below national standards, reflecting constrained access rather than sufficiency. Spatial heterogeneity shows intermediate areas with higher household totals (likely larger households/parcels) and central areas with higher per-capita consumption, consistent with service availability and socioeconomic status. The RFR model demonstrates strong predictive power and identifies key determinants—household size, housing quality, income, rooms, parcel legality, reliability, climatic and topographic features—aligning with and extending prior literature to the Ethiopian urban context. The seasonal peak during rainy months arises from supply-side relaxation (increased river flow and pressure), highlighting how supply reliability can invert typical climate-demand patterns. The results support water-sensitive interventions emphasizing demand management (behavior and devices), source diversification (rainwater, reuse), and fit-for-purpose allocations to reduce unmet demand, with transition pathways toward a water sensitive city integrating centralized and decentralized systems.
Conclusion
This study establishes a comprehensive baseline of water sources, production, consumption, and reliability for Adama city; quantifies residential consumption patterns; and develops a robust machine learning model to predict household water use and identify key determinants. Main contributions include: (1) evidence of a substantial supply-demand gap driven by single-source dependence and limited network coverage; (2) identification of socioeconomic, climatic, and topographic drivers, with household size, housing quality, income, and rooms as leading predictors; (3) demonstration that Random Forest Regression provides reliable predictive performance and spatially explicit consumption mapping for decision support; and (4) actionable water-sensitive intervention strategies. Recommended interventions include improving household conservation behaviors and retrofits, diversifying sources (surface/groundwater, rainwater harvesting, wastewater reuse), implementing fit-for-purpose allocation across sectors, and steering system design toward integrated centralized–decentralized, multi-source configurations characteristic of water sensitive cities. Future research should extend the approach to multiple cities to enhance generalizability and support broader policy design.
Limitations
- Case study scope: The analysis focuses on a single city (Adama), limiting generalizability to other urban contexts. - Data constraints: Household behaviors and infrastructure conditions were captured via surveys and utility records; while cleaned and validated, measurement and reporting biases may remain. Climatic and topographic rasters are proxies and may not capture micro-scale heterogeneity. - Modeling scope: The RFR model, while performant, is based on available predictors and cross-sectional billing data; dynamic price effects and temporal behavioral adaptations were not explicitly modeled. Broader validation across cities and time is suggested.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny