logo
Loading...
Enhancing the explanation of household water consumption through the water-energy nexus concept

Environmental Studies and Forestry

Enhancing the explanation of household water consumption through the water-energy nexus concept

Z. Li, C. Wang, et al.

This groundbreaking study by Zonghan Li, Chunyan Wang, Yi Liu, and Jiangshan Wang explores the critical water-energy nexus to enhance our understanding of household water consumption. Through advanced modeling techniques, including XGBoost, their research unveils that integrating energy-related features vastly improves explanatory power, offering fresh insights for sustainable resource management.... show more
Introduction

Rapid growth in urban settlements is driving substantial increases in residential water consumption, projected to rise dramatically by 2050. Explaining household water consumption is critical for infrastructure planning, demand management, and conservation measures, yet existing models often leave more than half of the variance unexplained (typical R² < 0.50). Prior studies have focused on water-use behaviors, household demographics/economics, and housing characteristics, using OLS, ARIMA, ANNs, and tree-based methods. A key gap is the limited or absent consideration of the water-energy nexus—despite many household activities (laundry, bathing, cooking) jointly consuming water and energy. In Beijing, up to 65.6% of household water use is associated with energy use, and 54.5% of electricity is associated with water use, suggesting energy features could proxy residual variance in water models. This study assesses whether incorporating energy use (EU) and electricity consumption (EC) features improves explanatory power for household annual water consumption, comparing models with and without these features across OLS, random forest, and XGBoost using household-level data from Beijing.

Literature Review

Extensive research has modeled household water use using features such as water-use behaviors, demographics (e.g., family size, income, education), and housing attributes (area, type), with techniques including OLS, ARIMA, ANNs, and tree-based models. Despite breadth, many models achieve R² < 0.50, indicating substantial unexplained variability. Studies addressing the water-energy nexus have often been narrow (e.g., focusing on specific appliances like water heaters or hot water use), with few works integrating broader energy-use and total electricity features due to data constraints. Prior evidence indicates strong coupling between household water and energy use in multiple regions and time scales, motivating a comprehensive integration of energy-related features to improve explanatory power in household water consumption models.

Methodology

Data were collected in 2020 via a self-designed, face-to-face household questionnaire in Haidian and Tongzhou Districts, Beijing, using systematic sampling (1320 responses; 1257 valid after cleaning). The survey comprised 78 items consolidated into 24 features spanning: household information (HI), water use (WU), energy use (EU), water consumption (WC), and electricity consumption (EC). Seasonal monthly WC and EC were gathered (or converted from costs using local tariffs) and aggregated to annual totals by month-weighting. WU and EU items captured behavior frequencies/durations and appliance characteristics (e.g., washing machine power, water heater power/temperature, cooking/AC durations). Data cleaning involved completeness checks, cross-checking WC with computed totals, outlier removal using the 3-sigma rule, and LASSO-based feature selection to mitigate multicollinearity. The study implemented a stepwise-like modeling approach constructing four annual water consumption models: (1) HI+WU, (2) HI+EU, (3) HI+WU+EU, (4) HI+WU+EU+EC. Three techniques were applied: OLS multiple regression, random forest (RF), and XGBoost. For RF and XGBoost, hyperparameters were optimized via exhaustive grid search. Performance was evaluated using R², RMSE, and MAPE with repeated runs (500) for RF/XGBoost to obtain averages. To quantify the explanatory power of individual features, the best-performing technique (XGBoost) refit Model (4) repeatedly with one feature removed; changes in R², RMSE, and MAPE measured each feature’s contribution. Feature importance for XGBoost Model (4) was computed using normalized impurity reductions (0–1 scale). All modeling was conducted in Python 3.10.

Key Findings
  • Incorporating energy-related features substantially improved model performance. Across techniques, moving from Model (1) HI+WU to Model (4) HI+WU+EU+EC increased average R² from 0.33 to 0.45 (+34.0%), reduced RMSE by 8.8%, and reduced MAPE by 8.7%.
  • Technique comparison: Machine learning outperformed OLS; XGBoost performed best overall. For Model (4): OLS R²=0.32; RF R²=0.52 (avg 0.50); XGBoost R²=0.55 (avg 0.52). Compared to OLS and RF, XGBoost increased average R² by 62.5% (0.32→0.52) and 4.0% (0.50→0.52), respectively; RMSE decreased by 16.4% (vs OLS) and 1.6% (vs RF); MAPE decreased by 19.4% (vs OLS) and 10.7% (vs RF).
  • Stepwise improvements with XGBoost: Model (2) HI+EU outperformed Model (1) HI+WU; adding EU to Model (1) to form Model (3) raised average R² by 12.2% (0.41→0.46), reduced RMSE by 4.8%, and reduced MAPE by 7.1%. Adding EC to Model (3) to form Model (4) further raised average R² by 13.0% (0.46→0.52), and reduced RMSE and MAPE by 5.1% and 3.8%, respectively.
  • Leave-one-feature-out explanatory power (XGBoost Model 4): Removing EC had the largest negative impact: ΔR² = −0.053 (~10.2% reduction), ΔRMSE = +1.786, ΔMAPE = +0.015, indicating EC’s indispensability. On average, removing one EU feature reduced R² by 0.008 vs 0.006 for WU, implying EU’s stronger explanatory power.
  • Feature importance (XGBoost Model 4): Family size was most important (~0.10), followed by housing location (~0.08). Cumulative EU importance (0.27) exceeded WU (0.26). Notable EU features included power_WH, duration_culinary, and duration_ac (each >0.05). Important WU features included frequency_laundry, frequency_bathing, and frequency_mopping (>0.05). EC was also important (~0.06).
  • Benchmarking: With similar sample sizes (≈1320±660), prior studies’ maximum R² was ~0.42 (avg ~0.33); this study’s XGBoost Model (4) achieved optimized R²=0.55 (avg 0.52), improving explained variance by at least 0.10 (≈23.8%).
Discussion

The findings confirm that integrating the water-energy nexus—via energy use (EU) and electricity consumption (EC) features—captures a substantial portion of variance previously unexplained by traditional water-use-only models. EC, in particular, exhibited the strongest individual explanatory contribution, while EU features consistently outperformed WU features, aligning with the reality that many end-uses (bathing, cooking, laundry) jointly consume water and energy. XGBoost’s superior performance underscores the presence of nonlinear relationships and interactions among HI, WU, EU, and EC features that are not well captured by linear OLS models. The improvement over prior studies with comparable sample sizes demonstrates the practical value of nexus-informed, data-driven approaches. These results support using energy-related proxies to represent residual variance in water models, improving accuracy for planning and demand management. Although transferable in concept, model retraining and hyperparameter tuning are necessary for other regions due to differences in demographics, behaviors, and water-energy coupling strength.

Conclusion

By explicitly incorporating water-energy nexus features—energy use behaviors/appliance characteristics and total electricity consumption—household water consumption models achieved markedly higher explanatory power. XGBoost emerged as the most suitable technique, highlighting nonlinear effects between features and water use. The approach provides a feasible, generalizable modeling basis to enhance municipal planning and conservation strategies and advances understanding of household-scale water-energy interactions. Future work should extend to longitudinal datasets to assess temporal trends, evaluate pandemic-related behavioral shifts, and investigate causal mechanisms through interventions and social experiments to design targeted demand-side measures.

Limitations
  • Cross-sectional data were used; absence of longitudinal series limits temporal inference.
  • COVID-19 likely altered water and electricity use behaviors, potentially affecting the strength and stability of the water-energy nexus and estimates of explanatory power.
  • The study emphasizes prediction rather than causal identification; future interventions and experimental designs are needed to establish causality.
  • Regional transferability requires retraining and hyperparameter re-optimization due to differing sociodemographics, behaviors, and climate/context.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny