Engineering and Technology
Wastewater-based epidemiology predicts COVID-19-induced weekly new hospital admissions in over 150 USA counties
X. Li, H. Liu, et al.
COVID-19 surges have repeatedly strained U.S. healthcare systems, with hospital occupancy reaching up to 90% at peaks and still accounting for substantial bed use into early 2023. Accurate, reliable forecasts of hospital admissions at actionable geographic scales are essential for preparedness and resource allocation. Traditional forecasts often rely on reported cases or prior admissions at state/national scales, but post-emergency changes in testing behavior, availability, and reporting reduce case data reliability, and county-level hospitalization dynamics vary due to demographics and resources. Wastewater-based epidemiology (WBE) offers unbiased, community-wide infection surveillance at low cost and can capture infection trends earlier than clinical data. This study aims to assess whether county-level WBE can predict weekly new COVID-19 hospital admissions 1–4 weeks ahead, quantify the contribution of demographics (CCVI), vaccination, and weather, evaluate the value of periodic model updates, and test transferability to new counties and states.
Prior forecasting of COVID-19 hospitalizations has been predominantly at national or state levels, leveraging confirmed cases or historical admissions, sometimes via ensemble models. However, case data may miss asymptomatic or untested infections and can lag admissions. WBE has shown strong correlations between SARS-CoV-2 RNA in wastewater and cases across many studies. Limited studies linked wastewater signals to hospitalizations with short lead times (1–8 days) and in a small number of localities over short durations, reducing generalizability. A state-level study in Austria suggested wastewater could predict hospital occupancy with 8–18 days’ lead, but demographic and health vulnerability factors (e.g., vaccination, chronic conditions) were not included. There is a gap for large-scale, county-level, weekly hospitalization predictions incorporating population vulnerability and other covariates, aligned with healthcare planning cycles.
Data and scope: Weekly county-level datasets were assembled for June 2021–January 2023 across the USA. Model establishment used June 2021–May 2022 data from 99 counties in 40 states; evaluation/transferability used June 2022–January 2023 data including 99 original and 60 new counties (159 total) across 45 states. Targets: three hospitalization indicators at county-week level (patients/100k): (1) weekly new admissions; (2) census inpatient sum (total patients occupying inpatient beds during the week); (3) census inpatient average (daily average inpatients during the week). For each target, four lead times relative to wastewater sampling were defined: Hos1w–Hos4w (1st to 4th week ahead). Explanatory variables: Common covariates across model types included CCVI indices (overall and 7 themes: socioeconomic status; minority/language; household/transportation; epidemiological; healthcare system; high-risk environment; population density), county population size, vaccination coverage (first and second dose, Vaccine_1st and Vaccine_2nd, % of total population), weather (weekly mean air temperature Ta; derived wastewater temperature Tw from Ta; weekly precipitation). Model types: (a) WBE-based: wastewater SARS-CoV-2 RNA concentration (CRNA; normalized to PMMoV) and Tw plus common covariates; (b) case-based: weekly cases per 100k and test positivity plus common covariates; (c) record-based: the concurrent-week value of the hospitalization indicator plus common covariates. Data sources: Wastewater (Biobot Nationwide Wastewater Monitoring Network); hospitalizations (HealthData.gov, aggregated to county-week and scrubbed for redacted cells); CCVI (precisionforcovid.org/ccvi); vaccination (CDC); cases and positivity (USAFacts); weather (NOAA/EPA LCD). Modeling: Random forest regression models were built in R for each combination of indicator × lead time × model type (36 models). For establishment (June 2021–May 2022), data were randomly split into training (70%), validation (15%), and test (15%) sets. Performance metrics included correlation coefficient (R), mean absolute error (MAE), and normalized MAE (NMAE). Feature importance and significance were assessed via permutation (%MSE increase) with rfPermute (5-fold CV, 5 reps). Partial dependence (one- and two-factor) analyses were conducted to interpret relationships, especially interactions with CRNA. Model evaluation and comparison: Established models were applied prospectively to June 2022–January 2023 data to compare model types, indicators, and lead times using MAE/NMAE. Progressive learning: Selected WBE-based models for weekly new admissions (Hos1w–Hos4w) were progressively updated every 4 weeks from June 2022 to January 2023, rebuilding models with all data up to the prior week (80% train, 20% test at each update). Transferability: Progressive models trained on the 99 original counties were applied to 60 unseen counties in 30 states (including 5 new states). A second experiment incorporated these new-county data progressively (monthly) into the models to assess gains in transfer performance. Residual diagnostics (e.g., ACF) verified error properties. Code and data access are provided (Zenodo DOI; sources listed).
- WBE predicts weekly new admissions best among indicators: For prospective prediction (June 2022–January 2023), WBE-based models achieved R = 0.81–0.82, MAE = 3.30–3.84 patients/100k, NMAE = 0.32–0.37 across 1–4 week leads for weekly new admissions. In contrast, for census inpatient sum: R = 0.59–0.67, NMAE = 0.53–0.76; for census inpatient average: R = 0.66–0.69, NMAE = 0.51–0.65.
- Comparison with alternatives: For weekly new admissions, case-based models yielded R = 0.40–0.51, MAE ≈ 4.23–4.46, NMAE = 0.40–0.42; record-based models yielded R = 0.56–0.78, MAE ≈ 3.90–4.63, NMAE = 0.38–0.45. Thus, WBE outperformed both in accuracy and offered consistent 1–4 week lead times.
- County-level error profile: MAE generally increased with higher admission levels; most counties had NMAE within 0.2–0.4.
- Feature contributions: CRNA was the dominant predictor for weekly new admissions (permutation importance: 50–67% MSE increase; p≈0.01), with diminishing relative importance at 4 weeks. Vaccination coverage contributed substantially (Vaccine_2nd 21–28%, Vaccine_1st 19–24% MSE increase; p=0.01–0.10). CCVI themes contributed 10–25%, with household/transportation and population density gaining importance at longer leads. Weather variables had limited impact (Tw, Ta: 7–14%); precipitation negligible (1–2%).
- Partial dependence: Higher vaccination (especially Vaccine_2nd > 60%) reduced predicted admissions at a fixed CRNA. Higher vulnerability (overall VI, household/transportation, epidemiological, socioeconomic > 0.5) increased admissions at a fixed CRNA.
- Progressive updates improve accuracy: Transitioning from batch to progressive learning reduced MAE from ≈4 to ≈3 patients/100k and NMAE from 0.32–0.37 to 0.28–0.29 across all leads; residuals were near white noise; county MAE bands narrowed (≈1–12 patients/100k).
- Transferability: Applying progressive models to 60 unseen counties yielded average MAE 7–8 patients/100k and NMAE 0.43–0.48 for 1–4 week leads. Incorporating new-county data monthly improved performance to MAE 4–5 (NMAE 0.31–0.35) for 1–3 week leads and MAE ≈6 (NMAE ≈0.45) at 4 weeks, without degrading performance on original counties (NMAE ≈0.27–0.28 for 1–3 week leads).
The study demonstrates that WBE can serve as a robust, county-level early warning system for weekly new COVID-19 hospital admissions, offering 1–4 week actionable lead times. Mechanistically, wastewater RNA reflects early infection and shedding dynamics, preceding clinical testing and hospital presentation, aligning with observed incubation, infectious periods, and median time to hospitalization. Consequently, WBE better captures upcoming new admissions than census occupancy metrics, which are influenced by variable lengths of stay and ongoing admissions from prior weeks. Incorporating population health (vaccination, epidemiological vulnerability) and transmission-related factors (population density, household/transportation) further refines predictions, with transmission context growing more important at longer horizons. Compared with case- or record-based models, WBE provides improved accuracy and timeliness, likely due to unbiased community coverage independent of testing behaviors and reduced lag relative to hospital records. Progressive updating significantly enhances performance and responsiveness to evolving immunity and variant landscapes, and models exhibit reasonable transferability that further improves when localized data are assimilated. These findings address the need for reliable, granular forecasts to support county-level healthcare preparedness and resource allocation.
This work establishes and validates wastewater-based random forest models that accurately forecast county-level weekly new COVID-19 hospital admissions 1–4 weeks ahead across the USA. WBE-based models outperform case- and record-based approaches for the target most relevant to operational planning (weekly new admissions), and their accuracy improves with progressive updates. Incorporating demographic vulnerability, vaccination, and minimal weather factors enhances performance and interpretability. The approach transfers reasonably to new counties and states and benefits from localized data assimilation. Practically, periodically updated WBE-informed models can provide early warning windows of approximately 5–28 days for healthcare systems. Future work should integrate time-weighted vaccination and prior infection histories to capture waning immunity, include timely variant/subvariant data, consider mobility-informed exposure metrics, and explore higher-resolution (e.g., daily) predictions where appropriate.
- Immunity data limitations: County-level booster coverage and timing, and infection-induced immunity were unavailable; immunity wanes over time and varies by variant, potentially affecting admissions under the same infection signal.
- Variant/subvariant effects: Lack of timely variant composition data (clinical/wastewater) prevented modeling variant-specific hospitalization risks during the study period.
- Retrospective design and population changes: Models trained on retrospective data may be sensitive to shifts in population structure (aging, relocation, seasonal movements).
- Mobility and catchment uncertainties: Despite normalization to PMMoV, population mobility and sewer catchment variability introduce noise not fully captured; mobility data were not available at required granularity.
- Indicator scope: Weekly aggregation aligns with resource planning but may limit applicability where daily forecasts are required; census occupancy metrics remain harder to predict due to variable lengths of stay and comorbidities.
- Generalizability and transfer: Initial transfer to new counties showed reduced accuracy, improved after localized data incorporation, indicating the need for periodic, region-specific updates.
Related Publications
Explore these studies to deepen your understanding of the subject.

