logo
ResearchBunny Logo
SARS-CoV-2 genomic surveillance in wastewater as a model for monitoring evolution of endemic viruses

Medicine and Health

SARS-CoV-2 genomic surveillance in wastewater as a model for monitoring evolution of endemic viruses

M. Yousif, S. Rachida, et al.

This groundbreaking study unveils the potential of wastewater surveillance in tracking SARS-CoV-2 variants in South Africa. By leveraging wastewater sequencing, the researchers—Mukhlid Yousif, Said Rachida, and others—identified multiple viral lineages often missed in clinical settings, providing a proactive signal for future lineage transitions.

00:00
00:00
~3 min • Beginner • English
Introduction
Clinical surveillance for SARS-CoV-2 relies on testing and sequencing of samples from infected individuals, but it is limited by health-seeking behavior, access to testing, and clinician practices, typically capturing mainly symptomatic cases and varying across regions and time. SARS-CoV-2 is shed in stool and urine and is detectable in wastewater, enabling wastewater-based epidemiology to overcome many of these limitations by sampling entire communities at lower cost. South Africa, a middle-income country with extensive sewerage infrastructure and over 1,000 wastewater treatment plants, established the South African Collaborative COVID-19 Environmental Surveillance System (SACCESS) to monitor SARS-CoV-2 in wastewater. The research question addressed is whether wastewater genomic surveillance can effectively identify and characterize SARS-CoV-2 variants and track their dynamics in South Africa, complementing clinical genomic surveillance, particularly as testing frequency declines. The study’s purpose is to assess concordance with clinical data, detect lineages missed clinically, and evaluate mutation patterns as early indicators of lineage transitions.
Literature Review
Prior studies demonstrated that wastewater genomic surveillance can recover complete SARS-CoV-2 genomes, mirror clinical lineage dynamics, and sometimes detect novel mutations and lineages before clinical identification. Real-time PCR and whole genome sequencing have been applied to wastewater for variant detection. Globally, over 70 countries monitor wastewater SARS-CoV-2 trends. In South Africa, the NGS-SA network has provided national genomic surveillance and first reported Beta and Omicron VOCs, while wastewater monitoring through SACCESS tracks viral trends. However, wastewater sequencing has been less widely applied in low- and middle-income countries, highlighting a gap this study addresses.
Methodology
Study design and sampling: 325 wastewater samples were collected from 15 wastewater treatment plants (WWTPs) in metropolitan areas of Gauteng, Eastern Cape, Western Cape, Free State, and KwaZulu-Natal, aligned with ongoing NICD wastewater quantification (April 2021 to January 2022). Ethics: Human research ethics waiver approved (University of the Witwatersrand HREC R14/49). Data deposition: Raw reads in NCBI BioProject PRJNA941107. Sequencing and inclusion criteria: Libraries were sequenced using established protocols (Galaxy pipeline employing BWA mem, samtools, iVar, LoFreq). Of 325 samples, 229 (70.5%) had >1 million reads and were used for mutational analyses and heatmaps; 183 (56.3%) achieved >50% genome coverage (10x depth) and were included in lineage deconvolution (Freyja) analyses. No minimum coverage threshold was required for the signature mutation analysis, enabling inclusion of lower coverage samples. Lineage deconvolution (Freyja): Samples with ≥50% genome coverage were analyzed using Freyja (v1.3.10), which leverages a barcode library of lineage-defining mutations and depth-weighted least absolute deviation regression to estimate relative lineage abundances in mixed wastewater samples. Clinical comparator data: The NGS-SA sequenced randomly selected SARS-CoV-2-positive clinical samples using Oxford Nanopore Midnight or Illumina COVIDseq; South African sequences (April 1, 2021 to January 31, 2022) with ≤5% Ns and pangolin lineages were downloaded from GISAID for comparison. Custom Python scripts summarized monthly lineage prevalence. Signature mutation analysis: Variant-defining spike protein amino acid mutations (signature mutations) were curated (Table S3) and used to infer variant presence from fragmented wastewater reads. Using ARTIC pipeline amino acid variation outputs, R (v4.2.0) scripts constructed matrices of mutation frequencies per sample and identified variants based on presence of signature mutations at >1% prevalence in GISAID. This enabled analysis of samples below the Freyja coverage threshold. Spike mutation profiling and visualization: A heatmap of spike amino acid mutations across samples (chronologically ordered) was generated to visualize temporal mutation patterns across spike regions (SP, NTD, RBD, SD1/SD2, FP, HR1). An in-house R script produced a mutational dot plot highlighting uncommon mutations. Identification of uncommon mutations: Uncommon mutations were defined as those with <1% prevalence in outbreak.info during the study period. Their occurrence over time and across sites was cataloged and compared with South African and global GISAID prevalence. Software and code: Exatype v4.2.0-dev20230731; Galaxy (BWA mem 0.7.17.1, samtools 1.9, iVar 1.2.2, LoFreq 2.1.5); Freyja v1.3.10; custom R/Python scripts (public GitHub/Zenodo links).
Key Findings
- Sequencing yield: 325 wastewater samples processed; 229 (70.5%) had >1M reads and were included in mutation/heatmap analyses; 183 (56.3%) had >50% genome coverage for Freyja lineage deconvolution. - Concordant variant waves: Wastewater variant dynamics mirrored clinical genomic surveillance nationally. Beta dominated early (April–May 2021), replaced by Delta from June 2021; Omicron emerged in November 2021 and rapidly dominated. Alpha was briefly detectable in June 2021. - Lineage resolution: Freyja identified AY.45 as the dominant Delta lineage; low-frequency detection of C.1.2 and B.1.1.X lineages leading up to Omicron. Cryptic circulation of A lineage viruses (e.g., A.25) and the Alpha–Delta recombinant XC (June 2021) were detected in wastewater but not reported in clinical data during the same period. - Omicron sublineages and recombinants: Rapid rise of BA.1 followed by displacement by BA.2 by January 2022. Substantial BA.3 and Omicron BA.1–BA.2 recombinants (XE, XAD, XAP) were observed in wastewater; BA.3 was rarely seen in clinical samples (only three clinical detections, all in December 2021). - Province-level dynamics: Gauteng showed early Beta dominance shifting to Delta and then Omicron; BA.2 was detected later in wastewater (January 2022). KZN exhibited early Delta dominance; Omicron became dominant in November 2021 with BA.2 co-dominant by January 2022; recombinants XE and XAP detected. Free State showed Beta then Delta; Omicron dominated clinically from November 2021; BA.2 not detected in wastewater. A lineage (A.25) circulation detected in Eastern Cape wastewater despite limited sampling. - Signature mutation analysis: Signature mutations were identified in 170/325 samples (52.3%): Gauteng (79), KZN (32), Free State (32), Western Cape (12), Eastern Cape (15). For focused analyses, 143 samples from Gauteng, KZN, and Free State were included, successfully recovering Beta→Delta→Omicron BA.1→BA.2 transitions; Alpha and C.1.2 signature mutations were detected in expected windows by province. - Spike mutation landscape: 411 spike amino acid mutations observed; 68 occurred at >1% prevalence. Transition from Delta to Omicron featured loss of NTD mutations (E156del, F157del, R158G) and appearance of RBD (e.g., G339D, S371L, N440K, S477N, E484A, Q493R, G496S, Q498R), FP (N764K, D796Y), and HR1 (Q954H, N969K, L981F) mutations. Periods of low case incidence showed lower spike coverage and fewer detected mutations. - Uncommon mutations: Ten spike mutations were detected at >1% prevalence in wastewater yet <1% in global clinical sequences on GISAID (e.g., S50L, H66Y, T250S, A288T, K444T, Q498H, D627H, L828F, T859N, Q1201K). Several were rare in South African clinical data as well. Some uncommon mutations have known functional associations (e.g., S50L reduced protein stability; Q498H increased ACE2 binding). These may represent cryptic lineages from unsampled infections or potential animal reservoirs. - Temporal early signals: Shifts in spike mutational profiles in wastewater anticipated lineage transitions (e.g., Delta→Omicron), suggesting wastewater mutation surveillance can provide early warning of emerging variants.
Discussion
Wastewater genomic surveillance provided a community-level, cost-effective complement to clinical genomic surveillance, capturing variant dynamics contemporaneously and detecting additional lineages and recombinant forms under-sampled clinically. The approach leveraged lineage deconvolution (Freyja) and signature mutation analysis to infer lineage prevalence even from fragmented, mixed viral RNA without relying on consensus sequences. Spike mutation profiling revealed characteristic mutational shifts corresponding to epidemiological waves and highlighted rare mutations and cryptic lineages potentially missed by patient-based sampling. These findings address the research question by demonstrating that wastewater genomics can reliably recover circulating variant prevalence, detect recombinants and rare mutations, and potentially flag emerging lineage transitions earlier than clinical data. This is especially relevant in periods of low clinical testing and in resource-limited settings, enhancing public health situational awareness and guiding response.
Conclusion
This study demonstrates that wastewater sequencing can effectively track SARS-CoV-2 variant dynamics, largely concordant with clinical genomic surveillance, while revealing additional lineages (e.g., A.25, XC, and Omicron recombinants) and uncommon spike mutations not readily observed in clinical datasets. By combining lineage deconvolution and signature mutation analysis, the method recovers variant prevalence from mixed, fragmented RNA and can provide early indicators of lineage transitions via mutational profile shifts. Wastewater genomics thus offers a valuable adjunct to clinical surveillance for monitoring the evolution and spread of SARS-CoV-2 and, by extension, other endemic viruses. Future work should refine sampling and concentration methods, improve primer schemes to accommodate emerging variants, standardize best practices (particularly for low- and middle-income contexts), and advance bioinformatics tools that reduce reliance on prior lineage definitions to enhance early detection of novel variants.
Limitations
- Matrix and inhibition: Wastewater contains inhibitors and highly fragmented RNA, complicating amplification and sequencing; methodological refinements are needed to mitigate inhibition and improve recovery. - Low prevalence settings: When community incidence is low, viral RNA concentrations can fall below detection limits, reducing amplification success and coverage, particularly between waves. - Primer mismatch: Emergence of new variants (e.g., Omicron BA.1/BA.2) can impair primer binding, leading to uneven or poor coverage, notably in spike. - Bioinformatics dependence: Tools like Freyja and signature mutation approaches rely on lineage definitions derived from clinical sequencing and public databases, potentially limiting detection of truly novel lineages. - Coverage variability: Some samples failed to amplify or did not meet coverage thresholds, restricting inclusion in certain analyses; provincial data gaps occurred (e.g., limited amplification in specific months/regions).
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny