logo
ResearchBunny Logo
Evaluation of the Impact of Concentration and Extraction Methods on the Targeted Sequencing of Human Viruses from Wastewater

Environmental Studies and Forestry

Evaluation of the Impact of Concentration and Extraction Methods on the Targeted Sequencing of Human Viruses from Wastewater

M. Jiang, A. L. W. Wang, et al.

Sequencing human viruses in wastewater faces low-abundance challenges; this study compared four concentration/extraction methods (Innovaprep, Nanotrap, Promega, Solids) and found method-dependent virus profiles. Innovaprep ultrafiltration yielded the highest sequencing sensitivity and assembled near-complete genomes, while Promega and Nanotrap were more sensitive for SARS-CoV-2 by dPCR. Research was conducted by Authors present in <Authors> tag.... show more
Introduction

The study addresses how wastewater virus concentration and nucleic acid extraction methods influence the performance of probe-capture targeted sequencing of human viruses from wastewater. Wastewater-based epidemiology (WBE), widely deployed during COVID-19, has relied heavily on PCR for specific targets but sequencing offers simultaneous genome-level monitoring of many viruses. Amplicon approaches are limited for novel viruses, while untargeted metagenomics often yields very low proportions of human viral nucleic acids. Probe-capture enrichment, adapted from clinical applications, can relax sequence matching constraints relative to PCR and enrich divergent targets, but enrichment in wastewater can still be dominated by non-target (e.g., bacteriophage, plant viruses). Prior work suggests upstream processing steps affect viral sequence recovery and composition, yet direct comparisons for probe-capture panels are limited. This study’s purpose is to compare four commonly used concentration/extraction workflows—Innovaprep ultrafiltration following solids removal, Nanotrap affinity capture, Promega direct extraction, and pelleted solids extraction—holding input volume constant, and to evaluate impacts on enrichment sequencing using the Illumina Virus Surveillance Panel (VSP). The importance lies in aligning wet lab workflows and bioinformatic processing with surveillance goals to improve sensitivity, richness, and recovery of human viral genomes from wastewater.

Literature Review

Existing literature shows that: (1) PCR-based WBE for respiratory and enteric viruses is effective, and tiled amplicon panels (e.g., ARTIC for SARS-CoV-2, HAdV-F41, Swift panels) enable subtyping and tracking but struggle with novel viruses due to primer design complexity. (2) Untargeted deep sequencing provides comprehensive viral diversity snapshots but human viruses are a tiny fraction (≈0.011% of unique reads or ≈0.1% of assembled contigs). (3) Probe-capture enrichment from clinical research has been adapted to wastewater and can greatly increase viral read proportions (up to ~81% vs. untargeted), yet most recovered viral content can still be bacteriophages and plant viruses (>80%), highlighting limits in target enrichment given high background. (4) Pre-COVID wastewater virus sequencing relied on large-volume, time-intensive concentration methods (e.g., PEG precipitation, skim milk flocculation, ultracentrifugation, membrane filtration) with method-dependent profiles and limited recovery of enveloped viruses. (5) During COVID-19, streamlined, lower-volume methods (e.g., Innovaprep CP-Select, centrifugal ultrafiltration, Nanotrap beads, HA membranes, Promega Wizard Enviro, and solids extraction) were widely adopted for qPCR/dPCR and later for sequencing; success has varied. McCall et al. compared methods at different volumes and suggested direct extraction yields lower equivalent viral volumes than prefiltered samples; Spurbeck et al. indirectly compared five methods across locations and found Innovaprep gave highest recovery for untargeted RNA sequencing (mainly bacteriophages). (6) Few studies directly compare concentration/extraction impacts on probe-capture targeted sequencing, motivating this work to control input volume and compare method-induced biases on sequencing performance.

Methodology

Sample collection: 24-h composite influent wastewater was collected on March 1, April 19, and April 26, 2023 from the EBMUD WWTP (Alameda County, CA; ~700,000 served). Twelve 40 mL aliquots per date were prepared. Bovine coronavirus (BCoV) vaccine (Merck) was resuspended, diluted 10-fold, and 50 µL was spiked into each aliquot as a process control, incubated overnight at 4 °C. Concentration/extraction methods (each performed in triplicate per date; PBS negative controls included):

  • Innovaprep CP-Select ultrafiltration (IP): Add 400 µL of 5% Tween 20, invert mix, centrifuge 7000 g for 10 min; ultrafilter supernatant using CP-Select; elute (160–882 µL). Extract TNA from up to 200 µL eluate using Qiagen AllPrep PowerViral DNA/RNA kit (liquid protocol), elute 100 µL.
  • Nanotrap Microbiome A beads (NT): Add 115 µL ER2 and 600 µL Microbiome A particles, mix/incubate; magnetically separate; wash with 1 mL water; add 600 µL preheated PM1 + BME (AllPrep PowerViral), heat 95 °C 10 min; remove beads; proceed with AllPrep liquid protocol, elute 100 µL.
  • Promega Wizard Enviro TNA direct extraction (PMG): Add 0.5 mL protease to 40 mL sample, incubate 30 min; centrifuge 3000 g for 10 min; add binding buffers + isopropanol; pass through PureYield binding column; wash; elute in 1 mL nuclease-free water; further purify/concentrate on PureYield Minicolumn; final elution 100 µL.
  • Solids pelleting and extraction (Solids): Centrifuge 40 mL at 20,000 g for 10 min; extract from 0.25 g wet solids using AllPrep PowerViral solids protocol including 10 min bead beating; elute 100 µL. Quantification: DNA and RNA quantified by Qubit dsDNA HS and RNA HS assays. Aliquots stored at −20 °C for dPCR within 1 week and −80 °C for sequencing. dPCR: Performed on QIAcuity Four (Qiagen) using OneStep Advanced Probe Kit on 8.5k or 26k nanoplates. Positive controls: linearized plasmid (SARS-CoV-2) or gBlock dsDNA (BCoV). Valid partitions ~7920–8269 (8.5k) and 12,548–25,493 (26k). Analysis via QIAcuity Suite v1.1.3; operational LOD ≥3 positive partitions/well. Library preparation and targeted sequencing: DNA and RNA quality assessed by Fragment Analyzer and Bioanalyzer. Libraries prepared with Illumina RNA Prep with Enrichment kits, mixing DNA and RNA. For Apr 19/26 samples, RNA diluted to ≤100 ng/µL as needed (no dilution for IP/NT due to low RNA); Mar 1 samples used undiluted. Workflow: denaturation, first/second strand synthesis, BLT tagmentation, adapter addition, AMPure cleanup, indexing, library quantification. Enrichment via Illumina Virus Surveillance Panel (VSP) by pooling 200 ng/library from three biological replicates into hybridization reactions, bead capture, amplification, cleanup, quantification. Final pooled libraries sequenced on one lane Illumina NovaSeq 6000 SP 150PE. Bioinformatics: BBduk v39.01 for adapter/quality trimming; SeqKit v2.4.0 for deduplication and unique read counts; human reads removed by Bowtie2 v2.5.1 against GRCh38.p14 and CHM13v2.0; taxonomic classification by Centrifuge v1.0.4 and Recentrifuge using a decontaminated NCBI-nt (June 5, 2023), with MHL thresholds (Centrifuge 15, Recentrifuge 40). One outlier (PMG_426_2) excluded for failed enrichment. Viral reads extracted and compared via MASH v2.3; PCoA (sklearn), PERMANOVA (vegan, 999 permutations). SARS-CoV-2 reads (taxID 694009) mapped to 463 GISAID genomes (Jan–May 2023), filtered to <5 mismatches. Assemblies by metaSPAdes v3.15.5; VirSorter2 v2.2.4 to identify virus scaffolds; quality filters >1000 bp and >10× coverage; BLASTn to NCBI-nt virus database with identity >80%, alignment/query length >90%, e-value <1e-8; best hits retained; >70% alignment considered near-complete genomes. JC polyomavirus near-complete scaffolds curated and used for phylogenetics (MUSCLE, GBlocks, IQ-TREE, MEGA). Statistics: Shapiro–Wilk for normality; Kruskal–Wallis with post hoc Dunn’s test; significance p<0.05. Data availability: SRA SUB13892842, BioProject PRJNA1047067; processed data and code at https://github.com/mj2770/Wastewater-virus-surveillance.
Key Findings
  • Yield and nucleic acid quality: Solids extraction produced the highest total DNA and RNA yields; IP had significantly lower yields than Solids and PMG (IP vs Solids DNA p=3×10⁻⁷; IP vs PMG DNA p=0.02; IP vs Solids RNA p=3×10⁻⁷; IP vs PMG RNA p=0.004). All methods yielded higher RNA than DNA, but RNA:DNA ratios differed (Kruskal–Wallis p=0.002), from 2.0±0.7 (NT) to 4.3±1.6 (PMG). NT showed shorter RNA fragments and absent 16S/23S rRNA; PMG had the highest RNA integrity (RIN 6.4±1.0).
  • Sequencing throughput: Across 35 analyzed libraries (one excluded), 527 million reads were generated (15.05±4.37 million/sample). Deduplication reduced counts by >50%. IP retained fewer unique reads (3.3±1.3 million) than NT and Solids (IP vs NT p=0.005; IP vs Solids p=0.04), not clearly correlated with input nucleic acid concentrations due to dilution and library pooling effects.
  • Taxonomic composition: >40% of unique reads remained unclassified at the domain level. Classified reads were mostly bacterial. Viral read proportions (of unique reads) were significantly higher in IP (1.82±0.46%) than Solids (0.17±0.02%, p=8×10⁻⁷) and NT (p=0.004); PMG also exceeded Solids (1.06±0.18%, p=0.002). IP concentrated more RNA viruses and human/vertebrate-associated viruses (human viruses 0.64±0.27% of unique reads; IP vs NT p=0.002; IP vs Solids p=1×10⁻⁶). Ratios of bacterial:viral classified reads were lower for IP and PMG (25±14:1 and 38±24:1) than NT and Solids (66±12 and 241±83) (IP vs NT p=0.04; IP vs Solids p=0.001; PMG vs Solids p=0.0006).
  • Viral composition similarity: PCoA on MASH distances showed strong separation by method along PC1 (37.2% variance; PERMANOVA p=0.001). IP and PMG clustered together; NT and Solids were distinct, likely due to bacteriophage prevalence. Samples separated by date along PC2 (24.5%; p=0.001).
  • Human virus richness and detection: PMG and IP had higher species-level richness (total viruses: 241 and 176; human viruses: 20 and 26, respectively) than NT and Solids at a >10-read threshold. Releasing solid-associated viruses followed by solids removal (IP/PMG) did not reduce human virus richness relative to including solids. Consistently detected human viruses across methods included human polyomaviruses, mastadenoviruses, mamastrovirus 1, and norwalk virus. Several RNA viruses (e.g., severe acute respiratory syndrome-related coronavirus, sapporo virus, enteroviruses) were not detected in NT and Solids samples.
  • Genome recovery: Seven near-complete human virus genomes were assembled from IP samples, none from Solids, aligning with higher total and human virus read counts in IP (total virus reads 59,965±28,180; human virus reads 20,242±9,294) versus Solids (11,043±2,720 and 213±99). Multiple near-complete JC polyomavirus genomes (IP, PMG, NT) were recovered and subjected to phylogenetic analysis.
  • dPCR vs sequencing sensitivity: dPCR showed higher sensitivity than sequencing across methods (SARS-CoV-2 detected in 33/34 by dPCR vs 10/35 by sequencing; BCoV 34/34 by dPCR vs 17/35 by sequencing). Although IP and PMG often had higher dPCR concentrations and unique virus reads than NT and Solids, within-method dPCR concentrations did not consistently correlate with sequencing read counts, likely due to varying target-to-nontarget ratios affecting probe-capture efficiency.
  • Overall: Innovaprep ultrafiltration (with solids removal) provided the highest targeted sequencing sensitivity and richness and enabled near-complete genome assembly for several human viruses, whereas PMG and NT were more sensitive for SARS-CoV-2 by dPCR. Astroviruses and polyomaviruses were the most abundant human viruses; SARS-CoV-2 was rare across methods.
Discussion

The findings demonstrate that upstream concentration and extraction methods substantially affect the performance of probe-capture targeted sequencing for human viruses in wastewater. Methods that reduce non-target nucleic acids—particularly removing solids after releasing solid-associated viruses via surfactant or protease—improved the fraction of viral reads, increased recovery of RNA and human/vertebrate-associated viruses, enhanced species-level richness of human viruses, and enabled near-complete genome assembly (notably with Innovaprep). Conversely, methods retaining more solids (NT, Solids) yielded higher bacterial and bacteriophage content, lower human RNA virus detection, and poorer genome recovery. The divergence between dPCR and sequencing results underscores that absolute nucleic acid yield and target copy numbers (by dPCR) are not sufficient predictors of targeted sequencing success; rather, the ratio of target-to-background and the composition of non-target nucleic acids influence capture efficiency. Method-driven viral composition differences were larger than sampling date effects along the primary variance component, yet temporal shifts were evident across dates, indicating both methodological and temporal factors shape wastewater viromes. Collectively, these results support aligning wet lab workflows (e.g., solids management, surfactant/protease treatment) and bioinformatic stringency with specific surveillance goals—whether maximizing breadth of human viruses, assembling genomes, or focusing on specific targets like SARS-CoV-2. Panel selection also matters: broad panels capture diversity but may dilute signals for low-abundance targets, whereas narrower panels (e.g., RVOP) can increase sensitivity for specific viruses. Incorporating pre-library metrics (e.g., target:nontarget ratios via dPCR/Qubit, RNA integrity, fragment size distributions) may improve prediction of sequencing success and guide triage of samples for enrichment sequencing.

Conclusion

This study provides a controlled comparison showing that concentration/extraction approaches that reduce non-target material—specifically Innovaprep ultrafiltration and Promega direct extraction with solids removal—improve probe-capture targeted sequencing of human viruses from wastewater, increasing viral read fractions, human virus richness, and enabling assembly of near-complete genomes. Innovaprep achieved the highest sequencing sensitivity and richness, while Promega and Nanotrap showed higher dPCR sensitivity for SARS-CoV-2. These results emphasize that the choice of upstream method should be matched to downstream objectives (genome recovery vs. quantitation), and that absolute nucleic acid yield or target copies alone do not predict capture sequencing performance due to background effects. Future work should: (1) evaluate methods across multiple plants and seasons to assess generalizability under varying wastewater characteristics and viral prevalence; (2) compare additional extraction chemistries and include steps such as DNase treatment or rRNA depletion to reduce background; (3) optimize effective processed volumes and elution conditions; (4) develop and benchmark custom probe panels that balance breadth and sensitivity for emerging targets; and (5) systematically relate pre-library metrics (target:nontarget ratios, integrity, fragment sizes) to sequencing outcomes to establish predictive thresholds for successful enrichment sequencing.

Limitations
  • Sampling scope: Single wastewater treatment plant over two months, limiting spatial and seasonal generalizability.
  • Limited chemistries and panel: Two nucleic acid extraction kits and one probe-capture panel (Illumina VSP) were evaluated; findings may differ with other extraction chemistries and panels.
  • Effective volume constraints: For IP and Solids, the effective processed volumes were less than 40 mL due to extraction kit input limits (IP ~16.3±13 mL; Solids ~18.93±5.04 mL), potentially affecting comparative sensitivities.
  • Classification stringency and reference limitations: A substantial fraction of reads remained unclassified at domain level with chosen thresholds; alternative thresholds or databases could shift assignments and downstream analyses.
  • Outlier library: One PMG sample was excluded due to apparent failed enrichment during library preparation, reducing replication for that date/method.
  • dPCR vs sequencing comparability: Method performance for sequencing and dPCR differed; direct correlation between dPCR concentrations and sequencing read counts was not observed, complicating cross-platform sensitivity interpretations.
  • Panel design constraints: The proprietary composition of probe targets limits precise interpretation of species/strain capture biases; BCoV detection relied on cross-capture via related hCoV-OC43 probes rather than direct BCoV targeting.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny