logo
Loading...
Hi-C metagenome sequencing reveals soil phage-host interactions

Biology

Hi-C metagenome sequencing reveals soil phage-host interactions

R. Wu, M. R. Davison, et al.

This study reveals the fascinating dynamics of phage-host relationships in soil through high-throughput Hi-C metagenomic sequencing. Researchers, including Ruonan Wu and Michelle R. Davison, uncover how soil drying impacts lysogenic infections and alters phage host ranges, ultimately influencing bacterial populations.... show more
Introduction

Viruses are highly abundant in soil and can regulate host population dynamics. Environmental moisture has been hypothesized to influence phage life strategies, with drier soils favoring lysogeny and wetter conditions favoring lytic cycles. However, identifying specific phage-host pairs in complex soil communities at the time of sampling is challenging, and most host assignments rely on indirect computational predictions that may miss current infections. This study applies Hi-C metagenomic sequencing to grassland soils sampled before and after a two-week desiccation incubation to directly capture contemporaneous phage-host interactions and to test how soil drying impacts phage lifestyle, host range, and consequences for host population dynamics and community structure.

Literature Review

Prior work estimated 10^7–10^10 viral-like particles per gram of soil and suggested moisture modulates phage lifestyles, with increased lysogeny under dry conditions and lytic activity when wet. Host prediction typically uses indirect genomic methods: CRISPR spacer matching, alignment-dependent protein family approaches (e.g., VPF-Class), and alignment-free genomic signatures (e.g., WISH, VHM, PHP). These approaches have expanded understanding of potential hosts but may not reflect active infections due to phage-host dynamics, rapid viral evolution, and persistence of historical CRISPR spacers. Hi-C proximity ligation has been used to link phages to hosts in the human gut but had not been applied to soil prior to this work.

Methodology

Experimental design: Grassland surface soils (0–20 cm) were collected near the Tall Wheatgrass Irrigation Field Trial (Prosser, WA; 46°15′04″N, 119°43′43″W) in June 2020. Soils were homogenized; water holding capacity (WHC) was determined. Sixty grams (dry-weight equivalent) were incubated in triplicate jars at 75% WHC with a pre-incubation temperature ramp to 30°C over 7 days. Pre-desiccation samples (n=3) were collected at 30°C. Remaining jars were held at 30°C without water additions until mass-stable complete desiccation at 14 days; post-desiccation samples (n=3) were collected. Samples were stored at −80°C.

Sequencing data generation: For each of the six samples, shotgun metagenomes (DNA) and bulk metatranscriptomes (RNA) were generated, and separate Hi-C metagenomes were produced on cross-linked DNA from extracted host cells. DNA was extracted from 0.25 g soil (ZYMO biomics kit). RNA was extracted from 2 g soil (Qiagen PowerSoil RNA kit), DNase-treated, cleaned, and QC’d. Shotgun libraries (ProxiMeta reagents) were sequenced on Illumina NovaSeq (avg ~177M PE150 pairs). Metatranscriptomes were sequenced at JGI on NovaSeq S4. For Hi-C, 5 g soil per replicate were processed by Phase Genomics low-biomass protocol: soil suspension, low-speed spin to remove sediment, 1% formaldehyde cross-linking, 1% glycine quench. Hi-C libraries (ProxiMeta Microbiome v4.0 kit) used Sau3A1 and MluCI digestion, proximity ligation with biotin, streptavidin bead capture, and NovaSeq sequencing.

Shotgun processing and viral detection: Reads were trimmed/filtered/normalized with fastp v0.20.1. Co-assembly across replicates used MEGAHIT v1.2.9 (meta-large). Contigs >1 kb were retained; viral screening considered contigs >5 kb. Viral contigs were identified using multiple tools with stringent criteria: VirSorter2 (min score 0.5), VIBRANT v1.2.1 (virus tag by NN), DeepVirFinder (score >0.9, p<0.05), and CheckV v0.7.0 for quality. A contig was viral if called by ≥2 tools or by one tool as complete/high-to-medium quality. vOTUs were defined at species level using 95% ANI and 85% alignment fraction via greedy centroid clustering; a proteomic tree was built with ViPTreeGen v1.1.2 (tBLASTx-based neighbor joining). Taxonomy was assigned by clustering against INPHARED and NCBI RefSeq (vConTACT2 v0.9.19) and by Demovir AA homology to TrEMBL; many remained unclassified, others annotated as Caudoviricetes.

Coverage and expression: Per-sample coverage depth/breadth for contigs was computed with BBMap v38.34; contigs with >50% breadth were considered detected; abundance was average depth normalized by assembled reads. Metatranscriptome reads were mapped to sample-matched detected viral contigs using BamM v1.7.3 (95% identity, 80% AF). Transcript levels were average base coverage normalized by total metatranscriptome reads; transcriptionally active richness was the count of vOTUs with mapped transcripts.

Hi-C processing and host binning: Hi-C reads underwent the same QC as shotgun, then were mapped to assemblies using BWA-MEM v0.7.17 (-SP). PCR duplicates were flagged with SAMBLASTER v0.1.24 and removed; alignments were filtered with samtools v1.9 (-F 2304). ProxiMeta deconvolution generated MAGs using a graph-based clustering of contig-contig Hi-C interactions (minimum mapping score 20; exclude non-unique/self links; contigs >1 kb). MAG quality was assessed with CheckM v1.2.0. Hi-C contact maps were visualized with bin3C v0.11. MAGs were quality-filtered (<9% contamination) and dereplicated at 99% ANI using dRep v3.4.0; taxonomy assigned via CheckM lineage-wf and GTDB-Tk v2.1.0. MAG abundance was average read depth normalized by length.

Phage-host linkage and VPH: Hi-C phage-host linkages were identified and filtered in two rounds. First, require ≥2 Hi-C read links per phage-host pair, phage-host connectivity ratio R≥0.1, and intra-MAG connectivity ≥10 links. R was computed using Hi-C connectivity densities for phage-host (DPH) and host self (DH) normalized by VPH. Second, an ROC-derived threshold maximized true links and minimized false positives; links with average counts <80% of the maximum for that viral contig were removed; phages with unusually broad interactions were further curated. Phage-host pairs were consolidated by grouping viral contigs within the same vOTU and host bins >99% identical (dRep). vOTUs were categorized as multiple-host, single-host, or no-host-detected. Average viral copies per host (VPH) were calculated using Hi-C link counts and abundance estimates of phage vOTUs and hosts.

CRISPR-based host prediction: CRISPRCasFinder v3.1.0 (-gscf -cas) retrieved MAG spacers; spacers were queried by BLAST (blastn-short) against viral contigs with filters: ≥95% identity, ≤1 mismatch, and one maximum target.

Co-occurrence networks: Using metatranscriptomes, the housekeeping arginine-tRNA ligase gene (argS) served as a proxy for MAG abundance. Networks were inferred with Pearson (|r|>0.8), CLR (top 200 edges, ≈95th percentile), and GENIE3 (top ~311 directional edges). Centrality (betweenness and degree) was computed in Cytoscape; central MAGs were cross-referenced to Hi-C phage hosts.

Statistics: Two-sided t-tests compared richness, relative abundance, and transcriptional metrics between pre- and post-desiccation treatments (n=3 each). Linear regression assessed VPH vs host abundance relationships (95% CI; significance by F-test with adjusted R² and p-value).

Key Findings
  • Detected 583 viral contigs clustered into 479 vOTUs from soil metagenomes; nearly half unclassified, remainder annotated as Caudoviricetes.
  • Hi-C identified 118 unique phage-host pairs and 148 unique host MAGs spanning nine bacterial phyla. Host-associated phages belonged to 19 of the 479 vOTUs, accounting for 5.3%–15.0% of total phage sequence abundance.
  • CRISPR spacer matching recalled 124 spacers yielding 121 unique phage-host links, but none overlapped with Hi-C-detected links, consistent with CRISPR reflecting historical rather than current infections.
  • Community shift with drying: Only 18.0% of vOTUs were shared between pre- and post-desiccation soils. Host-associated phage communities (Hi-C) were significantly impacted by soil drying (p<0.005).
  • Relative richness of host-associated vOTUs increased post-desiccation (p=0.02), while their relative abundance did not differ significantly (p=0.18).
  • Transcriptional profiles: A higher percentage of transcriptionally active vOTUs were host-associated after drying (p=0.009), but the fraction of total transcripts mapping to host-associated vOTUs decreased post-desiccation (p=0.06), suggesting reduced average transcriptional activity.
  • Multiple-host vOTUs (putative generalists) exhibited higher richness and abundance than single-host vOTUs, especially in pre-desiccation soils (richness p=0.02; abundance p=0.006). Post-desiccation differences were smaller (richness p=0.28; abundance p=0.06).
  • Infection networks differed by condition: Pre-desiccation, 5 host-associated vOTUs were observed (2 single-host linkages seen in one replicate each; 3 multi-host vOTUs (V1–V3) consistent across replicates). Post-desiccation, 14 vOTUs linked to hosts, often consistently across replicates (e.g., V8 infecting the same Actinobacterial host in all replicates). Only one host population (MAG B94, Alphaproteobacteria) was infected in both conditions by different vOTUs.
  • A post-desiccation Actinobacterial host (MAG B117) was targeted by six vOTUs (V4, V7–V10, V12). Overall, post-desiccation vOTUs were associated with fewer distinct host MAGs than pre-desiccation.
  • Phage hosts occupy central positions in bacterial co-occurrence networks. MAG B117’s two argS nodes ranked 2nd and 3rd by betweenness and 2nd and 4th by degree in the CLR network (64 nodes). MAGs B8 and B102 were top-ranked in GENIE3; B102 was also central in Pearson networks. Centrality was not explained solely by abundance.
  • Average viral copies per host (VPH) increased after soil drying overall (p=0.03) and for some taxa (e.g., Actinobacteria), consistent with more prevalent infections per host population under dry conditions.
  • VPH correlated negatively with host abundance pre-desiccation (slope −0.41, p<0.001), consistent with lytic infections reducing host populations. No significant relationship post-desiccation (slope 0.089, p=0.59), consistent with increased lysogeny.
  • Hi-C provided direct evidence of phage generalists: 15 unique viral contigs linked to multiple dereplicated MAGs; generalists were relatively richer and more abundant after drying, indicating competitive advantage under desiccation.
Discussion

Applying Hi-C metagenomics directly linked soil phages to their contemporaneous hosts, overcoming limitations of indirect host prediction and providing empirical evidence for phage-host interactions in situ. The combined sequencing and network analyses show that soil drying induces a shift toward lysogeny: increased VPH with reduced phage transcriptional activity post-desiccation suggests more lysogenic infections and broader host association among phages, while pre-desiccation negative VPH–host abundance correlations indicate active lytic cycles affecting host population sizes. Infection networks changed markedly between conditions, with minimal overlap in phage-host pairs, reflecting environmental control over host susceptibility and phage activity. Phage hosts were often central nodes in bacterial co-occurrence networks, implying that even limited infection of key taxa can substantially impact community interactions via processes such as viral shunt, lysogenic conversion, and auxiliary metabolic gene expression. Detection of phage generalists with multiple hosts, especially enriched under desiccation, suggests adaptive strategies to heterogeneous, fragmented soil habitats and supports the Piggyback-the-Winner hypothesis wherein lysogenic phages favor fitter hosts (e.g., drought-tolerant Actinobacteria). These results clarify how moisture regimes modulate phage lifestyles, host range, and community consequences, informing predictions of microbial ecosystem responses to climate change.

Conclusion

This study is the first to apply Hi-C metagenomics to soils to capture active phage-host interactions at sampling time. It reveals that soil drying shifts phage lifestyles toward lysogeny, increases average infections per host population (VPH), and restructures phage-host networks with limited overlap across moisture states. Phage hosts tend to be central community members, indicating that phage dynamics can disproportionately influence bacterial network structure and function. Hi-C also provided direct evidence for soil phage generalists, which were richer and more abundant after desiccation. Together, these findings improve mechanistic understanding of how changing moisture affects soil virosphere–microbiome interactions and offer a framework to predict ecological outcomes under climate change. Future research should deepen sequencing to increase overlap across methods (Hi-C and CRISPR-based predictions), optimize Hi-C for complex soils to improve host cell recovery, quantify fitness trade-offs of generalist versus specialist phages, and extend this approach to diverse soil systems and environmental gradients.

Limitations

Hi-C requires extraction of intact host cells with associated phage from soil, likely under-sampling the full diversity and biasing against hosts not efficiently recovered from the matrix; consequently, only 5.3%–15.0% of total phage sequence abundance was host-associated by Hi-C, far lower than in systems like the human gut. Many soil phages may be free particles not captured by Hi-C. High viral novelty limited taxonomic assignment. The study’s design cannot exclude temporal effects unrelated to moisture, nor account for potential changes in host susceptibility (e.g., outer membrane alterations) due to desiccation. Both Hi-C and CRISPR-based methods are subject to under-sampling given high soil viral diversity; limited sequencing depth likely reduced overlap between methods. Filtering thresholds and network inference choices may influence detected linkages and centrality, and n=3 replicates per condition limits statistical power for some comparisons.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny