logo
Loading...
Genomic network analysis of environmental and livestock F-type plasmid populations

Biology

Genomic network analysis of environmental and livestock F-type plasmid populations

W. Matlock, K. K. Chau, et al.

Explore how F-type plasmids, vital carriers of antimicrobial resistance genes, exhibit remarkable diversity influenced by environmental niches. This insightful research conducted by leading experts delves into plasmid network communities, revealing their unique genetic make-up and potential for rapid adaptation. Discover the findings that shed light on the persistent structures of these plasmids in various environments.... show more
Introduction

Environmental (non-clinical and non-human) populations of Enterobacterales may act as a genetic reservoir for antimicrobial resistance (AMR), including livestock- and water-borne resistance. Frequent horizontal gene transfer in Enterobacterales results in a large and open pangenome that facilitates widespread transmission of AMR genes, including between humans and the environment. However, evidence for such transmission is often context- and sequence type-specific, with broader patterns less conclusive. F-type plasmids are a diverse group of Enterobacterales-associated plasmids, replication of which depends on DNA gyrase, DnaB, DnaC, DnaG, single-strand binding, and DNA polymerase III proteins. They are strongly implicated in dissemination of ESBL genes (e.g., blaCTX-M-15) and carry a substantial fraction of plasmid-borne carbapenemases, as well as clinically important virulence and colicin genes. F-type plasmids are typically low copy-number and can be conjugative, and F-type replicons are common in multireplicon plasmids. Previous F-type plasmid studies have focused on clinically relevant, ESBL-encoding isolates and relatively small sample sizes. Here, hundreds of F-type plasmids from environmental Enterobacterales were analyzed from livestock (cattle, pig, sheep) and wastewater-associated waterways, sampled at three time-points in a region of South-Central England in 2017, generating a high-quality dataset (n=726 plasmids). Because plasmid evolution involves co-integration, recombination and insertion elements, phylogenetic trees alone are insufficient; networks based on sequence similarity can capture both vertical and horizontal evolution. Communities in such networks (densely intra-connected, sparsely inter-connected groups) provide a coarse-grained view of plasmid populations. Prior work often examined relationships between network features and plasmid classifications or relied on curated databases; it remained unclear whether similar community structures exist in large-scale natural populations and how to analyze shotgun datasets efficiently. The aim was to provide a scalable framework to relate plasmid sequence communities to metadata and to explore core-accessory genome structure within communities.

Literature Review
Methodology
  • Sampling and sequencing: Enterobacterales isolates were collected over three time-points in 2017 from South-Central England, UK, including 14 livestock farms (4 pig, 5 cattle, 5 sheep) and waterways (influent, effluent, and upstream/downstream rivers) surrounding five wastewater treatment works (WwTWs). Genomes were generated by hybrid assembly using Illumina 150 bp paired-end short reads and long reads (PacBio or Oxford Nanopore) from cultured isolates.
  • Plasmid recovery and characterization: From these assemblies, 726 circularized plasmids containing at least one F-type replicon were recovered. Host genera included Citrobacter, Enterobacter, Escherichia, and Klebsiella. Plasmid lengths ranged ~20–480 kb. Mobility was predicted as conjugative, mobilisable, or non-mobilisable. Replicon content was assessed, identifying 24 replicons in 52 combinations (replicon haplotypes); plasmids carried 1–5 replicons. Associations between plasmid length and replicon count were tested via one-way ANOVA with Tukey’s HSD post hoc testing. Relative AT-richness compared to host chromosomes was assessed, with differences between host genera evaluated by one-way ANOVA with Tukey’s HSD.
  • Sequence similarity and network construction: Pairwise Mash distances (k-mer based; range 0–1) were computed between plasmids; similarities (1 − Mash distance) were used as weighted edges in a plasmid network. Because the all-vs-all network was too dense for consistent Louvain performance, edges were thresholded (sparsified) by removing edges below a fixed Mash similarity threshold. Threshold optimization considered (i) number of detected communities (with minimum size constraints), (ii) proportion of plasmids recruited into communities, and (iii) kernel density estimates of edge weights stratified by sampling compartment. Network component evolution (largest connected component size and number of components) was examined across thresholds.
  • Community detection and validation: Communities were detected with the weighted Louvain algorithm, which optimizes modularity. To address stochasticity in community assignment due to local optima and random starts, statistics were averaged over 100 Louvain runs. A Mash similarity threshold of 0.95 was selected as it yielded the highest number of communities (13 with ≥10 plasmids) and >50% coverage, while preserving livestock structure with minimal WwTW structure loss. Community quality was validated by normalized mutual information (NMI) against MOB-cluster IDs and replicon haplotypes, and by inspecting dominant replicon haplotypes and MOB-clusters within communities.
  • Metadata association analyses: For each plasmid, metadata included sampling compartment (livestock type or WwTW association), host genus, and time-point. Two entropy-based measures were used: homogeneity (h) to assess whether communities contained similar labels, and completeness (c) to assess whether labels mapped to single communities. Both measures were computed and averaged over 100 Louvain runs, and evaluated at multiple resolutions (e.g., livestock vs WwTW; farm-level; WwTW influent/upstream vs effluent/downstream).
  • Additional analyses (as per abstract): Pangenome-style analyses were performed within communities to define core gene sets; plasmid phylogenies were built from alignments of core genes to assess links between accessory function and core content.
Key Findings
  • Dataset composition: 726 circularized F-type plasmids were recovered from 558 Enterobacterales hosts sampled across livestock and WwTW-associated environments. Host distribution: Citrobacter (n=53 C. freundii), Enterobacter (n=67; 65 E. cloacae, 2 untyped), Escherichia (n=471 E. coli), Klebsiella (n=135; 61 K. oxytoca, 67 K. pneumoniae, 7 untyped). Livestock plasmids predominantly originated from Escherichia (392/407), whereas WwTW plasmids were more uniformly distributed across genera.
  • Plasmid features: Lengths ~20–480 kb; predicted mobilities: conjugative 516/726, mobilisable 39/726, non-mobilisable 171/726; all conjugative plasmids >42 kb, consistent with ~33 kb F-type tra region. Identified 24 replicons across 52 haplotypes; many unique (22 haplotypes observed only once); most plasmids carried 2 (328/726) or 3 (258/726) replicons. Plasmid length increased with number of replicons (ANOVA F(4,721)=7.34, p=8.6e-6). All plasmids had ≥1 F-type replicon: FII (574), FIB (460), FIA (445). Non-F-type replicon II was common and always co-occurred with FII. Distinct co-occurrence patterns observed (e.g., U only with FIB; N only with FII), corroborating frequent F-type associations with II, X, R. F-type plasmids were AT-rich relative to hosts; relative AT-richness differed by host genus (ANOVA F(3,561)=111, p<2e-16), with Klebsiella plasmids being more AT-rich relative to their hosts than other genera.
  • Network and communities: Kernel density estimates of Mash similarities indicated livestock plasmids were more similar to each other (median similarity 0.85) than WwTW plasmids (median 0.74), suggesting higher diversity in WwTW. Using Mash similarity threshold 0.95 yielded 13 communities (≥10 members) and >50% plasmid coverage; largest connected component had 201 nodes with 182 components total and 99 singletons. Community validation showed strong correspondence with MOB-clusters (NMI=0.73) and replicon haplotypes (NMI=0.55), with communities dominated by specific replicon/MOB profiles.
  • Metadata associations: High homogeneity for livestock vs WwTW (h=0.713) indicated communities are largely distinct between these niches. Lower homogeneity when differentiating livestock types (h=0.592) and by individual farms (h=0.406), and low homogeneity by individual WwTWs (h=0.468). Homogeneity increased when contrasting influent/upstream vs effluent/downstream (h=0.553), suggesting some pre/post-treatment differences. Completeness scores were low for livestock vs WwTW (c=0.200) and changed little when stratified by individual WwTWs (c=0.238), reflecting high WwTW diversity and more uniform spread across communities.
  • Core-accessory relationships (from pangenome analyses): Communities exhibited distinct, largely non-overlapping core gene combinations; phylogenies built from core genes showed accessory functions closely linked to core content, supporting stable F-type backbone structures with variable accessory cargo.
Discussion

The study demonstrates that environmental and livestock F-type plasmid populations are structured by ecological niche: plasmid communities in sequence-similarity networks separate largely between livestock and WwTW environments, with additional, though weaker, structure across wastewater compartments (influent/upstream vs effluent/downstream). Livestock plasmids show greater within-group similarity than WwTW plasmids, consistent with lower diversity in livestock-associated contexts and higher heterogeneity in wastewater-associated environments. Replicon content and MOB profiles align closely with network-defined communities, validating the approach and indicating that these communities represent biologically meaningful subpopulations. Co-occurrence patterns among replicons and differences in relative AT-richness by host genus emphasize interactions between plasmids and their hosts that may contribute to niche adaptation. Pangenome-style analyses reveal that each plasmid community harbors a unique core backbone with little overlap across communities, and core-gene phylogenies correlate with accessory gene functions. This suggests that stable F-type backbone structures persist in environmental settings while accommodating diverse accessory gene repertoires, facilitating rapid adaptation to niche-specific selective pressures. The strong association of F-type plasmids with AMR genes may, therefore, reflect their suitability as vehicles for rapid niche adaptation and dissemination of resistance.

Conclusion

This work provides a scalable, tractable framework to analyze large plasmid populations using k-mer-based distance networks, community detection, and metadata association metrics. In a large, natural dataset of environmental and livestock F-type plasmids, niche strongly partitions plasmid diversity, with livestock and WwTW plasmids forming largely distinct communities. Communities correspond to characteristic replicon/MOB profiles and define unique core backbones, with accessory functions linked to core content, indicating persistent backbones with flexible cargo that likely aid niche adaptation and AMR dissemination. Future work could expand to broader geographies and timeframes, integrate more detailed functional annotations, and apply the framework to other plasmid types and metagenomic datasets to further elucidate plasmid ecology and transmission pathways.

Limitations
  • The plasmid network required edge-thresholding (sparsification) to achieve stable community detection; results may be sensitive to the chosen Mash similarity threshold (0.95).
  • Louvain community detection exhibits variability at community boundaries; metrics were averaged over 100 runs to mitigate stochastic effects, but boundary assignments remain uncertain.
  • The dataset is geographically and temporally bounded (South-Central England; three time-points in 2017), which may limit generalizability.
  • High overall diversity led to many singleton plasmids, potentially reducing power to detect structure in some subpopulations.
  • Analyses are based on cultured isolates and assembled plasmids, which may not capture uncultured or low-abundance plasmid diversity.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny