logo
ResearchBunny Logo
Comparative genomics unveils extensive genomic variation between populations of *Listeria* species in natural and food-associated environments

Food Science and Technology

Comparative genomics unveils extensive genomic variation between populations of *Listeria* species in natural and food-associated environments

J. Liao, X. Guo, et al.

Explore groundbreaking research on the genomic variation of *Listeria* species in different environments! This study by notable authors like Jingqiu Liao and Martin Wiedmann uncovers the fascinating links between soil properties, climate, and bacterial genetics, shedding light on how these factors may limit transmission between natural and food-associated settings.

00:00
00:00
~3 min • Beginner • English
Introduction
The study investigates how genomic variation in Listeria species relates to distinct ecological contexts, specifically natural (soil) versus food-associated (agricultural water and produce processing facilities) environments. Bacterial genomes vary through gene gain/loss and recombination under environmental selection, yet for pathogens that inhabit both natural and human-associated settings (e.g., Listeria monocytogenes), comparative data across environments are limited. Listeria spp. are widespread in soils, waters, and food processing environments, with L. monocytogenes and L. ivanovii as facultative pathogens and other species (L. seeligeri, L. innocua, L. welshimeri) serving as indicators of conditions that may facilitate L. monocytogenes contamination. Food-associated environments impose diverse stresses (low pH/water activity, low temperature, high salinity, sanitizers, antimicrobials), to which Listeria responds via stress regulons and stress survival islets (SSI-1, SSI-2). Natural environments tend to be more stable, with different nutrient and physicochemical contexts. The central question is whether and how Listeria populations genetically diverge and adapt to these environments, what genomic features (core and accessory) are associated with source environments, and what abiotic and biotic factors drive such divergence. The study aims to leverage large-scale genomic datasets and environmental metadata to elucidate environment-associated genomic signatures, predict source of isolation using machine learning, and identify ecological drivers of genomic variation, thereby informing public health surveillance and understanding of transmission potential from natural to food-associated environments.
Literature Review
Prior work indicates that bacterial genomes diversify through gene gain/loss and recombination under selection and dispersal, enabling adaptation to varied ecological niches. For Listeria, stress responses include sigma-dependent regulons and stress survival islets SSI-1 (low pH/high salt) and SSI-2 (high pH/oxidative). Listeria spp. are saprotrophic in nature and widely found in soils, agricultural waters, and processing facilities. Existing studies have emphasized human-associated environments, with fewer intensive investigations in natural settings, limiting understanding of pathogen adaptation outside hosts. Transcriptomic studies have shown upregulation of cell envelope synthesis and stress response genes under biocide and heat shock in L. monocytogenes; plasmids in Listeria can carry metal resistance and other adaptive traits, with some sequence types (e.g., ST121, ST14) associated with specific SSIs. Environmental and landscape factors (soil pH, moisture, land cover, weather) influence survival and detection of Listeria and other pathogens in soils and waters. This study builds on these findings by integrating comparative genomics across environments with environmental and community (16S) data.
Methodology
Study design and isolates: - Total of 839 Listeria isolates: 449 from soil (natural environment) across the United States (2018), 115 from agricultural water (NY, AZ; 2017–2018), and 275 from produce processing facilities (2017–2018). Species included L. monocytogenes (LM) and non-pathogenic L. seeligeri (LS), L. innocua (LI), and L. welshimeri (LW). - Soil set: 177 LM (lineages I n=12, II n=39, III n=126), 98 LS, 33 LI, 141 LW. Agricultural water: 54 LS, 19 LI, 42 LW. Processing facilities: 176 LM (I n=75, II n=68, III n=33), 47 LS, 37 LI, 15 LW. Environmental and 16S data: - Environmental variables for soil sites included latitude/longitude, 17 soil properties, 4 climate variables, and 10 land-use variables. - 16S rRNA gene amplicon sequencing performed on 311 Listeria-positive and 311 negative soil samples (V4 region; Illumina MiSeq 2×250 bp). QIIME2 used for processing; OTUs identified per Supplementary Information. Diversity metrics (OTU richness, Shannon index) computed. Whole-genome sequencing and phylogenomics: - WGS data for soil and processing isolates previously reported; agricultural water isolates sequenced using established protocols from Liao et al. (DNA extraction, assembly, QC). - Core SNPs per species identified using kSNP3 v3.1.2. Maximum-likelihood phylogenies built with RAxML v8.2.12 (GTR+G with ascertainment bias correction; 500–1000 bootstraps as specified). Midpoint rooting applied. - L. monocytogenes cgMLST assignments via in-house pipeline and BIGSdb-Lm database. - Genome annotation and accessory gene identification followed Liao et al. 2021. Plasmid, stress survival islets, and virulence gene detection: - Plasmids detected using PlasmidFinder 2 (identity cutoff 0.6), grouped into families (e.g., Inc18) and rep groups (rep13, rep25, rep26, rep32, rep33, rep35, rep7a, repUS25, repUS43). For rep groups with >3 genomes (rep25, rep26, repUS25, repUS43), plasmid sequences were extracted, aligned (muscle 5.1), and ML gene trees built (RAxML, GTR+G, 1000 bootstraps). - SSI-1 (lmo0444–lmo0448) and SSI-2 (lin0464–lin0465) screened in all species. - LM virulence genes screened: LIPI-1 (prfA, plcA, hly, mpl, actA, plcB), LIPI-3 (llsAGHXBYDP), LIPI-4 (LM9005581_70009–LM9005581_70014), internalins (inlA, inlB, inlC, inlE, inlF, inlG, inlH, inlJ, inlK, inlI, inlP). - Detection by BLASTN (default settings) against BIGSdb-Lm references; gene status defined as putative functional (coverage >0.8 and no premature stop codon), putative non-functional (0.3 ≤ coverage <0.8 or premature stop), or absent (no hit or coverage <0.3). Prevalences calculated using putative functional only. Statistical analyses for source associations: - Within species/lineages, Fisher’s exact tests (with BH FDR correction) assessed associations between sources and presence of virulence genes (LM), plasmids, SSIs, and accessory genes. FDR-adjusted P<0.05 considered significant. - Phi correlation used to assess correlation between source-associated genes and Inc18 plasmids. - Enrichment of COG function categories among source-associated accessory genes assessed using a binomial model (details in Supplementary Information). Machine learning for source prediction (LM lineages): - Gradient boosting (LightGBM via scikit-learn) trained per lineage using presence/absence of cgMLST alleles as features; alleles present/absent in >99% of samples were removed. - Parameters: learning_rate=0.05, max_depth=10, num_leaves=80; others default. - Two-fold stratified cross-validation; performance assessed via auROC and auPR. Feature importance computed within LightGBM and averaged across folds. Abiotic variable associations with genomic similarity: - Partial Mantel tests (vegan in R; 999 permutations) between environmental variable dissimilarity and ANI similarity per species/lineage, controlling for geographic distance; BH FDR applied. - Significant variables (FDR<0.05) were used in random forest models (randomForest in R) to predict ANI; importance assessed via %IncMSE (and Inc Node Purity). Bacterial community co-occurrence: - Correlations (Phi) between presence of each Listeria taxon and 1,172 bacterial species from 16S data were computed. Species with r>0.2 or r<−0.2 included in network analyses (ggraph in R). Diversity compared between Listeria-positive and -negative samples using t-tests.
Key Findings
- Core genome-environment associations: - L. monocytogenes (LM) lineages I and II were significantly overrepresented in produce processing facility isolates, while lineage III predominated in soil (Fisher’s exact P<0.001). Environment-associated subclades were evident in lineages II and III (P<0.01). - Few closely related LM isolate pairs (<50 cgMLST allelic mismatches) between soil and processing facilities: 42 (lineage I), 5 (lineage II), 1 (lineage III). Many pairs involved produce facility isolates closely related to soil isolates (31 in lineage I, 5 in II, 1 in III). - For non-pathogenic species, phylogenies showed strong clustering by environment in L. innocua and L. welshimeri (P<0.001), and two subclades in L. seeligeri with soil overrepresented in subclade B (P<0.01). Few cross-environment close pairs based on core SNPs: 22 (LS), 9 (LI), 0 (LW). - Machine learning prediction of source (cgMLST features): - LM lineage II: auROC 0.89, auPR 0.85; lineage III: auROC 0.88, auPR 0.95; lineage I: auROC 0.59, auPR 0.18. Top features often encoded cell-surface/transport/motility proteins (PTS transporters, flagellar components, ABC transporters). - Accessory genome source associations: - LM: 902 accessory genes associated with soil vs processing; by lineage, 50 (I), 36 (II), 195 (III) were source-associated. Overrepresentation counts: soil (LM total 450; I 50; II 29; III 80) vs processing (LM total 452; I 0; II 7; III 115) at FDR<0.05. - LS: 6; LI: 357; LW: 306 source-associated accessory genes across soil, ag water, and processing. - Enrichment analyses highlighted COG categories for cell wall/membrane/envelope biogenesis (M) and carbohydrate transport/metabolism (G) among source-associated genes across taxa/lineages. - Virulence gene distributions in LM: - LIPI-1 present as putative functional in nearly all isolates; some actA truncations in 6/87 lineage I and 5/159 lineage III isolates. - Internalins largely present, but selected internalins (e.g., inlF, inlG) had lower prevalence, especially in lineage III. - LIPI-3 prevalence: lineage I 78.2%, II 6.5%, III 3.8. LIPI-4 prevalence: lineage I 43.7%, II 9.3%, III 43.4. Many functional internalins and LIPI genes were source-associated (FDR<0.05): inlA and LIPI-4 enriched in soil; inlC, inlE, inlF, inlG, inlH, inlI, inlP and LIPI-3 enriched in processing. - Plasmids: - Prevalence: LM 14.4% (51/353), LS 3.5% (7/199), LI 33.7% (30/89), LW 19.2% (38/198). - Source associations (Fisher’s exact P<0.05): LM overall and LM lineages I and III had plasmids overrepresented in soil; in LM lineage I, plasmids were exclusive to soil. In LI, plasmids were overrepresented in processing facilities. - Inc18-family plasmids correlated with numerous source-associated accessory genes in LM (10 genes), LM lineage I (50), and LI (59) (Phi r>0.5), including replication and metal resistance functions. - Plasmid group distributions: repUS25 predominantly in soil (81% of 84), mixed across species and environments; repUS43 predominantly in food-associated environments (91% of 11), exclusive to LI; rep25 predominantly food-associated (97% of 29), mixing LI and LM lineage II; rep26 exclusive to processing, present across LW/LI and LM lineage II/LW clades. - Stress survival islets (SSI): - SSI-1 detected in 24.9% of LM; enriched in processing in LM and LM lineage III (FDR<0.05). SSI-1 frequent in LW (89.1%) but not source-associated. - SSI-2 detected in 50.1% of LM; enriched in soil for LM and LM lineage III (FDR<0.05). In LI, SSI-2 present in 82.0% and overrepresented in ag water and processing (FDR<0.05). No functional SSI-1 in LI; no functional SSI-2 in LW; none detected in LS. - Abiotic drivers of genomic similarity (partial Mantel with FDR correction): - LM: 11 soil variables (e.g., aluminum, organic matter, manganese), 3 climate (including precipitation), 4 land-use (e.g., grassland) correlated with ANI. - LM lineage II: soil variables only (total nitrogen, moisture, total carbon, potassium, sodium). - LS: 4 soil (e.g., pH, manganese), all 4 climate variables, and 3 land-use (e.g., grassland, pasture). - LI: 5 soil (e.g., sulfur, magnesium), 2 climate (max/min temperature), 2 land-use (wetland, shrubland). None significant for LM lineages I/III or LW. - Random forest importance: LM—precipitation and aluminum; LM lineage II—potassium and sodium; LS—proximity to forest and wind speed; LI—annual max/min temperature and proximity to wetland. - Biotic associations (soil bacterial communities): - Listeria-positive samples had higher OTU richness and Shannon diversity than negatives (t-test P<0.05). - Positive correlations (Phi r>0.2): 22 species with LM (many Proteobacteria), 14 with LS (notably Planctomycetes), 4 with LI, 30 with LW (many Actinobacteria). Negative correlations (r<−0.2): 12 with LM (mostly Actinobacteria), 1 with LS (Acidobacteria), 0 with LI, 8 with LW (half Proteobacteria). - These patterns suggest co-occurring taxa (Proteobacteria, Actinobacteria) as potential ecological interactors influencing Listeria adaptation. - Overall, core genomes and accessory elements (cell envelope and carbohydrate metabolism genes, plasmids, SSIs) show strong environment-associated patterns, and ML models can predict source at the lineage level for LM.
Discussion
The study demonstrates that Listeria populations exhibit pronounced genomic divergence between natural soils and food-associated environments, addressing the central question of environment-linked adaptation. Core genome phylogenies show environment-associated clustering across LM lineages and non-pathogenic species, implying long-standing ecological partitioning and limited frequent transmission between environments. Accessory genome analyses reveal that genes involved in cell envelope biogenesis and carbohydrate transport/metabolism are enriched among source-associated genes, consistent with selection pressures from environmental stresses (e.g., pH, salinity, sanitizers) and nutrient availability differences. The ML models leveraging cgMLST features confirm that core genome variation is predictive of isolation source at a fine taxonomic resolution, especially for LM lineages II and III, suggesting practical utility for source attribution in surveillance. Virulence loci and internalins show differential prevalence by environment, aligning with potential metabolic costs and differing functional relevance outside hosts. Plasmid patterns suggest habitat-dependent maintenance: LM (particularly lineages I and III) carrying more plasmids in soil, versus LI enriched in plasmids in food-associated settings, possibly due to beneficial metal resistance traits. SSIs exhibit complementary source associations (SSI-1 with processing in LM lineage III; SSI-2 with soil in LM and with food-associated settings in LI), supporting roles in coping with different stress regimes. Ecologically, combinations of soil chemistry (e.g., aluminum, potassium, sodium), climate (precipitation, temperature), and land use (forest, grassland, wetlands) correlate with genomic similarity, varying by taxon, indicating multifactorial abiotic drivers of diversification. Biotic community analyses highlight Proteobacteria and Actinobacteria as frequent positive/negative correlates, respectively, suggesting potential indirect facilitation or antagonism that could shape Listeria population structures. Collectively, these findings imply that both intrinsic genetic adaptations and extrinsic environmental and community factors create barriers that limit efficient cross-environment transmission, though sporadic transmission events (e.g., LM lineage I links between soil and facilities) do occur and are epidemiologically relevant.
Conclusion
This work provides strong evidence that Listeria species, including L. monocytogenes, have differentially adapted to natural and food-associated environments through both core genome diversification and accessory genome dynamics. Genes related to cell envelope biogenesis and carbohydrate transport/metabolism, along with plasmids and stress survival islets, are key intrinsic contributors to environment-associated genomic patterns. Machine learning models using core genome-derived features accurately predict sources of isolation at the lineage level for LM, supporting their application in source tracking of foodborne contamination and outbreaks. Additionally, combinations of soil properties, climate, land-use factors, and co-occurring bacterial taxa (notably Proteobacteria and Actinobacteria) emerge as potential extrinsic drivers of genomic diversification and ecological partitioning. These intrinsic and extrinsic factors together likely limit frequent transmission from natural to food-associated environments. Future research should include time-resolved sampling to pinpoint divergence timing, strain-level community analyses to move from correlation to causation in biotic interactions, and expansion to other natural and food-associated habitats to generalize drivers of adaptation.
Limitations
- The environmental community associations are correlative (based on 16S rRNA species-level OTUs) and do not establish causation; strain-level interactions and functional assays are needed. - Temporal sampling is limited; the timing and dynamics of divergence events between environments cannot be inferred without broader time-series data. - L. monocytogenes isolates were not available from agricultural water, limiting comparison across all environments for this species. - cgMLST was only applied to L. monocytogenes; SNP-based relatedness was used for other species due to scheme availability, which may affect cross-species comparability of relatedness thresholds. - Plasmid detection relied on in silico prediction (PlasmidFinder with identity cutoff 0.6) and may miss novel/low-identity plasmids or misclassify chromosomal elements. - Gene presence/absence assessment depends on assembly quality and BLAST coverage thresholds; genes split across contigs may be underdetected despite coverage thresholds accommodating contig breaks.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny