Food Science and Technology
Monitoring the microbiome for food safety and quality using deep shotgun sequencing
K. L. Beck, N. Haiminen, et al.
The study investigates whether shifts in food microbiome composition and activity can serve as indicators of food safety and quality issues, including contamination or environmental changes. Traditional pathogen detection relies on culturing and whole-genome sequencing (WGS), which introduces biases and overlooks the broader microbial community interactions. Total DNA/RNA sequencing can characterize microbial niches in their native state; RNA in particular can indicate biological activity, minimize PCR bias, and improve detection sensitivity. The authors focus on raw food ingredients (high protein powders derived from poultry meal) and develop a pipeline tailored for food matrices to accurately profile the microbiome, quantify relative abundances, and explore whether microbiome shifts correlate with matrix composition, supplier/source, and pathogen culturability (notably Salmonella).
The paper situates its work within several domains: prior microbiome studies in food quality and fermentation (e.g., kefir, Maasdam cheese), agricultural contexts (grape, apple), and processing influences (Cheddar cheese). Regulatory bodies (FDA, CDC, USDA, EFSA) increasingly employ WGS for outbreak investigations, but culturing-based approaches are biased and may not reflect native microbial communities. The 100K Pathogen Genome Project expanded reference genomes crucial for outbreak investigation and microbiome studies. Total RNA sequencing offers advantages over DNA or 16S amplicon approaches by avoiding PCR bias and improving taxonomic resolution and reproducibility. However, food matrix-derived eukaryotic sequences can cause false microbial hits, necessitating matrix filtering. Prior work by Haiminen et al. demonstrated metagenomic food authentication from shotgun reads, motivating matrix-aware pipelines. Literature also notes limitations of single-gene markers and emphasizes compositional data analysis for microbiomes.
- Samples: 31 high protein powder (HPP) poultry meal ingredient samples collected from train cars in Reno, NV (April 2015–February 2016), from two suppliers (A and B) across four batches. Each HPP consisted of five sub-samples pooled. Stored in Trizol and at room temperature for the remainder.
- Sequencing: Deep total RNA sequencing (~300 million paired-end 150 bp reads per sample) on Illumina HiSeq 4000 (HiSeq 3000 for MFMB-04 and MFMB-17). Data available under BioProject PRJNA186441. For RNA vs DNA comparison, total DNA was also sequenced for some samples (MFMB-03/08) on HiSeq 2000 100 bp PE.
- Quality control: Adapter trimming with Trim Galore (min length 50 bp). PhiX contaminant removal via Kraken using a PhiX-only database.
- Matrix filtering: Custom Kraken database containing 31 common food/contaminant eukaryotic genomes; k-mer size 31; low-complexity and repeats included; k-mer reduction (Kraken-build max-db-size) to fit 188 GB; conservative Kraken score threshold 0.1 to avoid over-filtering. This filtered out eukaryotic matrix reads and estimated percent matrix content.
- In silico validation of matrix filtering: Constructed two simulated mixtures with high eukaryotic matrix (animal and plant) and low microbial reads (15K reads from 15 microbial species). Synthetic 150 bp PE reads generated with DWGSIM (e=0.005, d=500, r=0.001, R=0.15, X=0.3). Assessed true/false positives at genus level with and without matrix filtering.
- Microbial identification: Remaining reads classified with Kraken (k=31) against NCBI RefSeq Complete microbial genomes (bacteria, archaea, viruses, microbial eukaryotes; ~7800 genomes, April 2017), with low-complexity masking via Dustmasker. Kraken score threshold 0.05 to maximize F-score. Relative abundance computed as reads per million (RPM = Ry × 1,000,000 / Ro), with presence threshold RPM ≥ 0.1. Analyses conducted at genus level due to database limitations.
- Diversity and compositional analyses: Alpha diversity via rarefaction at 5M-read intervals using subsampling of counts scaled by Ro/Rp; median elbow via kneed. Beta diversity via Aitchison distances (after pseudo-count of 1) using robCompositions and hierarchical clustering (ward.D2). Pairwise Spearman correlations of genus RPM vectors across samples. Differential abundance between suppliers via two-sample t test with Benjamini–Hochberg FDR correction.
- Unclassified reads: GC% distributions compared among matrix, microbial, and unclassified fractions using FastQC and MultiQC.
- Pathogen-containing genera: Focused on 14 genera relevant to food safety (Aeromonas, Bacillus, Campylobacter, Clostridium, Corynebacterium, Cronobacter, Escherichia, Helicobacter, Listeria, Salmonella, Shigella, Staphylococcus, Vibrio, Yersinia). Assessed presence and RPM distributions across samples.
- Salmonella culturability vs sequencing: Culture testing per FDA BAM/AOAC-compliant qPCR confirmation applied to 27 samples (4 positive). Compared culture status with three sequencing-based measures: (1) Kraken genus-level classification to multi-microbe RefSeq Complete; (2) Bowtie2 very-sensitive-local alignments to an augmented Salmonella-only reference (264 RefSeq Complete Salmonella + 1183 additional genomes; total 1447); (3) Bowtie2 alignments to 4846 Salmonella ef-Tu gene sequences from the Functional Genomics Platform. Samples ranked by RPM; Wilcoxon rank-sum tests compared culture-positive vs negative rank distributions. Co-occurrence with other genera assessed using point-biserial correlation (r_pb) between culture status and genus RPMs.
- Matrix filtering accuracy: In silico validation showed specificity improved to >99.96% (from 78–93% without filtering) and removed ~90–99.8% of false positives, with zero false negatives at genus level.
- RNA vs DNA: Total RNA correlated strongly with DNA for genus quantification (R^2 = 0.93), detected higher alpha diversity, and yielded 2.4× more microbial-genus-assigned reads after depth normalization.
- Microbial richness: Per sample, 98–195 genera detected (avg 119) at RPM ≥ 0.1; 65 genera present in all samples accounted for 88–99% of total abundance. Most abundant core genera included Bacteroides, Clostridium, Lactococcus, Aeromonas, Citrobacter. Total of 229 genera observed across all samples.
- Sequencing depth: Rarefaction indicated median elbow at ~67 million reads; below this, microbial diversity was not saturated, suggesting deeper/selective sequencing would reveal more taxa.
- Unclassified reads: 2–4% (~5–14 million) of reads per sample remained unclassified; their GC% resembled microbial reads, suggesting missing/divergent references.
- Beta diversity and source: Aitchison distance clustering largely separated samples by supplier; matrix-contaminated samples (with pork/beef reads) formed a distinct subcluster. Pairwise Spearman correlations: mean within Supplier A 0.946; within Supplier B 0.816; between suppliers 0.805. Matrix-contaminated samples had lower correlation to Supplier A baseline (MFMB-04: 0.656; MFMB-20: 0.866; MFMB-38: 0.885), tracking with decreasing non-poultry matrix percentages (16.7%, 1.5%, 1.2%).
- Differential abundance: 55 genera differed significantly between suppliers (FDR < 0.01). Highly variable genera included Bacteroides (median 148.1 RPM, MAD 30.6), Clostridium (37.4, 24.2), Lactococcus (36.8, 18.2), Lactobacillus (24.2, 7.2), Pseudomonas (11.1, 12.2). Certain samples (MFMB-04, MFMB-20, MFMB-83) showed marked elevations for specific genera.
- Matrix contamination signal: Three samples had elevated pork/beef matrix (MFMB-04: 7.74% pork, 8.99% beef; MFMB-20: 0.53%, 1.00%; MFMB-38: 0.92%, 0.29%). These showed distinct microbiomes and increased Lactococcus, Lactobacillus, Streptococcus. MFMB-04 had 44 unique genera (e.g., Macrococcus 35.8 RPM; Psychrobacter 23.8; Brevibacterium 18.1); Paenalcaligenes was unique to MFMB-04/20.
- Pathogen-containing genera: Eight of fourteen examined genera (Aeromonas, Bacillus, Campylobacter, Clostridium, Corynebacterium, Escherichia, Salmonella, Staphylococcus) were detected in every HPP sample, indicating a baseline of reads attributed to potentially pathogenic genera by NGS.
- Salmonella culturability vs sequencing: Of 27 cultured samples, 4 were positive. Culture-positive status did not correlate with Kraken-based Salmonella relative abundance (Wilcoxon P=0.86). Aligning to an augmented Salmonella genome set increased mapped reads (~370× vs Kraken counts) and enriched culture-positive samples among higher ranks (P=0.06) but not perfectly. ef-Tu gene abundances did not distinguish culture status (P=0.56). Co-occurrence analysis showed positive correlations (r_pb > 0.5) with Salmonella presence for Erysipelothrix, Lactobacillus, Anaerococcus, Brachyspira, Jeotgalibaca, and negative for Gyrovirus (r_pb = −0.54).
The findings support the hypothesis that microbiome composition shifts can signal differences in ingredient composition and supply chain source. By implementing a food-specific eukaryotic matrix filtering step, the pipeline greatly reduced false-positive microbial calls and enabled robust genus-level characterization from total RNA. Core and variable microbiota profiles distinguished suppliers and identified samples with matrix contamination (pork/beef), demonstrating potential for origin tracking and contamination detection. However, while Salmonella reads were present across all samples, sequencing-based abundance measures did not reliably indicate culturability or viability, even when targeting replication-associated genes, underscoring the limitations of using sequencing alone to predict pathogen viability. Enhanced organism-specific reference genomes improved discriminatory power but did not yield full concordance with culture. The study emphasizes the need for expanded, diverse reference databases, multi-gene or genome-wide approaches, and integrated analyses (including culture and metadata) to interpret microbiome signals for food safety and quality. Monitoring unclassified read fractions and co-occurrence patterns may further enhance anomaly detection.
This work introduces and validates a metatranscriptomic pipeline tailored for food ingredients that incorporates eukaryotic matrix filtering, enabling accurate microbial community profiling in complex food matrices. Applied to 31 poultry meal HPP samples, the approach identified a core set of 65 genera and demonstrated microbiome-based discrimination by supplier and detection of matrix contamination. While total RNA sequencing provides a rich, robust description of food microbiomes and potential hazard indicators, it does not by itself predict pathogen viability, as shown for Salmonella. Future research should: expand and update microbial reference databases (especially for pathogens and spoilage organisms); integrate multi-omics with culture-based assessments; develop quantitative baselines and anomaly thresholds per food type; evaluate additional functional markers of viability; and analyze unclassified read dynamics to detect emerging or divergent microbes. The pipeline can be adapted to other food types and environmental or human microbiomes with appropriate modifications and indicators.
- Sequencing-based detection did not reliably indicate pathogen viability; Salmonella culturability did not correlate with RNA-based abundance or ef-Tu expression.
- Reference database incompleteness likely contributed to unclassified reads (2–4%) and potential misclassification; organism-specific references improved results but did not resolve concordance.
- Genus-level analysis was chosen due to database limitations; species/strain-level accuracy remains constrained.
- Single-gene markers (ef-Tu) were insufficient to mirror culture results, highlighting limitations of targeted approaches.
- Rarefaction indicated incomplete capture of diversity at lower depths; deeper or selective sequencing may be needed for full diversity characterization.
- Potential mapping inflation when aligning to organism-only references lacking competing genomes; co-mapping and ambiguity can bias counts.
Related Publications
Explore these studies to deepen your understanding of the subject.

