Biology
Experimental validation that human microbiome phages use alternative genetic coding
S. L. Peters, A. L. Borges, et al.
Bacteriophages shape microbial communities via predation, metabolic reprogramming of hosts, and turnover of biomass, yet their biology remains poorly understood due to methodological limitations. Fundamental questions persist regarding how phages interact with and redirect host translation systems during infection. Metagenomic studies suggest some phages can exploit the bacterial ribosome to translate their proteins using both the standard genetic code and alternative codes. In particular, certain phages are predicted to reassign the TAG stop codon to glutamine (genetic code 15) or to tryptophan, with such phenomena common in human and animal microbiomes and prevalent among phages infecting Firmicutes and Bacteroidetes. Misrecognition of stop codon reassignment complicates phage discovery and gene annotation, leading to truncated proteins, incorrect reading frames, and low apparent coding density. Although bioinformatic predictions exist, experimental validation of alternative coding in phages has been lacking. Here, leveraging LC-MS/MS-based metaproteomics of human fecal samples, the study aims to directly demonstrate TAG-to-glutamine recoding in crAss-like phages and to assess when during infection this alternative coding is expressed.
Prior bioinformatic work has inferred alternative genetic codes in bacteriophages, including reassignment of the TAG stop codon to glutamine (genetic code 15) and to tryptophan in others. Such recoding appears widespread in human and animal microbiomes and is especially frequent in phages of Firmicutes and Bacteroidetes. Failure to account for these alternative codes leads to fragmented gene predictions, incorrect reading frames, and misannotation of proteins. While proteomic confirmation of stop codon reassignment has largely been demonstrated in bacteria, experimental validation in phages had not previously been reported. Alternative code usage can be inferred computationally by restored open reading frames and conserved amino acid alignments in homologous proteins, and coding density patterns have been proposed as indicators of code usage. However, direct proteomic evidence for genetic code 15 in phages was missing prior to this study.
Human adult and infant stool samples were collected with informed consent under an IRB-approved protocol (University of Pittsburgh). Short-read shotgun metagenomic sequencing (NCBI BioProject: PRJNA696896) identified one adult and one infant sample with abundant alternatively coded crAss-like phages; notably, phage L12,026.000M1_scaffold35 showed 659× coverage. For proteomics, 100 mg of stool was homogenized, clarified, filtered (0.8 µm then 300 kDa MWCO), and both filtrate and retained biomass were processed. Proteins were reduced/alkylated (15 mM IAA), captured by protein aggregation capture (PAC) using magnetic carboxylate beads with acetonitrile-induced aggregation, washed, eluted, quantified (A205), reaggregated, washed, and digested with trypsin (1:75 w/v) overnight plus 4 h at 37°C. Tryptic peptides were filtered (10 kDa MWCO). LC-MS/MS was performed on a Vanquish UPLC coupled to a Q Exactive Plus with a trapping column and nanospray emitter packed with C18 resin. Peptides were loaded, desalted (10 min 100% solvent A), separated by a gradient to 30% solvent B, and measured using DDA (m/z 300–1500, MS1 resolution 70k, MS/MS resolution 15k, loop count 20, isolation window 1.8 m/z, charge states 2–6). MS/MS spectra were searched in PEAKS Studio 10.6 using de novo-assisted database searching against a composite database: metagenome-annotated proteomes (excluding target phage contigs), phage proteomes predicted under genetic code 15 and standard code, the human UniProt reference, common contaminants, and decoys. Secondary databases extended upstream of genetic code 15 start codons to test upstream translation. Mass tolerances were ±10 ppm (precursor) and ±0.02 Da (fragment). PSMs required tryptic/semi-specific peptides with up to three missed cleavages. Fixed Cys carboxymethylation (+57.02) and variable Met oxidation (+15.99) were allowed (max three variable modifications). A 1% FDR was applied at peptide and protein levels, with at least one unique peptide per protein. De novo sequence tags were used as complementary evidence where residue-level confidence exceeded 90%, especially around reassigned TAG-to-Gln positions. Final peptide lists required database hits passing FDR and manual validation linking de novo tags to database sequences. Analyses were performed on both phage-enriched fractions (viral-like particles) and unenriched fractions to probe temporal/early versus late protein expression.
- LC-MS/MS of phage-enriched fractions identified 173 phage-specific peptides at <1% peptide-level FDR, mapping to 16 phage proteins in the infant sample and 14 in the adult sample.
- Roughly half of identified phage peptides uniquely mapped to proteins predicted only when using genetic code 15, indicating widespread recoding-dependent protein expression.
- Several identified proteins predicted under code 15 were annotated as structural (capsid, portal, tail-associated), consistent with late infection expression enriched by the viral-like particle preparation.
- Across all identified phage proteins, 67% of genetic code 15 predictions could be confidently validated versus only 34% for standard genetic code 11 predictions, underscoring misannotation risks when using the wrong code.
- A newly discovered tail fiber protein (infant sample; 13_O63_3206) showed extensive sequence coverage; 11 peptides were exclusively identified under code 15, directly confirming TAG stop codon reassignment to glutamine within internal peptide positions.
- High-quality MS/MS spectra demonstrated peptides containing glutamines derived from reassigned TAG codons positioned within tryptic peptides, with near-complete y-ion series and corroborating de novo sequence tags (>90% residue confidence).
- No peptide evidence of TAG recoding was observed in genomic regions predicted to use standard code 11 (based on similar coding densities between codes), supporting region-specific code usage.
- Genome maps showed that code 11 predictions produce fragmented genes and low coding density, while code 15 restores open reading frames; detected peptides, including those with reassigned TAG-to-Gln residues, were localized accordingly. Suppressor tRNAs were predicted to enable readthrough at reassigned TAG codons.
The study provides direct proteomic evidence that certain human microbiome crAss-like phages translate TAG stop codons as glutamine within protein-coding regions, confirming expression of genetic code 15. The predominance of alternatively coded structural proteins (capsid, portal, tail) supports a model wherein alternative coding is deployed late in infection to regulate the temporal expression of structural and lytic genes, preventing premature synthesis before genome replication and assembly. The marked disparity in confident annotation rates (67% for code 15 versus 34% for code 11) highlights the importance of correct code assignment for accurate gene prediction and functional inference. The absence of recoding evidence in regions predicted to use standard code 11 aligns with a mosaic, region-specific deployment of genetic codes, and validates the use of coding density and ORF restoration as practical indicators of code usage. These findings not only resolve a long-standing prediction from metagenomics but also provide a framework for integrating metaproteomics to refine genome annotations and to map the temporal regulation of alternative coding during the phage life cycle. Broader implications include facilitating improved phage genome assembly/annotation and enabling applications in synthetic biology and phage engineering where nonstandard codon usage can be harnessed.
This work experimentally validates the use of genetic code 15 (TAG reassigned to glutamine) in two crAss-like phages from the human gut microbiome using LC-MS/MS-based metaproteomics. The results demonstrate region-specific deployment of alternative coding, prominently in late-stage structural proteins, and show that employing the correct genetic code restores open reading frames and improves annotation accuracy. The study establishes a generalizable proteomic framework for validating alternative genetic coding in phages and suggests avenues to extend this approach to other recoding events (e.g., alternative start codons, selenocysteine, pyrrolysine). Future research should map the temporal dynamics of code switching across infection stages, broaden sampling across hosts and phage taxa, and integrate proteogenomic pipelines into standard phage genome annotation workflows.
Related Publications
Explore these studies to deepen your understanding of the subject.

