
Medicine and Health
Identification of microbial markers across populations in early detection of colorectal cancer
Y. Wu, N. Jiao, et al.
This innovative research by Yuanqi Wu and colleagues explores the potential of specific microbial markers to revolutionize early colorectal cancer detection. Through an analysis of over a thousand fecal samples, they identified significant markers that could enhance diagnostic approaches, promising advancements in CRC treatment and prevention.
~3 min • Beginner • English
Introduction
The study addresses the challenge of identifying robust, stool-based microbial biomarkers for early detection of colorectal cancer at the precancerous adenoma stage. Although the gut microbiome is implicated in CRC initiation and progression and fecal microbial markers have been reported for CRC, reliable adenoma-specific markers with cross-population generalizability are lacking. Prior studies show inconsistent markers across cohorts due to biological variability and methodological differences. Early detection at the adenoma stage can dramatically improve outcomes and reduce healthcare burden. The authors propose an integrated, multi-cohort 16S rRNA gene-based analysis to discover reproducible adenoma-associated microbial markers, build diagnostic models, and validate them across populations, while also exploring functional microbiome alterations involved in adenoma and CRC.
Literature Review
Previous work has identified fecal microbial signatures for CRC, including species such as Fusobacterium nucleatum, but reproducibility across studies has been limited. Meta-analyses using whole-metagenome shotgun (WMS) data have effectively identified CRC markers across cohorts; however, adenoma-specific markers were rarely reported and demonstrated lower diagnostic accuracy. A WMS-based meta-analysis reported low AUCs for adenoma detection (0.54 vs controls; 0.69 vs CRC), potentially due to the dependence on reference genomes and limited taxonomic coverage in WMS profiling. Tissue-based microbiome markers have also been studied but are less practical for population screening. The fecal immunochemical test (FIT), commonly used for non-invasive screening, shows poor sensitivity for adenomas. Given these limitations, integrating multi-cohort 16S rRNA datasets may better capture community composition and yield robust, generalizable markers for adenoma detection.
Methodology
- Public data collection: Four studies with 16S rRNA V4 region sequencing data of controls, adenoma, and CRC with sufficient metadata were retrieved from SRA/ENA using SRA Toolkit. Two additional independent cohorts (from the US and China) were used for external validation. Five non-CRC disease cohorts (CD, UC, IBS, NAFLD, T2D) were used to assess marker specificity.
- Patient recruitment (new cohort for qRT-PCR): Participants (adenoma, CRC, controls) were recruited at Fudan University Shanghai Tumor Center with ethics approval and informed consent; exclusion criteria included recent antibiotics, hereditary syndromes, active infections, and major chronic diseases.
- 16S processing: QIIME2 (v2018.1) pipeline with DADA2 for quality filtering and chimera removal; taxonomy assigned via RDP against ribosomal databases; alignment with MAFFT; phylogeny with FastTree; ASVs present in at least three studies and in ≥20% of samples were retained.
- Confounder analysis: Variance partitioning considered disease status and confounders (age, BMI, antibiotics, NSAID, platform, race, sex, study). Given predominant study effects on composition, analyses adjusted for study as a blocking factor.
- Differential abundance: Blocked two-sided Wilcoxon rank-sum tests (R coin package) with study as block identified differentially abundant ASVs between groups. Generalized fold change metrics were computed to summarize effect sizes.
- Modeling: Random Forest (caret) with stratified 10-fold CV, 501 trees, mtry ~10% features. Features included differential ASVs, alpha diversity (Shannon, Simpson, observed ASVs), and metadata (sex, age, BMI as available). Feature importance guided selection of parsimonious marker panels. Performance metrics included AUC, accuracy, sensitivity, specificity, precision, and F1.
- Cross-study validation: Study-to-study transfer validation (train on one study, test on another) and leave-one-dataset-out (LODO) validation assessed generalizability. Additional comparisons evaluated models using all ASVs, only differentials, or the important features.
- Co-occurrence networks: SparCC inferred correlations among differential ASVs (significance by bootstrap, P<0.05; magnitude thresholds 0.1 or 0.2). Network modules identified with Cytoscape and MODE-based community detection.
- Functional inference: PICRUSt2 predicted pathway profiles; differential pathways identified between groups; contributions of ASVs to pathways quantified. Focus was on ADP-L-glycero-beta-D-manno-heptose (ADP-heptose) and menaquinone (MK-10) biosynthesis pathways.
- qRT-PCR validation: Targeted quantification of key genes in ADP-heptose biosynthesis (e.g., hldE, rfaD, gmhA) comparing control vs adenoma, and menaquinone biosynthesis genes (e.g., menH, menF, menC) comparing adenoma vs CRC using newly collected fecal samples; two-sided Wilcoxon/Welch tests used for comparisons.
- FIT comparison: Retrieved FIT data from prior study; built RF models using FIT alone, microbial markers alone, and combined to assess adenoma detection performance.
Key Findings
- Multi-cohort classifiers: A Random Forest model using 11 microbial markers discriminated adenoma from controls with AUC ≈ 0.80 in cross-validation. A model using 26 markers discriminated adenoma from CRC with AUC ≈ 0.89.
- Cross-study robustness: Study-to-study transfer validation for control vs adenoma yielded AUCs 0.52–0.81 (average ~0.64); LODO AUCs 0.63–0.93 (average ~0.76). For adenoma vs cancer, study-to-study AUCs 0.59–0.93 (average ~0.76); LODO AUCs 0.86–0.95 (average ~0.89). Control vs cancer models were also robust (average AUC ~0.83 study-to-study; ~0.79 LODO).
- Independent validation: Reconstructed RF models in two external cohorts achieved AUCs ~0.82 (adenoma vs control; accuracy 0.70, sensitivity 0.96, specificity 0.59, precision 0.71, F1 0.77) and ~0.84 (adenoma vs cancer; accuracy 0.79, sensitivity 0.79, specificity 0.80, precision 0.78, F1 0.72). Feature rankings were consistent with discovery models.
- Specificity vs other diseases: The adenoma marker panel showed lower AUCs in non-CRC diseases (CD, UC, IBS, NAFLD, T2D) than in adenoma detection, indicating specificity for adenoma.
- FIT comparison: FIT alone achieved AUC ~0.60 for adenoma vs control; microbial markers achieved ~0.78 on the same dataset; combining FIT with microbial markers improved AUC to ~0.81.
- Taxonomic insights: Dominant phyla across groups included Firmicutes and Bacteroidetes; adenoma-associated markers differed from CRC-associated markers, with no common ASVs between control-vs-adenoma and control-vs-CRC biomarker sets. Notable contributors included Veillonella parvula (adenoma) and Bacteroides dorei (CRC vs adenoma).
- Functional shifts: Adenoma showed increased ADP-L-glycero-beta-D-manno-heptose (ADP-heptose) biosynthesis (linked to LPS and NF-kB activation), with key genes hldE, rfaD, and gmhA enriched by PICRUSt2 and validated by qRT-PCR. CRC showed elevated menaquinone-10 (vitamin K2) biosynthesis, with increased abundance of genes such as menF, menE/menC, confirmed by qRT-PCR. ASVs contributing strongly included Veillonella parvula (ADP-heptose) and Bacteroides dorei (MK-10).
Discussion
The study demonstrates that stool-based, 16S rRNA-derived microbial signatures can robustly and specifically detect colorectal adenomas across diverse cohorts, addressing a critical need for early, non-invasive screening. By rigorously adjusting for confounders and study effects, and validating across studies and independent cohorts, the identified markers show generalizability beyond single-cohort findings. Compared with FIT, microbial markers provide superior sensitivity for adenomas and, when combined with FIT, further enhance diagnostic performance, suggesting a practical path for clinical implementation. Functionally, enrichment of ADP-heptose biosynthesis in adenoma implicates microbiome-driven pro-inflammatory signaling (NF-kB) early in neoplasia, whereas increased menaquinone-10 biosynthesis in CRC may reflect microbial adaptations or compensatory responses within the tumor microenvironment. These mechanistic inferences, supported by qRT-PCR validation, provide biologically plausible pathways linking microbiome alterations to colorectal tumorigenesis and point to potential intervention targets.
Conclusion
Across multiple populations and datasets, the authors identify and validate adenoma-specific microbial markers enabling non-invasive early detection of colorectal cancer precursors with high accuracy. The classifiers are robust to technical and geographic variation and outperform standard FIT, with further gains from combining both. Functional analyses highlight increased ADP-heptose biosynthesis in adenomas and elevated menaquinone-10 biosynthesis in CRC, suggesting mechanistic roles and therapeutic opportunities. Future work should include prospective and interventional studies, integration with additional omics, and larger, harmonized cohorts to refine marker panels and evaluate clinical implementation in screening programs.
Limitations
- No interventional or prospective study was conducted to establish causality or assess real-world screening performance.
- 16S rRNA profiling limits taxonomic resolution and may miss strain-level differences; functional inference via PICRUSt2 is predictive rather than directly measured.
- Heterogeneity across studies (geography, protocols) necessitated blocking by study; residual confounding may remain.
- Some validation cohorts lacked complete metadata (e.g., age, BMI), potentially affecting model adjustment and generalizability.
- Inconsistencies and varying sample sizes across contributing datasets may influence feature selection and performance estimates.
Related Publications
Explore these studies to deepen your understanding of the subject.