logo
ResearchBunny Logo
Faecal microbiome-based machine learning for multi-class disease diagnosis

Medicine and Health

Faecal microbiome-based machine learning for multi-class disease diagnosis

Q. Su, Q. Liu, et al.

This groundbreaking study by Qi Su and colleagues reveals how the systemic characterization of the human faecal microbiome can lead to innovative, non-invasive disease diagnostics. By leveraging metagenomic data from over 2,300 individuals, the machine-learning model they developed shows impressive predictive power across multiple diseases, showcasing the promise of microbiome-based solutions in clinical applications.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge that many human diseases share overlapping gut microbiome signatures, which can confound traditional single-disease (binary) classifiers and lead to misclassification. Recent work shows dysbiosis is implicated in diverse conditions, and prior microbiome diagnostic models largely used binary tasks. An earlier multi-class attempt relied on heterogeneous public datasets, introducing technical biases and batch effects. To overcome these limitations, the authors assembled the largest single-site, well-controlled cohort spanning nine phenotypes and investigated whether a faecal microbiome-based multi-class machine learning model can robustly and specifically diagnose multiple diseases simultaneously, thereby improving clinical applicability over binary approaches.
Literature Review
The authors reference evidence that dysbiosis contributes to numerous diseases and that many disease states exhibit shared microbiome responses, potentially confounding single-disease models. Prior diagnostic efforts predominantly used binary classifiers (e.g., CRC vs healthy). A previous attempt at multi-class classification suffered from heterogeneity and batch effects due to reliance on disparate public datasets. Meta-analyses indicate ecological indices (diversity, richness) are inconsistent markers of disease, and many microbial signals are shared across diseases. Previous CRC microbiome diagnostic models are noted for comparison. Collectively, the literature motivates a need for robust, multi-class, species-level models developed on harmonised datasets.
Methodology
Study design and cohort: 2,320 Han Chinese adults (mean age 54.9, 48.7% female) recruited at a single centre (Prince of Wales Hospital, Hong Kong) from Jan 2017–Mar 2022 across nine phenotypes: colorectal cancer (n=174), colorectal adenomas (n=168), Crohn’s disease (n=200), ulcerative colitis (n=147), IBS-D (n=145), obesity (BMI>28; n=148), cardiovascular disease risk (n=143), post-acute COVID-19 syndrome (PACS; n=302), and healthy controls (n=893). Inclusion/diagnostic criteria followed standard clinical guidelines; extensive exclusion criteria limited confounders (e.g., recent antibiotics/probiotics, chronic medications affecting microbiome, comorbidities). Diet was stable traditional Chinese; non-obesity groups had normal BMI (18.5–22.9). Ethics approval obtained (Joint CUHK-NTEC CREC). An additional 60 post-COVID subjects were prospectively followed and confirmed fully recovered without PACS for independent validation. Sample collection and processing: Home faecal collection into Norgen preservative tubes, delivered within 24 h, stored at −80°C. DNA extraction with QIAamp DNA Stool Mini Kit, libraries prepared via Illumina Nextera DNA Flex, sequenced on Illumina NextSeq 550 (150 bp paired-end). Positive controls included ZymoBIOMICS spike-in and community standards. Randomisation applied to processing steps. Metagenomic profiling: Raw reads filtered with Trimmomatic (adapter, Q<20, length <50 bp removed); human reads removed with Kneaddata (GRCh38.p12). Microbiome composition profiled using MetaPhlAn3 (v3.0.14) on quality-filtered forward reads; GNU parallel used to accelerate processing. Species with average abundance <0.15% and prevalence <5% were filtered. Alpha diversity (Shannon, Chao1) computed with phyloseq. Statistical microbiome analysis: Beta-diversity (Bray–Curtis) PCoA visualisation; PERMANOVA (vegan::adonis, 999 permutations) to test compositional differences. Associations between species and phenotypes modelled with MaAsLin2 (healthy as reference), adjusting for age, sex, and technical factors (library DNA concentration, read depth, sequencing batch). Multiple testing controlled by Benjamini–Hochberg FDR<0.05. Machine learning: Data split per phenotype into 70% training (n=1,724) and 30% test (n=696) maintaining class balance. Classifiers evaluated: Random Forest (RF), K-nearest neighbours (KNN), multi-layer perceptron (MLP), support vector machine (SVM), and a graph convolutional neural network (GCN). RF implemented via scikit-learn with n_estimators=2000 and class_weight=balanced; other models used default settings; GCN reconstructed per published architecture. Nested CV within training: 20-times repeated, fivefold stratified CV to select optimal models; final performance evaluated on the withheld test set; process repeated 20 times to obtain distributions of AUROC/AUPR. Feature importance derived from frequently selected, highly ranked species. Binary classifier experiments: For comparison, 36 binary sub-cohorts (all pairwise phenotype contrasts) trained with RF (sklearn) using ComBat batch correction (SVA) on relative abundances; 70/30 split with 20× repeated fivefold CV; evaluated AUROC on validation splits. External validation: Aggregated 1,597 publicly available shotgun faecal metagenomes from 12 studies across 11 countries (covering UC, CD, CVD, CRC, adenoma, IBS-D, obesity, and healthy) processed identically. An unrelated-disease dataset (liver cirrhosis n=38; IBS-C n=22) was also tested. Recovered COVID cohort (n=60) used to assess health classification post-recovery. Data and code: Raw metagenomes deposited (PRJNA841786). Public data accessioned from SRA per cited studies. Code available at https://github.com/qsu123/multi_class_diagnosis.
Key Findings
- Dataset: 2,320 individuals; 14.3 TB sequence; mean depth 6.15 Gb/metagenome; 1,208 species detected; 325 species retained after abundance/prevalence filtering. - Shared signatures: 1,061 significant associations between nine phenotypes and 215 species (FDR<0.05); >94% of species associated with two or more diseases. Examples: Klebsiella pneumoniae positively associated with CD, CRC, IBS-D, obesity, PACS, UC; Roseburia intestinalis negatively correlated with the same six disease phenotypes. - Binary models misclassification: Disease-vs-control RF binary classifiers showed high misdiagnosis rates when applied to unrelated diseases (average 0.52, IQR 0.41–0.65), indicating limited disease specificity. - Multi-class performance (internal test set): All models feasible (mean AUROC 0.67–0.99, IQR 0.81–0.92). RF outperformed others with mean AUROC 0.90–0.99 (IQR 0.91–0.94; one-vs-all). At thresholds by highest Youden’s index, sensitivities 0.81–0.95 (IQR 0.87–0.93), specificities 0.76–0.98 (IQR 0.83–0.95), accuracies 0.77–0.98 (IQR 0.82–0.92). Examples: CRC AUROC 0.94, sensitivity 0.88, specificity 0.85 (accuracy 0.85); IBS-D AUROC 0.99, sensitivity 0.94, specificity 0.98 (accuracy 0.98); PACS AUROC 0.98. - One-vs-one evaluations: Mean AUROC 0.94 (IQR 0.92–0.98), sensitivities IQR 0.88–0.95, specificities IQR 0.83–0.94. - Robustness: Stable performance across different train/test split ratios and age strata; AUROC increased with more features, arguing against overfitting. Size-balanced training (143 per phenotype, total n=1,287) yielded AUROC 0.83–0.99 (IQR 0.89–0.96), comparable to full-cohort results. - External validation: On 1,597 public samples across Asia/Europe/North America, RF achieved AUROC 0.69–0.91 (IQR 0.79–0.87), outperforming other models. - Recovered COVID cohort: 83.3% (50/60) classified as healthy, supporting microbiome normalisation post-recovery without PACS. - Unrelated diseases (liver cirrhosis and IBS-C, n=60): High rates of undetermined predictions (48/60) due to failing thresholds; low misclassification per phenotype (0–5%), indicating high specificity for the nine trained phenotypes. - Feature–phenotype associations: Top 50 species achieved AUROC 0.88–0.99 (IQR 0.90–0.93) internally and 0.67–0.90 (IQR 0.78–0.86) externally; 363 significant associations (FDR<0.05). Patterns included decreased Firmicutes/Actinobacteria and increased Bacteroidetes across most disease states. Disease-specific markers included Parvimonas micra enriched in CRC (vs healthy and adenoma), Bacteroides ovatus enriched in UC, Bacteroides vulgatus and B. xylanisolvens increased in PACS but decreased in CD; Actinomyces spp. increased in obesity; Collinsella spp. increased in IBS-D.
Discussion
Findings demonstrate that faecal microbiome-based multi-class machine learning enables accurate and specific discrimination among multiple diseases, addressing confounding from shared microbial signatures that limit binary models. The high-quality, harmonised single-site cohort and robust RF classifier produced strong internal performance and generalised across diverse external datasets. The model’s probabilistic outputs and high specificity suggest potential utility for non-invasive screening, differential diagnosis among gastrointestinal and systemic conditions, and monitoring treatment responses. The identified shared and disease-specific microbial signatures provide biological plausibility for predictions and may inform biomarker discovery and therapeutic stratification, including cross-disease strategies targeting common microbial shifts.
Conclusion
This study assembled the largest single-site faecal metagenome dataset spanning nine disease phenotypes and developed a species-level, multi-class RF classifier achieving high AUROC, sensitivity, and specificity internally and solid performance externally. The work establishes feasibility for microbiome-based, non-invasive multi-disease diagnostics and highlights stable, disease-specific microbial markers (e.g., Parvimonas micra for CRC, Bacteroides for UC, Actinomyces for obesity). Future work should expand the disease spectrum, validate multi-disease diagnosis in single patients, and elucidate causal mechanisms underlying microbiome–phenotype associations to enhance clinical translation and guide therapy response prediction.
Limitations
- Disease spectrum was limited to nine phenotypes; inclusion of additional conditions may improve utility and calibration. - Biological/mechanistic evidence for identified associations remains limited; causal links require further experimental validation. - The pooled public validation datasets lacked detailed data on comorbidities and antibiotic use, which may affect performance estimates. - While the model outputs simultaneous probabilities for multiple diseases, true multi-disease diagnosis in single patients was not validated and should be tested prospectively.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny