logo
Loading...
Gut Microbiome Wellness Index 2 enhances health status prediction from gut microbiome taxonomic profiles

Medicine and Health

Gut Microbiome Wellness Index 2 enhances health status prediction from gut microbiome taxonomic profiles

D. Chang, V. K. Gupta, et al.

Discover the innovative Gut Microbiome Wellness Index 2 (GMWI2), which offers a revolutionary approach to assess gut health using microbial taxonomic profiles. This research, conducted by a team of experts including Daniel Chang and Vinod K. Gupta, demonstrates GMWI2's impressive accuracy in distinguishing between healthy and non-healthy individuals, establishing it as a vital tool for health evaluation.... show more
Introduction

The gut microbiome has been linked to numerous complex, chronic diseases, motivating the development of quantitative tools to assess health from microbial signatures. The original Gut Microbiome Wellness Index (GMWI; formerly GMHI) provided a disease-agnostic indicator using species-level abundances and α-diversity metrics, achieving balanced accuracies around 70% across pooled and validation datasets. However, it classified healthy samples more accurately than diseased samples, likely due to heterogeneity among diseases and the prevalence-based species selection with equal weighting. To address these limitations, the authors present GMWI2, which leverages multi-rank taxonomic features and Lasso-penalized logistic regression to learn variable feature importances from a substantially expanded pooled dataset (8069 metagenomes), aiming to improve balanced accuracy and generalizability for distinguishing healthy versus non-healthy gut microbiomes.

Literature Review

Prior studies have demonstrated strong associations between gut microbiome composition and inflammatory, metabolic, oncologic, neurologic, and autoimmune diseases. The original GMWI (GMHI) derived from 4347 metagenomes across 34 studies provided a disease-agnostic health indicator using a log-ratio of species abundances and α-diversity measures, achieving balanced accuracy of 69.7% in the pooled set and 73.7% in an external cohort. Subsequent research has used GMHI/GMWI to examine environmental and genetic/socioeconomic influences and to identify longevity-associated microbial signatures. Methodological advances in metagenomic profiling (e.g., MetaPhlAn3) and machine learning meta-analyses of large datasets suggest improved performance can be obtained by integrating features across taxonomic levels and by using penalized regression to handle high-dimensional, correlated microbiome features. The literature also underscores challenges with batch effects and the importance of inter-study validation for generalizable classifiers.

Methodology

Data sources and pooling: The authors searched PubMed and Google Scholar (through January 2022) for studies with publicly available adult human stool shotgun metagenomes and meta-data. From 73 candidate studies (12,967 samples), they applied inclusion/exclusion criteria to obtain 54 studies totaling 8069 samples (5547 healthy, 2522 non-healthy) across 12 phenotypes spanning 26 countries and six continents.

Definitions: Healthy subjects had no reported diseases and no abnormal BMI (excluding underweight, overweight, obese). Non-healthy subjects had a clinically diagnosed disease. Early-stage conditions (e.g., IGT, hypertension), rare/genetic disorders, non-colon cancers, newborn/infant/child studies, heavy drug/alcohol use, age >100, and longitudinal endpoints where disease developed were excluded.

Sequencing/platform and sample exclusion: Only Illumina platforms were included; non-Illumina (454, Ion Torrent, BGISEQ-500) excluded. Samples with <1M reads pre-QC, >90% unmapped reads post-profiling, >25% unknown taxa, or <100 identified taxa were removed. Studies with <20 remaining samples after filtering were excluded.

Processing and QC: Human reads were removed using Bowtie2 (GRCh38/hg38). Adapter detection via FastQC and trimming via Trimmomatic with specified parameters (ILLUMINACLIP, quality trimming, length cutoff 60 bp). Taxonomic profiling used MetaPhlAn3 (mpa_v30_CHOCOPhlAn_201901) with default parameters; unknown/unclassified clades removed. To address compositionality and simplify modeling, relative abundances were converted to binary presence/absence per sample using a threshold of 0.00001 relative abundance.

Exploratory analyses: PCA was conducted on presence/absence profiles. Bray–Curtis distances from relative abundances (phylum to species) supported PERMANOVA (adonis2) testing for health-status association (999 permutations), showing significant separation (Adonis R^2 = 1.2%, P=0.001).

Model: GMWI2 employs L1-penalized logistic regression (LIBLINEAR via scikit-learn v1.0.2) with class_weight='balanced' and random_state=42, trained on 3200 taxonomic features (clades across multiple ranks) using presence/absence vectors. Nested cross-validation within an inter-study validation (ISV) framework selected the regularization parameter C, with C=0.03 chosen consistently as optimal. The learned coefficient vector yielded 95 non-zero taxa (49 positive, 46 negative coefficients; remaining taxa had zero coefficients), spanning 1 class, 3 orders, 4 families, 19 genera, and 68 species, with coefficient magnitudes between -0.68 and 0.54.

Score definition and classification: For a sample x_test, GMWI2 score is the predicted log-odds (θ^T x_test), where positive indicates healthy likelihood and negative indicates non-healthy. A user-defined magnitude cutoff c implements a defer/reject option: classify as healthy if score > c, non-healthy if score < -c, and defer if within [-c, c].

Evaluation: Performance was assessed using balanced accuracy (mean of sensitivity in healthy and non-healthy classes) across: (i) training on the full pooled dataset, (ii) LOOCV, (iii) repeated 10-fold CV, and (iv) ISV (leave-one-study-out). ROC AUCs were computed for training, 10-fold CV, and ISV. External validation used 1140 independent samples from six datasets (five healthy cohorts and three disease cohorts: ankylosing spondylitis, Parkinson’s disease, pancreatic cancer). Longitudinal demonstrations included: (1) IBS patients before/after FMT (6 months), (2) short-term diet intervention (Vegan, Omnivore, Exclusive Enteral Nutrition), (3) antibiotic exposure and recovery (days 0, 4, 8, 42, 180), and (4) in vitro fecal fermentation with prebiotics (FOS, IN, GOS, XOS, 2FL) and controls.

Software and availability: A command-line tool is available via Bioconda (GMWI2), with source code, processed datasets, and reproducible notebooks at https://github.com/danielchang2002/GMWI2.

Key Findings

Model composition and separation: Lasso-penalized logistic regression selected 95 taxa with non-zero coefficients (49 positive, 46 negative), with coefficient magnitudes from -0.68 to 0.54. Healthy and non-healthy groups showed significant microbiome profile differences (PERMANOVA Adonis R^2 = 1.2%, P=0.001).

Training performance: On 8069 pooled samples, GMWI2 achieved balanced accuracy 79.9% (healthy correct 79.2%, non-healthy correct 80.6%). GMWI2 outperformed the original GMWI and species-level α-diversity indices in stratifying healthy vs non-healthy (Cliff’s delta d=0.75 for GMWI2 vs d=0.63 for GMWI). Healthy individuals had significantly higher GMWI2 than each of the 11 disease phenotypes.

Confidence and magnitude cutoffs: Increasing the absolute GMWI2 magnitude was associated with higher classification precision. Using cutoffs |score| ≥ 0.5 and ≥ 1.0 yielded balanced accuracies of 85.8% and 91.0%, retaining 78.9% (n=6364) and 58.4% (n=4712) of samples, respectively. In LOOCV and 10-fold CV, |score| ≥ 1.0 achieved balanced accuracy ~90% (90.4% and 90.2%, respectively), close to training performance.

Cross-validation and ISV: LOOCV and 10-fold CV balanced accuracies were 79.1% (healthy 78.6%, non-healthy 79.5%) and 79.0% (healthy 78.6%, non-healthy 79.3%), respectively. ISV average balanced accuracy across 54 held-out studies was 75.8%, improving to 86.9% for |score| > 1. AUCs were 0.88 (training), 0.87 (10-fold CV), and 0.84 (ISV), indicating robust generalization.

External validation: Across 1140 independent samples (494 healthy, 646 non-healthy), GMWI2 scores were significantly higher in healthy vs non-healthy (P=1.6×10^-43; Cliff’s delta d=0.48). Balanced accuracy was 72.1%, improving to 75.4% and 80.1% with cutoffs |score| ≥ 0.5 and ≥ 1.0, retaining 74.3% and 49.3% of samples. Per-cohort accuracies: healthy cohorts ranged from 28.1% to 96.3%; disease cohorts: pancreatic cancer 90.7% (39/43), Parkinson’s disease 81.2% (398/490), ankylosing spondylitis 80.5% (91/113). Notably, Parkinson’s disease was not included in training yet showed strong discrimination.

Longitudinal insights: (1) FMT in IBS: Only recipients reporting symptom relief showed significant GMWI2 increases at 6 months (P=0.039), whereas α-diversity measures gave mixed signals. (2) Diet intervention: Exclusive Enteral Nutrition (fiber-free) caused significant GMWI2 decreases by day 2 onward (P<0.05), while Vegan and Omnivore groups showed no significant change; α-diversity did not significantly change, suggesting GMWI2’s sensitivity to fiber removal. (3) Antibiotics: Despite partial recovery of Shannon index and richness by day 42–180, GMWI2 remained significantly lower even at day 180 (P<0.05), consistent with persistent dysbiosis. (4) In vitro prebiotics: FOS, inulin, and GOS significantly increased GMWI2 vs NS0 (baseline control), and these plus XOS increased GMWI2 vs NS24; α-diversity metrics decreased in all prebiotic groups compared to NS0, highlighting complementary information captured by GMWI2.

Discussion

GMWI2 directly addresses limitations of the original GMWI by using a penalized logistic regression framework with multi-rank taxonomic features and learned variable weights, improving balanced accuracy and reducing bias between healthy and diseased classifications. Its robust performance under LOOCV, 10-fold CV, and ISV, alongside strong AUCs, indicates generalizability across diverse cohorts and minimal susceptibility to batch effects. The magnitude-based cutoff offers a tunable confidence-reject mechanism, enabling users to balance accuracy with coverage and to defer uncertain predictions for alternative screening.

Beyond case-control discrimination, GMWI2 detects dynamic shifts in gut health in longitudinal contexts: improvement with effective FMT in IBS, rapid deterioration with fiber removal (EEN), prolonged adverse impacts after broad-spectrum antibiotics, and beneficial modulation with selected prebiotics in vitro. These findings suggest GMWI2 captures disease-agnostic dysbiosis signatures that are not fully reflected by traditional α-diversity metrics, expanding translational utility in monitoring, donor selection for FMT, and potentially informing dietary or therapeutic decisions. Overall, the work underscores the value of large-scale data sharing and standardized processing for building resilient microbiome-based predictive tools.

Conclusion

GMWI2 is a disease-agnostic, metagenome-based health status index that significantly improves classification of healthy vs non-healthy gut microbiomes over the original GMWI and α-diversity metrics. Trained on 8069 metagenomes across 54 studies with standardized processing and multi-rank features, it achieves ~80% balanced accuracy overall and ~90% for high-confidence samples, generalizing across studies, diseases, and external cohorts. Re-analyses of longitudinal datasets reveal clinically relevant sensitivity to interventions such as FMT, diet, antibiotics, and prebiotics. GMWI2 is released as an open-source command-line tool, facilitating broad adoption.

Future work should integrate richer microbiome features (growth rates, strain-level variation, functional profiles), expand training to underrepresented populations and additional diseases (e.g., neuropsychiatric), explore the impact of refined health/disease definitions, and embed GMWI2 into multi-omic decision frameworks with user-tailored reject thresholds for precision wellness monitoring and preventive health.

Limitations

GMWI2 reflects associations with health status (defined as presence/absence of clinical disease) and is not a causal or diagnostic measure, nor a replacement for clinical tests. The model currently omits strain-level resolution, growth dynamics, and functional (metabolic) features that could improve accuracy and interpretability. Despite standardized reprocessing, residual batch effects from collection, storage, and sequencing protocols may persist. The pooled dataset, while globally diverse, still underrepresents certain geographies and ethnicities; broader inclusion would enhance generalizability. Health/non-health definitions follow prior work and their subtle variations were not systematically assessed. Some known pathogens did not receive negative coefficients within the model, and pathogenicity is often strain-specific beyond the model’s scope. Unmeasured confounders (e.g., transit time, stool consistency) may influence taxonomic compositions. The confidence cutoff introduces a trade-off between accuracy and sample coverage, requiring user calibration to context.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny