logo
Loading...
Health and disease markers correlate with gut microbiome composition across thousands of people

Medicine and Health

Health and disease markers correlate with gut microbiome composition across thousands of people

O. Manor, C. L. Dai, et al.

This groundbreaking study from Ohad Manor and colleagues unravels the complex relationships between gut microbiota and various host phenotypic features across thousands of individuals. With findings indicating significant variance in the gut microbiome and potential microbiome-targeted interventions, this research opens new doors for enhancing host health.... show more
Introduction

The human gut microbiome is implicated in the etiology of diverse diseases, including inflammatory bowel disease, type 2 diabetes, hypertension, and colorectal cancer. Individual blood markers such as those related to glycemic control and cholesterol have been linked to specific gut bacterial abundances. Lifestyle factors, particularly diet and physical activity, can substantially reshape gut microbiome composition, with robust evidence from animal studies and suggestive findings from small human cohorts. However, large-scale human studies integrating gut microbiome profiles with clinical blood phenotypes, diet, lifestyle, disease, and medication usage remain limited. This study investigates, at population scale, how gut microbiome diversity, taxa, and inferred functions associate with a wide array of host clinical markers and lifestyle behaviors. The goal is to identify robust, composition-specific associations that could inform targeted interventions and improve understanding of host–microbe relationships relevant to health and disease.

Literature Review

Prior research has associated gut microbiome variation with multiple diseases, including inflammatory bowel disease, type 2 diabetes, hypertension, and colorectal cancer. Specific blood biomarkers for diabetes and cholesterol have been reported to correlate with certain gut bacteria. Diet is a major determinant of microbiome composition, and physical activity has been shown to shift gut microbial communities in animal models, with preliminary evidence in humans. Previous analyses often used smaller cohorts or focused on limited data types, leaving a gap for comprehensive, large-scale studies that integrate microbiome features with clinical, dietary, and lifestyle measurements. The literature also highlights major axes of gut compositional variation such as the Firmicutes-to-Bacteroidetes balance and the non-overlapping distributions of Bacteroides and Prevotella, as well as prior associations between alpha diversity and health outcomes. The present work builds on these findings by systematically analyzing thousands of individuals with dense phenotyping to validate known associations and uncover new, composition-specific relationships.

Methodology

Study population: Baseline data were collected from 3,409 consenting participants in a commercial Scientific Wellness program (Arivale Inc., Seattle, WA). Participants completed lifestyle, stress, digestion, and diet questionnaires; clinical laboratory tests from blood were available for most participants. The cohort included 59% females, 84% self-reported European-American, mean age 49 ± 12 years, mean BMI 27 ± 6 kg/m². IRB approvals were obtained at Arivale and the Institute for Systems Biology. Sample collection and laboratory methods: Stool samples were profiled using 16S rRNA amplicon sequencing. DNA extraction used MoBio PowerSoil or equivalent workflows with stabilization, bead beating, and automated platforms. OTU picking was done using QIIME v1.9.1 with closed-reference against Greengenes 13_08. Samples were rarefied to 50,000 reads for diversity analyses with rarefaction curves indicating saturation. Functional inference: Microbiome functional capacity was inferred using PICRUSt2 to predict KEGG pathway profiles from OTU relative abundances. Predicted KO profiles were further processed to estimate average copy numbers and aggregated to pathways for downstream analyses. Alpha diversity and adjusted diversity: Multiple alpha diversity metrics were computed (e.g., Shannon, Pielou’s evenness, species richness, Faith’s PD). To isolate associations beyond dominant phylum effects, Bacteroidetes-adjusted diversity (BA-diversity) was defined as residuals from a cubic polynomial regression of Shannon diversity on Bacteroidetes relative abundance. Taxonomic ordination and clustering: edgePCA (a phylogeny-aware PCA) was applied to OTU-level counts without rarefaction, and additional ordinations included PCoA with weighted UniFrac and NMDS. Four compositional clusters along the Firmicutes–Bacteroidetes and Bacteroidetes–Prevotella axes were defined: (i) reference Firmicutes-average cluster, (ii) Firmicutes-rich, (iii) Bacteroidetes-rich, and (iv) Prevotella-rich, using thresholds on relative abundances to capture continuum positions. Host factors and assessments: Extensive self-reported host data were collected via validated instruments (e.g., Dietary Targets Monitor, Oxford Happiness Questionnaire, I-PIP, PAS-4) and in-house questionnaires on health history, lifestyle, and digestive health. Psychometric analyses indicated high internal consistency for these measures. Statistical analysis: Analyses were conducted in R. Associations between Shannon diversity and each host factor (lifestyle, diet, clinical tests, digestion) were tested using linear regression adjusted for age, sex, race, sampling season, and microbiome sequencing vendor. For genus- and pathway-level associations, generalized linear models were used appropriate to data distributions (e.g., logistic regression for rare genera, Poisson or linear models otherwise) with the same covariates; additional models adjusted for Shannon diversity to assess independence from overall alpha diversity. Multiple hypothesis testing was controlled using Benjamini–Hochberg FDR at α = 0.05. Cluster-specific analyses: Interaction models tested whether associations between host factors and diversity or taxa differed across the four compositional clusters, identifying cluster-specific relationships after FDR correction. Medication analyses: Participants self-reported usage of medication classes (e.g., cholesterol-lowering, antihypertensives, blood sugar–regulating). Models tested associations between medication use and taxa or pathways, adjusted for age, sex, race, sequencing vendor, and relevant clinical biomarkers (e.g., LDL for statins, blood pressure for antihypertensives, fasting glucose and insulin for blood sugar medications), with FDR control.

Key Findings
  • Gut microbiome composition showed wide variation: Firmicutes ranged from ~6% to ~100% and Bacteroidetes from ~0% to ~90%. Shannon diversity was strongly negatively correlated with Bacteroidetes abundance (r = −0.67, P < 1e−15), but the relationship was nonlinear; diversity peaked around ~15% Bacteroidetes and ~80% Firmicutes.
  • Other phyla correlations with diversity: Proteobacteria (r = −0.18, P < 1e−15), Fusobacteria (r = −0.13, P < 1e−13), TM7 (r = −0.10, P < 1e−15) were negatively correlated; Tenericutes (r = 0.28, P < 1e−15), Euryarchaeota (r = 0.19, P < 1e−15), Lentisphaerae (r = 0.17, P < 1e−15), and Cyanobacteria (r = 0.13, P < 1e−13) were positively correlated.
  • After adjusting diversity for Bacteroidetes (BA-diversity), Actinobacteria was most negatively correlated (r = −0.21, P < 1e−15), followed by Bifidobacterium (r = −0.19, P < 1e−15), revealing associations masked by dominant phylum effects.
  • Ordination revealed major axes of taxonomic variance: edgePCA PC1 (~54% variance) tracked the Firmicutes–Bacteroidetes continuum; PC2 (~17%) captured the Prevotella versus Bacteroides axis. Prevotella- and Bacteroides-rich states were largely non-overlapping; additional PCs involved clades within Clostridiales. Bifidobacterium was positively correlated with PC3 and inversely with Faecalibacterium, suggesting potential niche competition between these beneficial taxa.
  • Diversity associations with 148 host factors identified 75 significant relationships (FDR < 0.05). Known markers such as diabetes-related measures (e.g., fasting insulin), inflammation (hs-CRP, P < 1e−14), liver function (ALAT, P < 1e−9), and cholesterol (LDL, P < 1e−16) were associated with diversity. Omega-3s (e.g., DHA) and mercury (fish intake marker) were positively correlated (P ≤ 1e−13 to 1e−16). BMI, weight, and blood pressure were negatively correlated with diversity; height was positively correlated.
  • Physical activity showed robust, independent associations with diversity: both moderate and vigorous physical activity frequencies correlated positively (P < 1e−15 each). Associations persisted after adjusting for dietary factors (P < 1e−5 to 1e−15) and, with additional BMI adjustment, remained significant for vigorous activity (P = 0.04). Veillonella abundance was positively associated with vigorous activity (P < 1e−5) after multivariable adjustment.
  • Composition-specific effects: Compared to the reference cluster, there were 0, 14, and 14 unique associations within Firmicutes-rich, Bacteroidetes-rich, and Prevotella-rich clusters, respectively (FDR < 0.05). The diversity–insulin association was stronger in Bacteroidetes-rich individuals (p < 1e−6). Vegetable intake was more positively associated with diversity in the Prevotella-rich cluster (p < 1e−4).
  • Host factors aggregated into health- and disease-related groups by their multivariate association patterns with microbial genera and pathways. Health-related factors (e.g., vitamin D, HDL, adiponectin, fruit/vegetable intake, physical activity, easier bowel movements) were positively associated with Coprococcus, Lachnospira, Faecalibacterium, and unclassified Ruminococcaceae/Clostridiales clades. Disease-related factors (e.g., higher BMI, insulin/glucose/HbA1c, LDL/triglycerides/blood pressure, CRP/IL-6, digestive symptoms) were positively associated with Bacteroides, Sutterella, Bilophila, Acidaminococcus, Megasphaera, and some Ruminococcaceae.
  • Adjusting for Shannon diversity clarified independent associations: many Bacteroides associations lost significance after diversity adjustment, while Sutterella and Ruminococcaceae associations largely persisted, indicating composition-independent relationships for some taxa.
  • Functional pathways mirrored factor grouping: disease-related factors were positively associated with glycan/carbohydrate metabolism, bile acid metabolism (primary and secondary), and vitamin or vitamin-like metabolism pathways; xenobiotics metabolism pathways were associated with health-related factors. Several pathway associations (e.g., bile acid metabolism) remained significant after adjusting for diversity.
  • Medications associated with taxonomic and functional shifts: blood sugar–lowering medication users showed enrichment of Klebsiella and reduced Faecalibacterium; cholesterol-lowering drug users had enriched Enterobacteriaceae and Burkholderiaceae. Functionally, fructose and mannose metabolism pathways were enriched among blood sugar medication users, consistent with prior reports on metformin.
Discussion

This large, integrated analysis demonstrates that gut microbiome diversity and composition relate to a broad spectrum of host clinical markers, dietary patterns, lifestyle behaviors, and medication use in a composition-dependent manner. The strong but nonlinear association between Bacteroidetes abundance and alpha diversity suggests that simple ratios (e.g., Firmicutes-to-Bacteroidetes) are insufficient to capture health-relevant microbiome states. Introducing adjusted diversity (BA-diversity) exposed additional taxa–diversity relationships that are otherwise obscured by dominant phyla. Major axes of variation along Firmicutes–Bacteroidetes and Prevotella–Bacteroides continua reaffirm known structural features of the human gut microbiome while highlighting potential competitive dynamics between beneficial genera such as Bifidobacterium and Faecalibacterium. The persistence of associations between vigorous physical activity and both diversity and specific taxa (e.g., Veillonella) after adjusting for key confounders indicates that physical activity independently relates to microbiome structure. Moreover, cluster-specific associations show that the microbiome’s compositional context modifies host–microbe relationships (e.g., stronger diversity–insulin links in Bacteroidetes-rich profiles and enhanced vegetable–diversity associations in Prevotella-rich profiles). These findings imply that personalized dietary and lifestyle interventions may benefit from baseline microbiome stratification. The aggregation of host factors into health- and disease-related groups based on multivariate microbiome association patterns suggests that the gut microbiome integrates diverse host phenotypes into coherent microbial signatures. Adjusted analyses revealed which taxa and pathways maintain associations independent of overall diversity (e.g., Sutterella and bile acid metabolism), pointing to potential mechanistic links worthy of targeted investigation. Observed medication–microbiome associations, even after adjusting for corresponding clinical biomarkers, underscore the need to consider pharmacologic exposures when interpreting microbiome–health relations and designing interventions.

Conclusion

This study maps comprehensive, population-scale relationships between gut microbiome composition/function and approximately 150 host clinical, dietary, lifestyle, and medication variables across more than 3,400 individuals. Key contributions include identifying a nonlinear diversity maximum along the Firmicutes–Bacteroidetes axis, demonstrating independent associations of vigorous physical activity with alpha diversity and specific taxa, revealing composition-specific host–microbe relationships, and showing that host factors cluster into health- and disease-related groups with distinct microbial genera and pathway signatures. These insights support using microbiome profiles to guide personalized interventions and to stratify participants in clinical trials. Future work should employ randomized controlled interventions and mechanistic studies to establish causality, dissect medication versus disease effects, and test whether microbiome-informed dietary and lifestyle strategies improve clinical outcomes. Metrics such as Bacteroidetes-adjusted diversity and baseline compositional clustering may serve as useful biomarkers for tailoring and monitoring interventions.

Limitations
  • Cross-sectional design limits causal inference; associations do not establish directionality.
  • Despite adjustment for many covariates, residual and unmeasured confounding is likely.
  • Medication analyses, even when adjusted for relevant clinical biomarkers, may not fully disentangle drug effects from underlying disease effects.
  • Self-reported lifestyle, diet, digestive symptoms, and medication use may introduce measurement error or bias.
  • 16S rRNA amplicon sequencing and inferred functional profiling (PICRUSt2) provide indirect functional estimates; metagenomic/metatranscriptomic validation would strengthen functional conclusions.
  • Cohort demographics (e.g., majority European-American, participants in a wellness program) may limit generalizability to other populations.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny