
Medicine and Health
Diet-omics in the Study of Urban and Rural Crohn disease Evolution (SOURCE) cohort
T. Braun, R. Feng, et al.
Explore the intricate relationship between rural-urban transitions and Crohn disease in this compelling research conducted by a multi-disciplinary team. Discover how environmental changes, diet, and gut microbial composition impact health outcomes in newly diagnosed patients in China and Israel.
~3 min • Beginner • English
Introduction
The global incidence of Crohn disease (CD) has risen dramatically with modernization and urbanization, suggesting environmental and dietary drivers rather than genetic drift. In China, rapid urban-industrial transition provides a unique setting to study environmental shifts associated with the emerging CD burden, complementing Western populations. Prior work has implicated diet- and environment-induced alterations of the gut microbiome and mucosal immune-epithelial responses as key to chronic intestinal inflammation in CD, yet many patients do not achieve durable control with current therapies, reflecting gaps in causal understanding. Motivated by this, the multicenter SOURCE cohort was established across rural and urban settings in Guangdong, China, and Israel to profile demographics, environmental and dietary exposures, fecal microbiome, stool metabolome, and ileal mucosal transcriptome in newly diagnosed, treatment-naïve CD and controls. The study aimed to determine how rural-to-urban exposure gradients relate to microbiome/metabolome shifts and to map host mucosal gene expression modules linked to specific dietary factors, metabolites, and microbes.
Literature Review
The study situates its aims within evidence that modernization correlates with increased CD incidence globally. A meta-analysis comparing Eastern and Western populations reported associations of higher total fat, MUFA, and n-3/n-6 PUFA intake with CD specifically in Eastern cohorts. Prior human CD cohort studies and biospecimen analyses highlighted disease-specific microbiome and ileal transcriptome signatures but lacked integrated diet and metabolomics. Existing animal models incompletely recapitulate human CD pathobiology, underscoring the value of human multi-omics cohorts. Earlier omics consortia (e.g., RISK, HMP2) advanced understanding of host–microbiome interactions in IBD but included limited dietary exposure data, motivating integrated diet-omics approaches such as SOURCE.
Methodology
Design and cohorts: Multicenter, cross-sectional, multi-omics study conducted in Guangdong province, China (Sun Yat-Sen First Affiliated Hospital) and Israel (Sheba Medical Center), enrolling 380 participants (2019–2021): China—40 newly diagnosed, treatment-naïve CD patients, 121 urban healthy controls (Guangzhou), and 162 rural healthy controls (Shaoguan); Israel—25 newly diagnosed, treatment-naïve CD patients and 32 healthy controls. Rural residents were stratified by time spent in urban environments in the past year: rural (<50% urban time, n=88) vs rural-urban (≥50%, n=74). Ethics approvals and informed consent obtained at both sites.
Exposures and diet: Participants completed an IOIBD-derived environmental questionnaire (childhood factors, smoking, sanitary conditions, etc.) with added item quantifying time spent in urban areas, and a comprehensive FFQ adapted for Israeli and Chinese diets. The Israeli FFQ (computerized) quantified macro/micronutrients and servings; the Chinese FFQ (manual extraction) summarized nutrient and food consumption.
Biospecimens and assays: Stool samples collected (≥3 weeks post-antibiotics), aliquoted, stored at −80°C. Ileal biopsies obtained during diagnostic colonoscopy, stored in RNAlater at −80°C.
- Microbiome 16S rRNA gene sequencing (V4): Israel—Extract-N-Amp direct PCR; China—OMEGA Soil DNA kit; sequenced on MiSeq (Israel) or NovaSeq (China). QIIME2 pipeline, Deblur ASV inference, Greengenes taxonomy, contaminant filtering via dbBact, rarefaction (China 33k reads; Israel 4k). Diversity: Faith’s PD (alpha), Unweighted UniFrac (beta), PCoA.
- Shotgun metagenomics (MGX): Israel—Nextera libraries, NextSeq 500; China—TruSeq Nano, HiSeq X-ten. KneadData decontamination, MetaPhlAn 4 taxonomy, HUMANN 3 functional profiling; feature filtering (>0.01% in ≥10% samples).
- Fecal metabolomics: Israel—untargeted LC–MS (UPLC–Orbitrap Q-Exactive, ZIC-PHILIC), 405 metabolites retained. China—targeted UPLC–MS/MS (Q300 kit), 185 metabolites retained. Ninety-two metabolites overlapped for cross-cohort validation. Values normalized by total ion sum; Canberra distance PCoA.
- Ileal transcriptomics: PolyA RNA-seq (Illumina) with kallisto quantification (Gencode v24). Protein-coding genes with TPM>1 in ≥20% samples retained.
Analytical strategy:
- Indices: Health index (ratio of health- vs disease-associated ASVs) and an independent rural index (derived from an external Hunan cohort, PRJNA349463) computed as log-ratio of rural vs urban-enriched ASVs, applied to SOURCE.
- Multivariate and association tests: PERMANOVA (vegan/adonis) for variance explained by exposures/diet within each subgroup (controlling age, gender). MaAsLin2 for differential ASVs/metabolites and diet–ASV associations (FDR≤0.25, highlighting FDR≤0.1). HAllA for hierarchical all-against-all associations among diet, microbiome, metabolites (FDR≤0.25).
- Transcriptomics modules: WGCNA on Israeli ileal data to define co-expression modules; module eigengenes applied to Chinese data. Modules associated with CD (p≤0.05) were functionally annotated (ToppGene/ToppFun) and correlated with clinical markers, FFQ, metabolites, and microbial features (BH FDR≤0.25).
- Multi-omics integration: Sparse PLS (sPLS) to quantify shared variation between omic pairs with permutation-based significance; DIABLO to integrate omics and distinguish CD vs controls, extracting top loadings (features) associated with disease across omes.
Statistics: Nonparametric tests (Mann–Whitney), Spearman correlations, chi-square/Fisher’s exact, BH FDR control (cutoffs 0.1/0.25 as indicated). Analyses performed per country to avoid batch/site confounding; cross-cohort validation used overlapping features and independent indices.
Key Findings
- Rural-to-urban exposure gradient and microbiome/metabolome: Among rural Chinese, time in urban settings (rural vs rural-urban) was a major driver of fecal microbiome differences (Unweighted UniFrac PCoA; PERMANOVA p=0.002), with reduced alpha diversity in rural-urban vs rural (Faith’s PD, p=0.004) and numerically lower microbial health index. An independent rural index (from Hunan cohort PRJNA349463) was significantly lower in rural-urban vs rural and also lower in CD vs urban controls (e.g., Mann–Whitney p=0.0001 and p=0.0007, respectively). MaAsLin identified 41 ASVs increased in rural-urban (including Bacteroides spp., Ruminococcus gnavus, Fusobacteriaceae—taxa previously linked to CD) and 37 ASVs enriched in rural (e.g., Actinomyces, Bifidobacterium).
- Metabolome mirrors CD changes: In rural vs rural-urban Chinese (n=40 metabolomics), 22 metabolites differed (FDR≤0.25; 8 higher, 14 lower in rural-urban). Of these, 8/8 metabolites elevated in rural-urban were also elevated in CD vs urban controls, and 12/14 reduced in rural-urban were also reduced in CD. Effect sizes across comparisons were highly correlated (Spearman r=0.902, p=9.6×10^-9). Examples: higher N-acetyltryptophan, N-acetylalanine, oleic/palmitoleic acids; lower phenylpyruvate, glutarate, aminoadipate.
- Diet and exposures shaping microbiota: PCA of FFQ showed macronutrients and added sugar as major axes of variation; PERMANOVA within subgroups implicated total/saturated fat, fruits, iron (ferrous), dairy, added sugar, and early-life factors (farm animals, siblings) as significant contributors to microbiome composition (multiple groups, adjusted for age/gender). In SYS controls (n=283), higher iron intake associated with decreased abundance of 16 ASVs (Lachnospiraceae/Ruminococcaceae) and increased 12 ASVs (Actinomyces, Streptococcus). Higher total fat associated with reduced Oscillospira/Lachnospira and increased Veillonellaceae (Acidaminococcus, Megasphaera) and Streptococcus. dbBact enrichment linked ASVs reduced with fat/iron to taxa typically decreased in CD (chi-square p=3×10^-6) and ASVs increased with fat/iron to salivary-origin bacteria (p=0.005), echoing CD-associated oralization.
- CD-associated taxa enriched in mucosa: Both cohorts showed reduced alpha diversity and health index in CD. Maaslin2 (controlling confounders) identified CD-increased taxa, more prominent in mucosal biopsies; shared increases included Enterobacteriaceae (ASV05780), Actinomyces (ASV08231), and Fusobacteriaceae (ASV15593). Within-subject stool vs biopsy samples were more similar than across-subject comparisons; ileum and rectum biopsies were more similar to each other than either to stool.
- Ileal transcriptomics WGCNA modules: Nine modules associated with CD across Israel/China. Down in CD: lipid metabolism (yellow), mitochondrial translation/structure (green), respiration (red), DNA damage/repair (pink). Up in CD: immune/ECM (brown; CXCLs, OSM, TREM1, MMPs), myeloid (black; TLRs, CARD9), tuft cells/eosinophils (salmon; CCR3, CLC, ALOX15, TFFs), epithelial innate immunity (tan; DUOX2, CEACAMs), and cell cycle/mitosis/T&B cells (purple). All modules showed consistent directionality across cohorts except DNA repair (pink) and cell cycle/T&B (purple), which reached significance only in China.
- Diet–module and metabolite–module links: In Israel, manganese and vitamin D intakes positively correlated with control-associated epithelial lipid/mitochondrial modules and coffee inversely correlated with immune modules; starch, iodine, and selenium tracked with disease-associated directions (FDR≤0.25). In China, vegetables/fruits associated with control signals, processed food with immune/ECM (brown) module; trends linked vitamin D with control and fat with CD-associated modules. Israeli metabolomics showed 234 significant metabolite–module correlations (FDR≤0.25), largely aligning with control modules for lipid fatty acids and with CD immune modules for amino acids. Of 92 cross-cohort metabolites, 39 correlated with modules and 32 (82%) showed consistent directionality across Israel and China (binomial p=7.1×10^-5).
- Microbe- vs diet-linked metabolites differentially map to host modules: HAllA indicated diet-linked fecal metabolites associate with epithelial metabolic modules, whereas microbe-linked metabolites associate with immune modules (Fisher’s p<1×10^-5). Examples: methionine linked to Veillonella dispar and the myeloid (black) module; malonate and 3-hydroxymethylglutaric acid linked to Veillonella pravula and Haemophilus parainfluenzae; ureidopropionate linked to R. gnavus. Potentially beneficial metabolites included azelate (PPARγ-linked anti-inflammatory), phenylethanolamine and L-dopa (dopaminergic pathway) associated with control epithelial modules and anti-correlated with disease immune modules.
- Multi-omics integration: sPLS revealed high shared variation between omic pairs—transcriptomics–metabolomics 0.867 (p<0.01), species–metabolomics 0.848 (p<0.01), species–functional profiles 0.569–0.743; FFQ linked with functional metagenomics (0.764, p<0.05) and transcriptomics (0.65, p<0.08). DIABLO showed 13–53% explained variance across omes and separated CD from controls; 67 cross-omic features associated with CD included R. gnavus and E. ramosum clustering with immune/myeloid modules, opposite to fibers, vegetables, and A. putredinis in the control space.
Discussion
The SOURCE cohort leveraged ongoing rural-to-urban transitions in China alongside a Western Israeli cohort to delineate how environmental and dietary exposures map onto gut microbiome, fecal metabolome, and ileal mucosal transcriptome in new-onset CD. Time spent in urban environments by rural residents associated with decreased microbial diversity, reduced rural and health indices, enrichment of CD-associated taxa, and metabolomic shifts that closely mirrored those observed in CD, indicating a continuum of exposome-driven alterations rather than a binary rural/urban state. Unbiased host transcriptomic module analysis resolved CD signals into epithelial metabolic suppression (lipid and mitochondrial pathways) and activation of immune, myeloid, tuft/eosinophil, and epithelial innate programs. Integrative correlations showed that diet-linked metabolites preferentially associated with epithelial metabolic modules, whereas microbe-linked metabolites tracked with immune modules, bridging environmental inputs to host–microbe crosstalk. Protective dietary factors (manganese, vitamin D, coffee) correlated with healthier epithelial modules and microbial profiles, while saturated fat, sugar, and processed foods aligned with immune-inflammatory modules. Multi-omics integration (sPLS, DIABLO) reinforced these associations and prioritized cross-ome features consistent with CD pathobiology. Together, the findings address the central hypothesis that modernization-related exposures perturb the gut ecosystem and mucosal programs in ways that resemble early CD pathogenesis, highlighting actionable dietary and metabolite targets.
Conclusion
This multicenter, multi-omics study across China and Israel demonstrates that rural-to-urban exposures are associated with microbiome and metabolome profiles that mirror new-onset CD and that specific dietary components map to distinct ileal mucosal gene modules. The work prioritizes potentially beneficial exposures (e.g., manganese, vitamin D, coffee) and metabolites (e.g., azelate, dopaminergic intermediates) linked to healthier epithelial metabolic programs and identifies microbe-associated metabolites connected to immune activation. The integrated dataset offers a resource for hypothesis generation beyond North American cohorts and suggests testable interventions to modulate diet and metabolite landscapes to favor protective host–microbe states. Future research should include longitudinal and interventional studies to validate causality, expand sample sizes and diversity, incorporate single-cell and spatial profiling, and trial prioritized dietary/metabolite interventions in models and patients.
Limitations
- Cohort size and omic subset availability were modest, limiting power to detect weaker effects; transcriptomics and metabolomics were available for subsets.
- Dietary data were derived from FFQs, which entail recall and estimation biases; more precise intake tracking might reveal additional associations.
- Each subgroup originated from a single geographic site, potentially limiting generalizability despite cross-cohort validation and use of an independent rural index.
- Rural CD cases were not included (CD primarily urban), constraining comparisons across all exposure strata.
- Analyses commonly used FDR≤0.25 to balance discovery and power, although results were supplemented with stricter thresholds and independent validations when possible.
- Single-cell transcriptomics was not performed; whole-biopsy bulk RNA-seq may mask cellular heterogeneity.
- Enteric infection burden was not directly assayed; CRP did not differ between rural and rural-urban.
- Due to COVID-19, samples were processed locally at each site; harmonized pipelines and within-country analyses mitigated batch effects, with cross-cohort validation where feasible.
Related Publications
Explore these studies to deepen your understanding of the subject.