Medicine and Health
Unique genetic and risk-factor profiles in clusters of major depressive disorder-related multimorbidity trajectories
A. Gezsi, S. V. D. Auwera, et al.
Major depressive disorder (MDD) is highly prevalent, heterogeneous in presentation and neurobiology, and often treatment-resistant. Prior GWAS and genetic correlation studies suggest pleiotropic genetic influences shared with psychiatric and somatic diseases, while twin studies imply a substantial role of non-genetic factors and comorbidities. The study asks whether age-dependent, directly depression-related multimorbidity trajectories can define biologically distinct subgroups of MDD with enriched genetic and non-genetic signals. The hypothesis is that focusing on strongly relevant (nonmediated) multimorbidities across the lifespan will enrich the genetic basis of MDD and reveal clusters with distinct genetic architecture, risk-factor profiles, and clinical courses.
Existing subtyping efforts in depression often rely on cross-sectional clinical features, limiting biological interpretability. GWAS of MDD indicate diverse, largely nonspecific pathways and pleiotropy with somatic and mental disorders. Prior work mapped strong multimorbidity patterns among hundreds of common diseases at phenotypic and genetic levels, including links between psychiatric, cardiovascular, and respiratory conditions. Time-dependent psychiatric multimorbidity clusters have been reported in schizophrenia. The network medicine perspective supports that directly related comorbidities reflect stronger biological overlap than mediated associations and are time-dependent. These findings motivate a temporal, systems approach to define biologically meaningful depression subtypes via multimorbidity trajectories.
Design and cohorts: The TRAJECTOME project analyzed 1,576,598 participants from seven European general-population cohorts. Discovery cohorts (N=1,189,509) included UK Biobank (UKB), Catalan Health Surveillance System (CHSS), and Finnish Institute for Health and Welfare surveys (THL). Validation cohorts (N=387,089) included FinnGen and SHIP.
Selection of cross-cohort diseases: From ICD-10 three-character categories, diseases with prevalence >1% in the cohort or in MDD cases were retained per cohort (266 in UKB, 356 in CHSS, 339 in THL). Strong relevance to MDD (ICD-10 F32/F33) was estimated using inhomogeneous dynamic Bayesian networks (BNs) over cumulative onset intervals [0–20], [0–40], [0–60], [0–70]. Strong relevance was defined as being in the Markov boundary of MDD (direct or interactional dependence). Diseases with posterior probability of strong relevance >0.5 in any interval in any cohort and present across all cohorts were kept, yielding 86 cross-cohort diseases.
Temporal BN modeling: Disease onsets were discretized into cumulative intervals; dynamic BN allowed only forward-in-time edges. Posterior probabilities of strong relevance were estimated by DAG-based MCMC (burn-in 2×10^6, 10^7 samples; max 8 parents per node; convergence on 99.5% of probabilities).
Multimorbidity scores and clustering: For each participant and interval, a weighted direct MDD-related multimorbidity score was computed as the sum over first-onset diseases multiplied by the cross-cohort relevance score. These four interval-specific scores defined a 4D space for k-means clustering. Participants ≥70 with complete scores determined cluster centers; younger individuals were assigned based on nearest center. Posterior probabilities of cluster membership were derived from normalized exponential of negative Euclidean distances; posterior log-odds were used as quantitative cluster traits. Participants under 60 with max posterior probability ≤0.25 were excluded from downstream analyses, leaving N=364,008 for comparisons and genetics. A privacy-preserving federated pipeline aggregated site-specific relevance and score counts without sharing individual-level data.
Clinical characterization: Weighted Cox models (hazard ratios for cluster membership) assessed risk for each of the 86 diseases, adjusting for sex, income (if available), and birth year (UKB). Weighted Kaplan–Meier curves estimated MDD-free survival by cluster.
Genetics: GWAS were conducted treating posterior log-odds of cluster membership as quantitative traits, adjusted for age (splines), sex, PCs, and batch variables (PLINK 2.0 in UKB; Regenie in FinnGen). Post-GWAS included FUMA locus mapping, MAGMA gene-level tests (Holm correction), functional enrichment with g:Profiler (GO/KEGG, g:SCS), and LD score regression for heritability and genetic correlations. Pleiotropy with MDD was tested by overlap with MDD genes (hypergeometric and GSEA), and network-based propagation on the STRING interactome identified cluster-specific functional modules influenced by MDD-associated genes. Polygenic risk scores (PRS-CS) derived from UKB cluster GWAS were tested in THL and SHIP, adjusting for covariates.
Non-genetic risk factors: Simple (one predictor with age, sex) and complex multivariable linear regressions related behavioral/physiological/psychological factors (e.g., BMI, blood pressure, CRP, smoking, alcohol, insomnia, neuroticism, stress, current depression) to cluster log-odds; Bonferroni correction applied. Validation of risk-factor associations was performed in THL and SHIP.
QC and genotyping: Extensive genotyping QC and imputation protocols were applied in each cohort (UKB v3; FinnGen DF10; THL arrays with SISU reference panels; SHIP Affy 6.0 with HRC v1.1 imputation).
- Seven temporal multimorbidity clusters were identified across cohorts, reflecting distinct age-at-onset distributions and disease burdens. Clusters 1–4 showed low comorbidity and later disease onset; Cluster 6 showed higher late-life burden; Cluster 5 had earlier onsets for musculoskeletal, respiratory, and genitourinary diseases; Cluster 7 had early onsets for allergic/respiratory inflammatory diseases, migraine, and dermatitis with a bimodal pattern (pre-20 years and later age-related peak).
- MDD burden: Clusters 1–4 had low MDD prevalence and later onset; Clusters 5–7 had high MDD burden. Cluster 5 showed increased schizophrenia and pain disorders; Cluster 6 showed increased stress-related and somatoform disorders; Cluster 7 showed increased asthma, allergic rhinitis, migraine, dermatitis alongside MDD.
- Disease risk: Weighted Cox models in UKB showed decreased hazard for most diseases in Clusters 1–2; modest increases in cerebrovascular/kidney/hypertension in Cluster 3 and lipid disorders/hypothyroidism in Cluster 4; broad increases in Clusters 5–6; selective increases for allergic/inflammatory conditions in Cluster 7. MDD-free survival was highest in Clusters 1–4 and lowest in Clusters 5–7 across cohorts.
- Genetics (UKB N=249,167): Across clusters, 6,141 distinct genome-wide significant SNPs mapped to 42 risk loci on 20 chromosomes. Heritability (LDSC h^2) ranged approximately 0.0148–0.0483. Clusters 1–4 (low burden) showed many significant loci/genes enriched in immune pathways (HLA region, interleukin/TLR signaling; enriched GO/KEGG terms including cytokine-cytokine receptor interaction, MHC class II assembly, Th1/Th2 differentiation). Signals overlapped with allergic diseases (asthma, rhinitis, eczema), cardiometabolic traits (BMI, CRP, HDL), autoimmune diseases, IBD, and blood measures (WBC, vitamin D).
- High-burden clusters: Clusters 5–6 had weaker GWAS signals (few loci; overlaps with psoriasis for Cluster 5 and with cardiovascular/asthma/RA/blood traits for Cluster 6). Cluster 7 had the strongest genetic contribution, numerous loci, and negative genetic correlations with Clusters 1–4; overlaps included apolipoprotein AI, coeliac disease, CAD, vasculitis, and cholangitis.
- Genetic correlations: Only Clusters 5–6 showed significant positive genetic correlation with PGC MDD; Clusters 1–2 were negatively correlated with MDD and BD; Cluster 7 was positively correlated with BD; no cluster significantly correlated with schizophrenia. With asthma, Clusters 1–4 showed negative, Clusters 5 and 7 positive, and Cluster 6 nonsignificant correlations. Case-only MDD analysis in UKB showed high genetic correlations (0.78–1) between population-based and MDD-specific clusters.
- Pleiotropy with MDD: Seventeen genes overlapped between cluster-significant genes and MDD genes; GSEA showed significant enrichment of MDD genes across all clusters. Network analysis identified 31 cluster-specific functional modules significantly influenced by MDD-associated genes, indicating pleiotropy at the module level.
- Validation: In THL, PRSs for all seven clusters significantly associated with cluster probabilities (adjusted P from 1.0×10^-15 to 1.7×10^-2). In FinnGen (N=277,252), multiple SNPs, loci, genes, and immune-related functional enrichments replicated; HLA gene signals replicated particularly in Clusters 1–2. Cross-cohort genetic correlations among clusters were similar (e.g., rg UKB–FinnGen ranged ~0.37–0.80 by cluster). In SHIP, five PRSs positively correlated with cluster probabilities (Cluster 1 P=0.025; Cluster 7 P=0.067), and non-genetic factor associations (age, BMI, BP, insomnia, neuroticism, current depression) were consistent with UKB.
- Non-genetic profiles: Clusters 1–2 had older age but fewer adverse behaviors; Clusters 3–4 had higher BMI, lower education/income, more smoking and insomnia (Cluster 4); Clusters 5–6 had younger age with accumulations of behavioral/psychological risk (stress, insomnia, neuroticism); Cluster 7 showed a more favorable behavioral profile overall despite higher allergic/respiratory disease burden and MDD risk.
- General patterns indicate three divergent risk profiles: protective (Clusters 1–4), high-risk/lifestyle-linked (Clusters 5–6), and genetically driven inflammatory/allergic (Cluster 7).
The study demonstrates that focusing on strongly relevant, age-dependent multimorbidities identifies seven clinically and biologically distinct MDD-related trajectories. The clusters recapitulate and refine the interplay between immune/inflammatory biology and depression risk. Low-burden clusters (1–4) exhibit shared protective immune genetics and fewer behavioral risks, yet show specificity in age-related cerebrovascular/metabolic tendencies. High-burden clusters (5–6) align with greater multimorbidity, lifestyle and stress-related risks, and positive genetic correlation with MDD, suggesting interaction between behavioral exposures and genetic liability. Cluster 7 highlights a genetically driven, early-onset inflammatory/allergic profile with increased MDD independent of adverse lifestyle, providing a mechanistic link between immune dysregulation and depression. These findings reconcile prior inconsistent inflammation–depression associations by revealing opposing pleiotropic effects across subgroups. The temporal, systems-based framework supports precision strategies: early identification of high-risk trajectories, targeted behavioral interventions for lifestyle-linked clusters, and immuno-modulatory approaches for genetically driven inflammatory subtypes. Replication across cohorts and validation via PRS underscore robustness and generalizability, including feasibility in settings with limited disease data.
Using dynamic Bayesian networks and temporal multimorbidity trajectories across 1.6 million individuals, the study identifies seven MDD-related clusters with distinct genetic architectures, immune-related pleiotropy, and non-genetic risk profiles. The results emphasize neuroinflammatory processes in specific depression subgroups and provide a practicable route to biologically informed subtyping for prevention and personalized therapy. Future work should include deeper phenotyping within clusters, causal inference to disentangle confounding, prospective validation for risk stratification, and extension of this trajectory-based approach to other complex multimorbid diseases.
- Heterogeneity in healthcare systems, digitization timelines, and socioeconomic factors across cohorts may influence diagnosis rates and onset recording, potentially biasing prevalence and timing estimates.
- The method does not distinguish between acute and chronic disease entities; acute events (often inflammatory) may have different long-term effects than chronic conditions.
- Bayesian network inference from observational data is sensitive to unmeasured confounding and selection bias; direct edges denote strong probabilistic dependence, not causality.
- Some cohorts (e.g., THL) had limited sample size for GWAS, reducing power and yielding small or negative heritability estimates.
- Limited disease availability in certain cohorts (e.g., SHIP) constrains clustering accuracy, though simulations and PRS validation support applicability.
- Cluster assignment in younger individuals relies on partial trajectories, introducing uncertainty that was mitigated by exclusion thresholds but may still affect generalizability.
Related Publications
Explore these studies to deepen your understanding of the subject.

