Medicine and Health
Unified AI framework to uncover deep interrelationships between gene expression and Alzheimer's disease neuropathologies
N. Beebe-wang, S. Celik, et al.
Alzheimer's disease (AD) is a leading cause of death with no disease-modifying therapy. Major challenges include clinical and pathological heterogeneity in older individuals and limited understanding of the molecular drivers of amyloid and tau proteinopathies and AD dementia. While large GWAS have implicated pathways such as tau binding, APP metabolism, and immune function, and postmortem transcriptomic studies have examined correlates of AD phenotypes, the scarcity of brain expression data within individual cohorts and reliance on linear methods limit discovery of complex gene–phenotype relationships. Integrative analysis across multiple cohorts is further hindered by the need for harmonized phenotypes and batch/cohort effects that can overshadow disease signals. The research goal is to develop a unified, data-driven framework that jointly models multiple related neuropathological phenotypes from heterogeneous, sparsely labeled multi-cohort brain gene expression data to uncover robust, potentially non-linear molecular relationships linked to AD pathology and clinical outcomes. The study introduces MD-AD, a multi-task deep learning approach designed to improve prediction of neuropathology from gene expression, enable cross-cohort integration despite missing labels, and yield biologically interpretable insights including sex-specific effects and immune/microglial involvement.
Prior transcriptomic investigations in AD have ranged from pairwise gene–trait correlations and case-control differential expression to network-based analyses identifying co-expression modules associated with neuropathology and cognition within single cohorts. Consensus approaches have aggregated modules across datasets or regions via meta-analysis, but no single unified model has directly integrated multiple AMP-AD datasets while jointly modeling several neuropathological outcomes. Linear or module-based methods tend to capture broad, cohort-level variation, potentially obscuring subtler disease-relevant signals. Deep learning methods can model non-linear relationships but have been underutilized due to limited per-cohort sample sizes. Multi-task learning offers inductive bias for related outputs, suggesting potential advantages for jointly modeling multiple, noisy measurements of AD pathology. The Methods also review and contrast earlier systems/network approaches and recent single-cell analyses that highlighted microglial programs and sex influences, setting the stage for a unified multi-task deep learning framework.
Data: RNA-seq brain gene expression and neuropathology labels from three AMP-AD cohorts—ROSMAP, ACT, and MSBB—comprising 1,758 samples from 925 individuals across nine brain regions. Gene IDs were standardized to symbols, genes present across datasets retained (14,591 genes; ~96.3% autosomal), expression log-transformed where needed, scaled to [0,1], and batch-corrected using ComBat. Neuropathology labels included six phenotypes: amyloid-related (Aβ IHC, neuritic plaques/NPs, CERAD score) and tau-related (τ IHC, tangles, Braak stage). Labels were aligned across regions (global CERAD/Braak assigned to all regions from same donor; regional IHC matched to same/nearest region; plaques/tangles averaged for consistency). All phenotype variables were normalized to [0,1]. Preprocessing for modeling used PCA to 500 principal components (capturing ~92% variance), employed for MD-AD and baselines to improve efficiency and reduce overfitting.
Model: MD-AD is a multi-task deep neural network that jointly predicts the six neuropathological outcomes from the same input via shared hidden layers followed by task-specific branches, enabling learning from sparsely labeled data (samples contribute gradients only for available labels while updating shared layers). Baselines included six single-output MLPs (identical architecture without shared layers) and six regularized linear models (ridge) per phenotype. All networks used ReLU activations, dropout (0.1), Adam optimizer (grid-searched learning rate 1e-3 vs 1e-4; gradient clip 0.1 vs 0.01), kernel regularization (1e-3 vs 1e-5), trained for 200 epochs with batch size 20. Hyperparameters were selected via five-fold cross-validation (CV) within each of five train/test splits; final models were selected by average rank across splits and retrained on all data for external validation and interpretation. Alternative architectures with different shared/task-specific depths were evaluated and performed similarly or worse.
Evaluation: Internal performance used five repeated train/test splits with CV-based tuning; metric was 1−R^2_cv (MSE divided by test-set variance). Additional analyses trained on subsets of cohorts and tested on a held-out cohort (e.g., ROSMAP) to assess batch effects and benefits of heterogeneous training data, including cases where added cohorts lacked specific phenotype labels. Splitting by individual (no donor overlap between train/validation) was also evaluated to rule out leakage from multiple regions per individual. Covariate correction analyses regressed out PMI and RIN per gene to confirm robustness to sequencing covariates.
External validation: Out-of-cohort human brain datasets (MSBB microarray N=1047, HBTRC N=338, Mayo N=157) were normalized to match training distributions; neuropathology predictions across six outputs were aggregated to a per-sample “neuropathology score” (phenotype-wise within-dataset percentiles averaged). Cross-species validation used mouse brain expression from TASTPM and wild-type mice (mapped to 7,057 orthologous genes, retrained MD-AD on intersecting genes) and computed neuropathology scores. Cross-tissue validation applied MD-AD to blood microarray datasets from AddNeuroMed (Blood1 GSE63060; Blood2 GSE63061; total n=711; retrained on genes intersecting blood and brain; generated neuropathology scores and assessed separation across CTL/MCI/AD and by age).
Interpretability and embeddings: The last shared layer provides a supervised embedding (50-node consensus) constructed by training 100 MD-AD models, collecting last-shared-layer activations, clustering nodes across runs via k-means (k=50), and summarizing clusters by medoids. Embeddings were visualized using t-SNE and evaluated by correlations of individual nodes with neuropathology and higher-level clinical phenotypes (dementia diagnosis, cognition, dementia duration). Integrated Gradients (IG) provided sample-specific gene attributions for each output and for last-shared-layer nodes (by temporarily truncating the network). Gene importance rankings were aggregated across samples and 100 runs to obtain consensus ranks per phenotype and overall; pathway enrichment was performed via GSEA for REACTOME (and KEGG in supplements). Non-linear interactions with covariates were probed by modeling per-sample consensus IG scores as a function of gene expression, covariate (primarily sex), and their interaction, testing the interaction coefficient with FDR correction across genes. Enrichment of sex-interacting genes was assessed for REACTOME categories and for microglial cluster signatures from single-cell studies; broader cell-type signature enrichment (41 clusters across six cell types) was also tested using gene set enrichment on the final gene rankings.
- Performance and multi-task gains: Compared with single-output MLPs, MD-AD reduced prediction error (1−R^2_cv) by 7% (CERAD), 13% (Braak), 7% (NPs), 25% (tangles), 10% (Aβ IHC), and 14% (τ IHC). MLPs consistently outperformed linear models, and MD-AD outperformed MLPs across all six outcomes.
- Benefits of heterogeneous and sparse labels: Training on multiple cohorts reduced test error on a held-out cohort (e.g., ROSMAP) despite initial increases from adding a single new dataset, indicating benefits of increased heterogeneous samples outweigh batch/labeling differences. Importantly, adding new samples improved performance for phenotypes even when those labels were absent in the added datasets, demonstrating value of shared representations learned from related outcomes.
- External human brain validation: In three independent datasets (MSBB-M N=1047, HBTRC N=338, Mayo N=157), MD-AD neuropathology scores were significantly higher in AD cases vs controls (two-sided t test: t=22.98, p<0.001), exceeding separations achieved by MLP and linear baselines. Stratified by age, differences remained significant and were largest under age 75.
- Cross-species generalization: In mouse brain expression, MD-AD predicted higher neuropathology scores in TASTPM vs wild-type (homozygous vs WT: t=3.45, p<0.001). MLP was weaker (t=3.01, p<0.01) and linear failed. Trends aligned with gene-dosage (heterozygous intermediate; t=1.38, p=0.17 vs WT). MD-AD captured increasing predicted pathology with age and strain severity.
- Supervised embeddings capture AD severity: The last shared layer embedding coherently organized samples by all six neuropathological measures and generalized to external human and mouse samples. Embedding nodes showed significant correlations with higher-level AD phenotypes—dementia diagnosis, cognition (age/sex/education-adjusted), and dementia duration—often outperforming MLP nodes and always outperforming unsupervised and module-based embeddings (paired tests, FDR-corrected).
- Gene attributions and pathways: IG-derived consensus rankings highlighted enrichment for pathways including metabolism of RNA/proteins, immune system, cell-cell communication, signal transduction, hemostasis, and complement. Top genes included immune and microglial markers (e.g., TREM2, C4B, SERPINA3, MS4A7, SIGLEC1, GFAP). Compared with correlation-based rankings, MD-AD prioritized metabolism, immune, and signaling genes, whereas correlation favored transcription-related genes.
- Sex-specific interactions: Of 14,591 genes, 6,465 displayed significant sex-by-expression interactions in IG importance (FDR<0.05), indicating widespread sex-differential contributions to predicted neuropathology. Among the top 100 MD-AD genes, immune, reproduction, and hemostasis pathways showed strong sex interaction enrichment. Exemplars with strong sex interactions included KNSTRN and P2RY11 (higher effect in females) and C4B, CMTM4, TREM2, SERPINA3 (higher effect in males). Complement pathway genes (e.g., C4B) and TREM2 showed pronounced sex-dependent associations.
- Microglial and cell-type context: Many top MD-AD genes overlapped microglial single-cell clusters (stress, interferon/cytokine signaling, antigen presentation, transcription factor, proliferating clusters), with sex-interacting effects enriched in stress/immune/proliferation clusters. Broader enrichment was observed for astrocyte and inhibitory neuron signatures, indicating multi-cell-type transcriptomic contributions.
- Cross-tissue transfer to blood: Applying brain-trained MD-AD to blood datasets predicted significantly higher neuropathology scores in MCI (t=7.34, p<0.001) and AD dementia (t=5.87, p<0.01) vs controls (N=238 CTL, 189 MCI, 284 AD). Predictions increased with age in controls, and separation of CTL vs MCI/AD was strongest under age 80. Linear models failed to transfer meaningfully; MD-AD embeddings of blood stratified samples by predicted neuropathology consistent with cognitive status.
The study demonstrates that multi-task deep learning can effectively integrate heterogeneous, sparsely labeled multi-cohort brain transcriptomic data to learn robust, non-linear relationships between gene expression and AD neuropathology. By jointly predicting six related amyloid and tau phenotypes, MD-AD leverages shared biological signal and denoises phenotype-specific noise, improving predictive accuracy relative to single-task deep and linear models. The learned representation generalizes across cohorts, species, and tissue, supporting the presence of conserved transcriptomic signatures of neuropathology beyond cohort or platform-specific artifacts.
Interpreting MD-AD via Integrated Gradients and supervised embeddings reveals molecular processes and interactions not captured by linear approaches. Notably, immune and complement pathways, and microglial-related programs, emerge as key contributors, with pervasive sex-by-gene interaction effects shaping predicted neuropathology. These findings contextualize and refine previous genetic and transcriptomic evidence implicating immunity in AD, suggesting that sex-specific microglial and immune mechanisms differentially influence amyloid and tau pathologies and that non-linear models can uncover such context-dependent effects.
The ability of MD-AD’s representation to correlate with clinical dementia, cognitive function, and disease duration indicates it captures a general AD severity axis beyond individual pathology measures. Cross-tissue transfer to blood underscores potential translational applications for risk stratification or monitoring, while cross-species consistency supports biological relevance of the learned signatures. Overall, MD-AD addresses the initial challenge of integrating multi-cohort transcriptomes without harmonized labels and reveals nuanced biology central to AD pathogenesis.
MD-AD provides a unified, multi-task deep learning framework that: (1) accurately imputes AD neuropathology from heterogeneous brain gene expression; (2) learns stable, generalizable representations that transfer across cohorts, species, and even tissue; (3) uncovers non-linear, sex-dependent immune and microglial contributions to AD pathology not captured by linear methods; and (4) links transcriptomic signatures to clinical dementia, cognition, and disease duration. These insights converge with genetic and proteomic evidence implicating complement and immune pathways in AD and highlight complex sex-specific effects. Future work should extend MD-AD to larger and newer datasets, including single-nucleus RNA-seq and additional cohorts/regions, refine cell-type and pathway resolution, and design sex-informed mechanistic and clinical studies to test hypotheses emerging from these models, potentially guiding precision interventions targeting immune pathways in AD.
- Causality cannot be inferred: Both gene expression and neuropathology labels are from postmortem observational data, precluding causal conclusions about gene effects on pathology.
- Cohort and batch heterogeneity: Differences in labeling conventions and residual batch effects may influence models, although benefits of heterogeneous data outweighed such effects in practice.
- Non-i.i.d. samples: Multiple regions per individual challenge i.i.d. assumptions; analyses with donor-separated splits showed similar performance, mitigating concerns but not eliminating dependency.
- Dimensionality reduction: Using 500 PCs may discard some predictive information, though analyses suggested comparable performance to using all genes for linear predictivity.
- Sparse and uneven labels: Some phenotypes were sparsely measured across cohorts, which MD-AD addresses via multi-task learning but still represents a constraint of available data.
Related Publications
Explore these studies to deepen your understanding of the subject.

