Medicine and Health

Unified AI framework to uncover deep interrelationships between gene expression and Alzheimer's disease neuropathologies

N. Beebe-wang, S. Celik, et al.

Explore the groundbreaking multi-task deep learning framework MD-AD, developed by Nicasia Beebe-Wang and colleagues, which dives deep into heterogeneous Alzheimer's Disease datasets to reveal complex non-linear relationships and subtle disease signals, showcasing its remarkable versatility across species and tissues.

00:00

Playback language: English

Index

Introduction

Alzheimer's Disease (AD), a leading cause of death, presents significant challenges due to its heterogeneity and the limited understanding of its molecular drivers. Current research approaches focus on large-scale genome-wide association studies (GWAS) identifying genetic variants linked to AD and moderate-scale postmortem transcriptomic studies exploring molecular correlates of neuropathological outcomes. However, these approaches often face limitations. GWAS primarily focus on clinical diagnoses, neglecting the rich spectrum of neuropathological variations. Transcriptomic studies, while richer in phenotypic detail, suffer from limited sample sizes per cohort, hindering the application of complex models like deep neural networks (DNNs). The AMP-AD consortium's collection of post-mortem brain RNA-sequencing datasets offers a unique opportunity to overcome these limitations by integrating multiple datasets. Existing methods struggle with the inherent heterogeneity of these datasets, requiring harmonized phenotypes and often focusing on linear relationships, thereby obscuring subtle disease signals. This study addresses this gap by developing a novel framework that can effectively analyze the combined data.

Literature Review

Previous research on the molecular mechanisms of AD has employed two main approaches: large-scale GWAS and moderate-scale postmortem transcriptomic studies. GWAS have identified genetic variants associated with AD, implicating pathways like tau protein binding, amyloid precursor protein metabolism, and immune responses. Transcriptomic studies have investigated the molecular correlates of various neuropathological outcomes. Early work focused on pairwise correlations between gene expression and AD traits, while more recent studies used single-cohort data to infer gene regulatory networks or co-expressed modules associated with AD phenotypes. However, the use of complex models like DNNs has been limited by the scarcity of data within each cohort. Existing methods for integrating multiple datasets often rely on separate analyses of each dataset followed by a consensus approach, which neglects potential complex interactions among variables. This study aims to overcome the limitations of these previous approaches by developing a unified framework for analyzing multiple datasets simultaneously.

Methodology

The MD-AD (Multi-task Deep learning for Alzheimer's Disease neuropathology) framework was developed to analyze RNA-sequencing data from three cohorts (ROSMAP, ACT, and MSBB), encompassing 1758 samples across nine brain regions. The model simultaneously predicts six AD-related neuropathological phenotypes (three related to amyloid plaques and three to tau tangles) using gene expression profiles as input. MD-AD accommodates sparsely labeled data, updating model parameters only for the labeled phenotypes in a given sample. The model's architecture includes shared layers capturing common features across phenotypes and task-specific layers for individual phenotype predictions. For comparison, the study employed two baseline models: a regularized linear model (ridge regression) and a single-output deep neural network (MLP). The model's performance was evaluated using five-fold cross-validation (CV) with average 1−R²cv error as the metric. External validation was performed using three independent datasets (MSBB-M, HBTRC, and Mayo Clinic Brain Bank), mouse models (TASTPM mice and wild-type mice), and blood samples from the AddNeuroMed cohort. Model interpretability was achieved using Integrated Gradients (IG) to quantify the contribution of each gene to predictions. This method allowed the identification of genes and pathways relevant to each neuropathological phenotype, including sex-specific analyses. Additional analyses included functional enrichment analysis (GSEA) and exploring interactions between gene expression, sex, and MD-AD importance scores. The study also assessed the enrichment of cell-type specific gene signatures from previous single-cell RNA-seq analyses.

Key Findings

MD-AD significantly outperformed the baseline models in predicting AD neuropathological phenotypes, demonstrating the advantage of its multi-task deep learning approach. The improvement was particularly pronounced for phenotypes with more missing labels, highlighting MD-AD's ability to handle sparse data effectively. The model's predictions generalized well to independent human datasets, mouse models, and blood samples, indicating the robustness and generalizability of the learned features. The model's last shared layer embedding provided a coherent representation of gene expression, consistently capturing the relationship between gene expression and overall AD severity, outperforming unsupervised methods. Interpretation of the model revealed that genes relevant to MD-AD's predictions were enriched in pathways related to RNA and protein metabolism, immune system, cell-to-cell communication, and signal transduction. Sex-specific analyses uncovered widespread sex-differential effects in gene importance, particularly for immune system genes. Many top-ranked genes were upregulated in multiple microglial clusters, suggesting a crucial role for microglia in AD pathogenesis, especially in relation to sex-specific immune responses. The model's predictions proved consistent across brain and blood samples, indicating that the learned patterns are transferable across tissues and opening avenues for early identification of at-risk individuals.

Discussion

The findings demonstrate MD-AD's capacity to capture complex, non-linear relationships between gene expression and AD neuropathology, surpassing the limitations of conventional methods. The improved prediction accuracy and generalizability across diverse datasets highlight the framework's robustness and its potential for broader application. The identification of sex-specific interactions in immune-related genes provides new insights into the complex interplay between sex and AD pathogenesis, refining our understanding of the underlying molecular mechanisms. The transferability of the model's predictions to blood samples suggests the possibility of developing non-invasive diagnostic tools. The limitations of the study include the reliance on post-mortem brain samples, which restricts the ability to establish direct causal relationships. Future research could focus on longitudinal studies to elucidate the causal relationships between gene expression and AD progression and explore the potential of MD-AD as a diagnostic and prognostic tool.

Conclusion

This study successfully introduces MD-AD, a powerful multi-task deep learning framework capable of identifying complex, non-linear relationships between gene expression and AD neuropathology. MD-AD outperforms existing methods in prediction accuracy and generalizability, highlighting the importance of integrating multi-cohort data and leveraging the power of DNNs. The framework's ability to reveal sex-specific molecular mechanisms and its applicability across tissues opens exciting avenues for future research, including the development of non-invasive diagnostic tools and a better understanding of AD pathogenesis.

Limitations

The study's reliance on post-mortem brain tissue limits the ability to establish direct causal relationships between gene expression and AD progression. The retrospective nature of the data also introduces potential confounding factors that cannot be fully controlled for. While the model's generalizability to mouse models and blood samples is encouraging, further validation in larger, prospective studies is warranted to confirm these findings across different populations and clinical settings. The black-box nature of deep learning models presents a challenge in fully understanding the biological mechanisms behind the model's predictions, although the integrated gradients method helps to address this limitation to some degree.

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Haploinsufficiency of the Parkinson’s disease gene synaptojanin1 is associated with abnormal responses to psychomotor stimulants and mesolimbic dopamine signaling

J. I. Mejaes, J. Saenz, et al.

Medicine and Health

DNA methylation and gene expression analysis in adipose tissue to identify new loci associated with T2D development in obesity

P. Baca, F. Barajas-olmos, et al.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Psychology

Dopamine release and dopamine-related gene expression in the amygdala are modulated by the gastrin-releasing peptide in opposite directions during stress-enhanced fear learning and extinction

Y. Morishita, I. Fuentes, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny