logo
ResearchBunny Logo
Modeling transcriptomic age using knowledge-primed artificial neural networks

Medicine and Health

Modeling transcriptomic age using knowledge-primed artificial neural networks

N. Holzscheck, C. Falckenhayn, et al.

Discover the groundbreaking age clock developed by Nicholas Holzscheck and his team, which accurately predicts age from biological data while revealing aging pathways. This innovative machine learning model goes beyond traditional methods by providing insights into accelerated aging conditions and longevity interventions.

00:00
00:00
~3 min • Beginner • English
Introduction
Recent advances in high-throughput molecular profiling and machine learning have accelerated biomarker discovery in aging research. Epigenetic clocks using DNA methylation are highly accurate, and clocks based on transcriptomic, proteomic, and metabolic data have also emerged. Yet, interpretability has often been neglected, even though gene expression and metabolites are conceptually closer to phenotype. The authors argue that increasing interpretability will enhance utility, especially in applied settings (e.g., human cell culture screening) where mechanistic insight is valuable. They propose a knowledge-primed artificial neural network (ANN) that embeds gene-to-pathway annotations into the network architecture. By restricting connections according to pathway membership, the ANN learns pathway-based representations, enabling monitoring of pathway aging states via neuron activations. They trained such a pathway-based ANN on RNA-seq data from 887 epidermal skin samples (aged 30–89) from the population-based SHIP-TREND cohort, leveraging skin’s suitability for aging studies and the opportunity to study extrinsically accelerated aging (photoaging). The unbiased cohort design supports investigation of natural aging progression.
Literature Review
Methodology
Cohort and samples: RNA-seq data were generated from 887 epidermal suction blister samples (7 mm) collected from the SHIP-TREND cohort (ages 30–89). Samples were split 70/30 into training (n=620–640, reported as 640 in text; 620 in parameter table) and test (n=267) sets. Ethical approvals were obtained; all participants gave informed consent. Nucleic acid extraction and RNA sequencing: Tissue was homogenized; RNA extracted using Qiagen RNeasy Fibrous Tissue Mini Kit. Libraries were prepared with Illumina TruSeq and sequenced 1x50 bp on HiSeq to ~100M reads/sample. QC and processing: FastQC 0.11.7 for QC, Trimmomatic 0.36 for trimming, Salmon 0.8.1 for quasi-mapping to GRCh38 and quantification (TPM). Knowledge-primed neural network architecture: Implemented in Keras/TensorFlow using R 3.6.1. Pathway prior: Hallmark collection (50 gene sets) from MSigDB used to create a binary gene×pathway filter matrix mapping input genes (n=4359) to pathways. Architecture: Input gene layer → 4 hidden pathway-centric layers → final linear pathway output layer (one neuron per pathway, 50 neurons; auxiliary output) → main output neuron (age prediction). Within hidden layers, neurons within the same pathway are densely connected; no inter-pathway connections to preserve interpretability. Pathway layer sizes scaled by pathway gene count: neurons per hidden layer = 5 + (number of genes)/f with f=2. Dropout (rate 0.1) between hidden layers and L2 weight decay (0.01) applied. Activation: ELU; He initialization. Loss and training: Joint loss combining main output MSE and auxiliary pathway output MSE: loss = (1−alpha)·MSE_main + alpha·MSE_auxiliary with alpha=0.4. This enforces training across all pathways and yields positively scaled pathway “ages.” Optimizer: Adam (lr=0.001). Mini-batch size 16. Epochs: 200. Total trainable parameters ~1,740,858. Ensemble learning: Ten separately trained networks (same inputs/architecture) were combined into a stacked ensemble by sharing input and averaging main and auxiliary outputs across models to improve accuracy and reproducibility. Comparison model: A fully connected ANN ensemble (comparable parameter count ~1,789,301; hidden layers [350,350,350,50]) was trained with identical optimizer, regularization, epochs, and data split to assess the accuracy/interpretability trade-off. Visual age assessment: Standardized frontal, non-polarized, color-controlled portrait images were collected for 154 randomly sampled test-set subjects. A blinded panel of 31 experts estimated age from images; estimates were averaged per subject. Linear models in R tested association between transcriptomic age and visual age, adjusting for chronological age and gender. In silico perturbation experiments: Single-gene knockdowns (log2 fold-change −2) and overexpression (as specified) were simulated by altering expression for all test samples and comparing predicted ages to baseline per sample. Pathway effects were assessed via auxiliary pathway neuron outputs. For complex signatures, significantly differentially expressed genes (FDR<0.05) were perturbed by their reported log2 fold-changes. Signatures included: photoaging, Hutchinson-Gilford progeria syndrome (HGPS), replicative senescence, actinic keratosis (AK), cutaneous squamous cell carcinoma (SCC) (human skin), and caloric restriction (CR) in rat (skin, liver, fat, brain); rat genes were mapped to human homologs using biomaRt. Significance was tested with one-sample Wilcoxon rank-sum tests versus 0 median effect, with Holm–Bonferroni correction where applicable. A t-SNE embedding was computed from pathway activation changes across all gene knockdowns to visualize the aging pathway landscape. General analysis and visualization used R packages data.table, dplyr, ggplot2, ggpubr.
Key Findings
- Accuracy: The pathway-based ensemble achieved a median absolute error (MAE) of 4.7 years on the independent test set (n=267). A comparable fully connected “black box” model slightly outperformed it (MAE 4.4 years), indicating a small trade-off between interpretability and precision. - Phenotypic association: Transcriptomic age was significantly associated with visual age estimates from a blinded expert panel after adjusting for chronological age and gender (p=0.016), supporting biological relevance to visible skin aging. - Pathway landscape of aging: Pathway neuron activations increased with age. p53 signaling and TNFα/NFκB signaling showed the highest correlations with chronological age among Hallmark pathways, though many pathways were significantly age-associated versus a random control pathway, indicating a global transcriptomic aging effect. The pancreas beta-cell gene set showed low correlation, consistent with limited relevance to skin. - Single-gene perturbations recapitulate known biology: In silico knockdown (log2 FC −2) of SIRT1 increased predicted age; TXNIP knockdown decreased predicted age; SERPINE1 knockdown decreased predicted age; KLF4 knockdown increased predicted age—each consistent with literature on lifespan/aging phenotypes. Continuous HK2 modulation showed that overexpression reduced predicted age (rejuvenating), aligning with reports of age-related HK2 decline in skin. - Genome-wide knockdown screen: Simulated knockdowns across all modeled genes yielded approximately symmetric age increases and decreases ranging from about +1 to −0.5 years. Top-impact genes included known aging markers (SERPINE1, IGFBP3, CDKN2A, TIMP1) and candidates such as HK2. High-impact genes tended to influence multiple pathways; their knockdowns altered at least two distinct pathway neurons, suggesting emergent emphasis on master regulators. - Complex signature perturbations: • HGPS: Strongly accelerated predicted ages (often >1200 years), reflecting a transcriptomic state far beyond natural aging. The most affected pathway was epithelial–mesenchymal transition (EMT), with additional strong effects on proteostasis/protein secretion, immune signaling, and estrogen response. • Photoaging: Increased predicted age by ~2.1 years on average. Pathways shifted toward older states included ROS, Wnt and KRAS signaling, and glycolysis; younger shifts were seen in G2 damage checkpoint and estrogen response. • Replicative senescence: Increased predicted age by >100 years on average, consistent with senescence as a hallmark of aging and its strong in vitro aging signature. • AK and SCC: Both signatures substantially increased predicted age (~+54 and +52 years, respectively). Pathway activation patterns were highly correlated between AK and SCC, with stronger deregulation in IL6–JAK–STAT signaling, immune pathways, and coagulation. Senescence-related genes (e.g., SERPINE1) showed higher activation in AK, while immune and JAK–STAT alterations were stronger in SCC, consistent with tumor immune evasion and STAT3 roles. • Caloric restriction (CR): Human-skin-predicted rejuvenation averaged ~−0.2 years (significant in subjects >50 years). Liver and fat signatures predicted larger rejuvenation (−0.4 and −1.5 years), while the brain signature predicted a small age acceleration (~+0.4 years). Rejuvenated pathways under CR included ROS/peroxisome-related processes, mTOR signaling, and metabolism; skin CR also showed a rejuvenated p53 signaling profile.
Discussion
The knowledge-primed ANN age clock achieves accuracy comparable to prior transcriptomic clocks while providing interpretable, pathway-level insights. The global age associations across many pathways support the deleteriome hypothesis that aging arises from the accumulation of numerous small detrimental changes rather than a single master driver. The model recapitulates known gene-level aging associations (e.g., SIRT1, TXNIP, SERPINE1, KLF4) and identifies candidate targets such as HK2. The emphasis on genes influencing multiple pathways suggests the architecture naturally prioritizes master regulators and broad effectors, aligning with biological intuition. Complex perturbation analyses demonstrate the model’s ability to decode mechanisms of accelerated aging and interventions: HGPS profoundly shifts EMT and related pathways; photoaging prominently impacts ROS and metabolic processes; senescence exerts strong acceleration; and CR yields tissue-specific rejuvenation via ROS reduction, peroxisome function, mTOR modulation, and metabolic remodeling, with skin-specific benefits potentially involving p53 signaling. Differences between AK and SCC pathway perturbations (notably IL6–JAK–STAT and immune pathways) align with known tumorigenic mechanisms and suggest hypotheses for AK-to-SCC progression heterogeneity. Overall, embedding biological priors into ANN architecture ties learned representations to interpretable pathway states, enabling mechanistic exploration alongside age prediction.
Conclusion
This study introduces an interpretable, pathway-guided neural age clock that predicts age from skin transcriptomes with competitive accuracy while revealing pathway-level aging states. By incorporating Hallmark pathway priors into the network architecture and leveraging an auxiliary pathway output, the model connects predictions to specific biological processes, facilitating hypothesis generation and mechanistic insight. It validates known gene and pathway associations with aging, highlights candidate targets (e.g., HK2), and deciphers mechanisms underlying accelerated aging (HGPS, photoaging, senescence) and pro-longevity interventions (caloric restriction). Although slightly less precise than fully connected models, the substantial gain in interpretability enhances utility for research and translational applications. Future work could extend this framework to other tissues and multi-omics, refine pathway annotations, and explore causal inference and intervention design using interpretable deep learning.
Limitations
- Tissue specificity: The model is trained on epidermal skin transcriptomes; findings and predictions may not generalize to other tissues without retraining. - Accuracy trade-off: Slight reduction in predictive accuracy compared to fully connected neural networks (MAE 4.7 vs 4.4 years) reflects a transparency–precision trade-off. - Out-of-distribution predictions: Extreme conditions (e.g., HGPS, in vitro senescence) can produce age estimates far outside the natural training range, reflecting underlying pathophysiology and model extrapolation rather than literal ages. - Cross-species signature translation: CR signatures derived from rat tissues were mapped to human homologs; tissue- and species-specific regulatory differences may introduce translation errors. - Data access constraints: Primary RNA-seq data are controlled-access due to privacy legislation, potentially limiting external replication. - Model training details: Reported minor inconsistencies in training sample counts (620 vs 640) in tables/text; however, overall split was 70/30 with test set n=267. - Benchmarking: Comparisons to other clocks are indirect; no head-to-head benchmarking against DNA methylation clocks was performed on the same samples.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny