Medicine and Health
Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data
A. V. Hilten, J. V. Rooij, et al.
This groundbreaking research by Arno van Hilten, Jeroen van Rooij, M. Arfan Ikram, Wiro J. Niessen, Joyce B. J. van Meurs, and Gennady V. Roshchupkin introduces interpretable predictive models for multi-omics data using biologically informed neural networks. The study showcases high performance in predicting smoking status, with significant insights into genes involved. Discover how multi-omics approaches can surpass single-omics models in stability and generalizability!
~3 min • Beginner • English
Introduction
The study addresses the challenge of integrating multiple omics layers (e.g., gene expression and DNA methylation) to improve phenotype prediction and provide biological insight. While numerous loci and CpGs have been associated with complex traits, the joint effects within and across omics remain underexplored. Multi-omics integration is complicated by differences in preprocessing, dimensionality, and correlation structures across data types. Visible machine learning embeds prior biological knowledge (genes, pathways) within neural network architectures to enhance interpretability alongside predictive performance. This work evaluates whether biologically informed, interpretable neural networks can accurately predict smoking status, age, and LDL levels across multiple cohorts, and whether multi-omics integration improves generalizability and stability of interpretations.
Literature Review
Recent developments in multi-omics integration include statistical and machine learning frameworks that combine diverse omics into single analyses. Visible machine learning approaches (e.g., GenNet, P-net) incorporate gene and pathway annotations to achieve interpretability, but interpretation stability can be sensitive to weight initialization. Other biologically informed networks include PasNet (pathway-informed survival prediction), DrugCliP (Gene Ontology-informed drug response), and ParsVNN (pruned visible networks for improved performance). Knowledge-primed neural networks and multi-task attention models (e.g., MOMA) also demonstrate integrative capabilities. These works highlight the promise and challenges of interpretable deep learning for genomics, motivating robust, biologically guided architectures that quantify contributions at omic, gene, and pathway levels.
Methodology
Data: Multi-omics data from the BIOS consortium across four cohorts (Lifelines, Leiden Longevity Study, Netherlands Twin Register, Rotterdam Study; total N=2940). Methylation profiled on the Illumina 450K array; RNA-seq for gene expression. Y chromosome excluded; X-chromosome and autosomes included. Expression filtered at ≥1 count per million on average across samples. CpGs were annotated to nearest genes using GREAT. After preprocessing, 14,248 expression genes remained; intersecting with methylation annotations yielded 10,404 overlapping genes used for all models. The methylation layer encompassed 324,295 CpGs mapped to these genes. Pathway annotations were derived from KEGG via ConsensusPathDB.
Network architectures: Built upon the GenNet framework with prior knowledge dictating connectivity. Three base variants: (1) ME (methylation-only): CpG inputs aggregated to gene-level nodes, directly connected to output. (2) GE (gene expression-only): expression inputs at gene level to output. (3) ME+GE (multi-omics): for each gene, methylation-derived node and expression input merged into a combined gene representation before the output. Activation functions: ArcTan for hidden layers; sigmoid at output for classification. For regression tasks, linear output with MSE loss; for classification, weighted binary cross-entropy.
Deeper networks: Added three hierarchical KEGG pathway layers (321, 44, 6 nodes) atop the gene layer to model multi-level functional structure; unannotated genes used skip connections directly to output. A comparable deeper baseline replaced pathway layers with fully-connected layers (321, 44, 6 nodes) without biological priors.
Regularization and omic-specific penalties: L1 penalties applied to weights to enforce sparsity and enhance interpretability. Additional omic-specific L1 penalties allowed assessment of unique contributions from methylation vs expression by penalizing one modality.
Covariates: Evaluated two strategies: (a) covariates as a late fusion layer, and (b) covariates injected at each gene node to probe covariate–gene interactions.
Training and evaluation: Cohort-wise cross-validation (leave-one-cohort-out). In each fold, three cohorts split into 75% train and 25% validation; the held-out cohort served as test. Hyperparameters tuned on validation: learning rates {1e-3, 5e-4, 1e-4, 1e-5}; L1 penalties {0.01, 0.001, 0.0001} for gene and/or methylation weights. Best validation configuration retrained and evaluated 10 times with different random seeds to assess stability. Metrics: AUC for classification; RMSE and explained variance (sklearn explained_variance_score) for regression.
Interpretation: Contribution scores computed from absolute-weight path products: for each input-to-output path, multiply absolute weights; sum over all paths to compute an input’s total contribution; normalize by total contribution to obtain percentages. Hidden node contributions computed by propagating along paths to the node and normalizing within-layer. Activation analyses: PCA on per-individual activations at the gene layer to identify subgroups and latent factors (e.g., sex effects).
Baselines: A traditional neural network with a locally connected 1D layer per omic followed by two dense layers was used for comparison. Additional analyses included deeper architectures with/without pathways and models with covariate integration.
Key Findings
- Smoking status: Multi-omics visible networks (ME+GE) achieved high and consistent performance across cohorts with overall mean AUC ≈ 0.95 (95% CI ranges reported from 0.90–1.00 overall; 0.93–0.98 in validation). Single-omic models performed worse (ME ~0.85 AUC; GE ~0.80 AUC). Interpretability highlighted well-established smoking-related genes including AHRR, GPR15, and LRRN3. Omic-specific L1 penalties showed that penalizing methylation reduced reliance on methylation weights; nevertheless, expression sometimes contributed uniquely for top genes, indicating complementary modalities.
- Age: The ME+GE network inferred age with a mean error of 5.16 years (95% CI: 3.97–6.35). Performance varied by test cohort due to differing age distributions; explained variance for ME+GE averaged around 0.72 across folds, exceeding GE (~0.30) and ME (poor generalization). PCA of activations revealed strong separation by sex along PC1 (driven by X-chromosome signals) and age-related variation along PC2. Gene-level contributions were diffuse, suggesting multifactorial patterns with genes such as COL11A2, AP1G1, OUTD7A, ADARB2, and CD34 repeatedly implicated.
- LDL levels: Training/validation R² up to ~0.17 was observed, but generalization to test cohorts was limited; only one fold showed modest test R² of 0.07 (95% CI: 0.05–0.08) for ME+GE (GE ~0.04). Gene weight distributions were small and diffuse (e.g., top gene FAM53A accounted for ~0.052% of total weight), indicating absence of singular strong predictors from blood-derived omics. Adding pathway or dense layers did not improve performance.
- Multi-omics advantage and stability: Across regression tasks, ME+GE improved performance, stability, and generalizability versus single-omic networks. Deeper networks did not consistently outperform shallower visible networks within the explored hyperparameter/data regime. Interpretation stability varied with random initialization, motivating multiple-seed training for robust insights.
Discussion
Embedding biological priors at gene and pathway levels enabled interpretable neural networks that both predicted phenotypes and exposed contributing omics, genes, and pathways. For smoking, strong methylation and expression signatures produced robust, high AUC across cohorts, with interpretable drivers matching established biology (e.g., AHRR, GPR15, LRRN3), thereby validating the visible network approach. For age, multi-omics integration improved explained variance and generalization beyond single-omic models, showing that complementary signals across methylation and expression enhance stability; activation analyses clarified that sex exerts a dominant latent factor while age contributes along a secondary axis. LDL prediction highlighted limits of blood-based transcriptome and methylome signals for this trait and the challenge of cross-cohort generalization when effect sizes are weak. Overall, the findings support the hypothesis that visible multi-omics networks enhance predictive performance and interpretability, but also reveal sensitivity of interpretations to random initialization and cohort distributional shifts. Proper regularization (e.g., L1) and careful model selection are crucial for meaningful interpretation.
Conclusion
This work extends the GenNet framework to a multi-omics, biologically interpretable neural network that merges methylation and expression at the gene level and can incorporate pathway hierarchies and covariates. Evaluated across four BIOS cohorts and three phenotypes, the approach achieved high accuracy for smoking, improved age prediction versus single-omic models, and limited yet informative performance for LDL. The architecture yields gene- and pathway-level contribution scores, offering mechanistic insights alongside predictions. Multi-omics integration generally improved performance, stability, and cross-cohort generalization, whereas deeper architectures did not confer benefits under the current data and hyperparameter settings. Future work should pursue richer and tissue-specific annotations (e.g., ENCODE), broader hyperparameter searches, larger training datasets, external validations, additional omics layers, and strategies to enhance interpretation stability across random seeds.
Limitations
- Interpretation stability: Contribution profiles varied across random seeds; multiple-seed training is recommended to capture robust signals.
- Generalization: Age and especially LDL predictions showed variable cross-cohort performance, influenced by cohort-specific distributions (e.g., age range) and weak trait–omics associations in blood.
- Data/annotation constraints: CpG-to-gene mapping via genomic proximity and KEGG-based pathways may omit tissue-specific and functional context; improved priors (e.g., ENCODE) could enhance interpretability and performance.
- Model complexity: Deeper networks did not outperform shallower models, possibly due to limited data size or suboptimal hyperparameters; more data and extensive tuning may be needed.
- Causality: Predictive genes/pathways reflect association and model usage, not necessarily causal mechanisms; effects may be mediated.
- Cohort composition: Class exclusions (e.g., former smokers), sex effects (X-chromosome signals), and differing age distributions may bias learned patterns and limit generalizability.
Related Publications
Explore these studies to deepen your understanding of the subject.

