Medicine and Health

Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data

A. V. Hilten, J. V. Rooij, et al.

This groundbreaking research by Arno van Hilten, Jeroen van Rooij, M. Arfan Ikram, Wiro J. Niessen, Joyce B. J. van Meurs, and Gennady V. Roshchupkin introduces interpretable predictive models for multi-omics data using biologically informed neural networks. The study showcases high performance in predicting smoking status, with significant insights into genes involved. Discover how multi-omics approaches can surpass single-omics models in stability and generalizability!... show more

Abstract

Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90–1.00) and interpretability revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R² of 0.07 (95% CI: 0.05–0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97–6.35) years with the genes COL11A2, AP1G1, OUTD7A, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to alternative single omics models. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.

Publisher

Nature Communications

Published On

Jul 12, 2024

Authors

Arno van Hilten, Jeroen van Rooij, M. Arfan Ikram, Wiro J. Niessen, Joyce B. J. van Meurs, Gennady V. Roshchupkin

DOI

https://doi.org/10.1038/s41540-024-00405-w

Related Publications

Explore these studies to deepen your understanding of the subject.

Biology

Prediction of plant complex traits via integration of multi-omics data

P. Wang, M. D. Lehti-shiu, et al.

Medicine and Health

Integrative machine learning approaches for predicting disease risk using multi-omics data from the UK Biobank

O. Aguilar, C. Chang, et al.

Engineering and Technology

Graph neural networks for an accurate and interpretable prediction of the properties of polycrystalline materials

M. Dai, M. F. Demirel, et al.

Interdisciplinary Studies

Envisioning a "science diplomacy 2.0": on data, global challenges, and multi-layered networks

S. Turchetti and R. Lalli

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny