Predictive modeling in neuroimaging often suffers from data leakage, where information from test data inadvertently influences model training. This study investigates five leakage types in connectome-based machine learning across four datasets and three phenotypes. Feature selection and repeated subject leakage significantly inflate prediction performance, while other leakage types show minor effects. Small datasets exacerbate leakage's impact. The findings highlight the variability of leakage's effects and the importance of avoiding it for valid and reproducible results.
Publisher
Nature Communications
Published On
Feb 28, 2024
Authors
Matthew Rosenblatt, Link Tejavibulya, Rongtao Jiang, Stephanie Noble, Dustin Scheinost
Tags
predictive modeling
data leakage
neuroimaging
machine learning
feature selection
connectome
reproducibility
Related Publications
Explore these studies to deepen your understanding of the subject.