logo
ResearchBunny Logo
Abstract
Predictive modeling in neuroimaging often suffers from data leakage, where information from test data inadvertently influences model training. This study investigates five leakage types in connectome-based machine learning across four datasets and three phenotypes. Feature selection and repeated subject leakage significantly inflate prediction performance, while other leakage types show minor effects. Small datasets exacerbate leakage's impact. The findings highlight the variability of leakage's effects and the importance of avoiding it for valid and reproducible results.
Publisher
Nature Communications
Published On
Feb 28, 2024
Authors
Matthew Rosenblatt, Link Tejavibulya, Rongtao Jiang, Stephanie Noble, Dustin Scheinost
Tags
predictive modeling
data leakage
neuroimaging
machine learning
feature selection
connectome
reproducibility
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny