Medicine and Health

Data leakage inflates prediction performance in connectome-based machine learning models

M. Rosenblatt, L. Tejavibulya, et al.

This research by Matthew Rosenblatt, Link Tejavibulya, Rongtao Jiang, Stephanie Noble, and Dustin Scheinost delves into the critical issue of data leakage in neuroimaging predictive modeling. By examining five types of leakage across four datasets, the study unveils how feature selection and repeated subject leakage can dramatically skew prediction outcomes, particularly in smaller datasets. Discover the nuances of leakage's impact and its significance for achieving valid results!

00:00

Playback language: English

Index

Abstract

Predictive modeling in neuroimaging often suffers from data leakage, where information from test data inadvertently influences model training. This study investigates five leakage types in connectome-based machine learning across four datasets and three phenotypes. Feature selection and repeated subject leakage significantly inflate prediction performance, while other leakage types show minor effects. Small datasets exacerbate leakage's impact. The findings highlight the variability of leakage's effects and the importance of avoiding it for valid and reproducible results.

Publisher

Nature Communications

Published On

Feb 28, 2024

Authors

Matthew Rosenblatt, Link Tejavibulya, Rongtao Jiang, Stephanie Noble, Dustin Scheinost

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Network-based machine learning in colorectal and bladder organoid models predicts anti-cancer drug efficacy in patients

J. Kong, H. Lee, et al.

Psychology

Building machine learning prediction models for well-being using predictors from the exposome and genome in a population cohort

D. H. M. Pelt, P. C. Habets, et al.

Medicine and Health

Improved metabolomic data-based prediction of depressive symptoms using nonlinear machine learning with feature selection

Y. Takahashi, M. Ueki, et al.

Education

Driving STEM learning effectiveness: dropout prediction and intervention in MOOCs based on one novel behavioral data analysis approach

X. Xia and W. Qi

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny