logo
ResearchBunny Logo
Abstract
Materials datasets often contain redundant materials, skewing machine learning (ML) model performance evaluations. This paper surveys overestimated ML performance in materials science and proposes MD-HIT, a redundancy reduction algorithm. Applying MD-HIT to formation energy and band gap prediction, the study demonstrates that redundancy control leads to lower, but more realistic, performance evaluations.
Publisher
npj Computational Materials
Published On
Oct 18, 2024
Authors
Qin Li, Nihang Fu, Sadman Sadeed Omee, Jianjun Hu
Tags
materials datasets
machine learning
redundancy reduction
MD-HIT
performance evaluations
formation energy
band gap prediction
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny