This paper explores the performance of classical machine learning (SVR), few-shot learning (FSLC), and transformer models (MolBART) in drug discovery applications across various dataset sizes and diversities. The authors find a 'Goldilocks zone' for each model type, where dataset size and diversity determine optimal algorithm choice. FSLC outperforms others with small datasets; transformers excel with small-to-medium, diverse datasets; and classical models perform best with large datasets.
Publisher
Communications Chemistry
Published On
Jun 12, 2024
Authors
Scott H. Snyder, Patricia A. Vignaux, Mustafa Kemal Ozalp, Jacob Gerlach, Ana C. Puhl, Thomas R. Lane, John Corbett, Fabio Urbina, Sean Ekins
Tags
machine learning
drug discovery
SVR
FSLC
transformer models
dataset size
optimal performance
Related Publications
Explore these studies to deepen your understanding of the subject.