logo
ResearchBunny Logo
The Goldilocks paradigm: comparing classical machine learning, large language models, and few-shot learning for drug discovery applications

Computer Science

The Goldilocks paradigm: comparing classical machine learning, large language models, and few-shot learning for drug discovery applications

S. H. Snyder, P. A. Vignaux, et al.

This innovative research conducted by Scott H. Snyder, Patricia A. Vignaux, Mustafa Kemal Ozalp, Jacob Gerlach, Ana C. Puhl, Thomas R. Lane, John Corbett, Fabio Urbina, and Sean Ekins examines the optimal performance of machine learning models in drug discovery. Discover how dataset size and diversity create a 'Goldilocks zone' for SVR, FSLC, and transformer models.

00:00
00:00
Playback language: English
Abstract
This paper explores the performance of classical machine learning (SVR), few-shot learning (FSLC), and transformer models (MolBART) in drug discovery applications across various dataset sizes and diversities. The authors find a 'Goldilocks zone' for each model type, where dataset size and diversity determine optimal algorithm choice. FSLC outperforms others with small datasets; transformers excel with small-to-medium, diverse datasets; and classical models perform best with large datasets.
Publisher
Communications Chemistry
Published On
Jun 12, 2024
Authors
Scott H. Snyder, Patricia A. Vignaux, Mustafa Kemal Ozalp, Jacob Gerlach, Ana C. Puhl, Thomas R. Lane, John Corbett, Fabio Urbina, Sean Ekins
Tags
machine learning
drug discovery
SVR
FSLC
transformer models
dataset size
optimal performance
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny