Computer Science

The Goldilocks paradigm: comparing classical machine learning, large language models, and few-shot learning for drug discovery applications

S. H. Snyder, P. A. Vignaux, et al.

This innovative research conducted by Scott H. Snyder, Patricia A. Vignaux, Mustafa Kemal Ozalp, Jacob Gerlach, Ana C. Puhl, Thomas R. Lane, John Corbett, Fabio Urbina, and Sean Ekins examines the optimal performance of machine learning models in drug discovery. Discover how dataset size and diversity create a 'Goldilocks zone' for SVR, FSLC, and transformer models.

00:00

Playback language: English

Index

Abstract

This paper explores the performance of classical machine learning (SVR), few-shot learning (FSLC), and transformer models (MolBART) in drug discovery applications across various dataset sizes and diversities. The authors find a 'Goldilocks zone' for each model type, where dataset size and diversity determine optimal algorithm choice. FSLC outperforms others with small datasets; transformers excel with small-to-medium, diverse datasets; and classical models perform best with large datasets.

Publisher

Communications Chemistry

Published On

Jun 12, 2024

Authors

Scott H. Snyder, Patricia A. Vignaux, Mustafa Kemal Ozalp, Jacob Gerlach, Ana C. Puhl, Thomas R. Lane, John Corbett, Fabio Urbina, Sean Ekins

Related Publications

Explore these studies to deepen your understanding of the subject.

Education

How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

A. Gilson, C. W. Safranek, et al.

Computer Science

Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models

J. Chen, Y. Zhang, et al.

Medicine and Health

Large language models streamline automated machine learning for clinical studies

S. T. Arasteh, T. Han, et al.

Computer Science

The Potential and Limitations of Large Language Models for Text Classification through Synthetic Data Generation

A. K. P. Venkata and L. Gudala

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny