logo
ResearchBunny Logo
Leveraging large language models for predictive chemistry

Chemistry

Leveraging large language models for predictive chemistry

K. M. Jablonka, P. Schwaller, et al.

This groundbreaking research conducted by Kevin Maik Jablonka, Philippe Schwaller, Andres Ortega-Guerrero, and Berend Smit demonstrates how GPT-3 can revolutionize chemistry and materials science tasks, outperforming traditional machine learning in low-data scenarios. The model's capacity for inverse design and ease of use holds transformative potential for these fields.... show more
Abstract
Machine learning has transformed many fields and has recently found applications in chemistry and materials science. The small datasets commonly found in chemistry sparked the development of sophisticated machine learning approaches that incorporate chemical knowledge for each application and, therefore, require specialized expertise to develop. Here we show that GPT-3, a large language model trained on vast amounts of text extracted from the Internet, can easily be adapted to solve various tasks in chemistry and materials science by fine-tuning it to answer chemical questions in natural language with the correct answer. We compared this approach with dedicated machine learning models for many applications spanning the properties of molecules and materials to the yield of chemical reactions. Surprisingly, our fine-tuned version of GPT-3 can perform comparably to or even outperform conventional machine learning techniques, in particular in the low-data limit. In addition, we can perform inverse design by simply inverting the questions. The ease of use and high performance, especially for small datasets, can impact the fundamental approach to using machine learning in the chemical and material sciences. In addition to a literature search, querying a pre-trained large language model might become a routine way to bootstrap a project by leveraging the collective knowledge encoded in these foundation models, or to provide a baseline for predictive tasks.
Publisher
Nature Machine Intelligence
Published On
Feb 06, 2024
Authors
Kevin Maik Jablonka, Philippe Schwaller, Andres Ortega-Guerrero, Berend Smit
Tags
GPT-3
chemistry
materials science
machine learning
inverse design
natural language processing
low-data scenarios
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny