Computer Science

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

C. Si, D. Yang, et al.

This innovative study by Chenglei Si, Diyi Yang, and Tatsunori Hashimoto explores how large language models generate cutting-edge research ideas. Surprisingly, the ideas produced by LLMs were rated significantly more novel than those from human experts, despite some concerns regarding feasibility. Dive into the findings of this intriguing research!... show more

General Summary Metrics

Abstract

Recent advancements in large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery, with a growing number of works proposing research agents that autonomously generate and validate new ideas. Despite this, no evaluations have shown that LLM systems can take the very first step of producing novel, expert-level ideas, let alone perform the entire research process. We address this by establishing an experimental design that evaluates research idea generation while controlling for confounders and performs the first head-to-head comparison between expert NLP researchers and an LLM ideation agent. By recruiting over 100 NLP researchers to write novel ideas and blind reviews of both LLM and human ideas, we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility. Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. Finally, we acknowledge that human judgements of novelty can be difficult, even by experts, and propose an end-to-end study design which recruits researchers to execute these ideas into full projects, enabling us to study whether these novelty and feasibility judgements result in meaningful differences in research outcome.

Publisher

Published On

Authors

Chenglei Si, Diyi Yang, Tatsunori Hashimoto

DOI

https://doi.org/10.48550/arxiv.2409.04109

Explore these studies to deepen your understanding

Adjacent work that informs or extends this paper's methodology and findings.

Interdisciplinary Studies

Does large-scale research infrastructure affect regional knowledge innovation, and how? A case study of the National Supercomputing Center in China

H. Yang, L. Liu, et al.

Physics

An fMRI study of scientists with a Ph.D. in physics confronted with naive ideas in science

G. Allaire-duquette, L. B. Foisy, et al.

Medicine and Health

Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions

O. R. Sarrias, M. P. M. D. Prado, et al.

Agriculture

Soil microbiome indicators can predict crop growth response to large-scale inoculation with arbuscular mycorrhizal fungi

S. Lutz, N. Bodenhausen, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 22+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny