logo
Loading...
LARGE LANGUAGE MODELS CANNOT SELF-CORRECT REASONING YET

Computer Science

LARGE LANGUAGE MODELS CANNOT SELF-CORRECT REASONING YET

J. Huang, X. Chen, et al.

This paper critically examines self-correction in Large Language Models, focusing on intrinsic self-correction—when a model tries to fix its own answers without external feedback. The authors find that LLMs often struggle to self-correct during reasoning and can even degrade after attempting fixes, and they offer directions for future research and applications. This research was conducted by Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, and Denny Zhou.... show more
Abstract
Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically examines the role and efficacy of self-correction within LLMs, shedding light on its true potential and limitations. Central to our investigation is the notion of intrinsic self-correction, whereby an LLM attempts to correct its initial responses based solely on its inherent capabilities, without the crutch of external feedback. In the context of reasoning, our research indicates that LLMs struggle to self-correct their responses without external feedback, and at times, their performance even degrades after self-correction. Drawing from these insights, we offer suggestions for future research and practical applications in this field.
Publisher
International Conference on Learning Representations (ICLR) 2024
Published On
Authors
Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, Denny Zhou
Tags
Large Language Models
self-correction
intrinsic self-correction
reasoning
external feedback
performance degradation
evaluation
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny