Computer Science

LARGE LANGUAGE MODELS CANNOT SELF-CORRECT REASONING YET

J. Huang, X. Chen, et al.

This paper critically examines self-correction in Large Language Models, focusing on intrinsic self-correction—when a model tries to fix its own answers without external feedback. The authors find that LLMs often struggle to self-correct during reasoning and can even degrade after attempting fixes, and they offer directions for future research and applications. This research was conducted by Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, and Denny Zhou.... show more

Abstract

Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically examines the role and efficacy of self-correction within LLMs, shedding light on its true potential and limitations. Central to our investigation is the notion of intrinsic self-correction, whereby an LLM attempts to correct its initial responses based solely on its inherent capabilities, without the crutch of external feedback. In the context of reasoning, our research indicates that LLMs struggle to self-correct their responses without external feedback, and at times, their performance even degrades after self-correction. Drawing from these insights, we offer suggestions for future research and practical applications in this field.

Publisher

International Conference on Learning Representations (ICLR) 2024

Published On

Authors

Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, Denny Zhou

DOI

https://doi.org/10.48550/arXiv.2310.01798

Related Publications

Explore these studies to deepen your understanding of the subject.

Computer Science

Confidence in the Reasoning of Large Language Models

Y. Pawitan and C. Holmes

Computer Science

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

J. Liu, S. Chunqiu, et al.

Computer Science

AI-AI bias: Large language models favor communications generated by large language models

W. Laurito, B. Davis, et al.

Linguistics and Languages

DISSOCIATING LANGUAGE AND THOUGHT IN LARGE LANGUAGE MODELS

K. Mahowald, I. A. Blank, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny