Computer ScienceHarvard Data Science Review
Confidence in the Reasoning of Large Language Models
Y. Pawitan and C. Holmes
The research was conducted by Yudi Pawitan and Chris Holmes. It assesses LLM confidence—qualitatively by persistence when prompted to reconsider and quantitatively by self-reported scores—across GPT4o, GPT4-turbo, and Mistral on causal judgment, formal fallacies, and probability puzzles. Findings show performance above chance but variable answer stability, a strong tendency to overstate confidence, and a lack of internally coherent confidence signals.
Related Publications
Explore these studies to deepen your understanding
Adjacent work that informs or extends this paper's methodology and findings.
Psychology
Understanding the Role of Large Language Models in Personalizing and Scaffolding Strategies to Combat Academic Procrastination
A. Bhattacharjee, Y. Zeng, et al.
Computer Science
Sentiment Analysis in the Era of Large Language Models: A Reality Check
W. Zhang, Y. Deng, et al.
Computer Science
Evaluating the capacity of large language models to interpret emotions in images
H. Alrasheed, A. Alghihab, et al.
Interdisciplinary Studies
Analyzing Memory Effects in Large Language Models through the Lens of Cognitive Psychology
Z. Cao, L. Schooler, et al.

