Computer ScienceHarvard Data Science Review

Confidence in the Reasoning of Large Language Models

Y. Pawitan and C. Holmes

The research was conducted by Yudi Pawitan and Chris Holmes. It assesses LLM confidence—qualitatively by persistence when prompted to reconsider and quantitatively by self-reported scores—across GPT4o, GPT4-turbo, and Mistral on causal judgment, formal fallacies, and probability puzzles. Findings show performance above chance but variable answer stability, a strong tendency to overstate confidence, and a lack of internally coherent confidence signals.... show more

Related Publications

Explore these studies to deepen your understanding

Adjacent work that informs or extends this paper's methodology and findings.

Computer Science

Sentiment Analysis in the Era of Large Language Models: A Reality Check

W. Zhang, Y. Deng, et al.

Interdisciplinary Studies

Analyzing Memory Effects in Large Language Models through the Lens of Cognitive Psychology

Z. Cao, L. Schooler, et al.

Psychology

Understanding the Role of Large Language Models in Personalizing and Scaffolding Strategies to Combat Academic Procrastination

A. Bhattacharjee, Y. Zeng, et al.

Computer Science

Evaluating the capacity of large language models to interpret emotions in images

H. Alrasheed, A. Alghihab, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 22+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny