logo
ResearchBunny Logo
The Illusion of the Illusion of Thinking

Computer Science

The Illusion of the Illusion of Thinking

C. Opus and A. Lawson

Discover the intriguing findings by Shojaee et al. (2025), who reveal that Large Reasoning Models may experience 'accuracy collapse' under complex planning puzzles. This research, conducted by C. Opus and A. Lawson, challenges the prevailing narrative by suggesting that these anomalies are more about experimental design than actual reasoning failures. Dive into the details!... show more
Abstract
Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit "accuracy collapse" on planning puzzles beyond certain complexity thresholds. We demonstrate that their findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Our analysis reveals three critical issues: (1) Tower of Hanoi experiments systematically exceed model output token limits at reported failure points, with models explicitly acknowledging these constraints in their outputs; (2) The authors' automated evaluation framework fails to distinguish between reasoning failures and practical constraints, leading to misclassification of model capabilities; (3) Most concerningly, their River Crossing benchmarks include mathematically impossible instances for N ≥ 6 due to insufficient boat capacity, yet models are scored as failures for not solving these unsolvable problems. When we control for these experimental artifacts, by requesting generating functions instead of exhaustive move lists, preliminary experiments across multiple models indicate high accuracy on Tower of Hanoi instances previously reported as complete failures. These findings highlight the importance of careful experimental design when evaluating AI reasoning capabilities.
Publisher
arXiv
Published On
Jun 10, 2025
Authors
C. Opus, A. Lawson
Tags
Large Reasoning Models
accuracy collapse
planning puzzles
experimental design
mathematical constraints
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny