Psychology
Why do we think? The dynamics of spontaneous thought reveal its functions
J. N. Mildner and D. I. Tamir
The study asks why humans devote substantial cognitive resources to spontaneous thought (mind wandering, daydreaming, creative ideation). It tests two hypotheses about its function: (1) memory optimization—spontaneous thought prioritizes episodic details that facilitate memory consolidation and semantic abstraction; and (2) current concerns—spontaneous thought prioritizes content relevant to ongoing goal pursuits. Leveraging parallels with semantic fluency and memory replay, the authors evaluate whether the timing of topic jumps in free thought reveals prioritization of episodic detail and goal-relevant content.
Prior work indicates a large fraction of waking thought is spontaneous and episodic in nature. Neuroscience research shows that memory replay—reactivation of hippocampal traces during rest—supports consolidation of episodic and semantic information. Spontaneous thought occurs at rest and shares neural dynamics with replay, suggesting a role in memory optimization via variable, recombinatory sequences that decorrelate episodes and support abstraction. Concurrently, theories of current concerns posit that spontaneous thought frequently features goal-relevant content; cues and importance of goals increase future-oriented and goal-related thinking. Experience-sampling shows a high base rate of goal relevance, and resting-state default network activity relates to subsequent self-referential performance. Together, these literatures motivate testing whether spontaneous thought dynamics prioritize episodic detail (supporting memory) and goal relevance (supporting goal pursuit).
Design: Observational analysis of spontaneous thought using a 2-minute Think Aloud task (spoken or typed) collected online across three data collections (April 2020–May 2021). Participants reported their ongoing thoughts continuously without constraints. Participants: Initially 1,679 participants provided 3,359 responses. After predefined exclusions (word-count thresholds, audio quality/silence, non-English/gibberish/copied text/background speech), 1,524 participants contributed 2,901 responses. Demographics: mean age 39.8 (range 19–79); 56.6% women; 1.3% genderqueer/nonbinary/other; 65.7% White, 15.9% Asian/Asian American, 14.1% Black, 7.0% Latinx, 1.0% Native American. Recruitment sources: Princeton lists/social media (volunteer; n=371; 532 responses), University of Chicago CDR (paid; n=285; 488 responses), Prolific (paid; n=1,023; 2,311 responses; sample matched to U.S. census for sex, age, ethnicity). Data acquisition: Audio was transcribed with OpenAI Whisper; validation against human-edited Temi transcripts showed a low word error rate (8%). Participants could alternatively type their thoughts. Of 3,208 included responses at an intermediate step, 819 were audio (mean 242.73 words, SD 79.08) and 2,389 were written (mean 98.15 words, SD 49.74). Text processing pipeline: (1) Transcription (Whisper). (2) Identification of units of thought via automated parsing of independent clauses using spaCy; coordinating conjunctions with accompanying verbs defined independent clauses as minimal units. Validation: human coders on 132 transcripts; human–human Fleiss’ kappa = 0.58; human–automated kappa = 0.60 for boundary similarity, indicating comparable reliability. (3) Topic identification via hierarchical clustering (scikit-learn) to assign units to topics and detect topic jumps (defined as significant changes in time, place, and/or situation). Validation on 91 transcripts: human–human kappa = 0.41; human–automated kappa = 0.30. Behavioral structure: Participants produced on average 10.82 units of thought (SD 7.78) across 5.6 topics (SD 4.31), with 3.04 thoughts per topic (SD 1.63). Measures:
- Episodic detail: Automated scoring tool for the Autobiographical Interview identified episodic (internal) and nonepisodic (external) details per sentence (van Genugten & Schacter, 2024). Proportion episodic detail = episodic details / total details for each thought unit.
- Current concern relevance: COVID-19 used as a universal current concern during collection period. A curated dictionary of pandemic-related words yielded percent pandemic-related words per unit. Participants rated COVID-19 concern on a 7-point scale. Modeling: Mixed linear models tested effects of thought position within a topic on (a) episodic detail and (b) pandemic-related content. Thought position coded relative to a topic jump (0 = at jump; positive values after, negative values before). Models included random intercepts for data collection and participants; no random slopes to avoid overfitting. For current concerns, an additional model included fixed effects of thought position, COVID-19 concern, and their interaction.
- Memory optimization (episodic detail): Thought position significantly predicted episodic detail. After topic jumps, episodic detail was highest and decreased with each subsequent thought (B = −0.019, SE = 0.005, P < 0.001). Leading up to a topic jump, episodic detail decreased, with the lowest level immediately before the jump (B = −0.022, SE = 0.005, P < 0.001). This indicates that topic jumps occur as episodic detail wanes, consistent with prioritizing episodic detail (memory optimization).
- Current concerns (pandemic relevance): After topic jumps, pandemic-related content was highest and decreased over subsequent thoughts (β = −0.080, SE = 0.005, P < 0.001). Contrary to predictions, leading up to a topic jump pandemic-related content increased (β = 0.064, SE = 0.005, P < 0.001), rather than decreasing.
- Moderation by COVID-19 concern: Greater individual concern amplified position effects after the jump (interaction β = −0.034, SE = 0.014, P = 0.015), showing steeper decreases. No significant interaction before the jump (β = 0.011, SE = 0.015, P = 0.469). Main effects: higher concern predicted more pandemic-related content both before (β = 0.039, SE = 0.011, P < 0.001) and after jumps (β = 0.043, SE = 0.008, P < 0.001). Main effects of position remained significant: decrease after jumps (β = −0.049, SE = 0.013, P < 0.001) and increase before jumps (β = 0.042, SE = 0.013, P < 0.001). Overall, spontaneous thought dynamics prioritize both episodic detail and current-concern content, though the pre-jump increase for current concerns deviated from the initial threshold-based prediction.
Analyzing think-aloud streams with automated NLP/ML revealed functional structure in spontaneous thought. The timing of topic jumps matched a memory-optimization account: when episodic detail within a topic waned, thoughts transitioned to a new topic rich in episodic details, consistent with processes that facilitate decorrelation of episodes and support semantic abstraction akin to memory replay. Current concerns also shaped thought dynamics: new topics after jumps were goal-relevant (e.g., COVID-19) and then waned in relevance, especially for individuals with higher concern. However, goal-relevant content increased prior to jumps, suggesting salient concerns act as attractor states or rumination-like dynamics that draw thought back before transitioning. Thus, spontaneous thought appears multifaceted, simultaneously supporting memory functions and maintaining alignment with ongoing goals, though the mechanisms for goal relevance may differ from a simple threshold-based foraging rule.
This work bridges theoretical models of spontaneous thought with empirical analysis using large-scale think-aloud data and NLP. It shows that spontaneous thought dynamics prioritize episodic detail—consistent with optimizing memory consolidation—and also emphasize current concerns, keeping minds oriented toward goals. The study introduces an automated pipeline for individuating thoughts and detecting topic jumps, enabling scalable analysis. Future directions include: (1) mechanistic linkage of episodic detail dynamics to specific recent experiences and later recall; (2) individualized assessments of diverse current concerns beyond COVID-19 and the role of affect (e.g., anxiety) and rumination; (3) refining automated methods to improve segmentation and clustering reliability; and (4) formal semantic foraging models to estimate optimal patch-switch thresholds and new metrics (semantic distance, thought speed, topic prevalence, affective dynamics) to explain individual differences in creativity and psychopathology-linked thought patterns.
- Automated and human segmentation/clustering showed only moderate to fair reliability (units: human–human kappa = 0.58; human–automated = 0.60; topic jumps: human–human = 0.41; human–automated = 0.30), indicating room for improvement in defining and detecting units and topic boundaries.
- Effect sizes were small, potentially due to brief think-aloud duration and use of NLP tools optimized for longer, structured text.
- Current concerns were operationalized via COVID-19 only, an anxiety-laden topic during data collection; generalization to other idiosyncratic concerns is uncertain.
- Some data were transcribed automatically; while validated, ASR errors may introduce noise.
- Mixed-effects models included only random intercepts to avoid convergence issues; more complex random structures were not tested.
Related Publications
Explore these studies to deepen your understanding of the subject.

