logo
ResearchBunny Logo
Optimizing self-organized study orders: combining refutations and metacognitive prompts improves the use of interleaved practice

Education

Optimizing self-organized study orders: combining refutations and metacognitive prompts improves the use of interleaved practice

E. Onan, F. Biwer, et al.

This study by Erdem Onan, Felicitas Biwer, Roman Abel, Wisnu Wiradhany, and Anique de Bruin reveals an exciting new approach to enhancing learning. It shows how refutations combined with metacognitive prompts can help students transition from blocked to interleaved practice, significantly improving their performance in visual category tasks.... show more
Introduction

The study addresses why students rarely adopt interleaved practice—a desirable difficulty that initially increases effort and depresses immediate performance but enhances long-term learning and transfer—when they self-organize their study order. Prior interventions promoting interleaving often involved experimenter-controlled sequences where learners merely chose blocked versus interleaved orders globally, which does not reflect authentic self-study where item-by-item decisions are required. Misconceptions about strategy effectiveness (e.g., belief that blocking is superior) and misinterpretation of effort (perceiving high effort as poor learning) may deter use of interleaving. The authors test whether combining refutation texts (to challenge erroneous beliefs and warn about misleading on-task experiences) with visual metacognitive prompts (to help monitor effort/learning trajectories and compare strategies) increases self-regulated interleaving and benefits learning when students create their own study order. The preregistered research questions and expectations were: RQ1—Beliefs: before intervention, blocked perceived as more effective; across time, perceived effectiveness of interleaving would increase more in the intervention than control. RQ2—Behavior: before intervention, more blocking than interleaving; across time, interleaving use would increase more in the intervention than control. RQ3—Outcomes: greater interleaving (more switches) would predict higher classification accuracy.

Literature Review

Prior work shows learners often prefer blocking when sequencing exemplars for inductive category learning. Tauber et al. observed frequent blocking with free choice, with interleaving rarely exceeding 30%. Kornell and Vaughn reported about 47% blocking—still above chance but lower than Tauber et al. Using imagined scheduling, Yan et al. found most participants would primarily block. Beyond perceptual categories, Hartwig et al. showed students scheduled math practice largely in blocked fashion, though some interleaving occurred near test time; even with knowledge of interleaving’s benefits, blocking predominated. Frameworks such as Study Smart and KBCP argue that students need accurate knowledge and beliefs about strategies; however, misconceptions and misleading on-task experiences (high effort with effective strategies) lead to misinterpreted effort and avoidance of desirable difficulties. Previous interventions in experimenter-controlled contexts used minimal theory-based instructions, metacognitive prompts, and/or performance feedback; Sun et al. showed durable shifts toward interleaving after instruction plus feedback. The authors advocate combining theory- and experience-based methods, with refutation texts to directly counter misconceptions and preempt misinterpretation of effort, and with experience-based metacognitive prompts to help learners interpret effort/learning trajectories and internalize benefits. Prior observational work on self-organized schedules indicates modest increases in interleaving under specific motivations or task characteristics (e.g., category similarity), but students still largely block. Evidence for performance benefits of interleaving in self-controlled contexts is mixed, possibly due to cognitive load from item-level control. This motivates testing an intervention directly in authentic, self-organized sequencing tasks and assessing both strategy use and learning outcomes.

Methodology

Design: Preregistered 3 (time: pre-intervention, post-intervention, delayed transfer) × 2 (condition: intervention vs control; between-subjects) mixed design. In pre- and post-intervention tasks, participants learned painting styles of artists; in delayed transfer, they learned bird species. Participants: 96 undergraduates recruited at Maastricht University (≈20 years; 52 female, 42 male, 2 non-binary). Five did not complete Session 2 within the preregistered window and were excluded; two had partial data on Day 1 and were excluded only from affected analyses, yielding n=91 for main analyses. Ethics approval: Maastricht University REC/2022/091; compensation €15. Data, code, and materials available on OSF. Stimuli: 180 paintings (24 artists) and 48 bird images (6 species). Pre-intervention: 36 paintings (6 artists × 6 each). Post-intervention: 48 paintings (6 artists × 8 each; 6 for study, 2 for test). Delayed transfer: 48 bird images (6 species × 8 each; 6 for study, 2 for test). Intervention phase used remaining 96 paintings (12 artists × 8 each; 6 for study, 2 for test). Intervention (three components):

  • Refutations: Two pre-study instructional texts (~273 and ~278 words) explicitly stated common erroneous beliefs (e.g., blocked > interleaved; misinterpreting effort as poor learning), labeled them false, provided correct information (interleaving benefits), and evidence-based explanations (e.g., interleaving highlights category differences). Included citations and graphics.
  • Strategy implementation: An experimenter-controlled phase where participants studied 12 artists under both schedules: six artists in blocked units (6 paintings of one artist) and six in interleaved units (6 paintings, one from each of six artists). Presentation: fixation cross 1s, painting 3s with artist name. Unit order: B-I-I-B-B-I-I-B-B-I-I-B (counterbalanced assignment). After each unit, participants rated perceived effort and perceived learning (single-item 9-point scales).
  • Visual metacognitive prompts: Post-implementation, participants viewed two line graphs summarizing their perceived effort and learning over time for blocked and interleaved practice side-by-side, with prompt questions guiding attention to temporal changes, interpretations, and strategy effectiveness judgments. Control condition: Completed the strategy implementation phase (to allow comparable exposure), followed by a filler task instead of refutations and visual prompts; then beliefs measure. Measures:
  • Learning strategy beliefs: Perceived effectiveness of blocked and interleaved practice for long-term learning (two separate single-item Likert scales, 1–6), plus open-ended rationale.
  • Self-organized study sequences: In pre-, post-, and delayed-transfer tasks, participants freely chose item order to study all exemplars (35 unique choices per task). Interleaving operationalized as switch decisions (category n ≠ category n−1); proportion of switches (n_switch/35) computed per participant.
  • Classification accuracy: Tests on novel exemplars. After strategy implementation (24 trials; two per category) and after each self-organized task (12 trials; two per category). Items presented centrally with category labels; untimed responses. Procedure: Session 1 in lab: pre-intervention free-choice learning of painting styles (no strategy instruction), then intervention (or control) including beliefs measure and classification test, then post-intervention free-choice learning of painting styles with classification test. Session 2 (online) 5–7 days later: delayed-transfer free-choice learning of bird species, beliefs measure, and classification test. Tasks programmed in SoSci Survey; intervention delivered via Qualtrics. Analyses: ANOVAs (afex in R) for beliefs and interleaving use; t-tests; generalized linear mixed-effects models (lme4, lmerTest) for trial-level accuracy with random intercepts for subjects and items.
Key Findings

RQ1—Beliefs: Before intervention, blocked was rated more effective than interleaved (blocked M=4.39, SD=0.85; interleaved M=4.00, SD=1.02), t(90)=2.36, p=0.020, d=0.25. Over time, Time × Condition interactions emerged for both strategies. Blocked perceived effectiveness decreased significantly in the intervention condition (F(2.00,86.00)=28.24, p<0.001, ηp2=0.229), but not in control. Post-intervention, blocked was lower in intervention (M=3.43, SD=0.95) than control (M=3.91, SD=1.10), t(87)=2.19, p=0.031; after delay, intervention remained lower (M=3.54, SD=0.93) than control (M=4.02, SD=0.94), t(87)=2.49, p=0.015. Interleaved perceived effectiveness increased over time in the intervention condition (F(1.57,67.44)=11.62, p<0.001, ηp2=0.115), with no change in control. Post-intervention, interleaved was higher in intervention (M=4.80, SD=0.80) than control (M=4.04, SD=1.04), t(87)=−3.81, p<0.001; after delay, the difference was not significant (control M=4.22, SD=1.13; intervention M=4.61, SD=0.93), t(87)=−1.77, p=0.080. RQ2—Use of interleaving: At baseline, blocking predominated (blocked proportion ≈0.57 vs interleaved ≈0.43); blocking rate exceeded chance, t(90)=12.89, p<0.001. A Time × Condition interaction showed that interleaving increased significantly in the intervention condition (F(1.64,73.63)=40.38, p<0.001, ηp2=0.266), but not significantly in control. Post-intervention, interleaving proportion was higher in intervention (M=0.76, SD=0.31) than control (M=0.55, SD=0.36), p=0.004; after delay, intervention remained higher (M=0.81, SD=0.29) than control (M=0.57, SD=0.34), p<0.001. RQ3—Learning outcomes: In the experimenter-controlled phase, interleaved study yielded higher classification than blocked (interleaved M=6.56, SD=2.97; blocked M=3.57, SD=2.40), t(90)=10.00, p<0.001, d=1.05; no condition difference. In self-controlled phases, no main effects of condition or time on overall accuracy, but GLMMs showed higher interleaving rates predicted better accuracy: post-intervention painting styles, Estimate=1.56, SE=0.54, z=2.77, p<0.005 (each additional switch increased odds of correct classification by ~4%); delayed-transfer bird species, Estimate=1.65, SE=0.47, z=3.48, p<0.001 (~5% odds increase per switch). Exploratory: The longest interleaving run was greater in intervention vs control post-intervention (M=21.15 vs 13.67; p=0.012) and after delay (M=24.67 vs 13.37; p<0.001). Longest interleaving run correlated with accuracy post-intervention r=0.22 (p=0.035) and after delay r=0.33 (p<0.001).

Discussion

Findings confirm that students initially overvalue blocked practice and tend to block when freely sequencing. The combined intervention successfully corrected beliefs, increased perceived effectiveness of interleaving, and substantially increased self-regulated interleaving in both immediate and delayed tasks where learners created item-by-item study orders. Importantly, greater interleaving predicted better classification accuracy under self-controlled conditions, extending the interleaving benefit beyond experimenter-controlled paradigms. The intervention likely worked via three mechanisms: (1) refutations directly challenged misconceptions and warned about misleading on-task experiences, timed before study to preempt reinforcement of erroneous beliefs; (2) credible, explanatory content increased persuasiveness; and (3) visual metacognitive prompts helped learners contextualize effort/learning trajectories and internalize that interleaving “works for me,” reducing secondary cognitive load from self-regulation. Compared to performance feedback approaches, which may sometimes contradict normative information due to immediate score variability, the theory-plus-experience approach may foster more robust conceptual change and behavior shifts. Results suggest that interleaving benefits can manifest even in cognitively demanding self-controlled contexts, potentially because the intervention mitigated the metacognitive load that can otherwise obscure benefits.

Conclusion

A combined refutation and metacognitive-prompt intervention effectively shifts students’ beliefs and self-regulated behavior toward interleaving in authentic, self-organized study sequences and is associated with improved classification performance. The work advances strategy training beyond experimenter-controlled decisions to item-level sequencing, demonstrating durable behavior change over a 5–7 day delay. Future research should evaluate classroom and real-world implementation, examine generalization to more ecologically relevant materials and assessments, and identify optimal degrees of interleaving under varying task demands.

Limitations

Interleaving was treated as a desirable difficulty in visual category learning, but desirability depends on task demands; in some contexts, blocking may be advantageous depending on task complexity, category similarity, and test format. The study used relatively low-authenticity materials (painting styles, bird species), limiting generalizability to students’ real courses where motivation, task interest, goals (mastery vs performance), and prior knowledge may influence strategy adoption and effectiveness. Further work should test the intervention in classroom settings and determine optimal interleaving levels.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny