Education

Interleaved practice enhances memory and problem-solving ability in undergraduate physics

J. Samani and S. C. Pan

Discover the surprising results of a study by Joshua Samani and Steven C. Pan that reveals how interleaved practice can boost memory and problem-solving skills in physics students. Despite showing significantly better performance, students found this method more challenging and less effective. Dive into the intriguing findings of this educational research!... show more

Introduction

The study investigates whether interleaved practice—alternating between different problem types or topics during practice—enhances memory and problem-solving ability in an authentic undergraduate physics course. While traditional instruction often uses blocked practice (focusing on one topic at a time), research suggests interleaving can benefit inductive learning and attention, retrieval, and discrimination processes. However, interleaving’s utility for factual memory and complex problem-solving in real educational settings remains underexplored. Given physics’ heavy problem-solving demands and historically low performance, the authors test if interleaving, implemented through routine homework, improves longer-term retention and transfer to novel problems compared with blocking.

Literature Review

Prior work has primarily examined interleaving in laboratory studies of perceptual category learning (e.g., artists’ styles, biological taxa, artificial shapes), typically finding superior classification after interleaved versus blocked study. A meta-analysis reported interleaving benefits with Hedge’s g ≈ 0.67 (95% CI [0.57, 0.77]) for artists’ paintings and g ≈ 0.31 (95% CI [0.17, 0.45]) for artificial shapes, with larger benefits for sets of perceptually similar categories. Proposed mechanisms include temporal spacing (a form of distributed practice), increased attention to differences between categories (discriminative contrast), and enhanced retrieval processes. Emerging classroom evidence shows promise beyond pure induction tasks: interleaved homework in middle-school mathematics nearly doubled surprise test performance relative to blocked practice in one classroom study, and a large randomized controlled trial across 54 classrooms reported Cohen’s d ≈ 0.83 (95% CI [0.68, 0.97]) on delayed surprise tests. Evidence has also appeared in second-language learning. Nonetheless, gaps remain regarding interleaving’s benefits for (a) memory for factual content, (b) problem-solving requiring multi-step reasoning, and (c) effectiveness in authentic higher-education contexts like undergraduate physics.

Methodology

Design and setting: A preregistered, counterbalanced, within-subjects experiment was conducted in two large lecture sections of an introductory undergraduate physics course (Physics for Life Science Majors) at a major U.S. public university over the first 8 weeks of a 10-week term. Only the arrangement of homework problems was manipulated (blocked vs interleaved); all other course elements proceeded as usual. Participants: 350 undergraduates enrolled across two back-to-back lecture sections. Inclusion criteria (preregistered): completion and submission of all homework in a stage and completion of the associated surprise criterial test. Analyzed samples: Stage 1 (weeks 1–4): Lecture 1 n=159, Lecture 2 n=151; Stage 2 (weeks 5–8): Lecture 1 n=137, Lecture 2 n=149. No significant GPA differences between sections. Intervention: Thrice-weekly homework assignments (Mon/Wed/Fri). Each stage had 10 assignments totaling 84 problems; the same problems were used across conditions within stage, differing only in arrangement. Blocked: three successive isomorphic problems per topic (one topic at a time). Interleaved: one problem per topic per assignment, alternating topics across problems and assignments (isomorphs for a topic appeared on subsequent assignments, not adjacent). Topics (30 per stage) spanned electrostatics, circuits, magnetism, waves, and modern physics (see table of topics in text). Assessment: At the end of each stage (Friday of weeks 4 and 8), an in-class, unannounced criterial test with three novel, more challenging problems assessed retention and transfer. Problems required recognizing relevant principles, recalling equations, and integrating knowledge across topics to devise solution strategies; problem combinations tested were not explicitly practiced in homework. Sub-measures: (a) Memory (recall/notation of necessary formulas/principles), and (b) Correctness (exact final numerical answer with correct units). Tests were scored using a rubric common to the department by at least two trained raters blind to condition; discrepancies adjudicated by a third rater. Inter-rater reliability: Cohen’s k ≈ 0.81 (Stage 1), 0.853 (Stage 2). Overall proportion correct on criterial tests was low (≤0.34) given difficulty. Perceptions and behavior: After each homework, students rated difficulty and perceived learning. An exit survey captured study time distributions and behaviors. Reported study time prior to criterial tests did not differ significantly between conditions. Analyses: Preregistered comparisons of overall criterial test performance across conditions using two-tailed t-tests with effect sizes (d) and 95% CIs; supplementary permutation tests yielded similar results. Exploratory analyses examined homework accuracy and metacognitive judgments by condition and stage.

Key Findings

Criterial tests (primary outcome): Interleaving > Blocking.
- Stage 1: d = 0.40, 95% CI (0.17, 0.651), t(288) = 3.41, p = 0.0008; median test score improvement ≈ 50% relative to blocking.
- Stage 2: d = 0.91, 95% CI (0.60, 1.21), t(284) = 7.68, p < 0.0001; median improvement ≈ 125% relative to blocking; larger effect with more cumulative content.
Sub-measures on criterial tests:
- Memory (recall of necessary formulas/principles): Interleaving improved memory in Stage 1, d = 0.41, 95% CI [0.17, 0.66], t(288) = 3.49, p = 0.006; Stage 2, d = 0.96, 95% CI [0.70, 1.24], t(284) = 8.05, p < 0.0001.
- Correctness (exact final answer with units): Interleaving improved correctness in Stage 1, d = 0.25, 95% CI [0.02, 0.48] (text truncated for full stats); overall correctness rates were low (≤0.34) given difficulty.
Homework performance and metacognition (Table 2):
- Accuracy: Interleaved homework had lower mean accuracy than blocked.
  - Stage 1: 0.69 (0.68, 0.70) interleaved vs 0.74 (0.73, 0.75) blocked.
  - Stage 2: 0.67 (0.66, 0.68) interleaved vs 0.76 (0.75, 0.76) blocked.
- Judged difficulty: Interleaved rated more difficult.
  - Stage 1: 0.94 (0.93, 0.95) interleaved vs 0.86 (0.84, 0.87) blocked.
  - Stage 2: 0.89 (0.87, 0.90) interleaved vs 0.81 (0.80, 0.83) blocked.
- Judged learning: Interleaved perceived as yielding less learning.
  - Stage 1: 0.51 (0.48, 0.53) interleaved vs 0.57 (0.54, 0.60) blocked.
  - Stage 2: 0.40 (0.39, 0.43) interleaved vs 0.48 (0.45, 0.50) blocked.
Study time: No significant differences in self-reported study time between conditions; most students reported minimal study (≤3 hours/week) before criterial tests and substantial cramming before midterms.
Overall: Despite lower homework accuracy and perceptions, interleaving produced substantially better performance on surprise criterial tests, indicating superior long-term retention and transfer to novel problems.

Discussion

The findings directly address the research question by demonstrating that interleaved practice, implemented in routine homework without altering other course elements, enhances both long-term memory for physics concepts and problem-solving performance on novel, more challenging tasks. Benefits were observed across two counterbalanced stages, with a larger effect as content accumulated. The dissociation between homework performance/metacognitive judgments and criterial test outcomes underscores a metacognitive illusion: students perceived interleaving as harder and less effective even though it led to better durable learning and transfer. Several mechanisms may underlie these benefits: (1) facilitation of inductive learning of problem categories based on underlying principles rather than surface features; (2) distributed practice via temporal spacing between isomorphic problems; (3) enhanced discriminative contrast and relational processing by juxtaposing topics; (4) frequent retrieval practice prompted by topic switching; and (5) improved strategy selection due to the unpredictability of problem types in interleaving, unlike the repetitive predictability of blocking. These accounts are complementary and may jointly contribute to observed gains. While memory benefits were robust, evidence for far transfer (multi-step integration to produce fully correct solutions) was smaller and more variable, partly constrained by generally low exact-correctness rates and coarse measurement of problem-solving processes. Importantly, interleaving’s benefits were evident on surprise criterial tests after 1–several weeks but were not clearly detected on subsequent high-stakes exams, potentially due to cramming effects that can mask instructional differences. The real-world, business-as-usual implementation supports the ecological validity and scalability of interleaving for STEM courses.

Conclusion

This preregistered classroom experiment shows that interleaving homework problems across topics in an undergraduate physics course substantially improves long-term memory and problem-solving on novel tasks relative to conventional blocked practice. Despite lower immediate homework accuracy and perceptions of efficacy, interleaving produced medium-to-large improvements on surprise criterial tests across two stages. Practical implications: Instructors can often adopt interleaving by simply rearranging existing problem sets to alternate topics, thereby promoting durable and generalizable learning without overhauling course content. Future directions: (1) Employ fine-grained assessments of problem-solving (e.g., step-by-step strategy analyses) to better capture far transfer; (2) test generalizability to courses and disciplines with fewer isomorphic problems; (3) examine durability beyond course timelines and effects on high-stakes exams when cramming is minimized; (4) identify optimal interleaving schedules and topic similarity structures; (5) develop interventions to mitigate metacognitive illusions and improve student buy-in.

Limitations

Outcome generalization: Interleaving benefits were robust on surprise criterial tests but were not clearly evident on high-stakes exams, potentially due to cramming, limiting conclusions about exam performance.
Measurement constraints: Exact-correctness rates were low, and rubric-based scoring may not capture partial yet meaningful improvements in multi-step strategies; more granular process data were not collected.
Context specificity: Single-course, single-institution implementation with physics topics featuring multiple isomorphic problems; results may not generalize to courses without such structure.
Final exam data: Optional, take-home final due to COVID-19 resulted in insufficient data.
Perceptual/metacognitive factors: Students perceived interleaving as harder and less effective, which may affect adoption and adherence without explicit guidance.
Minor implementation variations: Assignment-length irregularities late in cycles and exclusion of topics from tests due to timing constraints could introduce slight imbalances, though both conditions used identical problem sets per stage differing only in order.

Related Publications

Explore these studies to deepen your understanding of the subject.

Education

Setting parameters for developing undergraduate expertise in transdisciplinary problem solving at a university-wide scale: a case study

G. Bammer, C. A. Browne, et al.

Medicine and Health

Concept and location neurons in the human brain provide the 'what' and 'where' in memory formation

S. Mackay, T. P. Reber, et al.

Medicine and Health

Concept and location neurons in the human brain provide the 'what' and 'where' in memory formation

S. Mackay, T. P. Reber, et al.

Psychology

Concept and location neurons in the human brain provide the ‘what’ and ‘where’ in memory formation

S. Mackay, T. P. Reber, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny