logo
ResearchBunny Logo
Optimizing practice scheduling requires quantitative tracking of individual item performance

Education

Optimizing practice scheduling requires quantitative tracking of individual item performance

L. G. Eglington and P. I. P. Jr

Discover how a novel approach combining microeconomic principles with a computational model of spacing enhances memory retention! This groundbreaking research by Luke G. Eglington and Philip I. Pavlik Jr. demonstrates that optimal practice scheduling can lead to up to 40% more items recalled. Perfect for online education!... show more
Introduction

The study addresses how to optimally schedule spaced retrieval practice to maximize learning given real-world constraints on time. Prior research shows that spacing and testing improve memory, but concrete guidance on when and how often to practice specific items is lacking. The authors argue that conventional, nonadaptive spacing schedules are inherently suboptimal because they ignore time costs, individual differences, and item-level variability. They propose an adaptive scheduling approach that uses a computational memory model to estimate recall probabilities and combines this with microeconomic principles to optimize efficiency (learning gain per unit time). They hypothesize that there exists an optimal efficiency threshold (OET) for item difficulty (operationalized as recall probability) that maximizes learning efficiency, and that scheduling practice to keep items near this threshold will outperform conventional schedules and difficulty-focused heuristics. They further predict that the optimal threshold for paired-associate learning will be relatively high due to faster and more efficient correct trials and the time costs of feedback for errors.

Literature Review

The introduction surveys classic and contemporary work on spacing and retrieval practice, including uniform, expanding, and contracting schedules, with mixed empirical outcomes regarding which schedule is best. It reviews heuristic adaptive methods such as drop-after-correct rules and the Region of Proximal Learning framework, noting their limitations in accounting for time costs and reintroduction timing. The paper emphasizes the often-overlooked impact of difficulty on response time and total study time, arguing for evaluating practice by efficiency (gain per second). It integrates findings that practice reduces response time and can slow forgetting, and that correct trials are typically faster than incorrect trials with feedback. Prior computational models (e.g., Pavlik and Anderson; Lindsey et al.; Walsh et al.) predict recall and spacing effects and can support adaptive scheduling. The authors critique recommendations to practice items near forgetting (e.g., probability ≈ 0.40) by highlighting the substantial time costs associated with low recall probabilities and slower responding, suggesting that higher thresholds may be more efficient when time is considered.

Methodology

Overview: The authors used a two-stage approach: (1) parameterize models with an initial experiment (Experiment 1), and (2) run simulations comparing adaptive OET-based scheduling to conventional schedules, followed by an empirical test (Experiment 2) of key simulation predictions. Materials were Japanese-English word pairs throughout.

Experiment 1 (Model parameterization):

  • Participants: 132 MTurk participants (IRB-approved; informed consent). Retention interval groups: 2 min (n=43), 1 day (n=45), 3 days (n=44).
  • Materials: 48 Japanese-English word pairs per participant (English targets 4 letters, average familiarity/imageability).
  • Design: Two practice sessions. Session 1 manipulated repetitions per item (2, 4, 8) and spacing (very narrow: 0–1 intervening trials; narrow: ~4; wide: ~8; very wide: ~13), with jittering to avoid predictability. Session 2 (after 2 min, 1 day, or 3 days) tested each item three times with feedback in randomized blocks.
  • Procedure: First trial per item was study (7 s). Subsequent trials were cued recall tests (7 s to respond). Correct responses got brief 500 ms “correct” plus 500 ms ISI; incorrect or timeout led to 4 s corrective feedback. Timers reset upon typing start to avoid cutoffs.
  • Modeling: A logistic-form correctness model predicted recall probability using components for successes and failures with spacing and decay parameters (item and participant intercepts). An RT model mapped predicted log-odds to RT for correct trials (exponential decay form). Incorrect RTs were set to empirical median plus 4 s feedback.

Simulations:

  • Setup: Each simulation represented a learner studying 30 items for 22 minutes, then a test 3 days later. 200 simulated students per condition. Item and student intercepts were sampled from Experiment 1 fits.
  • Conditions: 22 OET conditions (probability thresholds from 0.20–0.80 in steps of 0.05, and 0.80–0.98 in steps of 0.02). Conventional schedules included massed, uniform, expanding, contracting, and a Drop-1 heuristic (drop after first success; restart after all succeeded). Conventional conditions were simulated with fixed duration trials (11 s; common in literature) and variable duration governed by the RT model. In variable-duration conditions, successes were faster and allowed more trials within the fixed total study time.
  • Adaptive OET scheduling: On each trial, select the item with predicted recall probability closest to but below the threshold; if all items exceed the threshold, choose the closest above it. After practice, the item’s probability generally jumps above threshold and is deferred until it decays back under the OET. This produces adaptive spacing that depends on item difficulty and practice history.
  • Timing: First exposure per item: 7 s study. For self-paced conditions, correct trial duration was RT model estimate plus ~0.5 s feedback; incorrect trial duration fixed at 8.98 s (median incorrect) plus 4 s feedback; 1 s ISI throughout.

Experiment 2 (Empirical test of simulation predictions):

  • Participants: 322 MTurk via TurkPrime; exclusions: 25 with Japanese knowledge and 6 noncompleters; final Ns per condition: S15 (n=59), OET40 (n=53), OET70 (n=57), OET86 (n=50), OET94 (n=56), OET98 (n=16). Power targeted for medium-to-large effects.
  • Materials: 30 word pairs per participant (subset of the same pool).
  • Procedure: Session 1: 22 minutes of practice. Session 2: 3-day delayed cued recall test on all 30 items, randomized order, same feedback regime as practice. In adaptive conditions, the model (with fixed parameters from Experiment 1 and item intercepts refit; no participant intercepts) estimated item recall probabilities each trial and scheduled according to assigned OET (0.40, 0.70, 0.86, 0.94, 0.98). In conventional S15, items were split into two blocks of 15, each repeated approximately every 15 trials with jitter.
  • Outcomes: Delayed test recall probability; model fit metrics (predictive validity).
Key Findings

Simulations:

  • All spaced schedules outperformed massed practice; greater average spacing generally improved outcomes. Allowing variable trial duration (faster correct trials) improved performance compared to fixed durations by enabling more practice within the same total time.
  • Adaptive OET-based scheduling was superior to conventional schedules. The optimal simulated threshold was approximately OET = 0.94. OETs ≥ 0.80 yielded 39–55% higher delayed recall than the best conventional variable-duration schedule.
  • Achieved practice difficulty in conventional schedules remained relatively low on average due to lack of adaptivity, often making items too difficult during practice. OET-based practice naturally adjusted item spacing: easier items received wider spacing; harder items narrower spacing. In high OET conditions (e.g., 0.94), spacing expanded with practice; in low OET (e.g., 0.40), spacing tended to contract.
  • Adaptive scheduling mitigated ability-based inequities in self-paced settings: number of trials completed was strongly correlated with ability in conventional self-paced conditions but not in adaptive OET conditions, leading to fairer distribution of practice opportunities.

Experiment 2:

  • Delayed recall followed a skewed inverted-U across thresholds, with high OETs best. OET86 and OET94 significantly outperformed OET40, OET98, and conventional S15.
  • Statistics: Kruskal-Wallis H(4) = 16.5, p < 0.01. Mixed-effects model comparison: adding condition improved fit, chi-square = 18.72, p = 0.002. Pairwise contrasts showed OET86 and OET94 > OET40, OET98, and S15 (Z > 2.83, ps < 0.005). OET86 (mean ≈ 0.48) was numerically highest but not significantly different from OET94 (mean ≈ 0.47), p = 0.64; OET70 did not differ significantly from other conditions (ps > 0.07, 0.17 depending on comparison).
  • Predictive validity: Model fit McFadden pseudo-R2 = 0.37; predicted vs actual recall r = 0.775, p < 0.001; AUC = 0.871 (95% CI 0.868–0.874). Performance decreased at the very highest threshold (0.98), consistent with the inverted-U (quadratic term significantly improved fit, chi-square = 23.47, p < 0.001).
  • Magnitude: OET94 produced approximately 40% higher retention than conventional S15 and than OET40. Overall advantage of the adaptive approach vs conventional spacing had Cohen’s d ≈ 0.64, comparable to classic spacing effects (d ≈ 0.42).

Sensitivity analyses:

  • Alternate assumptions increasing efficiency of failures (faster or more informative feedback) shifted the optimal threshold somewhat lower but still favored high thresholds for paired-associate learning. The relative benefit over lower OETs decreased as failure efficiency increased.
Discussion

Findings support the hypothesis that incorporating time costs into adaptive scheduling yields better learning than conventional, nonadaptive spacing. An efficiency-based policy that practices items near an optimal recall probability (high but below ceiling) maximizes learning per unit time by balancing gains from spacing and testing against the time costs of errors and slower responses. OET-based scheduling adapts to item and learner variability, naturally producing expanding intervals for learned items and narrower spacing for difficult items, and equitably distributing practice across learners of different abilities in self-paced settings. The optimal level of difficulty is task-dependent: for fast cued-recall tasks with costly feedback and substantial forgetting, high OETs (≈0.86–0.94) are optimal; if failures are faster or more beneficial, lower OETs may be preferred. The results refine the desirable difficulty framework by quantifying optimal difficulty levels when time is considered, and they show how computational memory models can be practically used to guide adaptive educational technologies.

Conclusion

The paper introduces an efficiency-centered, model-based approach to scheduling practice that quantifies and targets an optimal difficulty threshold. Simulations and an experiment demonstrate that high OETs substantially improve delayed recall relative to conventional spacing and to policies focusing on harder items. The approach is readily implementable in adaptive learning systems and naturally handles item reintroduction and expanding spacing. Future work should extend the framework to other materials with inter-item relations, incorporate dynamic parameter updates and learner feedback, integrate adaptive cue strength to control early-trial difficulty, and explore how varying costs and benefits of failures versus successes shift the optimal threshold.

Limitations
  • Generalizability: The empirical test used independent paired associates (Japanese-English word pairs). Results may differ for materials with rich inter-item structure or different task demands.
  • Model assumptions: The RT and correctness models may not fully capture interference dynamics during adaptive practice; simulations underestimated performance in high-difficulty conditions, suggesting unmodeled factors (e.g., reduced interference) contributed.
  • Parameterization: Approach relies on having prior data to estimate model parameters and item intercepts; parameter estimates may be biased if prior datasets lack sufficient variability in spacing or repetitions.
  • Task dependence: The optimal threshold varies with the relative gains and time costs of successes versus failures; high OETs may not be optimal when failures are fast and highly informative.
  • Early-phase difficulty: Items can be practiced far from the target OET early on; additional mechanisms (e.g., adaptive cueing) may be needed to efficiently scaffold initial learning.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny