Psychology
The role of variable retrieval in effective learning
E. Butowska-buczyńska, P. Kliś, et al.
Effective learning in educational settings is strongly supported by two robust techniques: retrieval practice and temporal spacing of learning sessions. Theoretical accounts suggest that encoding variability—processing different facets or contexts of the same information across sessions—should enrich episodic contextual representations and thereby enhance long-term retention. Despite its prominence in memory models, past empirical findings on variable encoding have been mixed, sometimes favoring constant over variable encoding, especially when variability was implemented during restudy. The present work hypothesizes that the benefits of variability should emerge most strongly when learning proceeds via retrieval practice (as opposed to restudy), because retrieval combines original encoding context with the retrieval cue, yielding richer composite contextual representations. Moreover, variability should magnify the benefits of spacing, as using different cues across sessions maximizes contextual differentiation. The study aims to (a) test variable retrieval via changing meaningful cues (sentences or questions) while learning foreign vocabulary and lecture content, (b) examine interactions between variability, retrieval practice versus restudy, and spacing, and (c) assess learners’ metacognitive judgments about the effectiveness of variable versus constant cueing.
Foundational theories posit that encoding variability enhances retention by associating items with multiple contextual features (Estes, Glenberg). Distributed context representations (Howard & Kahana) and episodic context accounts of retrieval-based learning (Karpicke et al.) emphasize the role of context as a retrieval cue. Empirical reviews confirm strong testing (retrieval practice) and spacing effects (Roediger & Karpicke; Cepeda et al.; Carpenter & Pan). Prior variability studies report mixed outcomes: some document null or detrimental effects when variability is implemented during restudy or with incidental context changes (Postman & Knecht; Verkoeijen et al.; Young & Bellezza), while others show benefits under constrained conditions or specific tasks (Huff & Bodner; Zawadzka et al.; Hendriks et al.). Smith & Handy found benefits of varying incidental environmental contexts during retrieval but not restudy; Butler (2010) found superior transfer from repeated testing vs. study with limited cue variation; Butler et al. (2017) showed that retrieving and applying knowledge to different examples promotes transfer. Desirable difficulties literature suggests that challenging retrieval (weaker cues, cued recall) enhances learning (Carpenter; Delosh; Pyc & Rawson), but combining difficulty-inducing strategies can be subadditive in some cases (Birnbaum et al.; Appleton-Knapp et al.). Metacognitive work indicates learners often misjudge effective strategies, underutilizing retrieval practice and misattributing fluency (Karpicke & Butler; Benjamin & Bjork; Yan et al.).
Seven experiments tested variable versus constant cueing across retrieval practice and restudy, manipulating spacing and assessing metacognition.
- General approach: Participants learned Finnish-to-Polish translations with sentences serving as meaningful cues; variability was implemented by changing the sentence for a given word across cycles versus keeping it constant. Final tests used the foreign word without sentence as the cue. Later, lecture materials with conceptual questions were used to assess transfer and metacognitive judgments. Experiments 1a & 1b (spaced retrieval practice; foreign vocabulary):
- Participants: 1a: 31 Prolific recruits (Polish-speaking; no Finnish). 1b: 31 SWPS University undergraduates (Polish-speaking; no Finnish).
- Materials: 40 Finnish-Polish word pairs; 5 distinct Polish sentences per word; 2 pairs for training.
- Design: Within-participants manipulation of learning condition (constant vs. varied sentences). Five practice cycles with average lag ~40 items. 1a included an initial study phase (word + translation) and five retrieval cycles without feedback. 1b used five retrieval cycles with feedback (correct translation shown for 3 s after each attempt) and no initial study phase. Final cued-recall: translate Finnish words with no sentence cue.
- Procedure: Online, timed retrieval attempts (10 s); instructions emphasized using sentence context; assignment to conditions counterbalanced; randomized order. Experiments 2a & 2b (retrieval vs restudy × variability):
- Participants: 2a: 52 SWPS undergraduates. 2b: 69 Prolific recruits (Polish-speaking; no Finnish).
- Design: 2 (learning condition: varied vs. constant) × 2 (learning mode: retrieval practice with feedback vs. restudy) within-participants. Restudy trials presented the correct translation outright below the sentence for 13 s (no retrieval). 2a immediate test; 2b delayed test (24 h ± 2 h). Other materials and procedures as in 1b. Experiment 3 (spacing × variability):
- Participants: 80 Prolific recruits (Polish-speaking; no Finnish).
- Design: Within-participants learning condition (varied vs. constant); between-participants lag manipulation: long-lag (~40 items, as in 1b) vs. short-lag (mini-blocks of two word pairs; average lag 0.5 item). Retrieval practice with feedback; immediate final test. Experiment 4 (metacognition with foreign vocabulary):
- Participants: 40 Prolific recruits (Polish-speaking; no Finnish).
- Design: As in 1b (varied vs. constant, retrieval with feedback), plus metacognitive measures: (a) global predictions before study (estimated final performance for constant vs. varied; 0–100%), (b) item-by-item JOLs during cycle 5 immediately after feedback (0–100%), (c) global postdictions after the final test (0–100%). Orders counterbalanced; judgments self-paced. Experiment 5 (lecture materials; transfer and metacognition):
- Participants: 47 Prolific recruits; after exclusions (performance ≤ 0.08; copying; AI use), final N = 38 (English-speaking from UK, Canada, Australia, US, Ireland).
- Materials: Five ~5–8 min geology lecture segments covering 12 concepts; four questions per concept (A–D) requiring retrieval and application; example set provided. Due to copyright, materials available on request.
- Design: Concepts assigned to constant or varied condition (odd/even counterbalanced). Constant: one question repeated three times during practice; a different question used at final transfer test. Varied: three different practice questions; a fourth novel question used at test. Practice with feedback after each response; 2 min per question max; test self-paced with 2 min limit per question. Metacognition: pre- and post-test explicit choice of superior method (Repeated questions; Different questions; Equally effective). Interrater scoring by three raters; full agreement 84.7%; majority rule for disagreements. Data and ethics: Online testing; informed consent; ethics approval at SWPS; data/materials link (OSF).
Across seven experiments, variable cues during spaced retrieval practice improved memory performance relative to constant cues and enhanced the benefits of both retrieval practice and spacing. Specific results:
- Experiment 1a (study + retrieval; no feedback): Varied > constant, t(30) = 4.52, P < 0.001, d = 0.81.
- Experiment 1b (retrieval with feedback): Varied > constant, t(30) = 3.71, P < 0.001, d = 0.67.
- Experiments 2a (immediate test) and 2b (24-h delay) with retrieval vs. restudy: • 2a ANOVA: main effect of learning condition (varied > constant), F(1,51) = 15.51, P < 0.001, ηp² = 0.233; main effect of mode (retrieval > restudy), F(1,51) = 11.28, P < 0.001, ηp² = 0.181; interaction, F(1,51) = 5.87, P = 0.019, ηp² = 0.103. Retrieval > restudy in varied, t(51) = 3.60, P < 0.001, d = 0.50; no significant benefit in constant, t(51) = 1.41, P = 0.17, d = 0.20. • 2b ANOVA: main effect of learning condition (varied > constant), F(1,68) = 25.30, P < 0.001, ηp² = 0.271; main effect of mode (retrieval > restudy), F(1,68) = 18.42, P < 0.001, ηp² = 0.213; interaction, F(1,68) = 5.71, P = 0.026, ηp² = 0.071. Retrieval > restudy in varied, t(68) = 5.05, P < 0.001, d = 0.61; smaller benefit in constant, t(68) = 2.10, P = 0.040, d = 0.25. Means (SE): varied M ≈ 0.58–0.61 (SE ≈ 0.026–0.035); constant M ≈ 0.51–0.53 (SE ≈ 0.026–0.035).
- Experiment 3 (spacing): Varied > constant, F(1,78) = 39.39, P < 0.001, ηp² = 0.336; long-lag > short-lag, F(1,78) = 9.97, P = 0.002, ηp² = 0.113; interaction, F(1,78) = 6.65, P = 0.012, ηp² = 0.079. Spacing benefits larger with varied cues, long vs. short lag: varied t(78) = 3.89, P < 0.001, d = 0.87; constant t(78) = 2.31, P = 0.024, d = 0.52. Means (SE): varied M = 0.65 (0.026) vs constant M = 0.57 (0.026); long-lag M = 0.69 (0.035) vs short-lag M = 0.54 (0.035).
- Experiment 4 (metacognition with vocabulary): Memory benefit of varied cues, t(39) = 2.11, P = 0.041, d = 0.33. Global predictions before study: no preference, t(33) = 0.50, P = 0.62, d = 0.09. JOLs favored constant over varied, t(38) = 3.80, P < 0.001, d = 0.61 (means: constant 73.62% vs varied 67.45%). Global postdictions after test still favored constant, t(31) = 4.16, P < 0.001, d = 0.73 (means: constant 67.67% vs varied 56.97%).
- Experiment 5 (lecture transfer & metacognition): Transfer performance better after varied than constant practice questions, t(37) = 3.30, P = 0.002, d = 0.53. Metacognitive judgments showed preference for constant questions both pre- and post-test, χ²(2, 38) = 6.52, P = 0.038. Counts: majority chose constant (N = 20 both times); varied (N = 8 pre; N = 10 post); no difference (N = 10 pre; N = 8 post). Practice performance patterns (Table 2) show retrieval attempts were generally less successful during learning with varied cues than with constant cues when feedback was provided, indicating that varied cueing increases retrieval difficulty at practice but enhances delayed retention.
The findings demonstrate that variable retrieval—changing meaningful cues across practice—robustly enhances learning, especially when combined with spaced retrieval. This supports an episodic context account in which retrieval integrates original encoding context with the current cue, forming rich composite contextual representations. Variability across retrieval attempts increases the diversity of contextual features bound to the target, thereby raising the likelihood of successful retrieval under novel test contexts. From a desirable difficulties perspective, varying cues makes practice more challenging, increasing reliance on contextual cues and promoting deeper encoding of context, which improves long-term retention. Critically, variability produces superadditive benefits when combined with both retrieval practice (vs. restudy) and spacing, contrasting with prior reports of subadditivity when variability was implemented during restudy and learning relied on spontaneous reminding. The present work suggests superadditivity arises under deliberate retrieval because learners can reinstate prior study contexts and self-generated cues, facilitating updating and integration of varied contextual information. Despite performance gains, learners misappraise variable retrieval as less effective, likely due to reliance on immediate retrieval fluency during practice—constant cues enhance practice fluency but do not generalize to variable test contexts. This metacognitive illusion persisted even after experiencing the final test, indicating challenges in attributing item performance to the learning condition and in overcoming fluency-based theories of learning. These results have practical implications: educators should incorporate varied, meaningful cues in spaced retrieval practice to maximize retention and transfer, and interventions are needed to align learners' beliefs with effective practices.
Variable retrieval, implemented by changing meaningful cues across spaced practice sessions, reliably enhances memory for foreign vocabulary and promotes transfer of lecture content. It magnifies the benefits of retrieval practice (relative to restudy) and of spacing, yielding superadditive effects consistent with episodic context mechanisms and desirable difficulties. However, learners typically undervalue this approach, favoring constant cues based on practice fluency, creating a metacognitive mismatch. The study strengthens evidence-based guidelines by adding variable cueing to spaced retrieval practice as an optimal strategy. Future research should: (a) generalize these findings across broader educational materials, settings, and populations; (b) disentangle mechanisms by comparing retrieval with and without feedback under controlled difficulty; (c) develop metacognitive interventions to correct learners’ misbeliefs; and (d) explore design parameters for cue variability (type, degree, and timing) to optimize learning and transfer.
- Materials: Most experiments used simplified foreign vocabulary translations; only one experiment used lecture materials. Generalizability to diverse educational domains and complex curricula requires further study.
- Feedback confound: In experiments with feedback, retrieval practice effects and variability benefits largely reflect the impact of retrieval attempts on subsequent encoding of feedback. This complicates isolating the mechanism of variability independent of feedback.
- Practice difficulty differences: Varied cueing increased retrieval difficulty during practice (lower interim retrieval rates), which may influence subsequent learning via desirable difficulties; matching difficulty across conditions is challenging.
- Metacognitive attribution: Learners’ post-test judgments remained misaligned, potentially due to difficulty attributing item performance to learning conditions; this may limit learner-led adoption of variable retrieval strategies.
- Materials availability: Copyright restrictions limit public sharing of lecture materials; replication may require access upon request.
- Sample characteristics: Online samples (Prolific, university students) may limit external validity across age ranges and educational contexts.
Related Publications
Explore these studies to deepen your understanding of the subject.

