Psychology
Comparing models of learning and relearning in large-scale cognitive training data sets
A. Kumar, A. S. Benjamin, et al.
This groundbreaking study by Aakriti Kumar, Aaron S. Benjamin, Andrew Heathcote, and Mark Steyvers delves into how we learn and relearn in real-world settings, analyzing data from over 39,000 individuals on the Lumosity platform. The findings reveal a nuanced interplay between long-term skill acquisition and task preparedness that could reshape our understanding of cognitive training.
~3 min • Beginner • English
Introduction
The study investigates how learning and relearning unfold over extended, naturalistic timescales when practice is self-scheduled and includes irregular, often lengthy gaps. Traditional laboratory studies tightly control practice schedules, limiting their ability to capture real-world variability in spacing, circadian factors, and interruptions. Using large-scale gameplay data from two Lumosity cognitive tasks, the authors aim to characterize naturalistic learning patterns—rapid within-session gains, gradual across-session improvement, and significant between-session losses with rapid recovery—and to determine what theoretical mechanisms are required to explain them. The central research questions are: (1) What components (e.g., long-term skill, short-term task-set preparedness, forgetting) best account for performance over long, variable intervals? (2) How does relearning after breaks depend on these components and their interaction? The work is motivated by spacing literature, warm-up decrement phenomena, and theories positing multiple timescales of learning, with the goal of establishing a robust account that generalizes beyond laboratory constraints.
Literature Review
Prior research consistently shows that spacing practice enhances long-term retention across domains, with expanding schedules often outperforming equal spacing. Classic and contemporary models emphasize the balance of learning and forgetting and sometimes multiple timescales, including power-law or exponential decay of performance contributions from individual learning events (e.g., Anderson’s model, Bjork & Bjork’s storage vs. retrieval strength). Warm-up decrement literature attributes immediate post-break performance drops to factors moderating expression of skill—attentional state, preparedness, and interface familiarity—rather than loss of skill per se. Theories in motor learning and reanalyses of historic data also support distinct fast (short-term, within-session) and slow (long-term, across-session) learning processes. However, laboratory designs often limit opportunities for forgetting by minimizing inter-practice intervals, potentially obscuring dynamics prominent in self-guided, real-world learning where gaps span days to years. The present work leverages naturalistic data with wide interval variability to test whether models must include both multiple timescales and explicit, interval-dependent forgetting, and whether interaction between skill and preparedness is necessary to account for rapid relearning after breaks.
Methodology
Data source and tasks: Gameplay histories from Lumosity for two tasks: Lost in Migration (flanker/selective attention; 45 s per gameplay) and Ebb and Flow (task switching; 60 s per gameplay). Performance metric: number of correct trials per gameplay (excluding bonus points). Time span: December 18, 2012 to October 31, 2017. Users primarily on web-based platform.
Sampling and preprocessing: The full dataset includes 194,695 users, 389,389 learning curves, and 41,006,715 gameplay events. Gameplays were clustered into sessions using a 1-hour threshold: consecutive gameplays separated by <1 hour belonged to the same session; ≥1 hour marked a new session. This produced 34,722,958 sessions; 81% had a single gameplay. To study relearning (which requires multiple gameplays per session), users with gameplay-to-session ratio >1.5 were retained, then 25,000 users per game were sampled, and finally restricted to users with >50 sessions, yielding 19,463 users (flanker) and 19,694 users (task switching). Missing gameplay records (timestamps without data) were removed.
Train-test split: Two subsets created by interval between sessions: >100 days and <100 days. From each subset, 20% of gameplays were held out for testing; remaining constituted training, ensuring coverage of long delays in both splits.
Proportional performance change metric: To quantify losses/gains across sessions while controlling for individual baselines, proportional change from gameplay t to t+1 was computed as (Y−B)/(X−B), where X and Y are performances at t and t+1, and B is baseline estimated as the mean of the first three gameplays. Observations with (X−B)<2 were excluded (<2% of scores) to avoid instability.
Models compared: Four increasingly complex models predicting gameplay score y as a function of cumulative practice (j), within-session index (k), and inter-session delay (t_j).
- M1 (baseline learning): y_j = A − U e^(−λ j). Captures overall practice gains; no within-session dynamics or forgetting.
- M2 (two-timescale learning): y_j = A − U(e^(−λ j) + r e^(−β k)). Adds fast within-session learning (β) relative to slow across-session skill (λ); r scales fast component. Implicit session resets for preparedness but no delay-dependent forgetting.
- M3 (two-timescale + additive forgetting): y_j = A − U(e^(−λ j) + r e^(−β k) − δ e^(−γ t_j)). Adds explicit forgetting that grows with inter-session delay t_j, weighted by δ with common decay rate γ shared across participants.
- M4 (interactive model with context loss): y_j = A − U(e^(−λ j) + (1 + (1−r) L(t_j)) e^(−β k)), where L(t) = 1 − e^(−γ t). Skill (slow component) is permanent; task-set preparedness (fast component) decays with delay via context loss and is relearned more rapidly as skill increases (interaction). γ shared across participants.
Model estimation: Parameters estimated by maximum likelihood minimizing root-mean-square error (RMSE) between observed and predicted scores. Automatic differentiation and Adam optimizer used. Constraints: λ in [0, 0.1]; fast/forgetting parameters (e.g., β, δ, γ, r) in [0, 1]. Each participant had individual parameters except γ, which was shared in M3 and M4. γ was first estimated on a large participant subset, then fixed for the remaining participants.
Additional benchmark: Predictive Performance Equation (PPE) from the spacing literature was adapted in Supplemental Materials with predefined settings and fitted via MLE; results summarized in supplements.
Visualization and diagnostics: Aggregate learning curves plotted by within-session gameplay index across sessions; retention curves plotted as function of inter-session delay for first vs. second gameplay to assess warm-up decrement and rapid recovery dynamics. A context loss function L(t) was fitted; the best-fitting curve indicated ~50% context loss by ~450 days and ~80% by ~800 days.
Key Findings
- Naturalistic learning exhibits three key features: rapid within-session gains (often 50–70% of asymptote in the first session), gradual across-session improvements, and between-session losses that are rapidly recovered.
- Relearning is exceptionally rapid even after long breaks: performance after very long intervals (approaching or exceeding 1–2 years) recovers within a few gameplays at the start of a session.
- Delay-dependent forgetting continues to grow over very long spans: losses from 700 to 800 days are comparable to those from 0 to 100 days, indicating substantial ongoing context loss at extended delays.
- The interactive model (M4) provides the most general account over the wide range of naturalistic intervals. For Lost in Migration (flanker), M4 achieved the best overall out-of-sample RMSE (~3.55). For Ebb and Flow (task switching), M3 and M4 were essentially tied overall (~4.53). Across interval-specific subsets, the simpler M3 sometimes outperformed M4, especially at certain delay ranges, but M4 uniquely captured the combined patterns of learning, relearning, and forgetting across the extreme variability in spacing.
- Fitted context loss function L(t) suggested that approximately 50% of task-set context is lost after ~450 days and ~80% after ~800 days.
- Empirical patterns matched M4’s diagnostic prediction: the rate of within-session learning increases across practice sessions (interaction between slow skill and fast preparedness), explaining steeper growth for earlier gameplays within a session and the rapid recovery from the first to second gameplay after a delay.
Discussion
The findings address the central questions by demonstrating that two distinct timescales—slow, durable skill acquisition and fast, labile task-set preparedness—are necessary to explain real-world learning trajectories. Critically, the components must interact: greater underlying skill accelerates reacquisition of task-set preparedness, producing rapid relearning after breaks. This interactivity explains steeper growth on earlier within-session gameplays and shallower forgetting evident by the second gameplay after a delay. Moreover, forgetting grows with interval length over extensive timescales, necessitating a delay-dependent context loss process. While models with independent learning and forgetting (e.g., M3) can fit restricted interval ranges, only the interactive model (M4) robustly captures the full naturalistic spectrum of spacing, retention, and relearning observed in the data. These results caution against directly generalizing models calibrated on controlled laboratory schedules to self-paced, real-world learning and underscore the value of naturalistic data in testing theoretical robustness.
Conclusion
This work shows that naturalistic, long-timescale learning is governed by at least two interacting components: permanent skill and transient task-set preparedness, with forgetting that increases with delay via context loss. An interactive model that links preparedness to accumulated skill best accounts for rapid relearning and the dynamics observed across highly variable, self-scheduled practice. Contributions include: (1) large-scale characterization of learning, forgetting, and relearning under naturalistic spacing; (2) formulation and comparison of four models, highlighting the necessity of interactivity; (3) empirical estimation of a context loss function operating over months to years. Future research could extend interactive models to additional tasks and domains, incorporate individual differences in forgetting (e.g., participant-specific γ), examine metacognitive self-scheduling strategies, and integrate adaptive scheduling policies optimized for long-term retention in real-world contexts.
Limitations
- Observational, naturalistic data lack experimental control; self-selection and uncontrolled confounds (e.g., motivation, device differences, time-of-day, sleep) may influence performance.
- Sample is biased toward Western countries and older adults; generalizability to other populations is limited.
- Session definition (1-hour threshold) may imperfectly capture cognitive session boundaries.
- Many sessions contain only a single gameplay, limiting within-session learning observations for some users.
- The forgetting rate parameter γ was shared across participants (for estimation stability), limiting modeling of individual differences in delay sensitivity.
- Performance metric excludes game bonus points and may not capture all aspects of speed–accuracy trade-offs.
- Potential inconsistencies between platform feedback and analyzed metric, and removal of sparse data may introduce minor biases.
Related Publications
Explore these studies to deepen your understanding of the subject.

