Psychology
Abstract representations of events arise from mental errors in learning and memory
C. W. Lynn, A. E. Kahn, et al.
Humans excel at recognizing abstract patterns, but how do our mental mishaps shape this ability? Researchers Christopher W. Lynn, Ari E. Kahn, Nathaniel Nyema, and Danielle S. Bassett explore an intriguing perspective on how learning errors influence our abstract representations of the world, with implications for designing better information sources. Discover the science behind our psychological shortcuts and the surprising role of errors in learning!
~3 min • Beginner • English
Introduction
The study addresses how humans infer higher-order statistical structure from sequences of events and why internal representations reflect abstract features such as community structure in transition networks. While humans are known to detect differences in immediate transition probabilities (e.g., in language), accumulating evidence shows sensitivity to abstract, multi-scale regularities and network organization. The central hypothesis is that the brain balances accuracy against computational complexity, leading to mental errors that blur fine-grained details and accentuate higher-order structure. Using the free energy principle, the authors formalize this hypothesis and propose that abstract representations arise naturally from resource-constrained learning and memory processes, rather than requiring complex hierarchical algorithms or error-free Bayesian inference. The work aims to explain network effects on expectations and reaction times and to provide a simple, predictive model that links transition network structure to behavior.
Literature Review
Prior work shows early sensitivity to transition frequencies in infants and adults and the importance of statistical learning for language, vision, and cognition. Network science has provided a unifying framework for conceptualizing stimuli as nodes and transitions as edges, with random walks generating stimulus sequences. Studies of artificial grammars and temporal community structure demonstrate that modular organization influences neural representations and behavior, including reaction times, and that humans can detect higher-order structures beyond first-order transitions. Prevailing approaches often emphasize either ideal Bayesian inference or explicit hierarchical learning, which can be computationally demanding. The authors build on the free energy and information-theoretic traditions, positing that resource constraints shape internal models, and relate their approach to concepts like successor representations and communicability while distinguishing their account via direct measurements of memory-induced temporal shuffling errors.
Methodology
Modeling: The authors derive a maximum entropy model of internal transition estimates by assuming that, during sequence learning, memory errors temporally shuffle past stimuli. Let P(Δt) denote the probability of recalling a stimulus Δt steps from the target. They define a free energy functional F(Q)=βE(Q)−S(Q), where E(Q)=Σ Q(Δt)Δt is average temporal error and S(Q) is entropy, and minimize F to obtain a Boltzmann memory distribution P(Δt) ∝ e^{−βΔt}. In the infinite-sequence limit, the internal estimate of the transition matrix becomes an exponential walk-average with closed form  = (1 − e^{−β}) A (I − e^{−β} A)^{-1}, where β indexes precision (higher β means fewer errors). Behavioral mapping: For a given sequence x1…xt, anticipation a(t)=Â_{x_t x_{t+1}}(t−1) is linked to reaction time via τ(t)=r0 + r1 a(t). Parameters (β, r0, r1) are fit per individual by minimizing RMSE between predicted and observed reaction times after regressing out biomechanical factors, practice effects, and recency using mixed effects models. Competing models: A hierarchy of explicit ℓth-order transition models π(ℓ)(t)=c(0)+Σ_{k=1}^{ℓ} c(k) a^{(k)}(t) is used as a benchmark for comparison via RMSE and BIC. Experiments: (1) Serial response task with random walks on two 15-node, k=4 graphs: a modular graph (3 communities of 5 nodes) and a 3×5 lattice with periodic boundaries; uniform transition probabilities (0.25). (2) Hamiltonian-walk condition controlling for recency: sequences interleave random and Hamiltonian walks on the modular graph; analyses focus on Hamiltonian trials. (3) Ring graph violations: sequences on a 15-node ring with nearest and next-nearest neighbor edges include 50 interspersed novel transitions (violations) of short topological distance 2 and long distances 3–4 to test topological surprise. (4) n-back memory task (n=1,2,3) using letter sequences; positive responses treated as samples from P(Δt) to directly estimate β via exponential fits; β from n-back is correlated with β from the serial response task. Data processing and statistics: First 500 trials excluded; incorrect and implausible RTs filtered; linear mixed effects models estimate fixed effects (e.g., transition type, graph, topological distance) with subject-level random effects; accuracy metrics include RMSE and BIC; significance assessed with two-sided tests. Participants were recruited via Amazon Mechanical Turk across experiments with non-overlapping samples and standardized instructions.
Key Findings
- Network effects on expectations and RTs: • Cross-cluster surprisal in modular graph: within-cluster transitions are faster than between-cluster transitions. Random walks: ΔRT = 35 ± 6 ms (p < 0.001, t = 5.77, df = 1.61×10^5; n = 73). Hamiltonian walks: ΔRT = 36 ± 14 ms (p = 0.010, t = 2.59, df = 1.31×10^4; n = 120). • Modular-lattice effect: overall faster RTs in the modular vs. lattice graph. Random walks: ΔRT = 23 ± 6 ms (p < 0.001, t = 3.95, df = 3.33×10^5; n = 286). These effects persist after controlling for recency and other covariates; accuracy rates are also higher for within-cluster transitions. - Maximum entropy model predictions: • The model with P(Δt) ∝ e^{−βΔt} explains the emergence of higher-order structure in internal estimates: at intermediate β, communities become salient while fine-scale edges fade. • Analytic predictions account for cross-cluster surprisal and modular-lattice effects. - Individual-level modeling: • Strong linear relation between anticipation and RTs: mean slope r1 ≈ −735 ms (random walks) and −767 ms (Hamiltonian), with intercepts r0 ≈ 931 ms and 975 ms, respectively. • Estimated β: In random walks (358 sequences across 286 subjects), 40 sequences fit β→∞ (MLE-like), 73 fit β→0 (no structure), remaining 245 had mean β = 0.30. In Hamiltonian data (120 subjects), 20 β→∞, 19 β→0, remaining 81 had mean β = 0.61. • Model outperforms explicit higher-order models up to third order in RMSE and BIC for random walks; for Hamiltonian data it outperforms first- and second-order in RMSE and has lower BIC than second- and third-order models. - Direct memory evidence: • n-back measurements show P(Δt) decreases exponentially with Δt; combined estimate β = 0.32 ± 0.01 (bootstrap). • β from n-back correlates with β from serial response task (Spearman r = 0.28, p = 0.047), linking memory errors to internal transition estimates. - Novel transition (violation) effects depend on topology: • On a ring graph, RTs increase for violations vs. standard transitions; long violations (topological distances 3–4) elicit 28 ms longer RTs than short violations (distance 2) (p = 0.011). Relative to standard transitions: short +38 ms, long +63 ms (both p < 0.001). This confirms that surprise scales with topological distance predicted by the model.
Discussion
The findings support the hypothesis that abstract representations of event structure can arise from resource-constrained cognition that trades accuracy for simplicity. By invoking the free energy principle, the model predicts an exponential memory error kernel that temporally shuffles observations, yielding internal transition estimates that accentuate higher-order network features (e.g., communities) and de-emphasize fine-scale details. This mechanism qualitatively and quantitatively explains cross-cluster surprisal, faster processing in modular vs. lattice structures, and graded surprise to novel transitions as a function of topological distance. The strong correspondence between directly measured memory decay (β from n-back) and β inferred from RTs strengthens the causal link between mental errors and internal statistical representations. Compared to explicit higher-order transition learning, the maximum entropy model is more predictive with fewer computational commitments, suggesting that humans may not learn explicit higher-order chains but instead rely on a blurred, integrated representation shaped by memory noise. The work relates to, but is distinct from, frameworks like Bayesian inference with non-Markov priors and successor representations; direct evidence for the specific memory errors and parameter values differentiates the present account. The results have implications for designing optimally learnable networks: structures with hierarchical communities may be more robust to mental errors and thus more easily learned and used.
Conclusion
This study presents a principled, maximum entropy model showing that abstract, higher-order representations of transition structure can arise from natural memory errors governed by an accuracy–efficiency trade-off. The model has a concise analytic form, predicts population-level network effects and individual reaction times, and is supported by direct measurements of the memory distribution in an n-back task. It further predicts graded surprise to novel transitions based on network topology. These contributions bridge information theory, free energy principles, and human statistical learning, offering practical guidance for constructing learnable information sources. Future research should examine dynamic and non-uniform networks, neural correlates of the proposed representations, individual differences in β and their cognitive determinants, interactions with attentional and reinforcement-learning systems, and a more precise dissociation of recency and structural effects.
Limitations
- Although recency was controlled analytically and via Hamiltonian walks, the authors note that future work should further disambiguate recency effects from structural learning. - Networks used had uniform degree and transition probabilities; generalization to heterogeneous or weighted graphs remains to be tested. - A subset of participants were best fit by degenerate β values (β→0 or β→∞), indicating variability in strategy or engagement. - Reaction time measures from online participants can be noisy; stringent preprocessing was used, including exclusion of early trials and outlier RTs, which might influence generalizability. - The model assumes an exponential memory error function; while supported by n-back data, alternative memory dynamics could operate in other contexts. - Comparisons to alternative frameworks (e.g., successor representation) are discussed but not exhaustively adjudicated neurally. - The first-order model had lower BIC than the maximum entropy model in the Hamiltonian condition, suggesting context-dependent reliance on statistics of different orders.
Related Publications
Explore these studies to deepen your understanding of the subject.

