
Psychology
The rational use of causal inference to guide reinforcement learning strengthens with age
A. O. Cohen, K. Nussenbaum, et al.
Discover how beliefs about environmental controllability shape learning from childhood to adulthood in a recent study conducted by Alexandra O. Cohen, Kate Nussenbaum, Hayley M. Dorfman, Samuel J. Gershman, and Catherine A. Hartley. The research reveals that while adults and adolescents utilize causal inference in their learning, younger children tend to rely only on simple outcomes. Don't miss this insightful exploration into cognitive development!
~3 min • Beginner • English
Introduction
The study investigates how individuals across development infer and use the causal structure of their environment to guide reinforcement learning. Real-world outcomes often result from external, unobservable causes, reducing controllability and complicating simple action–reward associations. Adults are known to modulate learning based on inferred controllability, discounting outcomes likely caused by external agents. The research questions are whether children and adolescents can infer latent external causes of positive and negative outcomes and whether they incorporate these beliefs to adjust credit assignment during learning. Using a task with hidden agents producing valenced or random outcomes, the authors hypothesized that the rational use of causal beliefs to guide value updating would increase with age.
Literature Review
Prior work shows adults adapt learning to inferred controllability, discounting outcomes attributable to hidden causes (Dorfman et al., 2019). Infants and toddlers can infer hidden causes and link them to probabilistic events, indicating early-developing causal inference abilities. However, developmental studies suggest marked changes in learning causal relationships from childhood through adolescence into adulthood, with adolescents showing distinct patterns relative to younger and older individuals. Emerging evidence indicates children and adolescents may rely more on simpler, model-free action–outcome learning and may underutilize complex reward structure knowledge. Adults display optimistic attribution biases (attributing negative outcomes to external causes more than positive), linked to perceived control. These strands motivate assessing both explicit causal attributions and their use in reinforcement learning across ages.
Methodology
Participants: Ninety volunteers aged 7–25 years (Mean age = 15.89, SD = 5.24; 47 female) from New York City completed the study, with equal targeted bins of children (7–12; Mean = 10.13, SD = 1.89), adolescents (13–17; Mean = 15.54, SD = 1.50), and adults (18–25; Mean = 21.99, SD = 2.34). Twelve additional participants were excluded for <60% optimal choices. Inclusion criteria excluded psychiatric diagnoses, learning disabilities, beta blockers/psychoactive medications, and colorblindness. IRB approval obtained; informed consent/assent procedures followed; compensation was $15/hour plus $5 bonus.
Task: Adapted from Dorfman et al. Participants sought gold by choosing between two mines per trial. Baseline within-block reward probabilities were fixed: one mine yielded gold with 80% probability (optimal), the other with 20%. Three blocks (territories) of 50 trials each were performed: Millionaire (benevolent agent sometimes put gold in both mines), Robber (adversarial agent sometimes replaced gold with rocks in both mines), Sheriff (random agent sometimes put gold/rocks randomly). Hidden agents intervened on 30% of trials in each block. Participants were informed of the territory and that interventions were infrequent but not the exact rate. After each choice and 2 s outcome display (gold vs rocks), participants indicated whether the hidden agent caused the outcome (yes/no). Both choice and attribution were self-paced. Practice trials demonstrated probabilistic outcomes and permissible interventions; corrective feedback during practice ensured understanding of agent effects. Six counterbalanced task versions were used to maintain comparable reward structures; territory order was counterbalanced. Implemented in PsychoPy 1.85.6.
Effective average gold probabilities accounting for 30% interventions: Millionaire territory: better mine 85.74%, worse mine 42.65%. Robber: better 55.62%, worse 10.89%. Sheriff: better 71.15%, worse 29.37%.
Statistical analyses: Trial-wise logistic mixed-effects models (lme4 glmer) examined (a) attributions and (b) optimal choices, with fixed effects of territory, outcome/reward (for attributions), trial number (for learning), continuous age and age-squared, and their interactions; random intercepts per participant and random slopes for within-subjects effects/interactions. Age and trial number were z-scored. Inclusion of both linear and quadratic age terms improved fit by likelihood ratio tests.
Computational modeling: Seven models fit to choices: three RL models not using attribution beliefs (one learning rate; two learning rates with separate α+ and α−; three learning rates by territory) and four Bayesian RL variants incorporating causal beliefs: empirical Bayesian (learning rate scaled by posterior probability of agent intervention; prior intervention probability derived from each participant’s overall attribution rate), empirical Bayesian by territory (priors derived per territory and outcome), adaptive Bayesian (intervention probability learned online from experience), and noisy Bayesian (like empirical Bayesian but allows probability epsilon of believing in impossible interventions). All models used a softmax choice function with inverse temperature β and a stickiness parameter λ to capture choice repetition. Model comparison employed random-effects Bayesian model selection with Laplace-approximated log marginal likelihood (mfit), computing protected exceedance probabilities (PXPs) within age groups. Model recovery: simulated 10,000 agents per model using empirical parameter distributions and attribution rates; accuracy filter >60% optimal choice; recoverability assessed (PXPs). Simulations: 100 runs per subject using fitted parameters and trial orders to compare qualitative learning trajectories.
Key Findings
Causal attributions: Participants' attributions aligned with task structure. Significant reward outcome by territory interaction: χ²(2, N=90)=87.69, p<0.0001; negative outcomes most attributed to Robber, then Sheriff, rarely Millionaire; positive outcomes most to Millionaire, then Sheriff, rarely Robber. Reward outcome by age interaction: χ²(1, N=90)=4.85, p=0.028; younger participants more often attributed positive outcomes to external agents than older participants; attribution of negative outcomes was relatively age-invariant. Main effects: territory χ²(2)=45.04, p<0.0001; reward outcome χ²(1)=7.44, p=0.006; age χ²(1)=17.09, p<0.0001; age² χ²(1)=6.14, p=0.013. Younger participants attributed more outcomes to hidden agents overall.
Learning performance: Main effects indicated learning across trials and with age: trial number χ²(1)=100.46, p<0.0001; age χ²(1)=13.97, p<0.001; age² χ²(1)=5.17, p=0.023. Interactions: trial number by territory χ²(2)=15.40, p<0.001 (steeper learning in Millionaire and Sheriff than Robber); territory by age² χ²(2)=6.89, p=0.032; trial number by territory by age² χ²(2)=6.81, p=0.033; marginal trial number by age χ²(1)=3.82, p=0.051. Older participants learned faster overall; younger participants showed better learning in Millionaire territory.
Model comparison: Children (7–12) best fit by one learning rate model (PXP=0.98; others <0.01), indicating updating based on outcomes without discounting for inferred agent intervention. Adolescents (13–17) best fit by adaptive Bayesian model (PXP=0.89; others <0.10), suggesting flexible estimation of intervention probabilities guiding learning. Adults (18–25) best fit by empirical Bayesian model (PXP=0.75; others <0.08), indicating learning modulated by explicitly reported beliefs about agent intervention. These results replicate adult findings from Dorfman et al. (2019) and show developmental shifts in integrating causal structure into RL.
Model recovery: All three best-fitting models (one learning rate, adaptive Bayesian, empirical Bayesian) were recoverable in simulations (PXP=1 for each). After accuracy filtering (>60% optimal): n≈9691–9748 simulated participants per model remained.
Simulations: One learning rate model produced differing learning trajectories across territories with dips in Robber territory due to undiscounted negative prediction errors; this matched children’s empirical patterns. Bayesian models produced more similar trajectories across territories, aligning with adolescent and adult behavior.
Parameter estimates (means [SE]) from best-fitting models: Children (one learning rate): α=0.38 (0.04), β=4.85 (0.30). Adolescents (adaptive Bayesian): β=4.87 (0.39), Φ=0.94 (0.15). Adults (empirical Bayesian): β=5.43 (0.35), Φ=0.85 (0.22).
Discussion
The study demonstrates that although individuals across ages can explicitly infer the latent causal structure (assigning outcomes to benevolent, adversarial, or random agents appropriately), the implicit use of these beliefs to guide reinforcement learning strengthens with age. Children showed higher overall attribution rates and were especially likely to attribute positive outcomes to agents, but their choice behavior was best captured by a simple model-free one learning rate RL, indicating limited discounting of uncontrollable outcomes. Adolescents and adults incorporated causal structure into learning via Bayesian RL, with adolescents favoring a more flexible adaptive approach that estimates intervention probabilities online, and adults relying more on their explicit attribution beliefs (empirical Bayesian). These developmental differences suggest increasing sensitivity to controllability and an age-related shift from model-free to more model-based, causally informed learning strategies. The findings intersect with literature on perceived control and optimistic biases, suggesting younger individuals may exhibit lower optimistic externalization of negative outcomes than adults. Potential neural underpinnings include protracted development of prefrontal-hippocampal-striatal circuits supporting model-based control and proactive behavior. The results emphasize adolescence as a period of emerging flexible use of mental models in decision-making, which may be adaptive in novel, volatile contexts.
Conclusion
Children, adolescents, and adults can learn the causal structure of environments with hidden agents, but the rational use of these beliefs to modulate reinforcement learning increases with age. Children rely predominantly on outcome-driven, undifferentiated updating, whereas adolescents and adults discount outcomes likely caused by external agents in a manner consistent with Bayesian inference, with adolescents showing more flexible estimation of intervention probabilities and adults more directly mapping explicit beliefs onto learning. This work extends prior adult findings to development and clarifies how controllability beliefs shape learning across ages. Future research should: (1) test whether more explicit, observable interventions change children’s integration of causal information into learning; (2) develop models capturing potential age-specific priors over external causes; (3) better manipulate and account for task structure to reconcile mixed findings on valence asymmetries; and (4) investigate neural mechanisms supporting the maturation of causally informed, model-based learning.
Limitations
The influence of external causes was invisible and ambiguous; participants could not know with certainty whether an agent intervened on a given trial and had to rely on inference. Children, despite higher attribution rates, may have been less certain and thus underweighted these beliefs in value updating. There was heterogeneity in best-fitting models within age groups, particularly among younger participants, and increased variability in children’s choice behavior, suggesting individual differences not fully captured by the implemented Bayesian models. The modeling framework may not encompass all ways younger participants incorporate causal beliefs (e.g., different priors on external causes).
Related Publications
Explore these studies to deepen your understanding of the subject.