Psychology
Observational reinforcement learning in children and young adults
J. M. R. Buritica, B. Eppinger, et al.
The study investigates how children (8–10 years) and young adults (18–20 years) learn from observing others versus learning from their own outcomes. Observational learning is pervasive in social contexts (e.g., schools and playgrounds) and may confer advantages in hazardous or novel environments by allowing learning without direct action. Prior work shows adults typically outperform children in instrumental reinforcement learning, likely due to developmental improvements in cognitive control. However, the neurocomputational mechanisms of observational learning across development remain unclear. Previous EEG studies suggest children exhibit larger responses to observed outcomes but use others’ information less efficiently than adults. The authors aim to characterize and compare the computational (reinforcement learning parameters) and neural correlates (model-based fMRI prediction error signals) of observational versus individual learning across development, and to test age differences in these processes and their relation to behavior.
Reinforcement learning (RL) models explain learning via prediction errors (PEs) weighting outcome deviations by learning rates and shaping choice via an inverse temperature parameter. Developmental RL work reports mixed age differences in learning rates but a consistent increase in choice specificity with age. Social/observational learning can involve dual updating: from one’s own outcomes and from others’ outcomes. Neuroimaging links PEs in non-social contexts to ventral striatum and vmPFC, whereas social learning additionally recruits dmPFC and ACC (often implicated in other-related reward and monitoring), and mentalizing regions including dmPFC, TPJ, and pSTS, especially when inferring others’ mental states or during strategic interactions. Developmental studies of social RL are sparse; EEG work indicates children show heightened responses when observing peers but less learning-related modulation and reduced behavioral benefit from others’ outcomes compared to adults. The present study extends this literature with model-based fMRI to directly compare observational and individual PE coding and their developmental differences.
Design: A probabilistic observational learning task was performed during fMRI. Participants made repeated choices between two abstract stimuli per pair (each pair presented 8 times). One option yielded rewards with 80% probability (gains) and 20% losses; the other reversed (20% gains/80% losses). Before each self-choice, participants observed an age- and sex-matched peer’s choice. Two conditions were intermixed across runs: (1) Individual learning (IL): no information about the other’s choice or outcomes; (2) Observational learning (OL): participants observed the other’s action and outcome before making their own choice. Four stimulus pairs per run (two per condition), 32 trials per run; three runs total (~9 min each), yielding 48 trials per condition per participant. Participants: 59 total (29 children aged 8–10 years, 18 female; 30 young adults aged 18–20 years, 16 female). One child excluded for non-completion. All right-handed, normal/corrected vision, no neurological/psychiatric disorders. IQ estimated via WISC-III subtests (Similarities, Block Design). Children had higher age-normed IQ than adults; IQ was included as a covariate in behavioral analyses. Task timing: Each trial had an observational phase (jittered fixation 1–8 s; 1 s cue with other’s photo; 2 s response window to reveal the other’s choice; 1 s outcome) followed by an action phase (jitter 1–8 s; 1 s self-photo cue; 2 s choice; 1 s outcome). In IL, the observational phase displayed the stimuli but no other’s action/outcome. Missed responses (“too slow”) were rare (adults mean 2.41 trials; children mean 6.89 trials across the task). Computational modeling: A dual-update Q-learning algorithm captured updating from others’ outcomes (OL observational stage) and own outcomes (OL action stage; IL action). Best-fitting models in both IL and OL included separate learning rates for positive and negative outcomes (αpos, αneg) and an inverse temperature β. Parameters were bounded (α in [0,1], β in [0,5]) and fit individually to choices; model selection used BIC. For model-based fMRI, trial-wise PEs were computed using median parameter estimates per age group for the best-fitting model in each condition and were scaled and mean-centered. Behavioral analysis: Mixed-effects generalized linear models (lme4 in R) predicted accuracy with fixed effects of age group (children/adults), condition (IL/OL), trial (1–8), their interactions, and intelligence; random intercepts and slopes (condition and trial) per subject. Robust mixed-effects models (robustlmm) tested age, condition, and valence effects on αpos, αneg, and β (simplified random-effects due to convergence constraints). An additional analysis equated information exposure across conditions (OL trials 1–4 vs IL trials 2,4,6,8) to rule out information amount confounds. fMRI acquisition: 3T Philips Achieva; T2* EPI, TR=2200 ms, TE=30 ms, 38 slices, 2.75 mm thickness, ascending acquisition; high-resolution T1 (TR=9.76 ms, TE=4.59 ms, flip angle 8°, FOV 224×177.33×168 mm, in-plane 0.875×0.875 mm, slice thickness 2 mm). Three functional runs; first two volumes discarded. Motion handling included censoring volumes with framewise displacement >0.5 mm (2–12 volumes; <10% per run). fMRI preprocessing and GLM: SPM8 used for slice-timing correction, motion correction, T1 co-registration/segmentation, normalization to MNI305 space (12-parameter affine + nonlinear), resampling to 3 mm isotropic voxels, 6 mm FWHM smoothing. GLMs modeled choice onsets (RT as duration) for observational and action phases in both conditions, with parametric modulators for choice value (from RL). Outcome onsets were modeled with stick functions; separate outcome regressors for own and other’s outcomes (OL) and own/no-outcome (IL). Trial-wise PEs (from RL) were included as parametric modulators for own and other’s outcomes. Missed responses and censored motion volumes were modeled as nuisance regressors; six motion parameters included. Main contrasts compared PE-related activation for own outcomes in IL (action phase) versus other’s outcomes in OL (observational phase). Cluster-level FWE correction p<0.05 (cluster-forming p<0.001) for condition effects; age differences tested with FDR cluster correction q<0.05 (voxel-wise p<0.001). Beta values from significant clusters were extracted (MarsBaR) for brain–behavior analyses. Brain–behavior analyses: Linear regressions related condition-specific accuracy to corresponding PE-related activation in regions identified from whole-brain maps (including dmPFC, dlPFC, inferior parietal/TPJ, insula, vmPFC, striatum, parietal), controlling for performance in the other condition and intelligence; age interactions tested. FDR correction applied across multiple ROIs.
Behavioral performance:
- Overall accuracy above chance in both conditions and age groups; performance correlated between OL and IL (r=0.46, p<0.001).
- Mixed-effects GLM showed main effects: condition (OL>IL; β=0.12, t=5.2, p<0.001) and trial (β=0.04, t=3.0, p=0.005), with condition×trial interaction (β=−0.04, t=−2.2, p=0.026). Post hoc: IL improved across trials (β=0.06, t=6.3, p<0.001); OL performance stable over trials (β=0.004, t=0.43, p=0.671), indicating early benefit from observation.
- Age effects: adults > children (β=0.10, t=3.7, p<0.001) and age×trial (β=0.04, t=2.2, p=0.031). Adults improved across trials (β=0.05, t=4.6, p<0.001); children showed limited improvement (β=0.18, t=1.75, p=0.08). No age×condition interaction (p=0.9), indicating similar OL benefit across ages. Additional analysis equating information exposure confirmed OL>IL (β=0.06, t=5.03, p<0.001), no age×condition interaction (p=0.8). Computational parameters:
- Learning rates: main effect of valence (αpos > αneg; β=0.75, t=3.4, p<0.001). Medians: αpos=0.78, αneg=0.20. No main effects of age or condition, and no interactions (all p>0.7).
- Inverse temperature (choice specificity): higher in OL than IL (medians 2.14 vs 1.03; β=0.95, t=7.4, p<0.001). Adults > children (medians 2.18 vs 1.13; β=0.91, t=4.5, p<0.001). No age×condition interaction (p=0.9). Neuroimaging—condition differences:
- IL PEs > OL PEs: stronger activation in vmPFC (pFWE<0.05), left lateral PFC (pFWE<0.05), bilateral striatum, and bilateral parietal cortex (pFWE<0.001). No regions showed OL PEs > IL PEs.
- Age×condition interaction: left TPJ/inferior parietal cortex (pFWE<0.05), with adults showing differential PE coding for self vs other (TPJ activation increased for others’ worse-than-expected outcomes and for own better-than-expected outcomes; t(29)=5.07, p<0.001); not observed in children (p=0.15). Neuroimaging—within-condition PE coding:
- OL condition: PE-related activation in right lateral PFC, right inferior parietal cortex, and right insula (whole-brain F-test, pFWE<0.05); relationships were negative with PE (larger negative PEs produced greater activation).
- IL condition: PE-related activation positively correlated in vmPFC, striatum, and parietal cortex (whole-brain F-test, pFWE<0.05). Age differences in PE coding:
- Observational PEs: adults > children in dmPFC, right dlPFC, right inferior parietal cortex, and right insula (qFDR<0.05, voxel-wise p<0.001), showing stronger negative PE coding in adults.
- Individual PEs: no significant age differences. Brain–behavior relations:
- Observational learning accuracy correlated with dmPFC PE-related activation (β=−1.12, t=−3.79, p<0.001; survives FDR), and with right dlPFC (p=0.037; did not survive FDR). The dmPFC relation was consistent across ages (interaction p=0.558) and specific to OL performance (IL performance included as covariate).
- Individual learning: accuracy related to left parietal PE activation (β=0.98, p=0.007) but did not survive multiple-comparison correction; no significant relations in vmPFC/striatum or TPJ. Overall: Adults exhibited faster learning and more value-driven choices; both age groups benefitted behaviorally from observational information. Neural PE coding for observation prominently engaged dmPFC and frontoparietal regions, with stronger negative PE coding in adults; dmPFC PE signals predicted observational learning performance across ages.
The study addressed how children and young adults learn from others versus from their own outcomes and which neurocomputational mechanisms underlie these processes. Behaviorally, both children and adults benefited from observational information, with OL performance starting high and remaining stable, while IL improved gradually. Adults overall outperformed children and exhibited greater choice specificity, consistent with developmental improvements in value-based decision-making and cognitive control. Computationally, learning rates were higher for positive than negative outcomes in both conditions and ages, whereas inverse temperature was higher in OL and in adults, indicating more value-driven choices particularly when observing others. Neurally, IL PEs were linked to canonical valuation regions (vmPFC, striatum, parietal), whereas OL PEs recruited dmPFC, dlPFC, insula, and inferior parietal cortex, with negative PE coding prominent during observation. Adults showed stronger observational PE coding than children in dmPFC and frontoparietal regions, suggesting maturation of networks involved in social learning, cognitive control, and social cognition. The TPJ showed condition-sensitive PE coding in adults but not children, consistent with roles in self–other distinction and social prediction. Critically, dmPFC PE responses predicted observational learning performance beyond IL, and this was age-invariant, underscoring dmPFC’s central role in leveraging others’ outcomes for learning. Together, these findings show partly overlapping but also distinct neural substrates for observational versus individual learning and delineate developmental differences particularly in the social-learning-relevant dmPFC/frontoparietal network.
This work integrates reinforcement-learning modeling with fMRI to compare individual and observational learning in children and young adults. Adults learned faster and made more value-driven choices than children, yet both age groups benefited similarly from observing others. Computationally, positive outcomes drove learning more than negative ones, and choice specificity was higher in observational contexts. Neurally, IL PEs engaged vmPFC/striatal circuits, whereas OL PEs prominently engaged dmPFC and frontoparietal regions with stronger negative PE coding, especially in adults. Importantly, dmPFC PE responses predicted observational learning performance across ages, highlighting dmPFC’s functional relevance for social observational learning. Future work should disentangle direct versus indirect observation contexts, equate information content across IL and OL, examine how task difficulty and cognitive load modulate PE signals, and investigate broader developmental trajectories (e.g., adolescence) and the exploration–exploitation trade-off in social learning.
- Task design differences from prior EEG studies (e.g., timing/jitter, condition complexity) may limit direct comparability and might reduce observed age-related differences between OL and IL.
- Social specificity: Although participants believed they observed a peer, the design involved indirect observation and computer-generated choices; effects could reflect general information availability rather than purely social processes.
- Information asymmetry: OL provided more immediate information than IL; although an additional analysis equated information samples across subsets of trials, the main design does not match information amounts perfectly.
- Intelligence differences: Children had higher age-normed IQ than adults; though controlled in behavioral models, residual confounding cannot be ruled out for neural differences.
- Exploration vs stochasticity: Higher stochasticity in children may reflect exploration; the task design does not disentangle exploration from noise.
- Limited age range: Only 8–10-year-olds and 18–20-year-olds were studied; adolescent trajectories and broader developmental changes remain untested.
- Potential task difficulty/cognitive load differences may contribute to activation differences, particularly in IL (greater load possibly increasing activation).
- fMRI constraints (e.g., limited trial numbers after censoring, indirect neural measures) and group-level PE regressors (using age-group medians) may limit precision of individual PE–BOLD mappings.
Related Publications
Explore these studies to deepen your understanding of the subject.

