logo
ResearchBunny Logo
Action initiation and punishment learning differ from childhood to adolescence while reward learning remains stable

Psychology

Action initiation and punishment learning differ from childhood to adolescence while reward learning remains stable

R. Pauli, I. A. Brazil, et al.

Discover how reward and punishment learning evolve from childhood to adolescence in this fascinating study by Ruth Pauli and colleagues. With a large sample and innovative computational modeling, the research reveals that while punishment learning rates increase with age, reward learning remains stable, offering new insights into adolescent behavior.... show more
Introduction

Adolescence is a developmental period marked by changes in risk-taking, impulsivity, and sensitivity to reward and punishment. Theoretical accounts propose heightened reward processing and learning during adolescence, yet observed behaviours can reflect multiple underlying mechanisms. Traditional summary measures and questionnaires often cannot disentangle dynamic learning processes from response biases such as an action initiation (go) bias. To clarify which mechanisms show normative developmental change, this study used computational reinforcement learning models to separate valenced learning (from reward vs punishment) from action initiation tendencies in a large, cross-national youth sample, testing whether these processes are separable and how they vary from late childhood through adolescence.

Literature Review

Prior computational work on adolescent learning is less extensive than in adults and often does not separate learning from action biases. Studies using probabilistic and reversal learning tasks have reported heterogeneous findings: adolescent peaks in reward learning, better reward learning than adults, increased punishment learning in adolescents, mid-adolescent dips in punishment learning with later increases in reward learning, or peaks in both around late adolescence. Small sample sizes, diverse task demands, and lack of explicit action-bias assessment likely contribute to inconsistencies. A go/no-go study incorporating both learning and inhibition requirements found adolescents had attenuated go and Pavlovian biases with a generic learning rate unrelated to age, suggesting action initiation may also vary developmentally. Overall, evidence for differential developmental trajectories of reward vs punishment learning is mixed, and existing designs often cannot isolate action initiation biases from learning processes.

Methodology

Design and participants: Cross-sectional study of 742 typically developing youths (491 girls), aged 9–18 years, recruited from 11 European sites within the FemNAT-CD project. Exclusions included psychiatric diagnoses, learning disability, serious physical illness, histories of disruptive behaviour disorders, and poor task performance criteria. Pubertal status was assessed via the Pubertal Development Scale. IQ was measured using Wechsler scales. SES was derived from standardized parental income, education, and occupation measures; missing data were imputed using fully conditional specification.

Task: A deterministic passive avoidance reinforcement learning task with eight abstract stimuli (four reward, four punishment) each presented 10 times (80 trials). Each stimulus had a fixed point value: ±1, ±700, ±1400, ±2000. On each trial, participants chose to respond (go) or withhold (no-go). For reward-associated stimuli, go yielded points; for punishment stimuli, go lost points; no response revealed only the running total for that trial. Stimuli displayed up to 3000 ms; feedback 1000 ms. Participants started at 10,000 points. No practice trials were provided to capture initial learning from trial 1.

Computational modelling: Seven reinforcement learning (RL) models were fitted via hierarchical maximum a posteriori (MAP) estimation with uninformative Gaussian priors and appropriate link functions. Models varied by: single vs separate reward/punishment learning rates (α or α_r, α_p), initial vs constant action initiation bias (b_i vs b_c), and magnitude sensitivity parameters (p; single or separate for reward/punishment). Response probabilities were computed using a Softmax with temperature β. Model comparison used random-effects Bayesian model selection (exceedance probability), Laplace-approximated log model evidence (LME), and integrated BIC (BIC_int). Parameter recovery and model identifiability were assessed via simulations (10,000 synthetic participants for recovery; repeated fits across the model space for identifiability). Control analyses removed first stimulus presentations (repetitions 2–10) and repeated model comparison in three age bins (9–12, 13–15, 16–18).

Statistical analyses: Behavioural learning was tested with GLMMs predicting responses (go/no-go) from age, repetition, valence, and their interactions, with sex and IQ as covariates and random intercepts for site. Pubertal stage analyses paralleled age analyses. Developmental effects on model parameters (α_r, α_p, β, b_c) were examined using robust linear mixed-effects regressions with age (or pubertal stage) as predictor, sex and IQ as covariates, random intercepts for site; quadratic age terms tested nonlinearity. Bayes factors for (null) effects were derived from BIC differences. Associations between model parameters and task performance (overall accuracy) used Spearman correlations.

Key Findings
  • Behavioural learning occurred overall and within reward and punishment conditions: repetition improved correct responses (overall OR=1.19 [1.17, 1.21], z=18.56, p<0.001; reward-only OR=1.13 [1.09, 1.16], z=8.48, p<0.001; punishment-only OR=1.28 [1.26, 1.32], z=19.65, p<0.001).
  • Age effects on behaviour: older age predicted greater overall learning (age×repetition OR=1.02 [1.01, 1.04], z=2.49, p=0.01) and more total correct responses (OR=1.08 [1.04, 1.11], z=4.58, p<0.001). Critically, improvement was specific to punishment (age×repetition×valence OR=1.09 [1.05, 1.13], z=4.65, p<0.001), with no age×repetition effect in reward-only analyses (OR=0.98 [0.95, 1.01], p=0.14) and a significant effect in punishment-only analyses (OR=1.07 [1.05, 1.11], z=5.63, p<0.001). Reward learning stability across age was strongly supported (BF01=57.80).
  • Model comparison: A model with separate α_r and α_p, constant action initiation bias b_c, and single magnitude sensitivity initially had highest exceedance probability (0.99) and LME, but its magnitude parameter showed poor recoverability (r=0.11). The selected winning model included separate α_r and α_p and constant b_c (2αβb_c), showing strong fit with good parameter recovery and identifiability and consistent winning across age bins; results were unchanged when excluding first presentations.
  • Developmental parameter trajectories:
    • Punishment learning rate increased with age: B=0.10 [0.05, 0.15], z=4.12, p<0.001; decisive evidence (BF10=167.25).
    • Action initiation bias decreased with age: β=−0.20 [−0.28, −0.12], z=−4.91, p<0.001; decisive evidence (BF10=8926.69).
    • Reward learning rate was stable with age: B=0.01 [−0.05, 0.07], z=0.30, p=0.77; strong evidence for null (BF01=13.60).
    • No age effect on temperature β: β=0.002 [−0.07, 0.07], z=0.08, p=0.94; strong evidence for null (BF01=26.11). Quadratic age terms were nonsignificant for all parameters.
  • Pubertal stage yielded similar patterns: higher stage associated with higher punishment learning (β≈1.20×10^−3, z=3.32, p<0.001), lower action initiation bias (β≈−1.06×10^−3, z=−3.57, p<0.001), and no association with reward learning (p=0.22) or temperature.
  • Parameter-performance associations: Overall accuracy correlated positively with α_r (Spearman r=0.40) and α_p (r=0.68) and negatively with action bias (r=−0.26) and temperature (r=−0.39) (all p<0.001), indicating optimal performance required higher learning rates and lower action initiation bias.
Discussion

The study disentangles valenced learning from action initiation during adolescence. Contrary to common assertions of heightened reward learning in adolescence, reward learning rates were stable across ages in this deterministic passive avoidance context, whereas punishment learning rates increased with age and action initiation biases declined. These asymmetric developmental changes suggest that apparent increases in reward-driven behaviour may often reflect reduced impulsive action initiation rather than enhanced reward learning. The findings were robust across chronological age and pubertal stage measures and validated through rigorous model comparison, recovery, and identifiability procedures in a large, multinational sample. The results have implications for theories of adolescent decision-making and for understanding psychopathologies with disrupted reinforcement learning and impulsivity, such as conduct disorder, where deviations from these normative trajectories could contribute to symptoms. They also suggest that interventions targeting action control may reduce risky behaviours that superficially appear reward-driven.

Conclusion

Using computational modelling in a large cross-sectional sample of youths, behaviour was best captured by a model with separate reward and punishment learning rates and a constant action initiation bias. Across adolescence, punishment learning increased and action initiation biases declined, while reward learning remained stable. These findings refine developmental theories that emphasize reward sensitivity by highlighting the importance of action initiation and punishment learning. Future research should employ tasks incorporating full action–valence crossovers (e.g., go-to-avoid-punishment, no-go-to-gain-reward), probabilistic outcomes, and additional parameters (e.g., variable learning rates, choice stickiness, forgetting) to map developmental mechanisms more comprehensively and to test cultural or contextual moderators.

Limitations
  • Task design lacked no-go-to-gain-reward and go-to-avoid-punishment conditions, preventing assessment of Pavlovian action biases.
  • Outcomes were deterministic rather than probabilistic or volatile, which may limit generalizability to other learning contexts.
  • The modelling space did not include variable learning rates, choice stickiness, or forgetting processes; some parameters showed moderate intercorrelations.
  • Cross-sectional design precludes within-person developmental inference.
  • Although multinational, site-specific or cultural effects were not the focus and may warrant dedicated investigation.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny