logo
ResearchBunny Logo
Dopamine transients encode reward prediction errors independent of learning rates

Biology

Dopamine transients encode reward prediction errors independent of learning rates

A. Mah, C. E. Golden, et al.

Biological models tie dopamine to reward prediction errors (RPEs) scaled by learning rates. Research conducted by Andrew Mah, Carla E.M. Golden, and Christine M. Constantinople shows that in a volatile, semi-observable-state task rats adjust initiation speed and use higher learning rates after state transitions, approximating Bayesian belief updates. Crucially, nucleus accumbens core dopamine encodes RPEs but not learning rates, pointing to dopamine-independent mechanisms for dynamic learning rates.... show more
Introduction

Reinforcement learning (RL) algorithms update state/action values using reward prediction errors (RPEs). While learning rates are often treated as static, behavior across species indicates dynamic learning rates that increase in volatile contexts. Dopamine is thought to convey biological RPEs to the striatum, driving dopamine-dependent plasticity of corticostriatal synapses. A central open question is whether dopamine encodes only the RPE or the RPE scaled by a learning rate. To address this, the authors studied rats in a volatile environment with latent reward blocks, measuring trial initiation times as a continuous readout of estimated state value and recording NAcc dopamine release to determine how learning rate dynamics are represented neurally.

Literature Review

Prior work established dopamine as encoding quantitative RPEs and driving learning via striatal plasticity (Schultz et al., Bayer & Glimcher; Steinberg et al.). Behavioral evidence across humans, non-human primates, and rodents supports dynamic learning rates adapting to environmental volatility (Behrens et al.; Nassar et al.; McGuire et al.). The basal ganglia RL framework posits cortical state representations with values stored in corticostriatal synapses modulated by dopamine. However, whether dopamine carries only RPEs or the product of RPE and learning rate remains unresolved, particularly when learning rates vary. Work on hidden-state inference shows dopamine can reflect beliefs under probabilistic timing (Starkweather et al.), and other neuromodulators (serotonin, norepinephrine) have been implicated in encoding uncertainty and modulating learning rates.

Methodology

Behavioral task: Rats performed a self-paced temporal wagering task with semi-observable reward blocks (low: 5,10,20 µL; high: 20,40,80 µL; mixed: all rewards). On each trial, an auditory tone signaled the offered reward; rewards were delivered after variable delays (exponential, mean ≈ 2.5 s) on 75%–85% of trials and withheld on 15%–25% (catch). Rats could opt out at any time. Trial initiation time (time from final reward/opt-out port poke to next trial start) served as a continuous measure inversely proportional to estimated environmental value. Cohort: 347 Long-Evans rats (215 males, 132 females) contributed behavioral data; photometry was collected in 14 rats. Behavioral analyses: Trial initiation times were z-scored per rat, satiety effects regressed out, and prior reward influences quantified via regressions on previous log2(reward) offers. Early (first 10) vs. late (last 10) mixed-block trials were analyzed to assess learning dynamics around block transitions. RL modeling: Value updated per V_{t+1}=V_t+α_t(R_t−V_t), with trial initiation time TI=D/V. Static learning rate models were fit separately to early and late trials; parameters were estimated via constrained optimization (fmincon) under log-normal noise, with cross-validation and held-out validation. Dynamic learning rate models tested: (1) Mackintosh: gain g_t=log2(R_t); (2) Pearce-Hall: g_t=|RPE_{t−1}|; (3) ΔBelief: g_t=1/(1−|B_t−B_{t−1}|), where B_t is the mixed-block posterior from Bayesian inference over blocks. Predictions included variance changes in initiation times within blocks and trial-by-trial dependence on ΔBelief controlling for RPE. Normative comparison: A Bayesian online changepoint detection (BOCPD) model (hazard rate 1/40) estimated run-length posteriors and changepoint probabilities; a truncated implementation limited run-lengths <75 trials. Photometry: NAcc dopamine was recorded via fiber photometry using GRABDA2h (AAV9-hSyn-GRAB_DA2h) with mCherry control (AAV1-CB7-Cl-mCherry) to correct motion artifacts (TMAC). Signals were z-scored per session; dopamine quantified as AUC from 0–0.5 s after event. Offer-cue responses were analyzed across reward volumes and blocks; reward history regressions included current and previous trial offers. Dopamine-RPE relationships were assessed by regressing AUC against model-estimated RPEs, separately for positive and negative RPEs, in early vs. late trials. Delay-period dopamine was analyzed by aligning to delay start/end and binning by reward delay, examining ramps and phasic responses.

Key Findings
  • Behavior: Trial initiation times were inversely related to block value; rats initiated more slowly in low vs. high blocks (p < 0.001, Wilcoxon signed-rank test, N = 347). Regression of initiation times on previous rewards showed larger coefficients for recent rewards decaying with lag, consistent with RL value estimation.
  • Dynamic learning rates: Following transitions into mixed blocks, rats exhibited fast learning early and slower learning later. Early trials integrated over fewer past trials (higher learning rates) than late trials; exponential fits to previous reward coefficients had smaller time constants early vs. late (p < 0.001, paired Wilcoxon signed-rank test, N = 347). Static RL models fit separately to early vs. late trials recovered higher α early (p < 0.001, Wilcoxon signed-rank test, N = 347) and fewer significant previous-trial coefficients; parameter sets generalized best within their respective phases.
  • ΔBelief model: Initiation time variance was higher early vs. late in mixed blocks, consistent with ΔBelief predictions and not with Pearce-Hall or Mackintosh models. For matched RPE bins, changes in initiation time were larger on trials with high ΔBelief than low ΔBelief (N = 347), supporting belief-driven learning-rate modulation.
  • Normative approximation: ΔBelief gain correlated strongly with BOCPD changepoint detection (average trials-to-detect correlation r = 0.97, p << 0.001; N = 200 simulated transitions), with higher gain on inferred changepoint trials. Unsigned RPE and log-reward did not systematically track changepoints.
  • NAcc dopamine encodes RPEs: Offer-cue dopamine responses scaled monotonically with reward volume (dips for small rewards). On 20-µL trials, dopamine increased in low blocks and decreased in high blocks, reflecting inverse scaling with expectations. Reward-history regressions showed positive current-trial coefficients and negative previous-trial coefficients, consistent with RPE encoding. Dopamine AUC correlated with model-estimated RPEs.
  • Independence from learning rate: Early vs. late mixed-block comparisons revealed no significant differences in dopamine’s dependence on previous trials, nor in slopes of dopamine vs. RPE for positive or negative RPEs across sessions (N = 994 sessions), indicating dopamine reflects RPEs independent of dynamic learning rates.
  • Delay-period signals: During reward delays, dopamine exhibited negative ramps (moment-by-moment negative RPEs). At reward-availability cues, dopamine phasic responses scaled with delay duration; baseline-corrected AUC increased with longer delays, reflecting beliefs about probabilistic reward timing, a form of expected (irreducible) uncertainty not captured by model-free TD alone.
Discussion

The data address whether NAcc dopamine represents the product of RPE and learning rate. Behavior showed robust, uncertainty-driven dynamic learning rates that increase near block transitions and scale with trial-by-trial changes in beliefs, approximating normative Bayesian changepoint detection. In contrast, NAcc dopamine at offer cues encoded RPEs without modulation by learning-rate dynamics: the influence of past trials and the slope of dopamine vs. RPE were similar early and late. Thus, dynamic learning rates likely arise from dopamine-independent mechanisms. During reward delays with probabilistic timing, NAcc dopamine reflected expected uncertainty (negative ramps and larger phasic responses with longer delays), consistent with hidden-state inference over trial timing. These results suggest a model-sensitive view in which dopamine encodes RPEs and certain aspects of state uncertainty (expected, irreducible) but not unexpected uncertainty linked to environmental volatility, implying distinct neuromodulatory substrates for learning-rate control.

Conclusion

This study demonstrates that rats use dynamic, belief-driven learning rates to adjust response vigor in a volatile task with hidden reward blocks, and that these dynamics approximate Bayesian online changepoint detection. NAcc dopamine encodes RPEs but not the product of RPE and learning rate, indicating dopamine-independent mechanisms govern learning-rate modulation. Dopamine reflects expected uncertainty during probabilistic reward timing but not unexpected uncertainty about latent blocks. Future research should examine dopamine heterogeneity across striatal subregions, employ cell-type-specific recordings, and investigate other neuromodulators (e.g., serotonin, acetylcholine, norepinephrine) as potential drivers of dynamic learning rates, as well as differences between naive and expert animals in policy vs. state learning.

Limitations

Recordings were confined to the nucleus accumbens core; dopamine activity shows heterogeneity across striatal subregions, necessitating broader sampling to assess modulation by hidden-state inference. More specific techniques (optogenetically tagged recordings, cell-type-specific fluorescence imaging) are needed to parse dynamics across dopamine neuron classes and their behavioral implications.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny