logo
ResearchBunny Logo
Dopamine release in human associative striatum during reversal learning

Medicine and Health

Dopamine release in human associative striatum during reversal learning

F. Grill, M. Guitart-masip, et al.

Simultaneous [11C]Raclopride PET-fMRI reveals dopamine release in associative striatum when reward contingencies flip, with peak receptor occupancy linked to reward prediction errors and sensitivity to mistakes, and overlapping fMRI signals for perseverance errors. This research was conducted by Filip Grill, Marc Guitart-Masip, Jarkko Johansson, Lars Stiernman, Jan Axelsson, Lars Nyberg, and Anna Rieckmann.

00:00
00:00
~3 min • Beginner • English
Introduction
Learning, unlearning, and relearning action–outcome associations are essential for optimizing behavior in uncertain environments. Probabilistic reversal learning paradigms probe decision flexibility under reinforcement learning, where reward prediction errors (RPEs) act as teaching signals. Positive RPEs reinforce rewarded actions; negative RPEs signal a need to explore alternatives and are tied to perseverance errors after reversals. Animal work shows midbrain dopamine (DA) neuron firing patterns consistent with RPEs, but subpopulations may also encode salience irrespective of valence, with dorsal striatal projections supporting attention, working memory, and motivation. Human studies have shown striatal involvement in RPE processing and modulation by DAergic drugs, with effects on reversal learning. However, direct human in vivo imaging associating spatially and temporally localized striatal DA release with RPEs and individual differences in reversal learning success has been lacking. The primary imaging method for DA function is PET with [11C]raclopride, enabling inference of DA release via receptor occupancy competition. Hybrid PET-MR allows simultaneous measurement of neurochemical and hemodynamic signals. This study tests whether unexpected rule reversals elicit DA release in associative striatum and whether the magnitude relates to absolute RPE (absRPE) and RPE sensitivity, and examines overlap with fMRI correlates of perseverance errors and absRPE, separable from ventral striatal valence responses.
Literature Review
Prior animal and human research implicates DA in reinforcement learning and reversal learning. Canonical findings show DA neuron firing increases to outcomes better than expected and decreases to worse-than-expected outcomes (Schultz et al., 1997), with striatal processing of RPEs observed in human neuroimaging. Evidence suggests DA subpopulations encode salience and unexpected events beyond value, with dorsal striatal projections supporting cognitive control functions (Bromberg-Martin et al., 2010; Matsumoto & Hikosaka, 2009). Rodent midbrain DA neurons projecting to dorsal nucleus accumbens/caudate border signal learning from lack of expected rewards (Ishino et al., 2023). In humans, DAergic drugs modulate RPEs and reversal learning performance (Pessiglione et al., 2006; Chowdhury et al., 2013; Cools et al., 2009; van den Bosch et al., 2022). PET with [11C]raclopride is established for imaging striatal DA release via receptor occupancy competition, with measurable changes over tens of minutes in pharmacological and cognitive paradigms. Hybrid PET-fMRI permits concurrent DA release and BOLD measures, though proportionality between DA release and BOLD magnitude remains unclear.
Methodology
Design: Two-forced-choice reversal-learning task performed during simultaneous dynamic [11C]raclopride PET and BOLD fMRI. A long stable period was followed by a volatile period with repeated, unexpected reversals of reward contingencies. Participants: 30 recruited; 26 included after exclusions (13 female; mean age 25.73 years, SD 4.57, range 20–36). Ethical approval obtained; informed consent provided. Compensation: 1000 SEK plus up to 600 SEK based on rewards. Behavioral task: 250 trials. On each trial, participants chose index finger (guess number >5) or middle finger (<5). Feedback: reward +3 SEK if correct, +0 if incorrect; visual feedback arrows; cumulative earnings shown. Timing per trial: response window 2 s, fixation 2 s, outcome 2 s, pseudorandom ITI 1–13 s (25 trials ≈ 5 min). Reward contingencies: stable period (first 150 trials) index finger rewarded 80% vs middle 20%. Without cue, after 150 trials contingencies reversed; thereafter reversed every 25 trials during volatile period (last 100 trials). Participants were unaware of reversals. Computational modeling: Five candidate RL models evaluated (rSTAN 2.26.1). Best-performing model estimated two parameters: β (inverse temperature) and α (RPE sensitivity/learning rate). Signed RPEs converted to unsigned absRPE. Mean absRPE magnitude over 25 trials after first reversal quantified unexpectedness; related via linear regression to perseverance errors and to α. Mean absRPE and α correlated with peak DA receptor occupancy; α related to total reward with quadratic term; robustness checks across model space. PET acquisition and analysis: Bolus 250 MBq [11C]raclopride; 68-min dynamic TOF acquisition (frame schedule: 6×10 s, 6×20 s, 6×40 s, 9×60 s, 26×120 s). MR-based attenuation correction; reconstruction with OSEM (3 iterations, 28 subsets, 3.0 mm post filter) including decay, randoms, scatter, attenuation corrections; voxel size 1.56×1.56×2.78 mm³. Motion correction (FSL MCFLIRT) to 25th frame; HYPR denoising; temporal smoothing (3-frame Gaussian [0.25 0.50 0.25]). T1w parcellation (FreeSurfer) used to extract ROI TACs. Voxelwise TACs modeled using linear parametric neurotransmitter PET (lp-ntPET) to estimate dynamic BPND, with cerebellar gray matter reference and multilinear reference tissue model with fixed k2′ (from whole striatum). Five gamma basis functions fitted, hypothesis-driven to align with first reversal transition; best-fitting function interpreted as [11C]raclopride displacement at reversal. Individual parameter maps thresholded (voxelwise F-statistics > 9.55), normalized to MNI152, and entered into second-level analysis (FSL randomise, 5000 permutations; one-tailed one-sample t-test; TFCE corrected) to locate coherent DA release during the stable-to-volatile transition. The significant displacement cluster ROI provided individual TACs for refined lp-ntPET fits yielding dynamic BPND curves and DA receptor occupancy computed as Occupancy(%) = ((pre BP_ND − post BP_ND)/pre BP_ND) × 100. Predicted TAC path (no DA release) computed for visualization. Control analyses: lp-ntPET fits in anatomical striatal ROIs (caudate, putamen, nucleus accumbens); simulations to assess model bias; testing gamma functions at multiple time points; fitting all reversal events; assessing confounds (e.g., motion); single-subject examples. fMRI acquisition and analysis: 50-min BOLD fMRI starting 8 min after PET start; parameters: FOV 25.6, matrix 96×96, slice thickness 3.6 mm, TE 30 ms, TR 4000 ms, flip angle 90°, acceleration factor 2.0; voxel size 1.95×1.95×3.9 mm³. Preprocessing (FSL FEAT): motion correction (MCFLIRT), B0 unwarping, slice-timing correction, brain extraction (BET), spatial smoothing (8 mm FWHM), intensity normalization, high-pass filtering (sigma = 25 s), registration to T1w via boundary-based registration, then to MNI152 via FLIRT and FNIRT. Events defined as whole trials (cue/choice/feedback not separable due to TR). Primary contrast: perseverance error (25 trials after first reversal) > rewarded correct response (25 trials before first reversal), tested within PET-derived DA release ROI using randomise with small-volume TFCE correction. Secondary whole-brain GLM: regressors for reward, no reward, and trial-wise absRPE orthogonalized to valence; absRPE as regressor of interest to identify voxels responding to unexpectedness independent of sign; visualization via Workbench.
Key Findings
- Behavior: Reversal manipulation effective; choice probability adapted to contingencies. Perseverance errors after the first reversal showed large inter-individual variability (mean 5.84 trials, SD 6.79). Mean absRPE magnitude increased at reversals and was associated with perseverance errors over 25 trials after first reversal (F(1,24)=6.78, p=0.0156, R2=0.19): a 0.1 increase in mean absRPE prolonged perseverance by 2.06 trials. Mean absRPE magnitude related to RPE sensitivity (F(1,24)=47.20, p=4.18e-7, R2=0.65): a 0.1 increase in absRPE magnitude associated with 0.14 decrease in RPE sensitivity. - PET DA release: Voxelwise lp-ntPET showed a bilateral cluster primarily in caudate with significant [11C]raclopride displacement (peak MNI xyz [10,10,10]; peak t(25)=6.30; k=828 voxels; TFCE-corrected p=0.0002), indicating DA release during the stable-to-volatile transition. ROI analysis confirmed BPND decrease in caudate (t=8.17, Bayes Factor=8.40e5); decreases not observed in putamen (t=2.17, BF=1.51) or nucleus accumbens (t=2.19, BF=1.58). DA receptor occupancy peaked at the first reversal for all participants (peak occupancy mean 12.56%, SD 6.32) within ~2 min of transition. - Behavioral correlations: Peak DA occupancy negatively correlated with mean absRPE magnitude over 25 trials post-reversal (r=−0.57, p=0.003), and positively correlated with RPE sensitivity (r=0.59, p=0.0015). Relationship between RPE sensitivity and total reward suggested an inverted-U (linear term t(23)=−1.76, p=0.09; overall r≈−0.01, p=0.95), consistent across models. Pre- or post-reversal static BPND did not correlate with behavior. - fMRI BOLD: Perseverance error > reward contrast showed significant BOLD differences within the DA release cluster in right caudate (peak MNI xyz [12,16,14]; t(25)=3.11; small-volume TFCE-corrected p=0.0404), overlapping DA release site. Whole-brain trial-wise absRPE regressors revealed BOLD responses in striatum (peak MNI xyz [−12,20,−2]; t(25)=4.37; TFCE-corrected p=0.0432), largely co-localized with DA release. Cortical activations to absRPE included right anterior insula (xyz [36,22,2]; t=4.75; p=0.0346), right DLPFC (xyz [48,34,22]; t=5.43; p=0.0212), bilateral parietal cortex (xyz [−44,−42,46]; t=6.08; p=0.0082), and occipital cortex (xyz [−28,−88,14]; t=4.98; p=0.031). Valence (reward > no reward) produced canonical ventral striatal responses adjacent/inferior to the DA release/absRPE sites. Individual BOLD–DA occupancy correlations were not significant.
Discussion
Findings provide direct human in vivo evidence that unexpected lack of reward during reversal learning elicits dopamine release in associative striatum, with concomitant hemodynamic signals overlapping this site. The magnitude of DA release relates to smaller absRPE magnitude at reversal and higher RPE sensitivity, indicating faster adaptation following errors. This supports a model in which DAergic responses to unexpected events (surprise) in associative striatum engage mesocortical circuits for cognitive control, distinct from ventral striatal reward valuation signals. The cortical network co-activated with absRPE (insula, DLPFC, parietal) aligns with attention and control systems, suggesting that DA release may signal the need to increase cognitive control when environmental contingencies change. The relationship between RPE sensitivity and performance follows an inverted-U, implying that both overly rigid and overly flexible adaptation can be suboptimal, with medium DA reactivity potentially optimal. Mechanistic coupling between PET-derived DA occupancy and BOLD remains unresolved; BOLD likely reflects mixed neurotransmitter processes and receptor-type specific effects, warranting multi-tracer or pharmacological hybrid PET-fMRI designs to disentangle contributions.
Conclusion
Simultaneous dynamic [11C]raclopride PET and fMRI during a reversal-learning task pinpoint associative striatum as the site of DA release when unexpected events are encountered. DA release magnitude during the transition from a stable to volatile environment is associated with enhanced reversal learning and higher RPE sensitivity, while being separable from ventral striatal reward valence signals. The study advances a human model of reversal learning where DAergic responses to absRPEs activate mesocortical cognitive control networks. Future research should clarify dose–response relationships of DA release across tasks, implement multi-tracer or combined cognitive–pharmacological PET-MR protocols, and optimize fMRI designs to separate cue, choice, and feedback to better link neurochemical and hemodynamic signals.
Limitations
- PET single-scan design may carry methodological biases; however, control analyses (motion, model controls) and convergence with fMRI and behavior reduce concern. Lack of task-free resting PET precludes quantifying model bias. - The winning cognitive model with a single RPE sensitivity parameter outperformed more complex models allowing trial-by-trial modulation, suggesting limited evidence for dynamic α changes in this paradigm. - Long TR (4 s) prevented separation of cue/choice from feedback, limiting event-specific fMRI inferences. - Unclear mechanistic linkage between DA PET signals and BOLD activations; DA may bind to mixed receptor types (D1-like and D2-like), and non-DA neurotransmission contributes to BOLD. - Exploratory analyses did not find comparable DA release magnitude during later volatile periods, raising questions about context and timing dependence of DA responses. - Generalization of DA release magnitude across studies/tasks is uncertain; medium optimal DA release remains to be defined.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny