logo
ResearchBunny Logo
Dopamine regulates decision thresholds in human reinforcement learning in males

Psychology

Dopamine regulates decision thresholds in human reinforcement learning in males

K. Chakroun, A. Wiehler, et al.

Explore the intricate relationship between dopamine and decision-making with this captivating study by Karima Chakroun, Antonius Wiehler, Ben Wagner, and their colleagues. Discover how L-dopa and haloperidol unveil the hidden mechanisms of reinforcement learning and action selection, challenging previous assumptions about neural prediction and response vigor.

00:00
00:00
~3 min • Beginner • English
Introduction
Dopamine is central to cognitive control, reinforcement learning, and decision-making. Phasic midbrain dopamine neuron activity encodes reward prediction errors that update value estimates in reinforcement learning. Classic accounts emphasize dopamine’s role in learning (value updating) through D1/D2 receptor pathways supporting go/no-go learning, while more recent views highlight dopamine’s role in performance and action selection, including regulation of response vigor and decision thresholds. Evidence for dopamine’s causal role in human reinforcement learning is mixed, with pharmacological and patient studies yielding heterogeneous findings. The present study sought to test two questions: (1) whether increasing dopamine (L-dopa) versus antagonizing D2 receptors (Haloperidol) affects reward-based learning and neural prediction error signals, and (2) whether dopamine regulates decision thresholds during reinforcement learning, tested with hierarchical Bayesian reinforcement learning drift-diffusion models (RLDDMs). The study was initially conceptualized to replicate Pessiglione et al. (2006) for gain learning, and to provide direct causal evidence for dopamine’s role in decision-threshold regulation in humans.
Literature Review
Prior work suggests L-dopa can improve go learning and enhance striatal reward prediction error signals in some studies, though others report null or even punishment-related effects, and blunted prediction error responses in Parkinson’s disease (PD). D2 antagonists sometimes impair reinforcement learning, but effects are inconsistent and dose-dependent; low doses may primarily block presynaptic D2 autoreceptors, increasing striatal dopamine release, potentially improving learning from positive feedback and enhancing prediction error signaling, whereas higher doses may suppress dopamine signaling. Beyond learning, dopamine has been implicated in action selection and response vigor, potentially via modulation of decision thresholds in basal ganglia circuits. Rodent and human pharmacology show patterns consistent with threshold reductions or increased response rates under elevated dopamine. Tyrosine and ropinirole have reduced thresholds in certain tasks, while bromocriptine shows task-dependent null effects. Gambling disorder has been associated with altered threshold adjustment. Direct causal evidence in human reinforcement learning, however, has been limited, motivating the current study’s pharmacological and computational approach.
Methodology
Design and participants: Within-subject, double-blind, counterbalanced pharmacological fMRI study in 31 healthy self-identified male volunteers (age 19–35). Each participant completed three sessions (Placebo, L-dopa 150 mg, Haloperidol 2 mg) one week apart, following an initial behavioral day for working memory and questionnaires. Drug administration: Upon arrival (t = 0 h), participants received haloperidol 2 mg or placebo; at t = 2 h, they received Madopar (L-dopa 150 mg + benserazide 37.5 mg) or placebo, yielding three conditions: Placebo (placebo/placebo), Haloperidol (haloperidol/placebo), L-dopa (placebo/L-dopa). About 30 min after the second pill, participants entered the scanner. Tasks: In each fMRI session, participants first completed a separate restless four-armed bandit (reported elsewhere), then performed a stationary reinforcement learning task (60 trials: two stimulus pairs, 30 trials per pair). Per pair, one option had an 80% reward rate (optimal), the other 20% (suboptimal). Choices were made within 3 s; binary feedback (1€ coin vs crossed coin) was shown; inter-event jitters (2–6 s) were included. Practice was completed before scanning. Behavioral preprocessing and model-agnostic analysis: Missed responses were rare. Accuracy, total rewards, and RTs were summarized per drug. Bayesian repeated-measures ANOVAs included covariates (working memory capacity: linear and quadratic; body weight). Replication of Pessiglione et al.’s behavioral effect used Wilcoxon signed-rank tests (L-dopa vs. Haloperidol) and a two-sample t-test restricted to first-session drug assignments. Computational modeling: Compared (i) DDM0 (no learning; constant drift), (ii) RLDDM1 (single learning rate η), and (iii) RLDDM2 (dual learning rates η+ and η−). Drift rate vt was linearly linked to the Q-value difference between optimal and suboptimal options (vt = vcoeff × (Qoptimal − Qsuboptimal)). Boundary separation α captured decision thresholds; non-decision time τ modeled perceptual/motor latencies; starting point z was fixed at 0.5. Trial-wise RTs of suboptimal choices were signed negative; fastest 5% RTs per participant were excluded. Models were estimated hierarchically via JAGS with wfpt likelihoods, using uniform priors for baseline parameters and Gaussian priors for drug shift parameters in a combined model (placebo as baseline with additive L-dopa and Haloperidol effects). Convergence was assessed via Gelman–Rubin R (≤ 1.01). Model comparison used estimated log pointwise predictive density (−elpd). Posterior predictive checks simulated 10k datasets to assess learning-related accuracy increases and RT decreases, including checks on RT percentiles and individual fits. Control analyses included RLDDMs with collapsing bounds (HDDM), and separate per-condition models. FMRI acquisition and preprocessing: Siemens Trio 3T, 32-channel head coil; single run per session; 40 slices; TR = 2.47 s; TE = 26 ms; 2×2×2 mm voxels with 1-mm gap; slices tilted 30° to reduce vmPFC/mOFC dropout. Standard SPM12 preprocessing: realign/unwarp, slice-time correction, coregistration, normalization (DARTEL), 8-mm FWHM smoothing. High-res T1 acquired post tasks. FMRI modeling: First-level GLMs per condition. GLM1 included decision onset, chosen−unchosen value (and its square), feedback onset, model-derived prediction error, plus regressors for error trials. GLM2 split feedback into positive vs negative prediction errors to reproduce Pessiglione et al. GLM3 examined average Q-values at choice. Parametric modulators were z-scored; values and prediction errors were computed using condition-specific group-mean RLDDM2 learning rates. Second-level flexible factorial models tested main effects and drug effects. Analyses focused on a meta-analysis-based reward ROI (ventral striatum, vmPFC, posterior cingulate) for small-volume correction; whole-brain FWE-corrected analyses were also performed. Covariates and control analyses: Working memory capacity (first principal component of listening/operation/rotation span) and body weight were tested as modulators of drug effects in hierarchical models; urgency/threshold collapse was tested with linearly collapsing bounds; exploratory whole-brain analyses probed drug effects on average Q-value at uncorrected p < .0001.
Key Findings
- Model-agnostic behavior: Participants performed above chance under all drugs (Bayesian signed-rank BF10 > 4000). Accuracy and median RTs were numerically higher and slower, respectively, under Placebo than under both drugs, but Bayesian RM-ANOVAs showed inclusion Bayes Factors < 3 for drug and covariates (no credible effects). Descriptive means (Mean ± SD): Accuracy Placebo .871 (.163), L-dopa .843 (.161), Haloperidol .840 (.136); Total rewards Placebo 40.839 (7.781), L-dopa 39.710 (6.394), Haloperidol 39.645 (5.395); Median RT (s) Placebo .820 (.188), L-dopa .788 (.143), Haloperidol .789 (.126). No session meta-learning (BF01 = 8.41). - Replication attempt: L-dopa vs Haloperidol total rewards showed no difference (Wilcoxon z = 0.668, p = .510, r = .140; 95% CI [−.263, .501]); also null when restricting to first-session drug (t = 0.943, p = .358, Cohen’s d = .412). - Model comparison: RLDDM2 (dual learning rates) outperformed RLDDM1 and DDM0 in all drug conditions by −elpd (smaller is better). Placebo: RLDDM2 67.3 (SE 48.5), RLDDM1 178.7 (47.7), DDM0 296.5 (52.1); similar ranking for L-dopa and Haloperidol. - Posterior predictive checks: RLDDMs reproduced learning-related increases in accuracy and decreases in RT over trials; RLDDM2 provided the best fit and captured RT distribution percentiles and individual distributions. - Drug effects on RLDDM2 parameters (combined model, placebo baseline): Both L-dopa and Haloperidol reduced boundary separation (decision threshold α) with high posterior probability of reduction (>97.5%). Haloperidol also reduced the negative learning rate (η−) with high probability; other parameters showed little credible drug effects. Effects replicated in per-condition models and with collapsing bounds: consistent threshold reductions without credible effects on rate of threshold collapse. - Behavior–model links: Individual drug-induced reductions in boundary separation predicted reductions in RTs (slowest third of trials) for both drugs (L-dopa RT difference BFincl = 290.191; Haloperidol RT difference BFincl > 10), and predicted accuracy differences for L-dopa (BFincl = 107.038) but not for Haloperidol (BFincl < 1). - fMRI replication: Within reward ROI, robust main effects across drugs for average Q-value (vmPFC/mOFC), chosen−unchosen Q-value (vmPFC/mOFC), and reward prediction error (bilateral ventral striatum, vmPFC/mOFC) survived small-volume correction. No significant drug effects on these neural measures in ROI or whole brain after correction. - fMRI exploratory: At uncorrected p < .0001, average Q-value effects were higher under L-dopa and Haloperidol vs Placebo in left anterior insula and dorsal anterior cingulate/pre-SMA. Overall: No credible evidence for improved gain learning or altered striatal prediction error coding under L-dopa vs Haloperidol. Strong, consistent evidence that both drugs reduce decision thresholds during reinforcement learning, consistent with dopamine increasing response vigor and modulating action selection mechanisms.
Discussion
Findings did not replicate prior reports of L-dopa improving learning from gains and enhancing striatal positive vs negative prediction error coding compared to Haloperidol. Likely contributors include experimental design differences: only a gain condition was used (omitting loss/neutral), potentially altering initial reward expectations and masking drug effects; slightly higher drug dosages; and the within-subject design, although no session effects were detected. In contrast, computational analyses robustly supported the hypothesis that dopamine regulates decision thresholds in value-based learning. Both L-dopa and Haloperidol reduced boundary separation, aligning with theories that elevated dopamine facilitates action initiation via basal ganglia circuits, reducing the evidence required to commit to choices and increasing response vigor. The similar direction of effects for Haloperidol is consistent with low-dose D2 blockade preferentially acting on presynaptic autoreceptors to increase striatal dopamine release. The modest Haloperidol reduction in negative learning rate observed in the combined model may reflect dampened no-go learning, though evidence was weaker and not consistent across all analyses. The neural data confirmed canonical value and prediction error signals in vmPFC and ventral striatum, respectively, but did not reveal drug modulation within a priori ROIs, leaving open whether striatal computations were directly altered. Exploratory findings in anterior insula and dACC/pre-SMA suggest a potential cortical control circuit contributing to decision-threshold modulation under elevated dopamine, consistent with literature linking these regions to threshold control and their connectivity with dopaminergic systems. Together, the results support a unifying account in which dopamine tunes decision thresholds and response vigor in value-based contexts, potentially bridging basal ganglia action selection models with vigor/effort frameworks.
Conclusion
This study provides causal evidence that elevating dopaminergic transmission reduces decision thresholds during reinforcement learning in healthy males, as shown consistently for both L-dopa and low-dose Haloperidol in hierarchical RLDDMs and control analyses. In contrast, previously reported beneficial effects of L-dopa on gain learning and striatal prediction error modulation were not observed, likely due to design differences. The results support computational accounts positing a role for dopamine in action selection and response vigor, potentially via basal ganglia and cortical control circuits (e.g., dACC/pre-SMA). Future work should: (1) test broader task contexts including gain and loss conditions and varying difficulty; (2) examine dose- and region-dependent pharmacodynamics, including different D2 antagonists/agonists; (3) include female participants and larger samples to assess inter-individual and baseline-dependent effects; (4) integrate concurrent physiological measures (e.g., pupillometry) or direct recordings; and (5) use confirmatory neuroimaging to identify circuit mechanisms underlying threshold modulation.
Limitations
- Sample restricted to males, limiting generalizability across sexes. - Modest sample size (n = 31) reduces power to detect non-linear or baseline-dependent pharmacological effects. - Task order and timing: the target task followed another learning task and occurred approximately 60 minutes after L-dopa ingestion—likely post-peak but pre half-life—potentially influencing learning-related effects. - Task design isolated the gain condition (no loss/neutral), which may have altered initial expectations and masked drug effects on learning/prediction errors; fewer stimulus pairs may have made the task easier than in the reference study. - Computational constraints: models with non-linear value-to-drift mappings and variable starting point failed to converge; only linear linkage was used. - Imaging results showed no drug effects within a priori ROIs; exploratory whole-brain findings were uncorrected and require confirmation. - Potential region- and dose-dependent actions of Haloperidol complicate interpretation of its mechanisms (autoreceptor vs postsynaptic effects).
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny