Psychology

Dopamine regulates decision thresholds in human reinforcement learning in males

K. Chakroun, A. Wiehler, et al.

Using pharmacological neuroimaging in 31 male volunteers (within-subjects: Placebo, 150 mg L‑dopa, 2 mg Haloperidol), this study found little evidence for previously reported L‑dopa benefits on gain learning or prediction-error signals, but reinforcement-learning drift diffusion models revealed consistent decision-threshold reductions under both drugs—supporting that dopamine regulates decision thresholds and links action selection with response vigor. Research conducted by Karima Chakroun, Antonius Wiehler, Ben Wagner, David Mathar, Florian Ganzer, Thilo van Eimeren, Tobias Sommer, and Jan Peters.

00:00

~3 min • Beginner • English

Index

Introduction

The study investigates how dopamine (DA) contributes to human reinforcement learning and action selection. Classical accounts posit DA neurons encode reward prediction errors used to update values, with D1-mediated signaling facilitating go learning for positive prediction errors and D2-mediated signaling affecting no-go learning for negative prediction errors. Beyond learning, DA is theorized to regulate action selection, response vigor, and decision thresholds within basal ganglia circuits, aligning with sequential sampling models such as the drift diffusion model (DDM) where thresholds determine the speed–accuracy trade-off. Prior pharmacological and patient studies have yielded mixed evidence for DA effects on human learning, motivating a rigorous within-subject, pharmacological fMRI investigation. The study aims to replicate L-dopa vs. Haloperidol effects on reinforcement learning and prediction error signals and to directly test whether DA modulates decision thresholds during reinforcement learning using hierarchical Bayesian reinforcement learning drift diffusion models (RLDDMs).

Literature Review

Evidence shows midbrain DA neurons encode reward prediction errors, with dorsal/ventral striatum activation reflecting these signals in fMRI and causal animal work linking DA neuron signaling to reinforcement learning. Pharmacological human studies are heterogeneous: L-dopa has been reported to improve go learning and amplify neural reward prediction error signals in some studies, while others found null or even punishment-related increases, including blunted prediction error responses in PD patients. D2 antagonists can impair, have no effect, or affect post-learning decision-making, with interpretation complicated by dose-dependent presynaptic autoreceptor engagement potentially increasing DA release at low doses. DA is implicated in action selection and response vigor, potentially via lowering decision thresholds. Rodent and human studies indicate DA or DA-related agents can reduce decision thresholds or increase response rates, though task domains may moderate effects (e.g., perceptual vs. value-based decisions). Gambling disorder, a putatively hyperdopaminergic condition, is associated with altered decision threshold adjustments. Direct causal evidence for DA regulation of decision thresholds in human reinforcement learning has been limited prior to this work.

Methodology

Design: Double-blind, counterbalanced, within-subject pharmacological fMRI study with three sessions (Placebo, L-dopa 150 mg, Haloperidol 2 mg) in healthy self-identified male volunteers (n = 31, age 19–35). Drug administration: First pill on arrival (2 mg Haloperidol or placebo), second pill 2 hours later (Madopar: 150 mg L-dopa + 37.5 mg benserazide or placebo). Scanning commenced 30 minutes after the second pill. Sessions were spaced one week apart. Tasks: In the scanner, participants first completed a restless four-armed bandit task (reported elsewhere), then a stationary reinforcement learning task of 60 trials (two pairs of fractal images; 30 trials per pair). For each pair, one stimulus was reinforced at 80% (optimal) and the other at 20% (suboptimal). Stimuli sides were randomized, feedback was binary (1€ coin for reward vs. crossed coin for no reward). Response window was 3 seconds; inter-trial jitters of 2–6 seconds followed choice and feedback. Participants practiced prior to scanning. Behavioral preprocessing: RTs for suboptimal choices were coded as negative; fastest 5% of trials per participant were excluded to avoid implausibly fast responses. Model-agnostic analyses included accuracy, total rewards, and median RTs with Bayesian repeated measures ANOVAs. Computational modeling: Three models were fitted using hierarchical Bayesian methods. DDM0 (no learning; constant drift), RLDDM1 (single learning rate with linear mapping from Q-value difference to trial-wise drift rate), RLDDM2 (dual learning rates for positive vs. negative prediction errors with linear mapping). Q-learning initialized at 0.5; learning rates estimated in standard normal space and transformed to [0,1]. DDM parameters included boundary separation (α), non-decision time (τ), and fixed starting point (z = 0.5). Trial-wise drift rate v_t = v_coeff*(Q_opt − Q_subopt). Hierarchical Bayesian estimation used JAGS with the Wiener module, 2 chains, burn-in 100k, thinning 2, retaining 10k samples; convergence assessed via R ≤ 1.01. Model comparison used estimated log pointwise predictive density (−elpd). Posterior predictive checks simulated 10k datasets per model and examined learning-related accuracy increases and RT reductions, plus RT distribution percentiles and individual participant fits. Collapsing bounds RLDDM variants were also fitted using HDDM. fMRI acquisition and preprocessing: Siemens Trio 3T, 32-channel head coil; one run of 60 trials per session (total 180 trials per participant). 40 slices per volume, TR 2.47 s, TE 26 ms; slices tilted 30° to reduce vmPFC/MOFC dropout. Preprocessing in SPM12 (realignment/unwarping, slice timing correction, coregistration, DARTEL normalization, smoothing 8 mm FWHM). fMRI analysis: First-level GLMs modeled choice onset, parametric modulators of chosen−unchosen Q-value (and squared term) or average Q-value, and feedback onset with model-derived prediction errors (including separate positive/negative prediction error regressors in GLM2). Parametric regressors were z-scored per participant. Second-level flexible factorial analyses examined main effects and drug effects. ROI-based small volume correction used a reward valuation mask (vmPFC/mOFC, ventral striatum, posterior cingulate) from meta-analyses; whole-brain FWE-corrected analyses were also performed. Covariates: Working memory capacity (PCA across listening, operation, rotation span), body weight. Side effects were monitored; none reported.

Key Findings

Behavior: Participants performed above chance under all drugs (Bayesian signed-rank tests, all BF10 > 4000). Accuracy and median RTs were numerically higher under Placebo than L-dopa or Haloperidol, but Bayesian RM-ANOVAs yielded inclusion Bayes factors < 3 for drug and covariate effects, indicating little evidence for differences. Direct replication of the primary behavioral effect (total rewards L-dopa vs. Haloperidol) showed no credible difference (Wilcoxon z = 0.668, p = 0.510; first-session subset t19 = −0.943, p = 0.358). Model comparison: RLDDM2 (dual learning rates) best fit across all drug conditions compared to RLDDM1 and DDM0 (−elpd lower is better): Placebo RLDDM2 67.3 (SE 48.5), L-dopa 195.4 (51.2), Haloperidol 336.8 (50.7), consistently outperforming RLDDM1 and DDM0. Posterior predictive checks: RLDDM2 reproduced learning-related increases in accuracy and decreases in RTs and matched RT distribution percentiles and individual participant RT trajectories; DDM0 failed to capture these dynamics. Drug effects on model parameters (combined RLDDM2): Boundary separation (decision threshold, α) was reduced under both drugs relative to Placebo. L-dopa: mean effect −0.114 [95% HDI −0.219, 0.001], P(effect < 0) = 0.977, directional Bayes Factor (dBF) = 37.462. Haloperidol: mean effect −0.125 [−0.228, −0.022], P(effect < 0) = 0.988, dBF = 72.841. Haloperidol also reduced negative learning rate (η−): mean −1.69 [−3.323, −0.125], P(effect < 0) = 0.987, dBF = 78.686; other parameters showed little credible evidence of drug effects. Separate per-condition models and collapsing bounds variants replicated threshold reductions, with little evidence for drug effects on the degree of threshold collapse. Individual differences: Reductions in α under L-dopa were strongly associated with RT differences (BF_incl = 290.191) and accuracy differences (BF_incl = 107.038) between placebo and drug in the slowest third of trials; for Haloperidol, α effects were associated with RT differences (BF_incl > 10) but not accuracy differences (BF_incl < 1). fMRI: Main effects across drug conditions replicated valuation and prediction error signals. Average Q-value effects in vmPFC/mOFC and reward prediction error in bilateral ventral striatum (small-volume corrected; e.g., vmPFC T = 5.06, p_SVC = 0.002; left ventral striatum T = 5.95, p_SVC < 0.001). No significant drug effects on these signals within ROI or whole brain after correction. Replication of Pessiglione et al. analysis showed robust prediction error main effects but no drug-by-sign interaction in ventral striatum. Exploratory uncorrected analysis (p < 0.0001) suggested higher average value effects under L-dopa and Haloperidol in left anterior insula and dorsal ACC/pre-SMA.

Discussion

Findings provide direct evidence that increasing dopaminergic neurotransmission reduces decision thresholds during reinforcement learning, consistent with DA’s role in action selection and response vigor accounts. Both L-dopa (substrate increasing DA synthesis) and low-dose Haloperidol (likely blocking presynaptic D2 autoreceptors and increasing striatal DA release) produced similar threshold reductions. Although behavioral and fMRI replication of L-dopa vs. Haloperidol effects on learning and striatal prediction error signaling did not succeed, differences in task design—most notably isolating only the gain condition—dosage variations, and potential individual variability may explain the discrepancies. RLDDM2 robustly captured learning-related behavior and revealed pharmacological threshold modulation across modeling schemes, with individual differences in threshold changes tracking RT changes. The lack of drug effects on striatal prediction errors suggests that DA’s impact in this paradigm was more pronounced in action selection mechanisms than in value updating, possibly involving cortical control circuits (dACC/pre-SMA and insula) interconnected with dopaminergic and basal ganglia systems. Domain specificity may be important: DA may more reliably modulate decision thresholds in value-based tasks than in perceptual tasks.

Conclusion

The study demonstrates that pharmacological increases in dopamine reduce decision thresholds during human reinforcement learning, supporting a computational account linking DA to action selection and response vigor. While previously reported L-dopa vs. Haloperidol effects on learning and striatal prediction error coding were not replicated, RLDDM-based analyses consistently revealed threshold reductions under both agents. These findings bridge basal ganglia circuit models of action selection with sequential sampling accounts and suggest cortical–subcortical circuitry may implement DA-driven threshold adjustments. Future research should extend to female participants, larger samples to test non-linear baseline-dependent drug effects, varied task domains and difficulty, and confirmatory neuroimaging to delineate the specific circuits mediating DA-induced threshold changes.

Limitations

Generalizability is limited by inclusion of only male participants. The sample size (n = 31) may be insufficient to detect non-linear or baseline-dependent pharmacological effects. Task timing occurred approximately 60 minutes after L-dopa ingestion and after completion of a prior task, potentially missing peak plasma levels, though several arguments and observed threshold effects argue against timing as the main explanation for null learning effects. Isolation of the gain condition likely altered initial prediction error profiles compared to prior designs, possibly masking learning-related drug effects. Dose- and region-dependent complexities of Haloperidol’s presynaptic vs. postsynaptic actions complicate interpretation. No significant drug effects on striatal prediction error signals were observed, leaving open questions about neural loci of threshold modulation.