Psychology

Disentangling the roles of dopamine and noradrenaline in the exploration-exploitation tradeoff during human decision-making

A. Cremer, F. Kalbe, et al.

This groundbreaking study explores how dopamine and noradrenaline influence the exploration-exploitation tradeoff in decision-making. Conducted by Anna Cremer and colleagues, participants experienced altered reward sensitivities through targeted pharmacological manipulation, revealing dopamine's role in directed exploration and noradrenaline’s influence on randomness in choices.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses how humans balance exploration of new options with exploitation of known, rewarding options—an essential aspect of adaptive decision-making implicated in psychiatric conditions. Neural evidence suggests distinct systems for exploitation (vmPFC) and exploration (frontopolar to lateral PFC). Dopamine and noradrenaline may differentially contribute: dopamine typically signals reward value and prediction errors, but also relates to directed exploration via uncertainty/novelty bonuses and prefrontal mechanisms; noradrenaline has been linked to increased strategy shifts and random, value-independent exploration, potentially acting as a reset signal that interrupts ongoing processing. The research aims to disentangle dopamine and noradrenaline contributions to subcomponents of exploration and exploitation using a virtual patch-foraging task manipulating reward, depletion, and switching costs (travel time).

Literature Review

Prior work links exploitation to vmPFC mechanisms and exploration to frontopolar/lateral PFC circuits. Striatal dopamine encodes reward prediction and future rewards; genetic variations affecting dopamine (e.g., COMT) associate with directed exploration proportional to uncertainty, potentially via novelty bonuses. Novel stimuli activate dopaminergic systems. Noradrenaline is associated with exploratory behavior, promoting strategy shifts at high levels and perseverance at low levels, and is thought to enhance random exploration by increasing choice stochasticity and acting as a reset signal that interrupts ongoing processing. Pharmacological and animal studies indicate noradrenergic manipulations affect choice consistency and exploration. Despite accumulating evidence, distinct roles of these neurotransmitters in exploration-exploitation remain incompletely understood, motivating the present pharmacological investigation.

Methodology

Design: Double-blind, placebo-controlled, between-subjects experiment. Participants received either placebo, 400 mg amisulpride (D2/D3 antagonist), or 40 mg propranolol (β-adrenergic antagonist). Due to pharmacokinetics, amisulpride was administered 120 min and propranolol 90 min before task onset. All participants ingested pills at both time points to maintain blinding. Participants: 69 healthy volunteers (33 women, 36 men), ages 18–35 (mean 24.98, SD 3.67), assigned to placebo (n=22), amisulpride (n=23), or propranolol (n=24). Exclusions: current medical conditions/medications, neurological/psychiatric history, drug/tobacco use, hormonal contraceptives (women). Participants avoided caffeine, exercise day-of, and food/drink (except water) 2 h prior. Testing in afternoon/evening, time counterbalanced. Ethics approved; informed consent obtained. Manipulation checks: Blood pressure and heart rate measured at baseline and 90, 120, 150, 180 min post first pill using OMRON M500; mean of two readings per time point used. Eye-tracking (RED-m) measured pupil diameter and blink rate at baseline (T1) and 90 min after first pill (T2) during 60 s fixation; changes computed as T2–T1. Task: Sequential virtual patch-foraging. On each trial, decide to stay (harvest) or switch to next tree. Repeated harvests deplete returns. Travel time (switch cost) manipulated: short 6 s or long 12 s, constant within block/orchard. Four blocks of 7 min (total 28 min), alternating travel times, start counterbalanced. Choices via keypress within 1 s; harvesting took 3 s; switching incurred travel time display (6 or 12 s). Tree initial richness: Gaussian mean 10, SD 1; depletion per harvest: Beta(14.9, 2.0). Participants informed about variability in richness/depletion (equal across orchards) and that only travel time differed across orchards. Total apples converted to payment. Statistical analyses: Mixed-effects ANOVAs tested drug effects on physiological and eye-tracking measures (between-subject factor group; within-subject factor time), with post hoc t-tests. Mixed-effects logistic regression modeled choice (stay=0, switch=1) as a function of previous return, travel time (short=0, long=1), depletion rate, number of previous stays at current tree, group (placebo reference), and interactions of the first four factors with group; random intercept per subject. Model selection via AIC and likelihood-ratio tests, adding factors incrementally. Models also fit separately for the first half (blocks 1–2) and second half (blocks 3–4). ANOVAs assessed total rewards and proportion of switch choices across groups and environments (short vs long travel time). Analyses in R; mixed models via lme4. Marginal Value Theorem (MVT) analysis: Simulated optimal exit thresholds where expected next harvest falls below average environmental return per time step (ph), yielding thresholds 6.7 (short travel time) and 5.67 (long). Individual exit thresholds computed as average apples in last two harvests before leaving (excluding single-harvest cases). Compared groups to optimal thresholds and across groups. Computational modeling: Error-driven learning MVT model estimating learning rate α (0–1), inverse temperature β (0–∞), and intercept c (bias). Average reward rate p updated trial-by-trial with prediction error δ = r/τ − p, where τ is time per trial (harvest h for stay, travel time d for switch). Choice probability followed a logistic function of kr − ph with parameters β and c. Parameters estimated per participant via maximum likelihood (optim).

Key Findings

Manipulation checks: Heart rate decreased more in the propranolol group over time than in placebo and amisulpride (time×group: F(5.05,164.09)=3.12, p=0.010). Around task time, propranolol had lower heart rate than placebo and amisulpride (e.g., post-task vs placebo t(43)=-2.70, p=0.010; vs amisulpride t(44)=-2.70, p=0.010). Systolic BP decreased more in propranolol (time×group: F(6.43,208.89)=2.91, p=0.008) and was significantly lower than amisulpride at multiple time points (e.g., 120 min: t(44)=-2.78, p=0.008). Blink rate decreased after propranolol vs placebo (t(39)=-2.29, p=0.027) and vs amisulpride (t(37)=-2.89, p=0.006). Pupil diameter decreased most in the amisulpride group vs placebo (t(36)=-3.20, p=0.003). Physiological changes did not correlate with switch proportions (all r<0.13, p>0.30). Model selection: Full mixed-effects logistic regression with previous return, travel time, depletion rate, number of previous stays, group, and interactions (excluding group×group) fit best (AIC 14859), improving over reduced models (e.g., Model 4 AIC 14928; Model 6 vs 5: χ²=86.743, df=8, p<0.001). Choice behavior (overall): - Previous return: Higher previous reward reduced switching (β=-0.749, z=-21.259, p<0.001). Interaction: amisulpride increased this sensitivity (previous return×amisulpride β=-0.192, z=-3.625, p<0.001); propranolol showed attenuated sensitivity (previous return×propranolol β=0.092, z=1.956, p=0.050). - Travel time: Long travel time reduced switching (β=-0.691, z=-8.214, p<0.001), with a stronger effect in amisulpride (travel time×amisulpride β=-0.623, z=-4.948, p<0.001). Propranolol did not differ from placebo (β=-0.076, z=-0.644, p=0.520). - Depletion rate: No main effect (β=-0.287, z=-0.376, p=0.701). Interaction: amisulpride increased switching with higher depletion rates (depletion rate×amisulpride β=2.685, z=2.298, p=0.022); propranolol interaction ns (β=-0.574, z=-0.531, p=0.595). Temporal dynamics (first vs second half): Main effects of previous return and travel time replicated in both halves (e.g., first half previous return β=-0.80, z=-15.49, p<0.001; travel time β=-0.56, z=-4.69, p<0.001). Depletion rate had no effect in first half (β=0.31, z=0.29, p=0.77) but in second half lower depletion rates led to more switching (β=-3.16, z=-2.68, p=0.007). Behavioral biases: first half, amisulpride showed enhanced switching vs placebo (β=2.36, z=2.58, p=0.010); second half, propranolol tended to switch less vs placebo (β=-1.6, z=-1.88, p=0.061). Amisulpride maintained increased sensitivity to previous reward and travel time across halves; depletion rate effects significant in amisulpride in both halves. Performance: Total rewards did not differ by group (F(2,66)=1.68, p=0.19), with trends for amisulpride > placebo (t(43)=1.92, p=0.061) and amisulpride > propranolol (t(45)=1.65, p=0.11). Rewards higher in short vs long travel time environments (F(1,66)=229.85, p<0.0001); no group×environment interaction (F(2,66)=1.176, p=0.31). Switch percentage did not differ by group (F(2,66)=0.48, p=0.62); fewer switches in long travel time blocks (F(1,66)=36.89, p<0.0001); no interaction (p=0.36). MVT analysis: Exit thresholds differed by environment (F(1,66)=47.70, p<0.0001) but not by group (F(2,66)=0.37, p=0.69); no group×environment interaction (p=0.29). Groups did not deviate from optimal thresholds (short 6.7: all p>0.76; long 5.67: all p>0.85). Computational modeling: Three α outliers (>3 SD; one per group) identified. Amisulpride group had a significantly lower learning rate α than propranolol (t(43)=2.16, p=0.036, d=-0.65) and trended lower than placebo (t(41)=1.99, p=0.054, d=-0.61). Placebo vs propranolol α did not differ (t(42)=0.28, p=0.78). Temperature β did not differ (F(2,66)=1.53, p=0.22; all pairwise p>0.12). Choice bias c did not differ (F(2,66)=0.51, p=0.60; all pairwise p>0.34).

Discussion

The findings dissociate dopaminergic and noradrenergic contributions to exploration-exploitation. Blocking D2/D3 receptors with amisulpride increased sensitivity to decision-relevant features—high prior rewards, long travel time (switching costs), and depletion rate—consistent with enhanced directed exploration early in the task and more informed exploitation thereafter. This aligns with accounts that D2 blockade can sharpen PFC representations, possibly via a shift toward D1-dominated states that promote strong, noise-resistant representations while leaving striatal dopamine signaling intact. In contrast, β-adrenergic blockade with propranolol reduced the utilization of value information, consistent with increased value-independent randomness in choice and with theories positing noradrenaline as a reset or urgency signal altering information gathering and commitment. Although the direction of noradrenergic effects on random exploration varies across studies, differences may reflect tonic vs phasic modes of noradrenergic activity and β-adrenergic receptor-mediated modulation of inhibitory tone. Overall, dopamine appears to govern directed exploration and sensitivity to task structure, while noradrenaline contributes to randomness and disengagement from current information, shaping exploration-exploitation balance.

Conclusion

This study shows functionally distinct roles of dopamine and noradrenaline in human exploration-exploitation. Amisulpride (D2/D3 blockade) heightened sensitivity to prior reward, switching costs, and depletion rates, supporting dopamine’s role in directed exploration and informed exploitation. Propranolol (β-adrenergic blockade) was associated with reduced reliance on value information and tendencies toward value-independent switching patterns, consistent with noradrenaline’s role in random exploration and decision urgency. These insights advance understanding of neuromodulatory control over exploration strategies and may inform interventions for psychiatric disorders characterized by exploration-exploitation biases. Future work should incorporate measures of fatigue/boredom, baseline task performance, and within-subject designs to strengthen causal inferences and clarify tonic vs phasic noradrenergic contributions.

Limitations

Potential confounds such as tiredness or boredom were not measured and may affect switching behavior. The between-subjects design lacks baseline performance measures, limiting control over pre-existing differences; a within-subject design would be stronger. While peripheral drug effects were measured and did not correlate with switching, residual peripheral influences cannot be entirely excluded. The modeling’s temperature parameter may have limited interpretability due to restricted value ranges inherent to task design.

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Decision-Making with Predictions of Others’ Likely and Unlikely Choices in the Human Brain

N. Ma, N. Harasawa, et al.

Interdisciplinary Studies

Trapped in the prison of the mind: Notions of climate-induced (im)mobility decision-making and wellbeing from an urban informal settlement in Bangladesh

S. Ayeb-karlsson, D. Kniveton, et al.

Medicine and Health

The experiences of adult patients, families, and healthcare professionals of CPR decision-making conversations in the United Kingdom: A qualitative systematic review

M. Hartanto, G. Moore, et al.

Education

Impact of artificial intelligence on human loss in decision making, laziness and safety in education

S. F. Ahmad, H. Han, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny