Psychology
Dopamine neurons encode trial-by-trial subjective reward value in an auction-like task
D. F. Hill, R. W. Hickman, et al.
The study investigates whether phasic midbrain dopamine neurons encode subjective reward value on an instantaneous, trial-by-trial basis. Subjective value cannot be directly measured and is typically inferred from aggregate choices, yielding averaged valuations. Prior neurophysiological work shows dopamine signals reflect subjective value modulated by states such as satiety and by constructs like temporal discounting and utility, but these assessments have relied on averages across multiple choices. Because subjective valuation is generated by stochastic brain processes, it likely varies from trial to trial even for identical rewards. To elicit single-trial subjective values, the authors employ the Becker–DeGroot–Marschak (BDM) auction-like mechanism, which is incentive compatible and encourages truthful revelation of instantaneous willingness-to-pay. The research asks whether dopamine neuron responses covary with these trial-by-trial subjective values (bids), even when the physical reward magnitude is held constant, and whether neural activity can predict upcoming bids.
Prior studies demonstrate dopamine signals reflect subjective value derived from aggregate behavior: dopamine concentrations vary with satiety and hunger; subjective value estimated via choice indifference points, temporal discounting functions, and economic utility functions is reflected in dopamine activity. Trial-by-trial regression-based analyses have shown dopamine reward prediction error responses vary with movement reaction time, learning of reward-predicting stimuli, expected reward timing, decision confidence, and licking responses to probabilistic predictors. Human neuroimaging has validated BDM as a tool to capture trial-by-trial fluctuations of subjective value for various rewards (e.g., food, movie trailers). However, no prior work directly tested whether dopamine neurons track intrinsic trial-by-trial subjective value fluctuations despite constant objective reward amounts, motivating the present approach.
Subjects: Two adult male rhesus monkeys (Macaca mulatta; Monkey V, 11 kg; Monkey U, 17.5 kg) extensively trained over several years and tens of thousands of BDM trials. Ethical approvals and welfare procedures followed UK regulations and University of Cambridge oversight. Surgical implantation of headposts and recording chambers was performed; recording sites in ventral midbrain were verified histologically.
Task (BDM): On each trial, a fractal cue indicated one of three fixed juice volumes (Monkey V: 0.3, 1.0, 1.7 ml; Monkey U: 0.2, 0.45, 0.7 ml). A vertical bid space corresponding to a 1.2 ml water budget appeared. Using a fore–aft joystick, monkeys positioned a cursor to indicate their bid (amount of water they were willing to forgo). After the bid stabilized for 500 ms (within up to 5 s), a computer opponent’s bid (uniformly random) was revealed. If the monkey’s bid ≥ computer bid, the monkey won and received the juice plus the remaining water (1.2 ml minus the computer bid). If the monkey’s bid < computer bid, the monkey lost and received the full 1.2 ml water but no juice. This second-price structure incentivizes truthful bidding and reveals instantaneous subjective value.
Behavioral analyses: Three fractal cues were heavily trained (>20,000 trials). Bids were analyzed for rank-ordering across reward magnitudes and coherence over time within and between sessions. A lasso regression (cross-validated; 31 candidate regressors) identified variables contributing to bid variability, followed by mixed-effects models to quantify contributions while accounting for trial and session as random effects. Key variables examined independently of reward magnitude included starting bid, total liquid consumed, previous competing bid (same magnitude), and previous result (win/lose), with additional analysis of win/lose streaks due to collinearity.
Electrophysiology: Single-unit recordings from ventral midbrain during task performance. Putative dopamine neurons were identified by wide spike waveforms (>1.8 ms), low baseline rates (<10 Hz), and significant task-related responses; others were categorized as non-dopamine neurons. Analyses focused on the second (value-related) component of cue-evoked responses in a species-specific window (Monkey V: 180–360 ms; Monkey U: 180–340 ms post-cue), excluding the initial attentional component. Linear regressions assessed correlations with reward magnitude and with bids; movement parameters (velocity, unsigned velocity, absement, unsigned absement) were tested to rule out motor confounds.
Subjective value vs reward magnitude tests: To determine whether neural responses reflected subjective value independent of physical magnitude, responses were examined within each reward magnitude across bid levels, and across different magnitudes for similar bids (within 5% match; comparisons with differing bid distributions were excluded). Population analyses used z-normalized activity.
Decoding: Support Vector Regression (SVR) decoded continuous bid values from neuronal responses during the value response window. Models were trained on 80% of data (stratified across bid deciles) and tested on 20% using five-fold cross-validation, repeated 300 times with 100 randomly selected trials per iteration. Decoding performance (R^2) was assessed as a function of neuron count for (i) bid-encoding dopamine neurons, (ii) all dopamine neurons, and (iii) non-bid-encoding dopamine neurons, with shuffled controls. Additional analyses added neurons ordered by single-neuron explained variance to estimate upper bounds on decoding accuracy.
- Monkeys’ BDM bids reflected instantaneous subjective value and were rank-ordered by reward magnitude. Session-level correlations between mean bid and juice volume were strong (overall R^2 = 0.61, p < 0.05; session average R^2 = 0.46, p < 0.05 in 96.9% of sessions; n=227 sessions Monkey V, n=309 Monkey U). Bids fluctuated trial-to-trial and day-to-day with coherent changes across reward magnitudes, consistent with time-varying subjective value.
- Lasso and mixed-effects modeling identified significant predictors of bids: reward magnitude, starting bid, total liquid consumed, previous computer bid (same magnitude), and previous result (win/lose) (adjusted R^2_y = 0.50, R^2 = 0.41). Win/lose streaks influenced bids differentially across monkeys (e.g., Monkey U: win streak β≈0.05, lose streak β≈−0.03, starting bid β≈−0.04, total liquid β≈0.47; Monkey V: win streak ≈0.016, lose streak β≈−0.007 (ns), starting bid β≈−0.10, total liquid β≈0.18; p < 0.05 unless noted).
- Electrophysiology: Of putative dopamine neurons, 65% (n=80/123) in Monkey V and 47% (n=68/145) in Monkey U showed graded value responses to reward-predicting cues. Subsets showed significant correlations with bids (Monkey V: n=41; Monkey U: n=32; p < 0.05). Population responses increased monotonically with both reward magnitude and bids (e.g., bid bins: Monkey V R^2=0.88, p=6.6×10^-12; Monkey U R^2=0.93, p=4.1×10^-14). Movement parameters did not explain neural responses (numbers of neurons modulated by velocity/absement failed to exceed 5% chance level).
- Subjective value encoding within constant magnitude: Within each reward magnitude, dopamine responses varied monotonically with bid levels (e.g., Monkey V mid magnitude R^2≈0.83, p<0.001; Monkey U mid magnitude R^2≈0.74, p<0.001), demonstrating graded subjective value coding independent of physical amount.
- Similar bids across different magnitudes evoked similar dopamine responses: For bids matched within 5% across small–medium, medium–large, and small–large magnitude pairs, no significant differences were observed in value-component responses (two-sided Wilcoxon signed-rank; p > 0.05 for Monkeys V and U), indicating encoding of subjective value rather than magnitude.
- Decoding: SVR predicted bids from dopamine responses. Single-neuron decoding was low, but accuracy rose rapidly with population size, reaching about 60% R^2 with ~20 neurons using bid-encoding dopamine neurons. Accuracy was lower with all dopamine neurons and lowest with non-bid-encoding neurons; shuffled controls yielded ~0% accuracy. Adding best-encoding neurons first showed asymptotic performance with relatively few units (10–20), with little gain from adding poorly encoding neurons.
The findings demonstrate that phasic dopamine reward responses track instantaneous subjective reward value on a trial-by-trial basis. Using the incentive-compatible BDM mechanism ensured that each bid reflected the animal’s current subjective valuation without constraints from explicit option sets. Dopamine neurons’ value responses scaled with bids both when reward magnitude varied and when it was held constant, and similar bids across different magnitudes produced similar neural responses, indicating coding of subjective value distinct from physical reward amount. The ability of small populations of dopamine neurons to predict upcoming bids supports a precise and behaviorally relevant neural code for momentary valuation. These results extend prior demonstrations of subjective value encoding (utility, temporal discounting) by moving from aggregate estimates to single-trial measures, linking dopamine reward prediction error signals to real-time fluctuations in valuation that guide behavior.
This study shows that midbrain dopamine neurons encode trial-by-trial subjective reward value, revealed via an incentive-compatible BDM task in highly trained rhesus monkeys. Neural responses in the value-related epoch tracked bids irrespective of physical reward magnitude, and small ensembles allowed accurate prediction of forthcoming bids. These results advance understanding of dopamine signals as precise, instantaneous encoders of subjective value rather than mere physical magnitude. Future work should examine whether similar coding extends to affective and socially relevant stimuli, explore generalization across broader reward types and task contexts, and leverage simultaneous recordings to assess population dynamics underlying rapid valuation.
- Generalizability: Data from two male rhesus monkeys; applicability to other individuals, species, or reward types (e.g., social/affective) remains untested.
- Task design: Only three reward magnitudes were used to increase trial counts; this may limit assessment of nonlinearities and risk attitudes, although fixed goods (not lotteries) minimized risk confounds.
- BDM considerations: While extensive training mitigates typical BDM pitfalls (misunderstanding, framing), residual influences such as previous outcomes (win/loss, streaks) and satiety affected bids.
- Neural sampling: Neurons were recorded non-simultaneously; decoding analyses may be conservative. Single-neuron decoding accuracy was limited; continuous SVR decoding typically yields lower apparent accuracy than binary classifiers.
- Motor confounds were tested and found unlikely, but residual unmeasured factors cannot be fully excluded.
Related Publications
Explore these studies to deepen your understanding of the subject.

