logo
ResearchBunny Logo
Dissociable neural correlates of uncertainty underlie different exploration strategies

Psychology

Dissociable neural correlates of uncertainty underlie different exploration strategies

M. S. Tomov, V. Q. Truong, et al.

This fMRI study by Momchil S. Tomov and collaborators reveals the intriguing neural mechanisms behind different exploration strategies in decision-making. The research uncovers how the brain's right rostrolateral and dorsolateral prefrontal cortices process uncertainties to guide both directed and random exploration, highlighting a complex interaction that informs our choices.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses how the brain balances exploration and exploitation by testing the hypothesis that dissociable neural representations of uncertainty drive distinct exploration strategies. Prior work suggests two forms of exploration: directed exploration, which adds an uncertainty bonus to option values (as in upper confidence bound, UCB), and random exploration, which increases choice stochasticity with overall uncertainty (as in Thompson sampling). The authors hypothesize that relative uncertainty between options drives directed exploration and is encoded in right rostrolateral prefrontal cortex (RLPFC), whereas total uncertainty across options drives random exploration and is encoded in right dorsolateral prefrontal cortex (DLPFC). They further hypothesize that these signals are integrated downstream to compute a decision value that determines choice, potentially in motor cortex. The purpose is to test these predictions behaviorally and neurally using a two-armed bandit task with explicit manipulations of option uncertainty.
Literature Review
Earlier research characterized random exploration via softmax decision policies and probability matching, but later work showed humans also guide exploration using uncertainty. Directed exploration is well captured by UCB, which adds a bonus proportional to posterior standard deviation. Random exploration is consistent with Thompson sampling, where choices become more stochastic as total uncertainty increases, accounting for payoff variability effects. Evidence for a hybrid of directed and random exploration includes behavioral demonstrations in two-armed bandits and links to separate uncertainty computations: relative uncertainty for directed exploration and total uncertainty for random exploration. Converging evidence suggests dissociable biological substrates: differential associations with dopamine-related genes, causal modulation of directed (but not random) exploration by right RLPFC TMS, and distinct developmental trajectories. Badre et al. (2012) reported right RLPFC activity correlating with relative uncertainty and right DLPFC with total uncertainty, motivating region-of-interest tests in this study.
Methodology
Participants: 31 healthy, right-handed adults (17 female; ages 18–35) from the Cambridge community. Compensation included $50 plus a performance-based bonus. Task: Two-armed bandit with 32 blocks per subject, 10 trials per block, four block types counterbalanced: RS, SR, RR, SS, indicating risky (R) or safe (S) status for each arm. Risky arms yielded Gaussian-distributed rewards with fixed mean within a block (variance 16); safe arms yielded a fixed reward throughout a block. Arm means were sampled from a zero-mean Gaussian with variance 100 at block start. Subjects were informed of the safe/risky status and task statistics and completed four practice blocks. Trial structure included cue/choice (up to 2 s), feedback (1 s), variable ISI and ITI. Behavioral modeling: An ideal observer Kalman filter tracked posterior means Q_t(k) and variances σ_t^2(k) per arm. Directed and random exploration were modeled using a probit regression capturing a hybrid of UCB and Thompson sampling: P(a=1|w)=Φ(w1·V + w2·RU + w3·V/TU), where V is the value difference (Q1−Q2), RU is relative uncertainty (σ1−σ2), and TU is total uncertainty (sqrt(σ1^2+σ2^2)). Mixed-effects maximum likelihood estimation was used to fit coefficients, with model comparisons against UCB-only, Thompson-only, and softmax variants. fMRI acquisition: 3T Siemens Prisma, multi-echo MPRAGE anatomical scan; eight functional SMS-EPI runs per subject (TR=2 s, voxel size 1.5 mm isotropic; multiband factor 3; GRAPPA 2). Standard preprocessing in SPM12 (realignment, coregistration, normalization to MNI space, 8-mm FWHM smoothing, high-pass filtering, AR(1) autocorrelation correction). Runs with excessive motion (>2 mm or >2°) were excluded. Univariate analysis (GLM 1): At trial onset, parametric modulators included |RU|, TU, |V|, and |V|/TU (non-orthogonalized). Additional regressors modeled timeouts, chosen action, button press, and feedback onset. Whole-brain voxelwise threshold p<0.001 uncorrected with cluster-level FWE α=0.05. A priori ROIs (10-mm spheres) from Badre et al. were used: right RLPFC (MNI [36 56 −8]) for RU and right DLPFC (MNI [38 30 34]) for TU. Decision value analysis (GLM 2): A single trial-onset parametric modulator DV_i = w1·V_i + w2·RU_i + w3·V_i/TU_i, using subject-specific coefficients, with whole-brain inference as above. An ROI was defined in left primary motor cortex (M1) around the peak |DV| effect. Decoding analyses: Ridge-based inversion of GLM regressors to obtain trial-by-trial neurally decoded estimates of RU_s from right RLPFC and TU_s from right DLPFC, and DV from left M1, accounting for HRF lag. These decoded regressors were added to the behavioral probit model to test whether they improved choice prediction (AIC/BIC/LL/deviance). Residual variance analysis: Correlated trial-wise residual variance of the DV ROI (left M1) with TU^2 to test a sampling-based mechanism prediction. Statistical reporting: Group-level t-tests and ANOVAs reported with effect sizes and p-values. Model comparisons used AIC, BIC, and Bayesian model selection (protected exceedance probability).
Key Findings
Behavioral signatures of directed and random exploration: - Relative uncertainty manipulation (RS vs. SR) shifted the intercept of the choice function: RS intercept > SR intercept (F(1,9711)=21.0, p=0.000005); RS intercept > 0 (F=10.8, p=0.001); SR intercept < 0 (F=17.9, p=0.00002). Small effect of total uncertainty on intercept (RR vs. SS: F=4.1, p=0.04). - Total uncertainty manipulation (RR vs. SS) reduced the slope of the choice function (RR < SS: F(1,9711)=3.4, p=0.07, trend); no effect of relative uncertainty on slope (F=0.06, p=0.8). Hybrid model of choice (probit): Significant positive effects of all regressors: w1=0.166±0.016 (t(9716)=10.34, p<1e−20), w2=0.175±0.021 (t=8.17, p<1e−15), w3=0.005±0.001 (t=4.47, p<1e−5). Model comparisons: hybrid > UCB-only or Thompson-only > softmax. Simulations showed hybrid achieved higher performance; individual performance correlated positively with sensitivity to RU (r(29)=0.47, p=0.008) and V/TU (r(29)=0.53, p=0.002). Neural correlates of uncertainty (GLM 1 with a priori ROIs): - Right RLPFC (MNI [36 56 −8]) tracked relative uncertainty: β_RU significant (t(30)=3.24, p=0.003); no TU effect (t=−0.55, p=0.58); RU > TU contrast significant (t=2.96, p=0.006). - Right DLPFC (MNI [38 30 34]) tracked total uncertainty: β_TU significant (t(30)=3.36, p=0.002); no RU effect (t=0.71, p=0.48); TU > RU contrast trend (t=1.74, p=0.09). Downstream decision value (GLM 2): - |DV| negatively correlated with activity in left primary motor cortex (peak MNI [−38 −8 62]; cluster-level FWE-corrected). Left M1 DV-related activity correlated with behavior and model fit across subjects: lower BIC (better neural fit) associated with higher performance (r(29)=−0.44, p=0.01) and higher behavioral model log-likelihood (r(29)=−0.37, p=0.04). Neural decoding improves choice predictions: - Adding RU_s decoded from right RLPFC improved BIC (6407 vs. 6410 baseline); adding TU_s from RLPFC did not (6421 vs. 6410). - Adding TU_s decoded from right DLPFC improved BIC (6359 vs. 6410); adding RU_s from DLPFC did not (6419 vs. 6410). - Including both RU_s (RLPFC) and V/TU_s (DLPFC) further improved fit versus either alone (AIC 6273 vs. 6287; deviance 6249 vs. 6266; BIC comparable at 6359). Mechanistic evidence for sampling: - Residual variance of DV signal in left M1 scaled with squared total uncertainty TU^2 (t(30)=2.06, p=0.05), consistent with a Thompson sampling-like implementation of random exploration. Controlling for RT indicated that DV and RT jointly explained left M1 activity, consistent with sequential sampling dynamics.
Discussion
Findings support a dissociation between uncertainty computations and exploration strategies: right RLPFC encodes relative uncertainty that biases choices toward uncertain options (directed exploration), while right DLPFC encodes total uncertainty that increases choice stochasticity (random exploration). Trial-by-trial neural variability in these regions predicts deviations from ideal observer estimates that in turn predict choices, establishing a functional link between neural signals and behavior. A decision value combining value, relative uncertainty, and value scaled by total uncertainty is reflected in left primary motor cortex, suggesting that motor circuits integrate uncertainty and value signals to implement the categorical choice. The negative correlation between |DV| and motor activity, its relation to RT, and the scaling of decision signal variability with TU^2 are consistent with a sequential sampling mechanism and a sampling-based (Thompson) account of random exploration. These results replicate and extend prior reports (e.g., Badre et al.) by orthogonally manipulating relative and total uncertainty and connecting neural signals to formal computational models of exploration, thereby clarifying the neural computations underpinning explore-exploit behavior.
Conclusion
The study demonstrates that humans employ a hybrid of directed and random exploration driven by distinct uncertainty computations encoded in dissociable prefrontal regions. Relative uncertainty in right RLPFC supports directed exploration, total uncertainty in right DLPFC supports random exploration, and these signals are combined with value in motor cortex to compute choice. Decoding neural estimates improves prediction of choices, and variability patterns support a sampling mechanism. Future research could establish causal double dissociations via neuromodulation (e.g., disrupting RLPFC vs. DLPFC to selectively impact directed vs. random exploration), examine effector-specificity by changing response modalities, and further test sampling versus analytic implementations of uncertainty integration.
Limitations
- The relative uncertainty cluster in right RLPFC did not survive whole-brain FWE correction and required a priori ROI analysis. - The behavioral slope effect for total uncertainty (RR vs. SS) was modest and at trend level (F=3.4, p=0.07), likely due to limited sample size. - Risk (irreducible uncertainty) was not explicitly disentangled from estimation uncertainty; the task conflates them to some degree. - Left M1 DV effects are influenced by reaction time; although modeled separately, full dissociation is challenging due to expected coupling under sequential sampling. - Decoding analyses used ridge regularization without cross-validation (λ fixed), and relied on HRF lag approximations. - Generalizability beyond right-handed participants and the specific task structure remains to be determined.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny