logo
ResearchBunny Logo
Dissociable neural correlates of uncertainty underlie different exploration strategies

Psychology

Dissociable neural correlates of uncertainty underlie different exploration strategies

M. S. Tomov, V. Q. Truong, et al.

This fMRI study by Momchil S. Tomov and collaborators reveals the intriguing neural mechanisms behind different exploration strategies in decision-making. The research uncovers how the brain's right rostrolateral and dorsolateral prefrontal cortices process uncertainties to guide both directed and random exploration, highlighting a complex interaction that informs our choices.

00:00
00:00
Playback language: English
Introduction
The explore-exploit dilemma, balancing familiar choices (exploitation) with potentially better unfamiliar ones (exploration), is fundamental to decision-making. Prior research suggests humans use various heuristics to navigate this trade-off, including softmax exploration and more sophisticated strategies guided by uncertainty. Uncertainty-guided exploration comes in two forms: directed exploration, favoring uncertain options with higher potential gains; and random exploration, introducing randomness to explore less favorable options. Directed exploration aligns with the Upper Confidence Bound (UCB) algorithm, which adds an uncertainty bonus to option values, while random exploration resembles Thompson sampling, which samples values from posterior distributions and chooses greedily. Previous work indicates that humans combine these strategies, with relative uncertainty driving directed and total uncertainty driving random exploration. This study used fMRI to investigate the neural mechanisms underlying these computations.
Literature Review
Earlier work proposed that choices are proportional to expected values (softmax exploration), related to probability matching. However, later studies demonstrated that humans utilize more advanced strategies involving uncertainty. Directed exploration involves choosing options with higher uncertainty, while random exploration adds stochasticity to choices based on overall uncertainty. The Upper Confidence Bound (UCB) algorithm models directed exploration by adding a confidence interval to expected values, while Thompson sampling captures random exploration by sampling values from posterior distributions. Existing evidence suggests humans employ a hybrid of these strategies, with distinct neural correlates possibly implicated. Dopamine gene expression profiles, transcranial magnetic stimulation (TMS) of right RLPFC, and developmental trajectories all point to a dissociation between directed and random exploration.
Methodology
Thirty-one participants underwent fMRI while performing a two-armed bandit task. Each arm was labeled 'safe' (consistent reward) or 'risky' (variable reward), allowing for independent manipulation of relative and total uncertainty. The trial structure consisted of a cue/choice phase, an inter-stimulus interval (ISI), feedback, and an inter-trial interval (ITI). Risky options delivered rewards drawn from a Gaussian distribution, while safe options provided constant rewards. The researchers used four trial types: RS (risky-safe), SR (safe-risky), RR (risky-risky), and SS (safe-safe). Subject choices were modeled using a probit regression incorporating value difference (V), relative uncertainty (RU), and total uncertainty (TU), reflecting a hybrid UCB/Thompson sampling model. fMRI data were preprocessed using SPM12, including realignment, coregistration, normalization, and smoothing. Two GLMs were employed: GLM1 assessed neural correlates of RU, TU, V, and V/TU; GLM2 investigated the neural representation of decision value (DV), a linear combination of V, RU, and V/TU. Decoding analyses were conducted to extract subjective estimates of RU and TU from brain activity and integrate them into the choice model.
Key Findings
Behavioral data confirmed that subjects showed a bias towards risky options (consistent with UCB) when relative uncertainty was high, and less sensitivity to value differences when total uncertainty was high (consistent with Thompson sampling). The probit regression analysis revealed significant contributions of V, RU, and V/TU to choices, supporting the hybrid UCB/Thompson model. GLM1 identified right RLPFC as representing relative uncertainty, and right DLPFC as representing total uncertainty, consistent with prior research. Right RLPFC activity significantly predicted variability in directed exploration, and right DLPFC activity predicted variability in random exploration. GLM2 showed that activity in left primary motor cortex (M1) reflected the combined decision value (DV). Decoding analysis showed that including neurally decoded RU from RLPFC and V/TU from DLPFC significantly improved choice predictions. The residual variance in M1's decision value signal positively correlated with the square of total uncertainty, suggesting a sampling mechanism for random exploration.
Discussion
The study's findings support a hybrid model of exploration-exploitation, where directed and random exploration are driven by distinct uncertainty computations implemented in dissociable brain regions. Right RLPFC encodes relative uncertainty and governs directed exploration via UCB, while right DLPFC encodes total uncertainty and drives random exploration through Thompson sampling. These computations are integrated in motor cortex, which computes the final choice, potentially through a sequential sampling mechanism. The results build upon and extend previous research by directly manipulating uncertainty, independently measuring relative and total uncertainty, grounding the exploration strategies in established machine-learning algorithms, and linking uncertainty computations to behavior and specific brain regions. The study provides a comprehensive account of how uncertainty contributes to choice behavior and clarifies the functional roles of RLPFC and DLPFC in this process.
Conclusion
This research demonstrates that humans use a hybrid strategy involving both directed and random exploration guided by distinct neural computations of uncertainty. Relative uncertainty is processed in right RLPFC, influencing directed exploration, while total uncertainty is processed in right DLPFC, affecting random exploration. These signals converge in motor cortex to compute choices, potentially via a sampling mechanism. Future studies could explore causal relationships by selectively disrupting activity in these regions, further validating the proposed computational architecture.
Limitations
The study primarily focused on a specific two-armed bandit task, potentially limiting the generalizability of findings to more complex real-world scenarios. The relatively small sample size (n=31) could also have affected the statistical power of certain analyses. Although several steps were taken to improve model interpretation, some aspects of the model, such as the lack of signal for V/TU in GLM1, warrant further research.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny