
Psychology
Neural and computational underpinnings of biased confidence in human reinforcement learning
C. Ting, N. Salem-garcia, et al.
Explore the fascinating neural mechanisms behind biased confidence in human reinforcement learning! Delve into the groundbreaking research by Chih-Chung Ting, Nahuel Salem-Garcia, Stefano Palminteri, Jan B. Engelmann, and Maël Lebreton, which reveals how the VMPFC network encodes global confidence signals amidst contextual biases using fMRI technology.
Playback language: English
Introduction
Humans and animals constantly assess the accuracy of their decisions, actions, and statements, often expressing this assessment as confidence judgments. These metacognitive judgments are crucial for sequential decision-making, influencing evidence integration, speed-accuracy trade-offs, and changes of mind. Recent research suggests confidence is a key variable in understanding human reinforcement learning. Neurobiologically, confidence computation and judgment are linked to prefrontal networks: a negative network (dACC, insula, dorsomedial and dorsolateral PFC) and a positive network (ventromedial PFC, pgACC). While meta-analyses link the negative network to uncertainty and error detection, and the positive network to affect and valuation, the precise roles of these networks in confidence computation remain unclear. This study hypothesized a functional dissociation, with the negative network representing objective uncertainty and the VMPFC aggregating subjective confidence from uncertainty variables and other signals. To test this, the researchers used a reinforcement learning paradigm manipulating outcome valence (gains/losses) and information quantity (partial/complete feedback), exploiting the known valence-induced confidence bias (higher confidence in gain contexts). This design allowed examination of three confidence signals: objective uncertainty (higher in partial feedback), condition-specific confidence (context-dependent improvement), and task-wide confidence (overall feeling of confidence). The study aimed to identify brain regions encoding each signal based on their activation patterns in response to valence and information manipulations. fMRI data were recorded while participants performed the task and provided confidence ratings.
Literature Review
Numerous studies have investigated the neural correlates of confidence and metacognition, consistently implicating prefrontal networks. The dorsal anterior cingulate cortex (dACC) is often identified as a key region for performance monitoring and error detection, and more generally as part of a network negatively correlated with confidence judgments. Conversely, the ventromedial prefrontal cortex (VMPFC) and pregenual anterior cingulate cortex (pgACC) show positive correlations with confidence and self-performance evaluation across various tasks. However, empirical evidence directly comparing the roles of positive and negative networks in confidence computation is limited. One hypothesis suggests these networks represent different stages of confidence processing or distinct aspects like uncertainty and subjective confidence. Uncertainty, referring to probability distributions over variables underlying choices and confidence, can be distinguished from confidence (probability of decision correctness given evidence). These quantities may be confounded in previous studies due to their association with similar brain regions, despite being theoretically distinct. The current study builds upon previous work demonstrating a valence-induced confidence bias, where participants are more confident in choices leading to gains than losses, even with similar choice difficulty and accuracy across contexts. This bias provided a valuable tool for dissociating the neural representations of different confidence signals.
Methodology
Forty participants completed a probabilistic instrumental learning task inside an fMRI scanner. The task involved repeatedly choosing between pairs of abstract symbols probabilistically associated with monetary gains or losses. Two factors were manipulated: outcome valence (gain vs. loss contexts) and information quality (partial vs. complete feedback). Participants made choices and rated their confidence on a probabilistic scale (50-100%). Confidence judgments were incentivized using a matching probabilities mechanism. Decision and response processes were decoupled by introducing a delay between symbol presentation and response cue, minimizing the correlation between response times and confidence judgments. fMRI data were acquired using a 3.0-Tesla Philip Achieva scanner. Preprocessing involved realignment, unwarping, co-registration, segmentation, normalization, and smoothing. Five general linear models (GLMs) were used for fMRI analysis. GLM1 modeled cue presentation and outcome separately for each of the four contexts (gain/partial, loss/partial, gain/complete, loss/complete), with parametric modulators for trial-by-trial confidence. GLM2WID and GLM2SPE concatenated all contexts, using native confidence ratings and condition-specific z-scored confidence ratings, respectively. GLM3-5 used latent variables from a computational model of reinforcement learning to investigate the neural representation of value-related variables. The computational models included several variants of Q-learning, incorporating features like context-dependent learning and asymmetric updating, and were compared using Bayesian Model Selection (BMS). Confidence models were also constructed and compared to capture individual confidence judgments, incorporating terms for difficulty, valence bias, and autocorrelation. Regions of interest (ROIs) were defined based on GLM1 results and independent meta-analyses. Additional whole-brain analyses and conjunction analyses were performed. Bayesian Model Selection (MACS toolbox in SPM) was used for quantitative model comparison of fMRI data.
Key Findings
Behavioral analyses replicated the valence-induced confidence bias: confidence was higher in gain than loss contexts, especially with partial feedback. fMRI analyses revealed two main networks correlating with confidence: a positive network (VMPFC, pgACC, precentral gyrus, middle temporal gyrus) and a negative network (DLPFC, DMPFC, dACC, bilateral insula). The VMPFC showed a valence effect on cue-evoked activity, mirroring the behavioral confidence bias. This suggests VMPFC encodes task-wide confidence including valence-induced bias, while the negative network encodes condition-specific confidence. Model-based fMRI analyses using a computational model (RELASYM) showed that VMPFC activity correlated positively with chosen option values (Qc), but importantly, VMPFC activity was better explained by confidence than by Qc alone when both were included in the model. In contrast, the negative network showed correlations with both Qc and Qu (unchosen option value), suggesting a role in option comparison. This suggests a functional dissociation, with VMPFC representing task-wide confidence and integrating affective information, whereas the negative network tracks condition-specific confidence and potentially involved in option comparison. Further analyses confirmed that confidence signals dominated value signals in the VMPFC, even when ROIs were selected to favor value encoding.
Discussion
The study's findings challenge the prevailing view that VMPFC primarily encodes option values in reinforcement learning. Instead, it suggests VMPFC integrates both value and confidence information, particularly a task-wide confidence signal, and plays a critical role in the valence-induced confidence bias. The functional dissociation between VMPFC and the negative network highlights the importance of both global and local confidence signals for adaptive behavior. Task-wide confidence allows comparison of different choice situations, while condition-specific confidence reflects context-dependent learning. The lack of a brain region specifically sensitive to information manipulation might be due to low effect size, implicit uncertainty encoding, or participants' inference of unobserved outcomes. Model-based analyses strengthened the argument that confidence signals, not just value signals, are encoded in VMPFC. Discrepancies between the current findings and existing literature on VMPFC's role in value representation may be due to the influence of confidence elicitation on valuation processes, or the possibility that VMPFC jointly represents both value and confidence.
Conclusion
This study demonstrates a functional dissociation in prefrontal cortex during reinforcement learning, with VMPFC encoding task-wide confidence integrating valence bias, and the dorsal prefrontal network encoding condition-specific confidence. The findings challenge existing views on VMPFC's role, suggesting it encodes both value and confidence. Future research should investigate the interactions between different types of confidence signals and how they impact behavior, particularly focusing on confidence dysfunctions' clinical relevance.
Limitations
The study's design, while effective in dissociating different confidence signals, did not directly measure objective uncertainty. The computational model, while providing valuable insights, is a simplification of the complex cognitive processes involved. The relatively small sample size limits the generalizability of the findings. The artificial nature of the task may limit the ecological validity of the results. Finally, the delay introduced to decouple decision and response processes might have inadvertently altered normal neural activity.
Related Publications
Explore these studies to deepen your understanding of the subject.