Psychology

Distinct basal ganglia contributions to learning from implicit and explicit value signals in perceptual decision-making

T. Balsdon, M. A. Pisauro, et al.

This fascinating research by Tarryn Balsdon, M. Andrea Pisauro, and Marios G. Philiastides explores the nuances of how we learn from implicit versus explicit feedback in decision-making. By employing advanced EEG-fMRI techniques, the study unveils unique neural signatures that could revolutionize our understanding of cognitive processes. Dive into the intricate world of perceptual decision-making!... show more

Introduction

The study investigates whether and how internal confidence estimates (implicit feedback) contribute to learning in perceptual decision-making, especially when explicit feedback is intermittently available. While explicit feedback is known to enhance learning rates across decision and perceptual processes, performance can also improve without explicit feedback, suggesting a role for confidence as a proxy for outcome information. Reinforcement learning (RL) frameworks have been extended to perceptual learning and can, in principle, incorporate both explicit and implicit (confidence-based) value signals. However, it remains unclear whether learning from confidence recruits the same or distinct neural mechanisms as learning from explicit feedback, and how these signals might be integrated. The authors aim to directly compare implicit and explicit feedback within the same task using simultaneous EEG-fMRI, testing whether confidence is used for learning even when explicit feedback is frequent, and mapping their neural signatures and integration within basal ganglia circuitry.

Literature Review

Prior work shows explicit feedback increases learning and performance in perceptual tasks, but learning can also occur without explicit feedback. Confidence provides a probabilistic estimate of decision correctness and may act as an internal reinforcement signal. RL concepts (expected value and prediction error) have been applied to perceptual learning and metacognition, with prior studies implicating dopaminergic and striatal systems in prediction errors, action values, reward anticipation, and confidence-related signals. Cortico-striatal and cortico-basal ganglia-thalamocortical loops support flexible learning via dopaminergic modulation. The functional roles of dorsal versus ventral striatum in learning and value coding are debated, with evidence for distinct contributions based on learning type, phase, and cortical inputs. Previous studies have also suggested integration of confidence with valuation in vmPFC; whether implicit and explicit signals are segregated or integrated within the basal ganglia remained an open question. This work addresses the gap by directly contrasting neural mechanisms for implicit (confidence-based) and explicit outcome value signals on interleaved trials within one task.

Methodology

Participants: 30 recruited; 7 excluded (data issues, low accuracy, or too few bets), yielding N=23 (right-handed; normal/corrected vision). Ethics approved; participants compensated. Task: Simultaneous EEG-fMRI during a random dot motion (RDK) left/right direction discrimination task. Each trial comprised three windows separated by jitter (1–2 s): decision (350 ms RDK; up to 1 s response), bet (up to 1 s to bet/not bet; betting doubled gains/losses), and feedback (cue indicated explicit feedback vs no-feedback). On explicit feedback trials, true points (+/−1; or +/−2 if bet) were shown; on no-feedback trials, a question mark prompted participants to infer points using confidence. Trials interleaved 50/50; six blocks of 50 trials (300 total). Difficulty was adjusted per block to maintain ~55–75% accuracy by decreasing motion coherence as performance improved via rules described (2-down-1-up initial calibration; blockwise adjustments). EEG acquisition/preprocessing: 64-channel MR-compatible system at 5000 Hz (later downsampled to 1000 Hz), gradient artifact removal by drifting template, median filter, bandpass 0.5–40 Hz, blink removal via PCA from calibration, conservative BCG removal via PCA on low-pass 4 Hz data, average reference; no baseline. fMRI acquisition/preprocessing: 3T Siemens Trio, EPI (TR=2 s; TE=30 ms; 32 slices; 3×3×3 mm; 8 mm smoothing), field-map unwarping, motion correction, slice timing correction, high-pass filtering, registration to MNI space. Behavioural analyses: Accuracy (proportion correct), sensitivity (d'), RTs (from stimulus offset), response perseveration computed as normalized probability of repeating responses contingent on stimulus repetition/alternation and feedback condition; statistics via repeated-measures ANOVAs and t-tests. Computational modelling: Three SDT-based learning models with reinforcement updates to perceptual means (μ) representing sensitivity. Learning update μ_{t+1}=μ_t+α L_t with learning rate α. Models differed in use of confidence: (1) learning only on explicit feedback trials (prediction error r_t−EV_t), (2) same but using confidence to learn on no-feedback trials, (3) confidence used on all trials (moderating EV_t on feedback trials and substituting learning signal on no-feedback). Parameters: initial μ_0, betting criterion b, α. Fitting minimized negative log-likelihood of choices and bet responses; model comparison via BIC and protected exceedance probability. EEG decoding: Linear Discriminant Analysis (LDA) with sliding 60 ms windows trained to discriminate bet vs no-bet in the decision-window (yielding “bet-prediction”) and positive vs negative explicit feedback in the feedback-window (yielding “feedback-prediction”). For each participant, robust spatial filters were selected based on maximal Az within group-level significant windows and averaged over adjacent time points. Filters were applied across time within and across windows to test re-emergence of signals. EEG-informed fMRI GLM: Eight regressors including stimulus, bet-cue (bet/no-bet), feedback cue, and four parametric regressors modulated by single-trial EEG predictions: (1) decision-window bet-prediction (post-decision confidence), (2) decision-window feedback-prediction (expected outcome value), (3) feedback-window feedback-prediction (explicit outcome value; explicit trials only), (4) feedback-window bet-prediction (implicit outcome value; no-feedback trials only). Events convolved with double-gamma HRF; motion and other confounds included; cluster-corrected thresholding (z≥2.57; min cluster size from permutation-derived 95th percentile). Connectivity analyses: ROI time-series extracted from dorsal striatum (implicit value), ventral striatum (explicit value), and external globus pallidus (GPe; voxels jointly significant for confidence and explicit value). Single-trial Pearson correlations (Fisher z) between GPe and striatal ROIs over 10 s post-feedback examined by condition. Cross-window correlations and subject-wise Granger causality assessed influence of earlier confidence-related BOLD (e.g., IFG) on later GPe BOLD. Psychophysiological interaction (PPI) used GPe time course (feedback to next trial) as physiological regressor and a psychological regressor encoding response perseveration contingent on feedback/bet; interaction identified regions where coupling predicted learning-related perseveration; analyses repeated separately for explicit and no-feedback trials.

Key Findings

Behavioural:

Task difficulty increased across blocks (coherence change: −10.43% ± 4.67; t(22)=4.64; p<0.001), with no substantial change in sensitivity (Δd′ = −0.37 ± 0.45; t(22)=1.67; p=0.109; BF10=0.75), consistent with perceptual learning despite decreasing coherence.
Bets indexed confidence: higher accuracy on bet vs no-bet trials (Δd′=0.98 ± 0.28; t(22)=7.35; p<0.001) and faster RTs on correct trials when betting (ΔRT=0.07 s ± 0.02; t(22)=6.63; p<0.001).
Explicit feedback reinforced behaviour: robust main effect of feedback sign on perseveration (F(1,22)=60.04; p<0.001; Bonferroni-corrected). Interaction with feedback magnitude did not survive multiple-comparisons correction (F(1,22)=5.18; p=0.033, uncorrected). Simulations without learning unlikely to generate observed effects (p<0.001 for sign effect; p=0.006 for interaction).
Confidence also drove perseveration on no-feedback trials (bet vs no-bet), indicating learning from implicit feedback.
Model comparison favoured the model using confidence on all trials (Model 3): BIC3−BIC1=82.21; BIC3−BIC2=32.11; protected exceedance probability=0.94. Simulations learning from explicit feedback alone failed to reproduce bet/no-bet perseveration differences (p=0.046). EEG decoding:
Decision-window bet-prediction discriminated bet vs no-bet from response to +0.25 s (mean F(1,22)=15.36; cluster-corrected p<0.001) and tracked accuracy (0.10–0.24 s; mean F(1,22)=6.66; p=0.002). It predicted correctness (β=0.22 ± 0.16; t(22)=6.62; p<0.001) and RT on correct trials (β=0.15 ± 0.06; t(22)=5.31; p<0.001), and bet decisions even on correct trials (β=1.53 ± 0.42; t(22)=17.34; p<0.001), supporting a graded confidence signal.
Bet-prediction re-emerged in the feedback-window on no-feedback trials, dissociating bet from no-bet (0.35–0.45 s; t(22)=2.58; p=0.003).
Feedback-prediction in feedback-window showed a main effect of valence (0.35–0.53 s; mean F(1,22)=9.62; p<0.001) and an interaction with magnitude (0.43–0.52 s; F(1,22)=6.81; p=0.006), reflecting overall outcome value. It predicted next-trial response perseveration (β=0.08 ± 0.02; t(22)=8.77; p<0.001). It also dissociated bet vs no-bet on no-feedback trials (0.17–0.26 s; mean F(1,22)=2.84; p=0.002) and was present post-decision (0.17–0.30 s after response; t(22)=2.96; p<0.001), suggesting expected value or early updates.
EEG-informed behavioural modelling: feedback-window EEG predictions explained behaviour comparably to behaviour-only model (ΔBIC≈1; PXP=0.45) and better than using bet-window predictions (BIC difference −141.08; PXP>0.99). EEG-informed fMRI:
Decision-window confidence correlated with BOLD in bilateral parietal cortex, posterior medial frontal cortex, inferior frontal gyri, left rostrolateral PFC, and external globus pallidus (GPe). Decision-window feedback-prediction (expected value) correlated with parietal operculum/insula.
Feedback-window explicit outcome value correlated with valuation network regions including ventral striatum and frontal lobes. Implicit outcome value (confidence-derived) correlated with left dorsal striatum. A dorsal–ventral striatal gradient emerged: dorsal striatum for implicit value, ventral striatum for explicit value. Integration and connectivity:
GPe ROI (voxels jointly related to confidence and explicit value) correlated with both dorsal and ventral striatum after feedback: dorsal–GPe mean z=0.498 ± 0.096; ventral–GPe mean z=0.275 ± 0.066; dorsal>ventral (t(22)=8.16; p<0.001). Correlation trends varied with feedback congruency but were not significant by condition (ventral: t(22)=0.48; p=0.63; dorsal: t(22)=1.94; p=0.07).
Cross-window analysis indicated GPe feedback-window BOLD was associated with earlier IFG BOLD (confidence-related). Subject-wise Granger tests supported a lagged IFG→GPe influence in 18/23 participants (median χ2=38.3; median p=2.62e−6; median lag=12 s).
PPI using GPe revealed coupling with thalamus, insula, and rostromedial PFC predicting learning-related response perseveration, similarly for explicit and no-feedback trials.

Discussion

The findings demonstrate that human observers use internal confidence signals to guide learning even when explicit feedback is frequently available. Behaviourally, confidence (indexed by betting) modulated accuracy, RTs, and response perseveration on no-feedback trials, and model comparison indicated that integrating confidence on all trials best explained behaviour. Neurally, EEG decoding isolated trial-wise signatures of post-decision confidence and explicit feedback, with confidence signals re-emerging at feedback when explicit feedback was absent. EEG-informed fMRI localized implicit (confidence-derived) and explicit outcome value signals to distinct striatal territories along a dorsal–ventral gradient, indicating separable value representations within the same task context. Evidence further suggested that these signals converge in the external globus pallidus (GPe), which correlated with both dorsal and ventral striatal activity and showed influence from earlier cortical confidence signals (IFG). PPI analyses implicated GPe–thalamus/insula/rmPFC coupling in translating integrated feedback into adaptive changes in subsequent decisions (response perseveration), irrespective of feedback source. Together, results support a framework in which confidence-derived implicit value and explicit outcome value are encoded separately within the striatum and integrated subcortically to update cortical decision processes via basal ganglia–thalamocortical loops.

Conclusion

This work provides convergent behavioural, EEG, and fMRI evidence that confidence serves as an implicit reinforcement signal supporting learning alongside explicit feedback. Implicit and explicit value signals are segregated along a dorsal–ventral striatal gradient and appear to be integrated in the external globus pallidus, which likely broadcasts reinforcement updates to cortical decision circuitry via thalamus and insula. The study highlights the importance of metacognitive confidence in shaping learning even when explicit feedback is available and delineates distinct yet interacting basal ganglia mechanisms for implicit versus explicit value coding. Future research should more precisely characterise the computations underlying GPe-mediated integration, refine computational models to capture nuanced feedback–magnitude interactions, and further test the generality of the dorsal–ventral gradient across tasks and modalities.

Limitations

The computational models were intentionally simple and did not capture all behavioural nuances (e.g., the full interaction of feedback magnitude and sign), limiting strong inferences about algorithmic implementation. The Granger-causal influence from IFG to GPe, while significant in most participants, was a relatively small effect and should be interpreted cautiously. EEG-informed fMRI infers deep structure involvement via covariance with EEG-derived regressors and shares limitations of fMRI temporal resolution and correlational connectivity (PPI). Sample size, while typical for EEG-fMRI (N=23 after exclusions), may limit detection of smaller effects and finer anatomical distinctions.

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Ventromedial prefrontal value signals and functional connectivity during decision-making in suicidal behavior and impulsivity

V. M. Brown, J. Wilson, et al.

Business

From insights to impact: leveraging data analytics for data-driven decision-making and productivity in banking sector

R. Gul and M. A. S. Al-faryan

Medicine and Health

Event-related brain response to visual cues in individuals with Internet gaming disorder: relevance to attentional bias and decision-making

B. Kim, J. Lee, et al.

Interdisciplinary Studies

Trapped in the prison of the mind: Notions of climate-induced (im)mobility decision-making and wellbeing from an urban informal settlement in Bangladesh

S. Ayeb-karlsson, D. Kniveton, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny