logo
ResearchBunny Logo
Response outcomes gate the impact of expectations on perceptual decisions

Psychology

Response outcomes gate the impact of expectations on perceptual decisions

A. Hermoso-mendizabal, A. Hyafil, et al.

This fascinating study reveals how expectations influence perceptual decisions, showcasing the adaptability of rats in response to varying stimulus probabilities. Conducted by Ainhoa Hermoso-Mendizabal and colleagues, the research highlights a remarkable transition bias in decision-making deeply tied to reward feedback and previous outcomes.... show more
Introduction

The study asks whether and how expectations derived from recent trial history are flexibly combined with current sensory evidence in perceptual decisions, and whether this combination is modulated by the outcome of the preceding trial. In dynamic environments, priors must be updated continuously, giving rise to history-dependent biases. The authors hypothesize that animals form expectations about upcoming stimuli (e.g., tendency to repeat vs. alternate categories) and that the influence of these expectations on choice can be gated depending on recent outcomes (correct vs. error). They examine this in rats performing rapid auditory discriminations with blockwise serial correlations, aiming to jointly characterize expectation buildup and its moment-to-moment use.

Literature Review

Normative theories prescribe optimal integration of priors and sensory evidence and how beliefs should be updated in volatile environments, producing sequential effects in choices. Prior work shows flexible modulation of the impact of expectations, including exploration–exploitation switching and change-point-driven updates. Confidence has been linked to weaker expectation biases after low-confidence responses in the absence of feedback. Sequential effects include lateral (win-stay/lose-switch) and higher-order transition-based biases affecting reaction times, choices, and neural signals in humans and rodents. However, a unified framework capturing both prior formation and its outcome-dependent deployment on a trial-by-trial basis has been lacking.

Methodology

Subjects were male Long-Evans rats (n=25). Rats performed a reaction-time 2AFC auditory discrimination. Stimulus sequences were generated by a two-state Markov chain with blockwise probability of repeating the previous stimulus category: Repetitive blocks P_rep=0.7 and Alternating blocks P_rep=0.2 (block length ~200 trials). On each self-initiated trial, rats sampled a sound consisting of two amplitude-modulated components (frequency discrimination task: 6.5 kHz vs 31 kHz tones; level discrimination task: broadband noise from left vs right speakers). Stimulus strength s∈{0, 0.23, 0.48, 1} set the evidence magnitude; s=0 provided no net sensory evidence, with the rewarded side still determined by the Markov category. Correct responses yielded water reward; errors produced light plus timeout (typically 5 s; a subset used 1–5 s). Group 1 (n=10) performed the frequency task with correlated sequences (mean ~508 trials/session; ~56k trials/animal). Group 2 (n=6) performed a level discrimination variant with the same sequence correlations. Group 3 (n=9) was first trained with uncorrelated sequences, later with correlated sequences. Analyses: Psychometric functions measured (1) rightward choice vs stimulus evidence and (2) repeating choice vs repeating stimulus evidence to quantify fixed side bias B and repeating bias b. Reaction times were analyzed relative to expectation congruency. Sequential dependencies were quantified via a generalized linear model (GLM) combining current stimulus frames and up to 10 trials of history, separating lateral biases from rewarded (r+) and unrewarded (r−) responses, and transition biases from transitions classified by outcomes (++,+−,−+,−−). Models were fit per rat. To test whether post-error effects reflect resetting vs gating, the authors computed a transfer coefficient quantifying predictability of future transition bias from current bias under different outcome sequences. A compact generative dynamical model was developed with latent variables: (i) transition evidence z^T accumulating repetitions vs alternations with outcome-dependent leak, (ii) lateral evidence z^L producing a lateral bias, and (iii) a modulatory gating variable c_t that multiplies z^T×previous response to yield a transition bias y^T=c_t z^T r_{t−1}. Parameters were fit to each rat’s choices; model simulations were compared to empirical biases across sequences. Model comparisons tested necessity of modules (transition, lateral, gating) and against the GLM.

Key Findings
  • Rats exploited serial correlations by developing a repeating bias b aligned with block tendencies after correct trials, while fixed side bias B remained block-independent. After error trials, the repeating bias b nearly vanished in both block types, indicating history-independent choices post-error.
  • Reaction times mirrored choice biases: after correct trials, expected trials (block-congruent repeating evidence) had shorter RT than unexpected ones (ANOVA block×repeating category F(1,126)=134.59, p<1e−6; mean normalized RT difference ≈0.10). No RT modulation by expectation followed errors (three-way ANOVA including previous outcome: F(1,264)=26.77, p<1e−6; after-error block×repeating category F(1,126)=0.02, p=0.88).
  • Repeating bias b built up with the length of preceding correct sequences (n≈5–10) but was reset to near-zero after a single error, independent of stimulus strength and ITI; time-out duration (1–5 s vs 5 s) did not account for the reset.
  • Accuracy benefited from the prior overall: mean accuracy was higher following a correct trial than following an error (0.76 vs 0.72; p<1e−4), with benefits largest at low stimulus strengths and only when the prior was block-congruent.
  • GLM revealed two distinct history components: (i) a lateral bias exhibiting win-stay/lose-switch (positive weights for prior rewarded side r+, negative for prior unrewarded r−), and (ii) a transition bias driven specifically by transitions between two rewarded responses (T++), promoting repetition after ++ repetitions and alternation after ++ alternations. Transitions involving any error (+−, −+, −−) had negligible influence.
  • Critically, after an error, the transition kernel weights vanished, indicating a reset of transition bias, whereas lateral kernels showed only moderate error effects. The transition influence decayed over ~5 trials and was robust across animals and ITIs. Results replicated in the level-discrimination task (Group 2) and were present (smaller) even with uncorrelated sequences (Group 3), increasing when correlations were later introduced.
  • Transfer-coefficient analysis supported a gating, not a complete reset: the bias at trial t did not predict t+1 after an error (reset), but strongly predicted t+2 if t+1 was correct (rebound), indicating that accumulated transition evidence persisted but was temporarily gated off by errors.
  • Generative model fits showed: strong updates of z^T by ++ transitions only; outcome-driven gating variable c_t that collapsed after errors and rapidly recovered after correct responses; similar leak of z^T after correct and error trials (no differential decay), consistent with maintenance of accumulated evidence. The model outperformed reduced variants and the GLM in explaining choices and reproduced build-up, reset, and rebound of b across sequences (Pearson r≈0.96 across 2–6 trial sequences).
Discussion

The findings demonstrate that rats form expectations about response transitions based on recent rewarded repetitions/alternations and combine these expectations with sensory evidence. Crucially, previous outcomes gate the impact of these expectations: after errors, animals transiently enter an expectation-free mode where the transition prior does not influence choice or RT, yet the internal estimate of transition statistics is maintained and becomes effective again after the next correct response. Dissecting repeating bias into lateral and transition components explains asymmetries across repeating vs alternating environments. The outcome-dependent reset-and-rebound dynamics of the transition bias were robust across animals and tasks and present even without induced correlations, suggesting a fundamental mechanism for sequence processing. Compared with an ideal observer, this behavior reflects a rapid switch between exploiting an internal model and a sensory-driven mode, potentially linked to confidence and feedback processing. The dynamical model with a reward-driven gating variable provides a compact computational account and points to plausible neuromodulatory or circuit mechanisms that control when priors influence decisions.

Conclusion

This work shows that expectations about stimulus-response transitions are learned and used by rats to bias perceptual choices, but their impact is flexibly gated by recent outcomes: a single error suppresses the expression of the transition prior without erasing the underlying accumulated evidence, which re-engages after a correct response. A generative dynamical model with a reward-driven gating variable quantitatively captures build-up, reset, and rebound dynamics across tasks and conditions, outperforming alternative models. These results unify expectation formation and its outcome-dependent deployment, highlighting a fundamental, general strategy for balancing prior use and sensory evidence in dynamic environments. Future work should identify neural substrates and modulatory mechanisms implementing the gating (e.g., anterior cingulate and neuromodulatory systems), assess generalization to other species and modalities, and test causal manipulations of feedback, confidence, and arousal on the gating dynamics.

Limitations
  • Neural mechanisms were not measured; the proposed gating variable is computational and lacks direct physiological validation.
  • Experiments were conducted in rats and in specific auditory 2AFC tasks; generalization to other species or task structures, while suggested, remains to be directly tested.
  • The design relies on explicit feedback; the role of confidence independent of reward feedback was not isolated within this paradigm.
  • Block structures and volatility were fixed; the integration window (~3–5 trials) may reflect capacity limits, but alternative task structures could yield different dynamics.
  • While the model captures behavior well, it does not uniquely identify the underlying computations (e.g., alternative formulations might also fit), and the interpretation of c_t (confidence vs mode switching) remains ambiguous.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny