Psychology

Representational spaces in orbitofrontal and ventromedial prefrontal cortex: task states, values, and beyond

N. Moneta, S. Grossman, et al.

The orbitofrontal cortex (OFC) and ventromedial prefrontal cortex (vmPFC) not only encode expected value but also task states by integrating stimulus, context, and outcome information—forming complex, mixed‑selectivity representations akin to late layers in deep reinforcement‑learning models. This research was conducted by Nir Moneta, Shany Grossman, and Nicolas W. Schuck.

00:00

~3 min • Beginner • English

Index

Introduction

This review examines how the orbitofrontal and ventromedial prefrontal cortices contribute to value-based decision-making beyond encoding a unitary, common-currency value signal. The authors situate their inquiry within economic and reinforcement learning (RL) frameworks, noting substantial neuroscientific evidence that vmPFC/OFC encode subjective value across species. They highlight conceptual challenges to a pure common-currency account, including context dependence, goal-modulation, range adaptation, and the influence of internal states. The central hypothesis is that OFC/vmPFC integrate contextual and latent task-state information with outcome expectations, yielding representations that support flexible, context-sensitive valuation and choice. The review further proposes that late layers of deep RL models provide a computational analogy for these mixed-selectivity, outcome-predictive representations.

Literature Review

The paper synthesizes evidence across human fMRI, lesion studies, and non-human electrophysiology indicating that vmPFC/OFC encode subjective value but are strongly modulated by context, goals, and internal states. Classic reports of content-independent value in OFC/vmPFC are contrasted with violations of expected-value maximization and context/range normalization in behavior and neural signals. Studies demonstrate that vmPFC value signals generalize across tasks but are shaped by task rules and goals; internal states (e.g., satiety, fatigue) alter valuation and neural responses. Work decoding latent, partially observable task states from medial OFC/vmPFC supports the view that these regions encode a cognitive map of task space necessary for predicting outcomes. Lesion data indicate deficits in ignoring irrelevant options and in flexible credit assignment following OFC damage. Electrophysiology reveals mixed selectivity in OFC neurons for combinations of variables (e.g., flavor × probability; spatial and reward features), with population-level subspace organization allowing largely orthogonal readout of different variables. Context signals co-exist and interact with value signals in vmPFC/OFC, organizing which values influence behavior. Parallel findings in deep RL models show that late hidden layers develop representations reflecting task-relevant abstractions and mixed selectivity, not solely value, suggesting convergence between biological and artificial systems. The review also surveys evidence for schemas and stimulus–stimulus associations in OFC, hippocampal interactions (e.g., replay) supporting state and transition knowledge, and meta-learning perspectives linking OFC plasticity to fast learning within recurrent dynamics.

Methodology

Key Findings

- vmPFC/OFC encode not only expected value but also contextual and latent task-state information necessary for predicting outcomes, consistent with a cognitive map of task space. - In human vmPFC, both expected value and current task context are decodable from the same value-responsive region; decoding strengths are positively related. In one reported association, a stronger context signal accompanies a stronger expected value signal, and a weaker vmPFC context–value coupling correlates with a larger behavioral congruency effect (e.g., R = -0.39, p = 0.022 linking neural coupling to reaction-time congruency). - Behavior and vmPFC signals are influenced by an alternative, task-irrelevant expected value (the maximum value in the uncued context), with congruency-dependent effects on reaction times: higher irrelevant values speed responses on congruent trials and slow responses on incongruent trials. vmPFC representations of relevant and irrelevant values compete, and stronger context encoding attenuates the influence of irrelevant values on vmPFC value signals. - OFC/vmPFC value signals exhibit context/range normalization across species; range adaptation is reduced during forced-choice contexts, indicating context-dependence of normalization mechanisms. - Partially observable (latent) task states can be decoded from medial OFC; OFC lesions alter dopamine signaling consistent with a need for OFC to signal latent states for proper reward prediction. - OFC/vmPFC neurons/voxels display mixed selectivity for multiple variables (e.g., probability, flavor, spatial features), yet population activity can occupy near-orthogonal subspaces for different variables, enabling flexible readout. - Deep RL models trained for reward maximization develop late-layer representations that multiplex value with abstract task-state features and recent history, mirroring mixed selectivity. Units selective to non-value variables also predict performance; representational geometries reflect both value and strategic/perceptual factors. Recurrent deep RL architectures capture context- and state-dependent dynamics analogous to OFC/vmPFC. - Evidence suggests OFC/vmPFC compress inputs toward goal-relevant dimensions while retaining latent representations of task-irrelevant features/values that can influence behavior and be rapidly accessed when relevance changes.

Discussion

The findings converge on a view that OFC/vmPFC support decision-making by integrating latent task states, contextual cues, internal states, and outcome expectations, rather than encoding a singular, context-invariant value signal. This integration explains observed context-dependence (e.g., range normalization, goal-modulation) and the influence of hypothetical or future goals on ongoing valuation. The interplay between context and value in vmPFC—where context signals arbitrate competition between relevant and irrelevant value representations—provides a mechanism for flexible, goal-aligned choice. Mixed selectivity at the single-unit level, coupled with population-level separability, reconciles reports of multiplexed neural activity with generalizable value readouts. Deep RL models offer algorithmic parallels: outcome-driven objectives produce late-layer representations that embody task states and values, with recurrent dynamics supporting partial observability and meta-learning. Interactions with hippocampus (e.g., replay) likely supply transition knowledge and long-term memory reinstatement, enabling flexible generalization and planning across tasks.

Conclusion

This review reconceptualizes OFC/vmPFC as hubs that integrate task states and values within a high-dimensional representational space optimized for predicting outcomes in complex, partially observable environments. Rather than a pure common currency, vmPFC/OFC multiplex value with contextual and latent variables, allowing flexible readout for behavior. Alignments with late-layer representations in deep RL models suggest that outcome-maximization naturally yields mixed-selectivity codes of states and values. Future research should: (1) establish tighter correspondences between specific deep RL architectures/layers and OMPFC subregions; (2) test recurrent/meta-RL accounts of partial observability and rapid adaptation; (3) probe determinants of compression versus retention of task-irrelevant information; (4) characterize learning dynamics in OFC/vmPFC across stages of training; (5) disentangle value-based from value-free decision processes; and (6) elucidate hippocampus–OFC interactions during replay and on-task periods.

Limitations

- Direct, quantitative alignment between deep RL late-layer representations and OMPFC neural activity remains limited and mixed; prior work has sometimes failed to find direct representational correspondence. - Anatomical heterogeneity and species differences within OFC/vmPFC complicate fine-grained regional attribution and cross-study integration. - Most deep RL accounts discussed are model-free and may underrepresent the brain’s use of transition models; where and how model-based knowledge interfaces with OFC/vmPFC is unresolved. - Evidence for representation of task-irrelevant features/values raises questions about whether such signals are functional (supporting flexibility) or residual (limits of suppression/compression). - Many insights derive from specific task designs and fMRI multivariate decoding; generalizability across tasks, modalities, and learning stages needs further testing.

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Ventromedial prefrontal value signals and functional connectivity during decision-making in suicidal behavior and impulsivity

V. M. Brown, J. Wilson, et al.

Psychology

Prefrontal cortex executive processes affected by stress in health and disease

M. Girotti, S. M. Adler, et al.

Physics

Experimental test of the Greenberger-Horne-Zeilinger-type paradoxes in and beyond graph states

Z. Liu, J. Zhou, et al.

Biology

Prefrontal Cortex-Specific Knockdown of Neurexin-1 in Rats Induces Anxiety-Like Behavior, Repetitive Behaviors, and Altered Social Interactions: A Proteomic Study

D. Wu, S. Zhang, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny