Interdisciplinary Studies
A unified neural representation model for spatial and conceptual computations
T. Haga, Y. Oseki, et al.
Place cells in hippocampus (HPC) and grid cells in entorhinal cortex (EC) represent physical space with local and hexagonal patterns. Grid-like codes have also been observed for nonspatial conceptual or perceptual spaces, and HPC exhibits conceptual representations, suggesting shared mechanisms for spatial navigation and semantic cognition. From a reinforcement learning (RL) perspective, predictive representations such as the successor representation (SR) and default representations (DR) have been used to explain spatial codes and flexible planning. However, neurons in HPC/EC also encode nonspatial semantic concepts (concept cells) across modalities and during imagery/memory recall, implying high-level conceptual representations. How spatial representation models like SR extend to semantic concepts remains unclear. The authors ask whether a single computational principle can unify spatial navigation and semantic representation learning, and whether such a model can reproduce place/grid/concept-cell-like responses and support inferences in both domains.
Prior work links hippocampal–entorhinal representations to RL and predictive coding. Successor representation (SR) explains place/grid phenomena and has empirical support. Linear RL and default representations relate EC codes to flexible behaviors. Grid-like codes have been observed for nonspatial domains (visual, olfactory, social, semantic). Models have produced grid-like patterns via nonnegativity/orthogonality constraints or by training recurrent networks for path integration/navigation. Other unified models address spatial and nonspatial cognition and clustering-based accounts of place/grid cells. Concept-cell models often rely on sparsity/cluster assumptions; semantic cognition has been modeled via distributed representations and deep-network theories. In NLP, distributional semantics via skip-gram and GloVe relate to PMI/PPMI matrix factorization. Despite these advances, a principled, biologically interpretable unification that ties value-based spatial navigation and semantic embeddings, and that accounts for concept cells and nongrid EC representations, has been limited.
Core idea: co-occurrence prediction is a shared principle in RL and word embedding. The authors extend the successor representation (SR) to successor information (SI) and its rectified form positive SI (PSI), connecting RL value functions and NLP information measures.
- Definitions: SR(s,s') is the discounted expected occupancy of state s' starting at s. SI(s,s') = log SR(s,s') − log P(s'), and PSI(s,s') = max(SI, 0). Intuition: log SR captures temporal proximity (reachability); subtracting self-information normalizes by base frequency; rectification discards weak distant relations.
- RL correspondence: In linear RL with a default policy and control costs, in a goal-directed navigation setting (nonterminal states, a goal state connected to a terminal that yields reward; uniform negative step costs), the optimal value function v*(s) is proportional to SI under the default policy for the goal state. Thus SI encodes value for navigation.
- NLP correspondence: Writing SI as the log of a discounted co-occurrence over the product of marginals reveals a formal analogy to pointwise mutual information (PMI), and PSI to positive PMI (PPMI). Therefore, low-rank factorization of PSI should behave like word embeddings.
- Disentangled SI (DSI): Factorize the PSI matrix via constrained nonnegative matrix factorization variants to obtain D-dimensional, nonnegative vectors X(s) and W(s') such that X(s)·W(s') ≈ PSI(s,s'). Two variants: DSI-decorr uses nonnegativity, decorrelation across dimensions, and L2 regularization (motivated by grid emergence); DSI-sparse uses nonnegativity and L1 sparsity (motivated by interpretable semantic features). Biological plausibility: activities are nonnegative; lateral inhibition can decorrelate/sparsify.
- Spatial simulations: 2-D square room (30×30 grid). Generate state sequences by random walk, compute PSI, factorize to 100-dim DSI vectors. Assess spatial units for gridness via standard spatial autocorrelation metrics. Path integration: update representations via movement-conditional recurrent weights to estimate trajectories from self-motion. Navigation: vector-based transition rule approximating value-based decision (using relation between DSI and value functions). Evaluate path lengths relative to shortest paths in various layouts (single room, four rooms, mazes). Analyze dependence on discount γ, dimensionality, and data length.
- Language simulations: Treat each word as a state; sequences from English Wikipedia (124M tokens; 9,376 word vocabulary). Learn 300-dim DSI vectors (both variants). Concept specificity: for each unit, collect top-10 activating words; compute WordNet-based semantic similarity; define units as concept-specific if mean similarity exceeds 95th percentile of a null distribution (random pairs); quantify conceptual specificity. Compare against CBOW, skip-gram, GloVe, PPMI-SVD, SR-SVD, and BERT embeddings.
- Semantic structure: Evaluate cosine similarity vs human similarity on WS353 (rank correlation). Visualize category structure via dissimilarity matrices and MDS for 10 semantic categories.
- Analogical inference (words): Standard vector arithmetic (e.g., king − man + woman ≈ queen) evaluated on Mikolov’s analogy set. Test “partial recombination”: restrict arithmetic to only a few dimensions (those with largest positive/negative differences) and compare performance to other embeddings.
- Analogical inference (spatial contexts): Learn DSI for contexts A, B, and Φ (baseline) with different barrier placements. Construct composite vectors for novel context A+B by position-wise arithmetic: X(A+B) = X(A) + X(B) − X(Φ). Evaluate navigation in A, B, and A+B using learned or composite representations. Identify dimensions with largest representational distance between contexts (e.g., B vs Φ), test performance when computing only a limited number of these dimensions, and when restricting to grid vs nongrid units.
- Materials and Methods summary: Optimization objectives for DSI variants; precise definitions of conceptual specificity and representational distance; datasets and code repositories listed.
- Grid/place-like spatial codes: In 2-D environments, DSI-decorr produced many grid-like units; 27.6 ± 5.6% of X(s) units and 30.6 ± 4.0% of W(s') units classified as grid cells (mean ± SD over 5 seeds). DSI-sparse produced spatially localized, place-like responses.
- Path integration: Using movement-conditional recurrent updates, path estimation accuracy was high: 969.4 ± 8.3 correct out of 1,000 trials for DSI-decorr; 759.2 ± 14.4 for DSI-sparse (10-step random movement sequences).
- Navigation: Vector-based decisions yielded near-optimal paths in single and multi-room environments; path lengths closely approached shortest-path lengths and generalized to complex mazes. Performance robust across parameters, except small γ (≤ 0.98) impaired grid emergence and navigation; larger γ facilitated long-range structure sensing.
- Concept-specific word units: Both DSI variants yielded many units highly selective for semantic concepts (e.g., “game cell,” “president cell”), surpassing skip-gram, GloVe, PPMI-SVD, SR-SVD, and BERT in the ratio of concept-specific units and average conceptual specificity. Removing nonnegativity substantially reduced conceptual specificity; sparsity was not necessary. CBOW showed comparable specificity in one setting but was not robust across dimensions/datasets.
- Semantic structure: DSI vectors’ cosine similarities correlated well with human word similarity (WS353), comparable to standard embeddings; very large γ degraded correlations.
- Analogical inference (words): DSI matched conventional embeddings on Mikolov analogies with full vectors. Critically, DSI maintained performance even when arithmetic was restricted to as few as 2 of 300 dimensions, unlike other methods that degraded markedly. Removing nonnegativity impaired partial-dimension inference. CBOW, despite high conceptual specificity, failed under partial calculations.
- Mechanistic interpretation: DSI word vectors are distributed (largest element ≈5% of sum on average) but factorize semantic factors across axes, enabling analogical inference by switching a few concept-specific units—interpretable as partial recombination of concept-cell assemblies.
- Analogical inference (spatial contexts): Composite vectors X(A)+X(B)−X(Φ) enabled best navigation in novel context A+B compared to using A or B alone. As few as 4 dimensions sufficed to reach asymptotic performance. Dimensions with largest context-dependent changes were nongrid units; restricting to grid-only units eliminated the benefit. Thus, nongrid EC-like units carry crucial context information. Inference failed when barriers interacted strongly (connected), indicating limits of linear composition.
- Parameter sensitivity: Performance and representational properties depended on γ; too small γ hurt spatial coding/navigation, too large γ (≥0.95–0.99, task-dependent) reduced concept specificity and nonnegativity effects.
The findings support a unified computational account linking spatial navigation and semantic cognition via a single predictive, value-related representation (SI/PSI) whose low-rank, nonnegative factorization (DSI) yields biologically interpretable codes. DSI accounts for grid-like EC and place-like HPC representations in 2-D spaces, supports path integration and near-optimal navigation, and simultaneously produces concept-specific units resembling concept cells that organize semantic structures at the population level. The model explains analogical inferences in language and extends the same vector arithmetic to infer novel spatial contexts, highlighting complementary roles of grid and nongrid EC units: grid units provide robust spatial coding, whereas nongrid units flexibly encode contextual differences and support compositional inference. Nonnegativity emerges as a common and critical constraint for both hexagonal grid emergence and concept-specific, functionally useful semantic features, providing a potential normative principle for HPC/EC coding. The work suggests bridges to hippocampal memory theories (e.g., attractor dynamics and recall transitions aligned with semantic similarity) and to predictive processing accounts in language, offering mechanistic hypotheses for how HPC/EC might contribute to semantic computations and decision making.
This study introduces successor information (SI) and disentangled successor information (DSI), unifying value-based spatial navigation and distributional semantic learning. A single representation learning scheme yields grid/place-like spatial codes, concept-specific word units, and supports analogical inference in both semantic and spatial domains via simple vector arithmetic interpretable as partial assembly recombination. Key contributions include: (1) a mathematical link between linear RL value functions and PMI-like measures; (2) a biologically plausible factorization yielding nonnegative, disentangled axes; (3) demonstrations of path integration/navigation and superior concept specificity; and (4) a shared framework for compositional inference relying on nongrid units for contextual modulation. Future work should: (a) develop context-dependent (dynamic) extensions of DSI; (b) integrate DSI with hippocampal memory/attractor and generative models for recall and prediction; (c) test broader context types (sensory cues, rewards) and online, one-shot learning; (d) establish biological learning mechanisms and validate timescales (γ) in vivo; and (e) empirically probe predicted roles of nongrid units in contextual inference and concept-cell population coding.
- Dependence on discount factor γ: too small impairs spatial coding/navigation; too large degrades semantic correlations and diminishes nonnegativity benefits. Biological plausibility and adaptive tuning of γ require validation.
- Linear compositionality limits: spatial context inference fails when barriers strongly interact (e.g., connected), indicating boundaries of simple vector arithmetic.
- Static embeddings: current DSI is context-independent, whereas concept-cell activity is context-sensitive; extensions to dynamic, context-dependent representations are needed.
- Learning mechanisms: biological implementation and synaptic learning rules for DSI remain speculative.
- Generality across contexts: demonstrated contextual inference for barrier layouts; applicability to other contextual factors (colors, odors, reward contingencies) remains to be tested.
- Model assumptions: default-policy linear RL setting and rectified information measure may not capture all task structures; evaluation on broader tasks/datasets is needed.
Related Publications
Explore these studies to deepen your understanding of the subject.

