logo
ResearchBunny Logo
Predictive learning shapes the representational geometry of the human brain

Biology

Predictive learning shapes the representational geometry of the human brain

A. Greco, J. Moser, et al.

Predictive coding tunes the brain: using MEG recordings while participants listened to acoustic sequences of varying regularity, the authors show that the brain reorganizes its representational geometry to cluster predictable, temporally contiguous sounds—an effect that scales with synergistic prediction-error encoding across high-level and sensory networks. Research conducted by Antonino Greco, Julia Moser, Hubert Preissl, and Markus Siegel.... show more
Introduction

The study investigates how predictive learning—minimizing prediction errors to update internal generative models—shapes the representational geometry of sensory cortices. Living organisms benefit from anticipating environmental changes, and humans excel at extracting statistical regularities from sensory inputs (statistical learning). Predictive coding frameworks posit that the brain maintains and updates a generative model via prediction errors, manifesting as increased responses to unexpected inputs or suppressed responses to expected ones. While extensive evidence exists for prediction error encoding (e.g., mismatch negativity in oddball paradigms), it is less clear how these signals update sensory representations. Prior work suggests perceptual learning can alter even low-level sensory codes, and representational similarities can reflect learned statistical dependencies. The central hypothesis here is that prediction error encoding should correlate with updating of neuronal representations. The study uses MEG during exposure to acoustic tone triplet sequences of low versus high regularity to test whether representational geometry aligns with input statistics and whether this alignment correlates with prediction error encoding and its synergistic, distributed nature across cortical networks.

Literature Review

Auditory oddball paradigms demonstrate mismatch negativity (MMN) as a signature of prediction error when rare deviants violate regular tone sequences. Predictive coding interpretations link MMN to errors between model expectations and sensory data. Beyond local tone-level violations, cortical responses also reflect global sequence regularities. Perceptual learning studies show sensory cortex plasticity (e.g., V1 orientation and contrast sensitivity improvements), and representational similarity can be shaped by temporal community structure and statistical dependencies, including hippocampal and cortical involvement in hierarchical sequence learning. Together, this literature indicates that statistical learning may reshape neural representational geometry to mirror sensory input statistics, yet direct evidence linking prediction error signals to representational updates has been lacking. This work builds on RSA methodologies and information-theoretic approaches to bridge that gap, testing both local encoding and distributed network interactions (via PID) underpinning prediction error processing.

Methodology

Participants: 24 healthy right-handed volunteers (12 male; age 20–37 years, mean 27.54, SD 9.96) with normal hearing. Ethics: Approved by the Medical Faculty of the University of Tübingen (No. 231/2018B01).

Stimuli: Twelve pure tones (261.63–932.33 Hz; notes C, D, E, F#, G#, A# in 4th and 5th octave: 261.63, 293.66, 329.63, 369.99, 415.3, 466.16, 523.25, 587.33, 659.26, 739.99, 830.61, 932.33 Hz). Each tone lasted 300 ms with 33 ms silent gap (presented every 333 ms). Sequences comprised 2400 tones (800 triplets; 1 s per triplet), constrained to within one octave; each tone appeared 200 times per sequence; no immediate tone or triplet repeats.

Conditions: Low regularity (LR) sequence with constantly changing triplets under octave constraint; High regularity (HR) sequence containing only four repeating triplet types; triplet order counterbalanced across participants in HR. Behavioral data (from prior work) showed increased familiarity ratings for HR triplets after exposure, indicating statistical learning.

Procedure: Passive listening in a magnetically shielded room; fixation on a cross; earplug stimulation at 70 dB. Order not counterbalanced: LR followed by HR for all participants.

MEG acquisition and preprocessing: 275-sensor whole-head CTF MEG (sampling 585.94 Hz). Fourth-order Butterworth band-pass filter 0.5–40 Hz; epochs 0–330 ms relative to stimulus onset with 32 ms auditory delay correction; resampled to 200 Hz. Noisy channels rejected via semi-automatic procedure (visual inspection; RMS > 0.5 pT). Independent Component Analysis (FastICA; components reduced to 50) used to remove ocular, muscular, and cardiac artifacts via visual inspection.

Source reconstruction: Sensors aligned to fsaverage template; single-shell head model (FieldTrip). Co-registration via nasion and preauricular landmarks. LCMV beamformer (regularization 5%) estimated from aggregated sensor-level data across conditions. Dipole orientation fixed by SVD to maximize power; sign-flip corrected. Source space parcellated into 72 Desikan–Killiany regions.

Representational Similarity Analysis (RSA): Trials split into 5 non-overlapping blocks across each sequence. Representational dissimilarity matrices (RDMs) computed at each block using cross-validated Mahalanobis distance (cvMD) with 10-fold cross-validation; covariance regularized by Ledoit–Wolf shrinkage. RDM entries ordered so diagonals reflect within-triplet distances (HR structure). Time-resolved RSA used all parcels as features per time point; searchlight RSA used each parcel plus 5 nearest neighbors over defined time windows. Averaged on-diagonal and off-diagonal values to assess within- and between-triplet representational distances; model-based RSA via fitting a theoretical RDM (zeros on-diagonal, ones off-diagonal) using Spearman correlation. Linear slopes across blocks estimated by OLS with block index regressor (1–5).

Ideal observer model and prediction error encoding: Perceptron-like model predicting next tone given current tone. Inputs one-hot vectors x_t ∈ R^{1×12}; weight matrix W_t ∈ R^{12×12}, initialized uniform (1/12). Prediction via z_t = x_t W_t; softmax ŷ_t; cross-entropy loss L(x_{t+1}, ŷ_t). Gradient ∂L/∂W_t = x_t^T (ŷ_t − x_{t+1}). Dynamic learning rate ω equal to Shannon entropy of ŷ_t. Precision-weighted prediction error PE = ω ∂L/∂W_t; parameters updated as W_{t+1} = W_t − PE. Models fit separately to LR and HR sequences; extracted prediction error trajectories.

Gaussian Copula Mutual Information (GCMI): Brain data (X) and prediction error trajectories (Y) transformed via inverse normal CDF to Gaussian marginals; mutual information computed analytically for Gaussian variables. Time-resolved GCMI used all parcels per time point; searchlight GCMI used parcel plus 5 neighbors over time windows. Baseline correction performed by subtracting pre-tone baseline (−50 to 0 ms).

Partial Information Decomposition (PID): For pairs of parcels (X1, X2) and prediction error trajectory Y, joint mutual information decomposed into redundancy (R), unique (U1, U2), and synergy (S) components. Redundancy computed via I_min; remaining terms via linear relations. Variables Gaussianized as above for closed-form computation. Computed adjacency matrices (72×72) for redundancy and synergy, baseline-corrected. Global network measures obtained by averaging matrices; node importance via betweenness centrality.

Statistics: Group-level random-effects analyses. Mass-univariate cluster-based paired two-tailed permutation tests (α = 0.05; 10,000 iterations; maxsum cluster statistic; parcel neighborhood topology). Paired two-tailed t tests for grand-average redundancy and synergy. Network-based statistics (NBS) for adjacency matrices (α = 0.05; 10,000 iterations; connected component size). Right-tailed Pearson correlations with Bonferroni correction between representational shift and prediction error encoding (GCMI, redundancy/synergy centrality) in selected clusters (parcels passing initial cluster threshold). Model comparison of representational dynamics: exponential y = a e^{−bx} + c versus linear (intercept, slope) fitted by non-linear least squares with 100 starting points; BIC computed at subject level and compared via paired t tests.

Key Findings
  • Tones were well decodable in both sequences, with peak decoding ~100 ms post-onset (p < 0.0001, cluster-corrected; peak Cohen’s d = 1.46).
  • Representational geometry aligned with input statistics in HR: within-triplet representational distances decreased across blocks around ~120 ms, with a significantly stronger decrease in HR than LR (p = 0.008, cluster-corrected; d = 0.79). Between-triplet distances showed a decreasing trend but no significant HR–LR slope difference (p > 0.05).
  • Model-based RSA (within-triplet distances < between-triplet distances) showed increasing fit across blocks in HR, significantly greater than LR around ~120 ms (p = 0.038, cluster-corrected; d = 0.84).
  • Searchlight RSA localized strongest within-/between-triplet distance decreases and model-fit increases to bilateral dorsolateral prefrontal cortices and left temporal cortex (110–120 ms).
  • Ideal observer model captured sequence statistics; prediction errors and accuracy reflected higher predictability in HR.
  • Prediction error encoding (GCMI) was significant, peaking ~100 ms post-tone for both sequences (HR: 0–50 ms, p < 0.0001, d = 0.78; LR: 0–40 ms, p = 0.007, d = 0.94; 50–260 ms, p < 0.0001, d = 1.72). No significant HR–LR differences in encoding (p > 0.05). Right temporal cortex showed maximal signals; broad frontoparietal and temporal encoding.
  • Correlation between prediction error encoding strength and representational shift was significant in left temporal cortex (r = 0.47, p = 0.021, Bonferroni-corrected); not significant in frontal cortices (r = 0.22, p = 0.287).
  • Representational dynamics did not show significant preference for exponential over linear models (whole brain BIC: −30.86 vs −29.99, p = 0.165, d = 0.28; left temporal: −26.40 vs −26.57, p = 0.622, d = 0.09; frontal: −27.08 vs −27.43, p = 0.464, d = 0.14).
  • PID revealed substantially greater synergy than redundancy in prediction error encoding across parcel pairs for both conditions (LR: p = 0.035, d = 0.44; HR: p = 0.038, d = 0.43); no HR–LR differences in synergy or redundancy.
  • Network-based statistics identified large-scale frontoparietal and temporal interactions for both redundancy and synergy (90–120 ms vs baseline), with these regions acting as hubs (betweenness centrality; all p < 0.05, corrected).
  • Synergy centrality in left temporal cortex correlated with representational shift (r = 0.47, p = 0.022, Bonferroni-corrected); redundancy centrality did not (left temporal: r = 0.33, p = 0.113; frontal: r = 0.17, p = 0.434).
Discussion

The findings directly link prediction error encoding to updating of sensory representational geometry during statistical learning. In predictable tone triplets, neural representations of tones within the same triplet converged (chunking), aligning cortical representational geometry with sequence statistics. This alignment emerged rapidly (~120 ms), overlapping with pitch encoding, suggesting that once established, representational changes operate without extensive top-down modulation on fast timescales. Prediction error encoding occurred broadly across temporal and frontoparietal cortices around ~100 ms and was context-independent in magnitude across regularity conditions. Crucially, individuals with stronger prediction error encoding in left temporal cortex exhibited stronger representational shifts, and left temporal cortex functioned as a synergistic hub within a distributed network encoding prediction errors. PID analyses indicated that prediction error processing relies predominantly on synergistic (complementary) interactions rather than shared redundant encoding, supporting a distributed, network-level computation of error signals. These results challenge strictly hierarchical, independent-level error computations posited by some predictive coding models and instead support recurrent, feedback-driven network mechanisms. By demonstrating a correlation between prediction error encoding and representational change, the study substantiates a core tenet of predictive coding—that prediction errors update generative model representations—and emphasizes the role of sensory areas as both targets of top-down predictions and modulators of representational content.

Conclusion

This work shows that predictive learning reshapes cortical representational geometry to mirror environmental statistics, with tones in predictable triplets becoming more neurally similar over time. Prediction error signals are widely encoded and, particularly in left temporal cortex, their strength and synergistic network centrality predict the magnitude of representational updating. These findings support distributed, synergistic network computations for predictive processing and provide direct evidence linking prediction error encoding to model updating in human cortex. Future research should refine models to capture individual learning dynamics and investigate the precise temporal form (linear vs. nonlinear) of representational changes, integrate hippocampal contributions, and assess generalization across modalities and tasks.

Limitations
  • Sequence order was not counterbalanced (LR always preceded HR), which may introduce order or carryover effects.
  • No conclusive evidence for nonlinear (exponential) representational dynamics despite the model’s exponential prediction error trajectory; this may reflect limited sensitivity or model mismatch.
  • The ideal observer model is a simplified categorical predictive learner and may not capture individual variability in learning.
  • Passive listening paradigm and relatively small sample (n = 24) may limit generalizability.
  • MEG source reconstruction and parcel-based analyses impose spatial and methodological constraints; cortical and subcortical contributions (e.g., hippocampus) were not directly measured.
  • Context-independence of error encoding (no HR–LR differences) may depend on task design and may not generalize to other forms or levels of volatility.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny