Psychology
Predictive learning shapes the representational geometry of the human brain
A. Greco, J. Moser, et al.
The study investigates how predictive learning—minimizing prediction errors about future sensory inputs—updates the brain's internal generative model and shapes sensory representations. While extensive evidence shows cortical signals compatible with prediction error encoding (e.g., mismatch negativity in oddball paradigms), it remains unclear how these signals drive changes in representational geometry, particularly the similarity structure of neural population codes. Building on statistical learning literature that demonstrates plasticity in sensory representations and alignment of representational similarities with stimulus dependencies, the authors hypothesize a direct link: stronger neuronal encoding of prediction errors should correlate with greater updating of neural representations to match environmental statistics. The work aims to establish this link using MEG during passive listening to tone sequences with low versus high regularity, testing whether learned predictability induces a representational shift (chunking) and whether prediction error signals, including synergistic large-scale interactions, predict this shift.
Prior research on predictive coding posits that the brain maintains and updates a generative model by computing prediction errors (Friston; Rao & Ballard). In auditory paradigms, oddball sequences elicit mismatch negativity interpreted as prediction error signals. Statistical learning studies have shown that neural representations reflect learned temporal and probabilistic dependencies, implying that representational geometry can be reshaped to mirror input structure (Schapiro et al.; Henin et al.). Perceptual learning demonstrates plasticity even in low-level sensory areas, optimizing processing (Schoups; Hua; Shibata). However, a direct mechanistic link between prediction error encoding and representational updating has been lacking. Emerging work also suggests distributed, synergistic information processing across cortical networks rather than purely modular encoding (Luppi; Gelens; Panzeri; Vinck), motivating analyses of redundancy versus synergy via Partial Information Decomposition in the context of predictive processing.
Participants: 24 healthy right-handed volunteers (12 male; age 20–37, mean 27.54, SD 9.96) with normal hearing; ethics approval (Medical Faculty, University of Tübingen, No. 231/2018B01). Stimuli: 12 pure tones (261.63–932.33 Hz; notes C, D, E, F#, G#, A# across 4th–5th octave) presented for 300 ms with ~33 ms silent gap (333 ms SOA). Two sequences (each 2400 tones, 800 triplets, 1 s per triplet): High Regularity (HR) with only four triplet types repeating; Low Regularity (LR) with continuously changing triplets under octave constraint. Each tone appeared 200 times per sequence; no immediate repetition of tones or triplets; HR triplet order counterbalanced across participants (three variants). Procedure: Passive listening with fixation; earplug stimulation at 70 dB; order not counterbalanced (LR then HR). MEG acquisition: 275-sensor CTF MEG, sampling 585.94 Hz; band-pass 0.5–40 Hz; epochs 0–330 ms relative to stimulus onset with 32 ms trigger delay correction; resampled to 200 Hz; noisy channels rejected (RMS > 0.5 pT); ICA (FastICA, 50 components) to remove ocular, muscle, cardiac artifacts. Source reconstruction: Sensors aligned to fsaverage; single-shell head model (FieldTrip); LCMV beamformer with 5% regularization; dipole orientation via SVD; source projection with sign-flip correction; Desikan-Killiany parcellation (72 parcels). Representational Similarity Analysis (RSA): Trials split into 5 non-overlapping blocks per sequence. RDMs computed using cross-validated Mahalanobis distance (cvMD) with 10-fold CV; covariance regularization via Ledoit-Wolf. Time-resolved and searchlight RSA (parcel + 5 nearest neighbors). RDMs ordered so diagonal reflects within-triplet tone distances (HR structure). Model-based RSA fit: theoretical RDM with zeros on diagonal and ones off-diagonal; Spearman correlation. Representational dynamics quantified by slopes across blocks (ordinary least squares). Ideal observer model: Perceptron-like network predicting next tone from current tone (one-hot input, 12 categories). Weights W initialized uniform (n^{-1}); prediction via softmax; loss: cross-entropy; gradient ∂L/∂W = x_t^T (y_t − x_{t+1}). Dynamic learning rate ω equals Shannon entropy of predictive distribution; precision-weighted prediction error PE = ω ∂L/∂W; parameters updated via gradient descent W_{t+1} = W_t − PE_t. Prediction error trajectories extracted per sequence. Encoding analysis: Gaussian Copula Mutual Information (GCMI) computed between prediction error trajectory (Y) and multivariate brain data (X), variables inverse-normal transformed; baseline-corrected using −50 to 0 ms window. Partial Information Decomposition (PID): For pairs of parcels (X1, X2) and prediction error trajectory (Y), JMI decomposed into redundancy (R), unique (U), synergy (S) using Gaussian parametric forms; adjacency matrices (72×72) for redundancy and synergy; baseline-corrected; global efficiency (average across matrix) and betweenness centrality estimated. Statistics: Mass univariate cluster-based paired two-tailed permutation tests (α=0.05, 10,000 iterations, maxsum cluster statistic, parcel neighborhood topology); paired two-tailed t tests for grand averages; Network-Based Statistics (NBS) for adjacency comparisons (α=0.05, 10,000 iterations, connected component size); Pearson correlations (right-tail) with Bonferroni correction for selected clusters; model fitting (linear vs exponential y=ae^{−bx}+c) via nonlinear least squares; BIC for model comparison and paired t tests on winning model.
• Robust tone-evoked responses: Peaks ~60 ms post-tone in bilateral auditory cortices and ~110 ms in left temporal cortex; tones decodable in both sequences with peak around 100 ms (p < 0.0001, cluster-corrected; peak Cohen's d = 1.46). • Representational geometry shift (HR > LR): Within-triplet representational distances decreased across blocks around 120 ms post-tone onset, significantly more in HR than LR (p = 0.008, cluster-corrected; d = 0.79). Between-triplet distances showed a trend to decrease but no significant HR–LR slope difference (p > 0.05). Model-based RSA fit increased across blocks in HR near 120 ms and was significantly stronger than LR (p = 0.038, cluster-corrected; d = 0.84). Searchlight RSA localized strongest effects to dorsolateral prefrontal and temporal regions; model-fit increase peaked in bilateral dorsolateral prefrontal and left temporal cortex. • Prediction error encoding: Ideal observer captured sequence statistics; HR had lower prediction error and higher accuracy. GCMI revealed significant encoding of prediction errors peaking ~100 ms (HR: 0–50 ms, p < 0.0001, d = 0.78; LR: 0–40 ms, p = 0.007, d = 0.94 and 50–260 ms, p < 0.0001, d = 1.72). No significant difference in encoding strength between HR and LR (p > 0.05). Encoding broadly distributed across frontoparietal and temporal cortex, with maximal signals in right temporal cortex. • Linking error encoding to representational shift: In left temporal cortex, prediction error encoding magnitude correlated with representational shift across participants (r = 0.47, p = 0.021, Bonferroni-corrected); bilateral frontal cortices showed no significant correlation (r = 0.22, p = 0.287). • Learning dynamics model comparison: Model’s prediction error followed exponential decay (BIC exponential/linear = −4685.2/−4105.1). Representational dynamics did not favor exponential over linear fits at whole-brain (BIC −30.86/−29.99; p = 0.165; d = 0.28) or in left temporal (BIC −26.40/−26.57; p = 0.622; d = 0.09) and frontal clusters (BIC −27.08/−27.43; p = 0.464; d = 0.14). • Synergistic large-scale encoding: PID revealed substantially synergistic interactions encoding prediction errors across parcel pairs; synergy significantly exceeded redundancy in both LR (p = 0.035; d = 0.44) and HR (p = 0.038; d = 0.43). No HR–LR differences in redundancy (p = 0.549) or synergy (p = 0.502). Network-based statistics identified large-scale frontoparietal and temporal networks for both components; hubs (betweenness centrality) in frontoparietal and temporal cortices (all p < 0.05, cluster-corrected). • Synergy predicts representational shift: Betweenness centrality of synergy in left temporal cortex correlated with representational shift (r = 0.47, p = 0.022, Bonferroni-corrected); no effect in frontal cortices (r = 0.03, p = 0.902). Redundancy centrality showed no significant correlations (left temporal r = 0.33, p = 0.113; frontal r = 0.17, p = 0.434).
Findings demonstrate that statistical learning reorganizes sensory representations to align with environmental regularities, producing a chunking effect where tones within predictable triplets become more similar. The representational shift emerges early (~120 ms), overlapping with pitch encoding, suggesting once established, changes do not require sustained top-down control. Prediction error signals are encoded widely across sensory and high-level areas, peaking around 100 ms, consistent with continuous sequence paradigms and overlapping with pitch processing. Crucially, the strength of prediction error encoding, and especially its synergistic network interactions centered on temporal cortex, predicts the magnitude of representational updating, directly linking prediction error computation to representational change—a core premise of predictive coding. The dominance of synergy over redundancy indicates distributed, complementary processing rather than strictly modular encoding, aligning with theories emphasizing large-scale recurrent and feedback interactions. These results challenge strictly hierarchical predictive coding models wherein errors are computed independently at each level, instead supporting network-level synergistic computation. The alignment between neural chunking dynamics and representational organization in trained deep networks suggests shared computational principles, potentially driven by efficiency constraints that favor clustering of predictable inputs to streamline processing.
The study provides direct evidence that predictive learning shapes the brain’s representational geometry by clustering predictable, temporally contiguous stimuli, and that this representational shift is linked to the encoding of prediction errors, particularly through synergistic large-scale interactions centered in temporal cortex. By integrating RSA, ideal observer modeling, information-theoretic encoding measures, and PID, the work connects prediction error signals to adaptive representational changes across sensory and high-level regions, supporting distributed network models of predictive processing. Future research should refine models to capture individual learning dynamics and probe the precise temporal form (linear vs. non-linear) of representational updating, as well as the causal roles of network hubs and hippocampal contributions in coordinating chunking across cortical hierarchies.
• Representational dynamics did not show a statistically significant advantage for an exponential model over a linear model, which may reflect limited sensitivity, fundamental differences between error signaling and representational updating, or simplifications in the ideal observer that do not capture individual learning trajectories. • The experimental order of conditions was not counterbalanced (low regularity always preceded high regularity), which may introduce order or carryover effects. • The ideal observer model is a minimal categorical, precision-weighted framework and may not fully capture all aspects of human predictive learning or inter-individual variability.
Related Publications
Explore these studies to deepen your understanding of the subject.

