logo
Loading...
Brains and algorithms partially converge in natural language processing

Computer Science

Brains and algorithms partially converge in natural language processing

C. Caucheteux and J. King

Explore how deep learning algorithms are beginning to mirror human brain activity during language processing! Researchers Charlotte Caucheteux and Jean-Rémi King delve into the striking similarities linking computational linguistics and cognitive neuroscience, revealing how modern language models might be paving the way towards understanding natural language processing.... show more
Introduction

The study asks whether deep language models process words and sentences like the human brain and what computational principles drive any observed similarity. Prior work shows that word embeddings and contextualized transformer activations can linearly map onto brain responses, with correlations to measures such as surprisal and syntactic parsing. However, most studies used few subjects, emphasized spatial over temporal properties, and compared a small, correlated set of pretrained models, obscuring causal factors. Here, the authors systematically compare many language transformers and their representations against human brain responses to sentences, recorded with fMRI and MEG in 102 subjects, to determine where and when models map onto brain activity and to disambiguate the roles of architecture, training, and language performance in producing brain-like representations.

Literature Review

Previous neuroimaging work indicates partial convergence between NLP models and brain activity. High-dimensional word embeddings trained on lexical co-occurrence map linearly to brain responses elicited by isolated words and narratives. Contextualized activations from transformers improve mapping, especially in prefrontal, temporal, and parietal cortices. Computational measures from deep models, including word surprisal and syntactic constituent parsing, correlate with ERP components and fMRI signals. Nonetheless, past studies often involve small cohorts and mainly address spatial organization rather than temporal dynamics, and they typically compare a few pretrained models whose architecture, objectives, and training data are confounded, limiting identification of principles underlying brain-model similarity.

Methodology

Participants and stimuli: The authors analyzed an open dataset of 204 Dutch speakers and focused on 102 right-handed subjects who performed a reading task during MEG and, separately, fMRI. Subjects read 400 meaningful isolated Dutch sentences (9–15 words), with words flashed one at a time (mean 351 ms; ISI 300 ms). Approximately 2700 words per subject were presented; 20% of sentences were followed by yes/no questions (excluded from analyses). Blocks of five sentence sequences were used as cross-validation groups to avoid leakage.

Neuroimaging acquisition and preprocessing: fMRI was acquired on a SIEMENS Trio 3T scanner (EPI-BOLD, TR 2.0 s, TE 35 ms, voxel size ~3.5×3.5×3.0 mm across 29 slices). fMRIPrep preprocessing was applied; time series were detrended and deconfounded (motion, CompCor, DCT regressors), projected to surface vertices (downsampled; referred to as voxels). Two subjects were excluded for fMRI (n=100 for fMRI analyses). MEG (CTF system) signals were filtered 0.1–40 Hz, Maxwell filtered, clipped (0.01–99.99th percentiles), epoched −500 to +2000 ms relative to word onset, and baseline-corrected. 273 MEG channels remained per subject. Source localization used a single-layer forward model and dSPM inverse operator; lack of empty-room recordings led to noise covariance estimation from the −200–0 ms baseline. Seven subjects were excluded from MEG (n=92 usable; n=95 reported in results for group summaries).

Models and embeddings: The study trained 36 transformer architectures (18 causal language models [CLM], 18 masked language models [MLM]) varying in depth (4, 8, 12 layers), dimensionality (128, 256, 512), and attention heads (4, 8). Networks were trained on Dutch Wikipedia (278,386,651 words; Moses tokenization; 50,341-word vocabulary including all experimental words) using the XLM implementation with standard hyperparameters (e.g., GELU, Adam with inverse sqrt schedule, LR 1e-4, dropout 0.1, attention dropout 0.1). Training was frozen at 100 log-spaced steps up to ~4.5–5M updates (~35 epochs), with early stopping. This yielded 3600 model checkpoints (“models”), and 32,400 embeddings across layers (input word embedding plus contextual layers). Language performance was measured as top-1 accuracy on a 180,883-word Dutch Wikipedia test set. Visual features were modeled by feeding rendered word images (100×32 px Arial on gray) to a VGG-based text-recognition CNN pretrained on naturalistic word images, yielding an 888-dimensional “visual embedding.”

Brain–brain noise ceiling: A shared response model (SRM) estimated explainable signal by predicting a given subject’s brain responses from the average responses of other subjects reading the same sentences. Ridge regression with nested CV and five GroupKFold splits (by 5-sentence blocks) produced per-voxel (fMRI) and per-sensor/time (MEG) “brain scores” (Pearson’s R between predicted and true responses), with FDR correction for multiple comparisons.

Network–brain mapping: For each subject, model, and layer, ℓ2-regularized linear mappings were fit from model activations X to brain responses Y. For fMRI, a 5-tap finite impulse response (2–10 s) modeled hemodynamic delays; for MEG, mappings were fit at each time sample independently. Encoders were evaluated with held-out correlations (“brain score” R). Gains ΔR between model levels (e.g., compositional minus lexical) quantified additional variance explained. Statistical significance used Wilcoxon signed-rank tests across subjects with FDR correction.

Convergence and feature importance analyses: To relate model properties to brain similarity, the authors computed correlations between brain scores and (i) language performance (top-1 accuracy) or (ii) training step across embeddings, and performed permutation feature importance using a Random Forest to predict average brain scores from features: task (CLM vs MLM), number of heads, layers, dimensionality, training step, language accuracy, and relative layer position (0=input embedding, 1=final layer). Importance was summarized as ΔR, the decrease in Random Forest predictive correlation when shuffling a feature (50 permutations). Region-of-interest analyses used PALS Brodmann and Destrieux atlases. Multiple comparison corrections and error bars followed standard procedures (FDR; 95% CIs across subjects).

Key Findings
  • Trained model embeddings significantly map onto brain responses above chance across subjects and modalities (all p < 1e-9). For compositional embeddings, fMRI and MEG brain scores reached R = 0.048 and R = 0.041, respectively, comparable to or exceeding the SRM noise ceiling (fMRI: R = 0.060; MEG: R = 0.020).
  • Spatial hierarchy (fMRI): Visual embeddings best predict V1 (R = 0.022 ± 0.003, p < 1e-11). Lexical embeddings peak in left superior temporal gyrus (R = 0.052 ± 0.004, p < 1e-13), inferior temporal cortex and middle frontal gyrus (R = 0.053 ± 0.003, p < 1e-15) and are significant across the reading/language network. Compositional embeddings significantly outperform lexical embeddings in STG (ΔR = 0.012 ± 0.001, p < 1e-16), angular gyrus (ΔR = 0.010 ± 0.001, p < 1e-16), infero-frontal cortex (ΔR = 0.016 ± 0.001, p < 1e-16), and dorsolateral prefrontal cortex (ΔR = 0.012 ± 0.001, p < 1e-13). Effects are left-lateralized but broadly bilateral (left–right ΔR = 0.010 ± 0.001, p < 1e-14).
  • Temporal dynamics (MEG): Visual embedding scores peak around 100 ms in V1 (R = 0.008 ± 0.002, p < 1e-3), lexical gains emerge around 200 ms in left posterior fusiform and peak ~400 ms across left temporal and frontal cortices. Compositional gains are widespread bilaterally and peak around 1 s after word onset; later periods show feedback-like effects where high-level embeddings predict activity in early visual areas (e.g., V1 visual vs word ΔR = 0.016 ± 0.002, p < 1e-10).
  • Across 36 transformer architectures, brain scores follow an inverted U across layers: middle layers outperform input and output layers (fMRI middle vs output ΔR = 0.011 ± 0.001, p < 1e-18; middle vs input ΔR = 0.031 ± 0.001, p < 1e-18; analogous for MEG).
  • Random (untrained) embeddings still yield significant brain scores (fMRI mean R = 0.019 ± 0.001; MEG mean R = 0.018 ± 0.0008; all p < 1e-16), indicating some fortuitous similarity independent of language proficiency.
  • Brain scores strongly correlate with language performance: MEG average Pearson R = 0.77 ± 0.01; fMRI R = 0.57 ± 0.02 across subjects. Correlations are higher for middle layers (fMRI: 0.81 ± 0.02; MEG: 0.86 ± 0.01) than for input or output layers. Voxel-wise, middle-layer correlations exceed R = 0.85 in superior temporal sulcus, infero-frontal, fusiform, and angular gyri.
  • Peak brain similarity does not occur at maximal language accuracy: for CLM, best brain scores occur at ~43% (MEG) and ~32% (fMRI) word-prediction accuracy, whereas the very best models reach 46% accuracy but yield smaller brain scores.
  • Permutation feature importance identifies language performance as the dominant predictor of brain similarity (ΔR: fMRI 0.56 ± 0.01; MEG 0.51 ± 0.02), surpassing training step and layer position; nevertheless, architecture and training factors (depth, width, heads, task) contribute significantly (p < 1e-16).
Discussion

The results indicate that deep language models and the human brain share representational structure during sentence processing. Linear mappings from model activations to brain signals reveal a hierarchy: early visual responses are best captured by visual features; around 200–400 ms, lexical representations dominate in fusiform and temporal regions and persist for seconds; later (>800 ms), compositional representations become prominent across bilateral language areas, including inferior frontal and anterior temporal cortices. Middle layers of transformers, which likely balance abstraction and task relevance, best align with brain activity. Crucially, the extent of brain–model similarity scales with a model’s ability to predict words from context, suggesting that optimizing predictive objectives leads models to converge toward brain-like representations. However, the highest-accuracy models can show reduced brain similarity, potentially reflecting overfitting to objectives that diverge from the brain’s multifaceted goals. Observed brain–model correlations are numerically small but statistically robust and near estimated noise ceilings given single-sample, single-voxel/sensor analyses, indicating that measurement noise and analytical granularity limit absolute scores. Differences in architecture (feedforward transformers vs. brain’s recurrent, grounded learning) and training regimes (massive text corpora vs. limited, grounded experiences) likely explain residual discrepancies and point to avenues for developing models that better capture human language processing.

Conclusion

This study provides two key contributions: (1) it demonstrates that activations from modern language models, particularly the middle layers, map significantly and hierarchically onto human brain responses to written sentences across space and time; and (2) it shows that a model’s ability to predict words from context is the primary factor driving brain-like representations, above and beyond architecture or training duration. These findings support the view that predictive training objectives induce partial convergence toward brain-like computations. Future work should improve signal-to-noise with larger and more naturalistic datasets, dissect convergent features into interpretable linguistic components, explore alternative or multi-objective learning schemes (e.g., hierarchical and long-range prediction, grounding), and investigate architectures and training paradigms closer to human learning (e.g., recurrent/interactive systems with limited, grounded input).

Limitations
  • Low absolute brain–model correlation values due to neuroimaging noise and single-sample, single-voxel/sensor analyses; noise ceilings constrain maximum attainable scores.
  • Stimuli were isolated sentences rather than continuous narratives, potentially affecting temporal dynamics and ecological validity.
  • Training–brain comparisons were limited to Dutch text and reading; generalization to other languages/modalities (listening, multimodal grounding) remains untested.
  • The best-performing language models exhibited decreased brain similarity at peak accuracies, suggesting potential objective mismatch or overfitting.
  • Architectural differences (feedforward transformers) and non-grounded large-scale text training differ from the brain’s recurrent, grounded learning, limiting direct comparability.
  • MEG noise covariance was estimated from baseline periods due to lack of empty-room recordings, which may affect source estimates.
  • Potential covariance between low- and high-level features (e.g., positional or frequency cues) may contribute to mappings; while gains ΔR were assessed, full orthogonalization was not the primary approach.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny