Linguistics and Languages
The structure and statistics of language jointly shape cross-frequency neural dynamics during spoken language comprehension
H. Weissbart and A. E. Martin
This research by Hugo Weissbart and Andrea E. Martin delves into the brain's remarkable ability to decode speech amidst varying sounds. Utilizing MEG data, they uncover how both structural and statistical aspects of language contribute to neural dynamics, revealing exciting insights into how we comprehend spoken language.
~3 min • Beginner • English
Introduction
Spoken language comprehension is robust despite acoustic variability from noise and speaker differences. Predictive processing frameworks propose that the brain uses prior linguistic knowledge to anticipate incoming information at multiple levels (from phonemes to words and structures). Human language also exhibits nested syntactic structures over which meaning is computed, raising a longstanding debate about the roles of hierarchical structure versus surface statistics in comprehension. Rather than treating these as opposing accounts, this study tests whether structural (rule-based) and statistical (data-driven) cues jointly contribute to neural responses during naturalistic speech comprehension. The authors hypothesize that both syntactic and statistical features explain variance in MEG data, potentially with overlapping spatio-temporal sources, and that their integration is coordinated via cross-frequency coupling (CFC), with low-frequency phase modulating higher-frequency power. Specifically, the brain’s responses to statistical uncertainty (entropy) and prediction error (surprisal) are expected to interact with syntactic integration signals (tree depth, number of closing constituents), potentially at distinct timescales (delta, theta, beta, gamma), reflecting predictive and integrative operations during continuous listening.
Literature Review
Prior work has linked both hierarchical structure and statistical predictability to neuroimaging signals. Brennan et al. showed sensitivity of BOLD signals to structure and surprisal but were limited in temporal resolution. Studies using MEG/EEG have demonstrated low-frequency neural tracking of speech envelope and potential roles for delta/theta in segmentation and linguistic processing, though whether this reflects entrained oscillations or evoked responses remains debated. Ding et al. reported cortical tracking of hierarchical linguistic structures. Information-theoretic metrics (surprisal, entropy) derived from predictive models modulate cortical activity during comprehension. Donhauser & Baillet identified distinct roles for theta and delta rhythms in predictive processing (information sampling vs. model updating). Prior studies of syntactic operations used features like node counts, depth, and closing brackets, and have linked syntactic processing to left inferior frontal and anterior temporal regions. However, the interplay of structural and statistical features across frequency bands, and their modulation of phase-amplitude coupling during naturalistic comprehension with MEG, remained unexplored.
Methodology
Participants: 25 right-handed native Dutch speakers (18 women, 18–58 years), no reported French fluency, gave informed consent (ethics: CMO2014/288). Task and stimuli: Participants listened to Dutch (≈49 min) and French (≈21 min) audiobook stories (naturalistic, in 5–7 min parts), maintaining fixation. Comprehension and attention were probed between parts with multiple-choice questions.
MEG acquisition: 275-channel CTF whole-head system, 1200 Hz sampling, magnetically shielded room; head position monitored with HPI coils; T1 MRI acquired for source modeling; head shape digitized for co-registration.
Preprocessing: Data resampled to 200 Hz after anti-aliasing low-pass; noisy/flat channels identified; ICA (1–40 Hz) used to remove EOG/ECG artifacts; 50 Hz notch filter applied.
Source reconstruction: Data covariance computed; noise-normalized LCMV beamformer used on a cortical grid (7 mm, 3 mm spacing) via MNE-Python.
Time-frequency analyses: Morlet wavelets (2–7 cycles; 3–80 Hz, 32 log-spaced steps) computed per word-epoch; extracted power and inter-word phase clustering.
Stimulus features: Two rule-based syntactic features from constituency parses (Stanford parser): (1) Depth (syntactic tree depth per word), (2) Close (number of closing constituents at the word). Two statistical features from GPT-2 next-word distributions: (3) Surprisal (−ln P(word|context)), (4) Entropy (uncertainty of next-word distribution; expected surprisal). Control regressors: acoustic envelope (half-wave rectified, low-pass FIR, compressive exponent 1/2) and word onset comb.
Encoding models (TRFs): Forward models estimated via ridge regression (Tikhonov), mapping feature time series (with lags) to MEG sensors/sources. Leave-one-story-out cross-validation used; performance evaluated by Pearson correlation between predicted and observed signals. Null models constructed by shuffling feature values across words while preserving onset timings to control for feature count and temporal structure.
Spectral/coherence measures: Power spectral density (PSD) and cerebro-acoustic coherence (magnitude squared coherence with speech envelope) computed; cluster-based permutation tests used for significance.
Word-locked phase/power: Inter-word phase clustering (ITPC) and induced power modulation around word onsets computed using TRFs fit to band-limited complex signals and power, respectively.
Phase-amplitude coupling (PAC): (1) Global PAC scanned across phase (delta/theta) and amplitude (beta/gamma) frequencies using Tensorpac with surrogate-based z-scoring; tested against French condition. (2) Novel TRF-based, feature-dependent PAC: compute complex analytical signal r_high(t) e^{i φ_low(t)} and fit complex-valued TRFs to assess how specific features modulate PAC over time; validated with simulations. PAC assessed per feature against its shuffled null via cluster-based permutation; source-space projections localized effects (Destrieux parcellation).
Statistics: Cluster-based permutation tests for PSD/coherence/ITPC/power; LMMs and paired t-tests (FDR/Bonferroni corrections) for reconstruction scores; spatio-temporal cluster tests in source space with FDR where applicable.
Key Findings
- Data quality and basic entrainment: MEG PSD showed alpha/beta oscillations; marginally significant French vs. Dutch beta-band power difference (FDR-corrected p=0.06). Cerebro-acoustic coherence indicated delta/theta alignment; coherence was greater for the (uncomprehended) French condition in delta/theta (p=0.01, cluster-based permutation), indicating envelope tracking not specific to comprehension.
- Word-locked modulation: Significant inter-word phase clustering (ITPC) in delta and theta, and induced power modulation in beta after word onsets (cluster-based permutation against baseline). Low-frequency traveling wave patterns suggested fronto-temporal propagation.
- TRF reconstruction and feature contributions: All feature sets (rule-based syntax; statistical surprisal/entropy; joint) significantly improved reconstruction over null models across frequency bands. LMM: feature set significantly affected relative score (χ²(17)=231.65, p<0.001); frequency band did not (χ²(18)=15.53, p=0.63). The joint model (syntax+statistics) outperformed either alone across bands (FDR-corrected). Rule-based model had marginally higher reconstruction than statistical-only in delta (paired t-test, p<0.005, FDR-corrected). Time-resolved TRFs (100 ms windows, −400 to 900 ms): syntactic features contributed broadly over earlier and later lags; entropy and close contributed at negative lags (anticipatory effects), and close had larger later-lag coefficients (integration of words into larger structures).
- Global PAC: Significant PAC observed for delta phase coupling with beta/gamma power and theta phase with gamma power; strongest over fronto-temporal regions.
- Feature-dependent PAC (TRF-based): All linguistic features significantly modulated delta–beta PAC; only entropy (precision/uncertainty) and closing nodes significantly modulated theta–gamma PAC. Reconstruction improvements in PAC models: joint model > others for delta–beta; rule-based stronger for theta–gamma (paired t-tests with FDR correction; reported p-values in text). Source localization highlighted bilateral superior temporal gyri; left inferior frontal gyrus and left anterior temporal lobe (close) showed significant clusters, aligning with regions implicated in syntactic processing and semantic composition.
- Interpretation: Delta–beta coupling increased for high surprisal (novel/non-redundant information), consistent with internal model updates. Theta–gamma coupling increased with higher entropy and syntactic integration (closing nodes), consistent with sampling/precision-weighted prediction and compositional integration operating in parallel.
- Overall: Structure- and statistics-based features jointly and complementarily shape MEG dynamics; syntactic features particularly engage delta-band phase-locked responses with temporally broader impact, while statistical features show strong theta-band sensitivity; cross-frequency coupling links their integration during continuous speech.
Discussion
The findings address whether structural (rule-based syntactic) and statistical (information-theoretic) cues jointly contribute to spoken language comprehension. Both feature sets significantly improved MEG reconstruction, and the joint model explained additional variance beyond either alone, challenging a strict structure-versus-statistics dichotomy. Syntactic features elicited broader temporal effects (including later lags consistent with integration) and showed stronger delta-band contributions, while statistical features robustly modulated theta- and cross-frequency dynamics linked to predictive processing. Feature-dependent PAC revealed complementary mechanisms: theta–gamma coupling tracked expected information gain (entropy) and syntactic integration (closing nodes), consistent with theta as a sensory sampling/read-out mechanism for high expected information; delta–beta coupling scaled with surprisal, consistent with encoding non-redundant information and updating internal models via beta-band dynamics. Spatially, effects overlapped in language-relevant regions including bilateral STG, left IFG, and left ATL, supporting parallel, multiplexed predictive and integrative computations. The results integrate predictive coding accounts with syntactic composition mechanisms, suggesting that low-frequency phase orchestrates higher-frequency power to align neural excitability with both statistical predictions and structure-building during continuous speech.
Conclusion
Using naturalistic MEG and forward encoding, the study shows that syntactic structure (depth, closing nodes) and statistical cues (surprisal, entropy) jointly shape neural dynamics during spoken language comprehension. Both feature sets improve reconstruction accuracy across bands, with syntactic features exerting temporally broader influences and stronger delta-band effects. Cross-frequency coupling provides a mechanistic link: theta–gamma coupling reflects precision-weighted sampling and integration, while delta–beta coupling indexes prediction-error-driven model updates. These results support a unified account in which structured and statistical information are processed in parallel and coordinated via cross-frequency coupling as speech becomes language. Future work should more fully integrate lower-level acoustic/phonemic predictors with lexical/syntactic models, further dissociate syntactic from semantic contributions within statistical features, examine language typological differences, and test causal roles of specific frequencies and regions (e.g., IFG, ATL, STG) in predictive and integrative operations.
Limitations
- Statistical features (surprisal, entropy) derived from GPT2 may implicitly encode some syntactic regularities; thus, they are not fully independent of structure, complicating attribution of effects.
- Delta/theta tracking can reflect acoustic envelope following or evoked responses; stronger envelope coherence for uncomprehended French indicates low-frequency measures are not purely linguistic.
- Naturalistic design limits experimental control and may include confounds (e.g., prosody, pauses) influencing low-frequency tracking.
- Word-level features do not capture full hierarchical and multi-level sensory-to-linguistic predictive context (phoneme-level, fine-grained acoustics not modeled), making top-down vs. bottom-up assignments ambiguous.
- Sample size is modest (n=25), and some effects were marginal or required multiple-comparisons corrections; generalizability should be tested across languages and tasks.
Related Publications
Explore these studies to deepen your understanding of the subject.

