logo
Loading...
Language prediction mechanisms in human auditory cortex

Linguistics and Languages

Language prediction mechanisms in human auditory cortex

K. J. Forseth, G. Hickok, et al.

This groundbreaking research conducted by K. J. Forseth, G. Hickok, P. S. Rollo, and N. Tandon uncovers two predictive mechanisms in the auditory cortex that enhance our understanding of speech perception and production. By revealing how distinct brain areas contribute to timing and response to acoustic stimuli, this study lays the foundation for cognitive models grounded in human neurobiology.... show more
Introduction

The study addresses how human auditory cortex implements predictive mechanisms that support speech perception and production. Given speech’s quasi-periodic, hierarchical structure, temporal prediction can reduce computational load by anticipating salient acoustic events and aligning neuronal excitability. Prior evidence of cortical entrainment to rhythmic inputs and speech suggested a role for oscillatory dynamics in timing prediction, while models of speech production propose predictive coding via motor-to-sensory feedback to anticipate sensory consequences of speech. However, the precise cortical loci and mechanisms of such predictions in early auditory cortex remained unclear. The authors investigated whether early auditory cortex predicts when acoustic events occur (timing) and whether it encodes what is expected (prediction error/suppression) during speech, using intracranial recordings from depth electrodes spanning planum polare, Heschl’s gyrus (HG) and transverse temporal sulcus (TTS), and planum temporale (PT). They designed tasks with rhythmic amplitude-modulated noise to isolate predictive timing signals and natural speech naming to test generalization to language, and used chronometric cortical stimulation to establish causal roles.

Literature Review

The introduction situates the work within frameworks of active sensing and predictive coding, where top-down predictions interact with bottom-up input. Evidence cited includes auditory cortical entrainment to rhythmic stimuli and speech, supporting the idea that cortical oscillations enable temporal prediction. In production, models (e.g., state feedback control, efference copy) require anticipation of sensory consequences, with empirical support for motor-induced suppression and predictive coding in auditory cortex. The debate over whether observed entrainment reflects true oscillatory alignment versus repeated evoked responses is noted. Theoretical accounts propose oscillatory mechanisms segment continuous speech into units via periodic windows of high excitability, facilitating comprehension and reducing temporal uncertainty. Prior surface ECoG work mapped onset versus sustained speech responses on lateral superior temporal gyrus, but lacked access to early auditory cortex in the supratemporal plane; thus, precise anatomical substrates of predictive timing and speech-specific suppression remained to be clarified.

Methodology

Participants: Thirty-seven patients (20 males, 16 females; mean age 33 ± 9; mean IQ 97 ± 15) with intractable epilepsy undergoing intracranial monitoring. Language dominance was determined by Wada (n=5), fMRI laterality index (n=7), cortical stimulation mapping (n=12), or handedness (n=13). Thirty-two were left-dominant; three were right-dominant.

Electrode implantation and recording: 7507 electrodes (6669 depth, 838 grid). Primary analyses focused on depth probes traversing the supratemporal plane (planum polare, HG/TTS, PT). Data acquired at 2000 Hz, 0.1–700 Hz band, NeuroPort NSP. Audio recorded synchronously. Artifact rejection excluded channels with interictal activity, proximity to seizure onset zones, excessive noise/saturation; remaining channels referenced to common average. Trials with epileptiform activity or incorrect/slow responses (>2 s) were removed.

Paradigms: (1) Rhythmic amplitude-modulated white noise: 3 Hz, 80% depth for 3 s followed by 833 ms constant-amplitude noise. In 50% of trials, a 1 kHz, 50 ms tone was presented during the unmodulated segment at one of five temporal positions and one of three intensities spanning 12 dB. Patients performed single-interval 2-AFC tone detection (100 trials per patient). Seven patients also tested at 5 and 7 Hz. (2) Natural speech: auditory-cued naming to definition (mean sentence duration 1.97 ± 0.36 s; 7.7 syllables; 6.9 acoustic edges). Patients named the target noun; accuracy >90% (mean 93%), mean RT 1.08 s. Reversed speech control required gender identification. Twenty-five patients completed 180 trials.

Imaging: Pre-op 3T MRI for cortical surface reconstruction (FreeSurfer), post-op CT co-registered to localize electrodes. Surface-based mixed-effects multilevel analysis (SB-MEMA) produced population activity maps.

Signal processing: Line noise removed with 60 Hz and harmonics band-stops. Analytic signals via frequency-domain bandpass Hilbert filters (2–16 Hz, 50 log-spaced steps; passbands 1–4 Hz). Extracted instantaneous amplitude and phase. Power was baseline-normalized (−300 to −50 ms) to percent change. Low-frequency phase alignment quantified with inter-trial coherence (ITC). Phase-power relationships examined across bands (low 2–15 Hz, beta, high-gamma 65–115 Hz). Temporal lags via cross-correlation between acoustic envelope and bandlimited neural envelopes. Acoustic edges defined as rapid amplitude increases; syllabic onsets annotated for comparison.

Classification: Nonnegative matrix factorization (NNMF, rank k=2, 1000 replicates) applied separately to high-gamma power and low-frequency ITC time series across supratemporal electrodes to derive sustained vs transient response archetypes and electrode class weights. Class bias defined as difference of weights; magnitude as sum. For articulation, listening-derived archetypes were fixed and weights recomputed.

Stimulation mapping: Clinical bipolar stimulation (50 Hz, 0.3 ms pulses for 3 s; 2–10 mA titrated) during sentence repetition, naming to definition, and picture naming in three patients; language-positive sites defined by articulation arrest/anomia. Chronometric stimulation in two additional patients targeted HG/TTS or PT identified physiologically. Experiment 1: clinical trains at sentence onset (listening) or offset (articulation) during repetition. Experiment 2: single 500 μs pulses triggered either at acoustic edges or uniformly distributed (matched total current) during naming to definition; performance measured as naming accuracy.

Statistics: Wilcoxon signed-rank tests for within-electrode/within-subject comparisons with familywise error correction; bootstrapping (n=1000) for phase–stimulus coupling using Kullback-Leibler divergence vs uniform; correlations via Spearman/Pearson as appropriate; significance thresholds reported (e.g., p<10^-3). Phase-space trajectories plotted with quarter-period delays to visualize dynamical states.

Key Findings
  • Sustained multispectral response in early auditory cortex during rhythmic white noise localized to HG/TTS in both language-dominant and non-dominant hemispheres (n=26 with supratemporal depth probes). Lateral STG did not show sustained responses.
  • Frequency-multiplexed encoding in HG/TTS: high-gamma power in-phase with the acoustic envelope; beta power synchronized at the stimulus trough; low-frequency power modulated by the rising slope (acoustic edge). Low-frequency phase reset was constrained to theta and did not occur in beta or high-gamma.
  • Traveling wave in HG/TTS: high-gamma power propagated mediolateral across HG/TTS at ~0.1 m/s, starting ~80 ms before to ~80 ms after each acoustic pulse peak.
  • Behavioral prediction: During low-intensity tone detection, detection accuracy was uniquely enhanced at the time of the first “missing” acoustic pulse (edge), indicating perceptual prediction aligned with neural phase reset.
  • Neural prediction: Low-frequency phase in HG/TTS maintained a sustained, predictive state for one cycle after rhythmic stimulation ceased (significant ITC in first prediction interval, p<10^-3), whereas high-gamma power carried no predictive information after rhythm ended.
  • Natural speech encoding in HG/TTS: Significant correlations between acoustic and neural envelopes with band-specific delays (low-frequency r=-0.0620 at ~135 ms; beta r=-0.0632 at ~95 ms; high-gamma r=0.0738 at ~45 ms; all p<10^-3). Low-frequency ITC increased for 125 ms following acoustic edges (p<10^-3) and more strongly than for syllabic onsets (p=0.0072). Effects persisted with reversed speech, indicating sublexical processing. Electrodes best tracking white noise envelope also best tracked speech envelope; electrodes showing predictive phase effects in noise also showed greater edge-related ITC in speech (both p<10^-3).
  • Distinct transient response localized to planum temporale (PT): A single-pulse, high-magnitude high-gamma spike with broadband low-frequency phase reset at onset that rapidly returned to baseline. NNMF across 349 supratemporal electrodes revealed an anteroposterior gradient from sustained (HG/TTS) to transient (PT) responses (high-gamma r=0.4101, p<10^-4; low-frequency phase r=0.7356, p<10^-16). Classifications from power and phase were correlated (r=0.4188, p<10^-6); discrete classes: sustained (n=74), transient (n=90), mixed (n=9). Transient responses were limited to language-dominant cortex; sustained responses occurred bilaterally.
  • Listening vs speaking dissociation: HG/TTS showed strong high-gamma during both listening and articulation (reduced during articulation), whereas PT responded robustly at ~100 ms after sentence onset during listening but was quiescent during articulation. NNMF showed sustained class preserved from listening to articulation (r=0.6663, p<10^-16) but transient class suppressed/uncorrelated (r=0.1094, p=0.3279); of 37 transient-listening electrodes, only 2 remained transient during articulation.
  • Causal evidence via stimulation: Clinical stimulation of HG/TTS impaired speech comprehension (sentence repetition and naming to definition) without affecting picture naming; evoked buzzing/ringing percepts. Stimulation of PT disrupted articulation in all tasks (including picture naming) and induced auditory hallucinations (e.g., sensation of people talking, echo). Chronometric stimulation: HG/TTS stimulation at sentence onset disrupted repetition, but at sentence offset did not impair articulation; PT stimulation at sentence offset caused articulatory failure. Edge-triggered single-pulse stimulation during naming: HG/TTS stimulation at edges reduced accuracy markedly compared with PT (Patient 1: 32% vs 81%, p<10^-3; Patient 2: 64% vs 95%, p<10^-3) and versus uniform HG/TTS stimulation matched for total current (Patient 1: 32% vs 61%, p=0.0246; Patient 2: 64% vs 86%, p=0.0397).
  • Overall: Two predictive mechanisms were identified in early auditory cortex: (1) temporal prediction of when, indexed by low-frequency phase in bilateral HG/TTS, which anticipates acoustic edges; and (2) predictive coding of what via suppression/error signaling in language-dominant PT, transient for external sounds and uniquely suppressed for self-generated speech.
Discussion

The findings delineate a functional architecture for prediction in human early auditory cortex. HG/TTS exhibits a sustained, multispectral encoding of acoustic envelope and low-frequency phase reset at acoustic edges that persists beyond rhythmic input, supporting temporal prediction mechanisms thought to segment continuous input and minimize temporal uncertainty. The same substrates and encodings generalize to natural speech, with stronger phase responses to acoustic edges than to syllabic onsets, indicating sublexical, acoustically driven processing. The observed traveling waves suggest organized spatiotemporal propagation optimizing computation. In contrast, PT shows a transient onset response that is absent during self-generated speech, aligning with predictive coding models of production in which efference copies suppress expected sensory outcomes and signal mismatches for externally generated sounds. The anatomical dissociation—sustained HG/TTS vs transient PT—together with causal stimulation demonstrates that HG/TTS contributes to speech perception/comprehension while PT contributes to articulation and error monitoring during production. These results bridge cognitive models of speech perception and production with neurobiological substrates, clarifying how when and what predictions are implemented in human auditory cortex.

Conclusion

This work identifies two complementary predictive mechanisms in early auditory cortex: temporal prediction in bilateral Heschl’s gyrus/transverse temporal sulcus via low-frequency phase alignment to acoustic edges, and speech-specific transient responses in language-dominant planum temporale that are suppressed during self-generated speech, consistent with efference copy-based predictive coding. These mechanisms operate for both simple rhythmic sounds and natural speech and are causally linked to perception and production, respectively. Future directions include concurrent thalamocortical recordings to definitively distinguish entrainment from evoked potentials, probing subcortical contributions, testing generalization across languages and prosodic structures, and investigating how these mechanisms interact with attention and higher-order linguistic processing.

Limitations
  • Entrainment versus evoked responses: While the sustained, predictive state persists beyond stimulus offset, definitively separating true oscillatory entrainment from recurring evoked responses likely requires concurrent thalamocortical recordings, which were not performed.
  • Direct evidence for internal predictive models: The study provides strong support for predictive coding during speech production (efference copies) but notes that direct evidence for internal predictive models instantiated in human cortex remains limited.
  • Cohort and data availability: Participants were epilepsy patients undergoing clinical monitoring, which may limit generalizability. Complete raw datasets with audio are not publicly available due to protected patient information; grouped data are available on request.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 22+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny