logo
ResearchBunny Logo
Language prediction mechanisms in human auditory cortex

Linguistics and Languages

Language prediction mechanisms in human auditory cortex

K. J. Forseth, G. Hickok, et al.

This groundbreaking research conducted by K. J. Forseth, G. Hickok, P. S. Rollo, and N. Tandon uncovers two predictive mechanisms in the auditory cortex that enhance our understanding of speech perception and production. By revealing how distinct brain areas contribute to timing and response to acoustic stimuli, this study lays the foundation for cognitive models grounded in human neurobiology.

00:00
00:00
Playback language: English
Introduction
Humans effortlessly understand speech despite noisy acoustic signals, segmenting it into meaningful units. This complex process is achieved across diverse voices, accents, and speaking rates. The quasi-periodic and hierarchical nature of speech suggests that temporal prediction reduces the computational burden of decoding. Anticipating salient acoustic information optimizes neural network potentiation and signal discretization into linguistic elements. This 'active sensing' framework involves interplay between bottom-up sensory input and top-down predictive modulation of neuronal dynamics. Evidence for cortical entrainment—synchronization of external quasi-periodic stimuli and intrinsic neural activity—during speech perception suggests that cortical oscillations enable temporal prediction. Speech production also relies on predictive mechanisms, with models proposing that the brain anticipates sensory consequences of speech. However, the specific auditory cortical processing levels and cortical locations of these mechanisms remain unclear. This study uses intracranial recordings and stimulation to investigate the neurobiology of prediction in early auditory cortex using amplitude-modulated white noise and spoken naming tasks. The white noise task involves detecting a near-threshold tone, where prediction should uniquely persist during the constant amplitude interval. The speech task examines cortical responses to natural language, assessing the predictive encoding signatures identified in the white noise task. Finally, chronometric stimulation clarifies the causal involvement of specific neuroanatomical substrates during naming.
Literature Review
Prior research has highlighted the importance of temporal prediction in speech processing, suggesting that the brain anticipates the arrival of salient acoustic information to optimize neural processing and segment continuous speech into meaningful units. Studies have shown evidence of cortical entrainment, where neural oscillations synchronize with the rhythm of speech, supporting the role of oscillations in temporal prediction. Furthermore, models of speech production posit that predictive mechanisms are crucial, with the brain anticipating the sensory consequences of its own speech. However, the precise neural substrates and mechanisms underlying these predictive processes in human auditory cortex have remained elusive. This study aims to fill this gap by directly investigating the neurobiological underpinnings of prediction in early auditory cortex using intracranial recordings and stimulation.
Methodology
This study employed intracranial recordings from 37 epilepsy patients (20 males, 17 females; mean age 33 ± 9; mean IQ 97 ± 15) with depth electrodes implanted along the anteroposterior extent of the supratemporal plane. Language dominance was assessed using various methods (intracarotid sodium amytal injection, fMRI, cortical stimulation mapping, Edinburgh Handedness Inventory). Two paradigms were used: 1) Amplitude-modulated white noise (3 Hz, 80% depth for 3s, then constant amplitude for 1s) with a tone detection task to assess temporal prediction; and 2) Auditory-cued naming (naming objects described in short sentences) to study responses to natural speech. Electrocorticographic (ECoG) data were recorded at 2000 Hz sampling rate and 0.1–700 Hz bandwidth. Data were preprocessed to remove noise and artifacts. Analyses focused on high-gamma power (65-115 Hz) and low-frequency phase (2-15 Hz). Non-negative matrix factorization (NNMF) was used to identify distinct response types (sustained vs. transient). In three patients, direct cortical stimulation was performed to assess the functional roles of stimulated areas. Two additional patients underwent chronometric stimulation experiments during sentence repetition and auditory-cued naming, with stimulation triggered at either stimulus onset/offset or acoustic edges.
Key Findings
The study revealed two distinct predictive mechanisms in early auditory cortex: 1) A sustained response in bilateral Heschl's gyrus (HG/TTS), characterized by frequency-multiplexed encoding of acoustic envelope (low-frequency, beta, and high-gamma power) and acoustic edges (low-frequency phase reset). This sustained response persisted for at least one cycle after the rhythmic stimulus ended, indicating temporal prediction. The same mechanisms were engaged during natural speech listening, with acoustic edges more strongly encoded than syllabic onsets, suggesting sublexical processing. 2) A transient response in the planum temporale (PT) of the language-dominant hemisphere, characterized by a brief high-gamma power increase and low-frequency phase reset following acoustic onset. This response was uniquely suppressed during speech production, consistent with predictive coding. Direct cortical stimulation confirmed these functional dissociations: Heschl's gyrus stimulation disrupted speech comprehension, while planum temporale stimulation disrupted speech production. Stimulation experiments also showed that stimulating HG/TTS at acoustic edges impaired naming performance more than stimulating PT or uniform stimulation, further supporting the specific role of HG/TTS in processing acoustic edges during speech comprehension.
Discussion
The findings demonstrate a functional architecture of prediction in human early auditory cortex, with distinct roles for Heschl's gyrus and planum temporale. Heschl's gyrus is involved in temporal prediction, tracking both acoustic envelope and edges, which are crucial for speech segmentation and comprehension. The planum temporale plays a role in predictive coding during speech production, with its activity suppressed when the sensory input matches the predicted output. These results provide strong neurobiological evidence for the cognitive models of speech perception and production that incorporate predictive mechanisms. The observed frequency-multiplexed encoding in Heschl's gyrus suggests sophisticated information processing, supporting the hypothesis that the brain uses a hierarchical network of distinct computational channels for acoustic processing. The suppression of the transient response in planum temporale during self-generated speech strongly supports the theory of efference copy, where an internal model predicts sensory consequences of motor actions.
Conclusion
This study provides detailed insights into the neural mechanisms underlying speech perception and production, revealing two distinct predictive mechanisms in early auditory cortex. Heschl's gyrus's role in temporal prediction and the planum temporale's involvement in predictive coding during speech production are clearly demonstrated. Future research could investigate the interaction between these two regions and explore the role of subcortical structures in these predictive processes. Further studies could also investigate individual differences in predictive abilities and their correlation with language proficiency.
Limitations
The study is limited by its reliance on a patient population with epilepsy. While efforts were made to exclude data affected by epileptic activity, the possibility of residual effects cannot be entirely ruled out. The relatively small number of patients in the stimulation studies also limits the generalizability of those findings. Further research with larger sample sizes and other methodologies is needed to validate these findings and explore their broader implications.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny