logo
ResearchBunny Logo
Introduction
This research addresses the poorly understood interaction between auditory and linguistic processing in speech comprehension. While much is known about brain responses to speech acoustics, less is understood about how these responses integrate into broader language processing. Phonology, bridging speech acoustics and morphosyntax, is crucial for this transformation. However, the high redundancy of phonological and acoustic information has led some to question phonology's relevance in speech processing. Studies showing that acoustic features alone can achieve human-like accuracy in word recognition and neural encoding models further fueled this uncertainty. Despite this overlap, divergences between acoustic and phonological similarity exist, suggesting neural dissociability. This study argues that the high degree of overlap between acoustic and phonological information highlights the need for careful experimental design to assess phonological information's unique contribution. Points of acoustic-phonemic divergence, common cross-linguistically but language-specific, offer a unique window into linguistic abstraction because they require higher-order knowledge of a specific language. The study focuses on English, examining the phonological opposition between /d/ and /t/ and their contextual acoustic neutralization to [r], a coronal tap. The researchers hypothesized that if phonological context is used to compute phonemic identity, even with acoustic neutralization, sites demonstrating a phonemic 'underlying response' (where /t/ taps are more similar to other /t/ allophones than /d/ taps) should exist alongside sites with an acoustic 'surface response'. Furthermore, the study investigates phonemic-morphemic divergence, focusing on regular past tense and plural formation in English. Here, the expectation is that sites exhibiting a morphological 'underlying response' (grouping voiced and voiceless forms of plurals and past tenses) should exist alongside sites showing a surface response based on acoustic similarity. Receptive field estimation techniques, specifically maximum noise entropy (MNE) models (linear and quadratic), are employed to assess the interplay between acoustic signals and language representation in the brain. By comparing linear and quadratic models, the study examines whether the neural response is impacted by stimulus covariance.
Literature Review
Previous research has extensively explored the brain's response to the acoustic properties of speech, revealing spatially organized tuning to spectrotemporal features in auditory areas. However, the neural processing of phonological features often appears to mirror acoustic results, raising questions about phonology's cognitive relevance. The significant overlap between acoustic and phonological information makes it difficult to isolate the unique contribution of phonological processing. Studies have demonstrated that word recognition models based solely on acoustic features can achieve accuracy comparable to humans. Similarly, neural encoding models based solely on acoustic information perform identically to those incorporating phonological information, emphasizing the richness of acoustic information. Despite these findings, well-established divergences exist between acoustic and phonological similarity, hinting at the possibility of neural dissociation. Some studies have shown that models incorporating both spectrotemporal and categorical phonological features outperform models using only one feature type, suggesting non-identical information captured by these feature sets. However, the precise relationship between these two information types remained unclear. The current study addresses this gap by focusing on language-specific points of acoustic-phonemic divergence to isolate the neural signatures of purely phonological processes and understand their relationship with sensory processing and language cognition.
Methodology
Ten patients with implanted intracranial stereo-EEG electrodes participated. They were native English speakers with no prior Catalan experience. The study involved a passive listening task with excerpts of conversational American English from the Buckeye Corpus interspersed with short passages of Catalan. For English excerpts, participants answered content questions to ensure attention. For Catalan excerpts, they pressed a key when they heard an embedded English word. Intracranial EEG data were pre-processed (artifact removal, filtering, downsampling). Speech responsiveness was determined by comparing neural responses to speech and silence using a sliding-window t-test. Three linguistic comparisons were used: (1) coronal stop-tap neutralization (/d/, /t/ neutralizing to [r]), (2) regular past tense, and (3) regular plural. For each comparison, electrodes were classified as exhibiting either a 'surface' response (reflecting acoustic similarity) or an 'underlying' response (reflecting phonemic or morphological similarity). Linear mixed-effects (LME) models were fitted to examine the relative contributions of spectrographic and phonemic features to neural responses across different frequency bands (delta, theta, alpha, beta, gamma, high-gamma, and broadband LFP). MNE models (linear and quadratic) were used to identify stimulus features driving neural responses, assessing the role of phonemic information and stimulus covariance. Model fitting was conducted using jackknifes to prevent overfitting. To assess the language-specificity of phonemic information, the analysis was repeated on Catalan data. AIC was used for model selection.
Key Findings
More sites sensitive to phonemic identity were observed than expected by chance. Electrodes sensitive to phonemic underlying responses were identified in the coronal stop-tap comparison, providing evidence for phonological abstraction. In LME models, lower-frequency bands were best fit by models including both spectrographic and phonemic features, indicating the significant contribution of phonemic information, particularly in delta, theta, and alpha bands. However, higher frequency bands (gamma and high gamma) were mainly driven by acoustic features. The broadband LFP responses were mainly explained by models using both phonemic and spectrographic features. MNE models revealed that the stimulus covariance structure enhances prediction accuracy when phonemic information is available; the second order MNE model significantly improved model fit only when phonemic labels were used. Importantly, phonemic label features did not improve model fits for Catalan speech, demonstrating a language-specific effect, confirming that the effect is dependent on the participants linguistic knowledge of the language being spoken. Both surface and underlying patterns of activity were identified for the past tense and plural comparisons, indicating evidence of early morphological processing. The temporal dynamics of both acoustic and phonemic patterns suggests simultaneous but largely separate processing in gamma and high-gamma bands.
Discussion
The findings challenge the assumption that language-specific grammar only operates at the syntactic level, demonstrating that language-specific phonological grammar significantly shapes neural responses to speech at the phoneme level. The presence of phonological underlying sites implies an abstraction between surface acoustic forms and prelexical representations. This abstraction is language-specific, reflecting both phonemic inventory differences and language-specific sound alternations. The results support the classical understanding of the phoneme as a psychological entity crucial for sublexical linguistic processing. Furthermore, the observation of more surface and morphological underlying sites than expected by chance in the past tense and plural comparisons provides evidence for early morphological processing and supports the idea that morphological identity is abstracted over phonologically distinct alternants in a structured, language-specific way. The use of LME and MNE models helped to ensure that the results reflect phonological, rather than lexical or semantic information, by accounting for the variance in the neural response that is not explained by acoustic features. The finding that phonemic information didn't improve Catalan model fits underlines the importance of language-specific knowledge in phonological processing.
Conclusion
This study provides strong evidence for the psychological reality of phonemes and morphemes as units of linguistic processing. Language-specific phonological and morphological knowledge guides neural responses to speech at a fine-grained level. The interplay between acoustic features and phonemic categories is crucial for linguistic abstraction, particularly in lower frequency bands. Future research could explore the neural mechanisms underlying these processes in greater detail and investigate the role of individual differences in shaping the neural response to speech.
Limitations
The study uses a relatively small sample size (ten participants). The use of intracranial recordings limits generalizability to the broader population. The passive listening task may not fully capture the complexity of natural speech comprehension. The analysis primarily focuses on regular morphological patterns, while irregular forms could provide additional insights. The study uses only two languages for comparisons of model accuracy, potentially reducing the scope and generalizability of conclusions that can be drawn.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny