logo
ResearchBunny Logo
Introduction
The human capacity for speech comprehension relies on a complex neural system whose foundations are laid during the first few years of life. Behavioral studies demonstrate robust word recognition in infants as young as 4-6 months, suggesting increasingly sophisticated neural underpinnings. While the infant brain effectively tracks the speech envelope, previous cortical tracking studies haven't definitively shown phonetic feature encoding. This study aimed to address this gap by investigating the emergence of cortical encoding of phonetic features using a longitudinal cohort of infants, measuring electrophysiological responses to nursery rhymes at 4, 7, and 11 months, and comparing them to adult responses. The primary hypothesis was that acoustically invariant phonetic feature encoding would emerge during the first year of life. This study uses neural tracking, specifically the multivariate Temporal Response Function (TRF) analysis, applied to electroencephalography (EEG) data to analyze this question. TRFs allow investigation of the relationship between continuous stimuli (nursery rhymes) and neural activity. The study sought to determine if these speech sounds are encoded as categorical units in the infant brain and when this acoustically invariant encoding emerges.
Literature Review
A substantial body of research, primarily from behavioral studies, documents the developmental progression of speech processing in infants, including neonates. These studies, using methods like head-turn preference procedures, have revealed aspects like speaker familiarity, phonetic discrimination abilities, and sensitivity to native versus non-native contrasts. However, our understanding is limited by relying on simple behavioral measures. Neurophysiological studies are needed to complement this, focusing on the neural encoding of phonological information during natural listening tasks using continuous speech. Methodological challenges in infant neurophysiology have historically limited studies to discrete stimuli and evoked potentials (e.g., mismatch negativity or MMR). While advances, such as simultaneously measuring multiple vowel sounds, exist, these stimuli are still simplified and removed from natural speech. The neural basis for phonetic category encoding (/b/ vs. /p/) in continuous natural speech in infants remains largely unexplored, prompting this investigation.
Methodology
This longitudinal study re-analyzed EEG data from a previous study involving 50 infants (24 male, 26 female) and 17 adults. Infants participated in three EEG recording sessions at 4, 7, and 11 months of age. Adults performed the same task. Stimuli consisted of 18 nursery rhymes (vocals only), presented audio-visually. EEG signals were recorded using a Geodesic Sensor Net. Data preprocessing included low-pass filtering at 8 Hz, high-pass filtering at 0.1 Hz, artifact removal using Artifact Subspace Reconstruction (ASR), and interpolation of noisy channels. The 8-band acoustic spectrogram (S), a half-way rectified broadband envelope derivative (D), a 14-dimensional phonetic feature vector (F) representing voicing, manner, and place of articulation, and a visual motion regressor (V) were extracted from the stimuli. Multivariate TRF analysis was performed to model the relationship between these features and EEG signals. The analysis assessed EEG prediction correlations (Pearson's r) using leave-one-out cross-validation. To isolate acoustically invariant phonetic encoding, EEG prediction gains (F-S) were calculated by subtracting the correlation for acoustic-only TRFs from acoustic-phonetic TRFs. Statistical analyses included repeated measures ANOVA and post-hoc tests.
Key Findings
Multivariate TRF analysis revealed that EEG tracking of phonetic features (F), but not sound acoustics (S), significantly increased with age in the alpha (1-4 Hz) and theta (4-8 Hz) bands. Four-month-olds showed the strongest acoustic tracking, while statistically significant phonetic feature encoding emerged from 7 months of age. The analysis of acoustically invariant phonetic encoding (F-S), showed a significant age effect in both alpha and theta bands, with gains greater than zero emerging from 7 months. This pattern was consistent across multiple EEG channels, with centro-frontal electrodes showing larger correlations. Topographic patterns became progressively more similar to adults with age. A control analysis showed no significant effect of signal-to-noise ratio (SNR) on the results.
Discussion
This study provides longitudinal neurophysiological evidence of increasing phonetic encoding in the human cortex during the first year of life, consistent with hypotheses. Invariant encoding emerged from 7 months, occurring in both alpha and theta bands, aligning with previous adult studies. The strongest gains were in the alpha band, consistent with the rhythmic and stress patterns of nursery rhymes. This contrasts with studies using simplified stimuli, where younger infants show phonetic discrimination that may not generalize to natural speech. The use of nursery rhymes as natural and ecologically valid stimuli provided a rich phonological environment, contributing significantly to our understanding of phonetic category learning. The study successfully used continuous neurophysiological measurement and a forward TRF framework to evaluate phonetic category encoding, rather than relying on sound discrimination metrics. The findings enhance understanding of typical speech processing development.
Conclusion
This study demonstrates the emergence of acoustically invariant phonetic category encoding in infants during natural speech listening, using a novel methodology with ecologically valid stimuli. The findings provide critical insights into the development of speech processing in typical infants, offering a more nuanced perspective than previous studies using simplified stimuli. Future research should explore cross-linguistic comparisons, investigating the relationship between language characteristics and phonetic encoding development, as well as the relationship between speech TRFs and other aspects of cognition in infants and those with language learning disorders.
Limitations
The study's reliance on a specific type of continuous speech (nursery rhymes) may limit the generalizability of the findings to other speech contexts. The use of audio-visual stimuli, while ecologically valid, may have influenced the results. Although the visual motion regressor was controlled for, the influence of visual input might still impact neural responses. Further research is needed to disentangle the auditory and visual components.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny