
Psychology
Emergence of the cortical encoding of phonetic features in the first year of life
G. M. D. Liberto, A. Attaheri, et al.
Discover how infants, even before uttering their first words, develop sophisticated speech processing skills! This enlightening study conducted by Giovanni M. Di Liberto and collaborators investigates cortical phonetic feature encoding in infants, revealing neurophysiological evidence of pre-verbal phonetic category learning.
~3 min • Beginner • English
Introduction
The study addresses when and how the infant cortex begins to encode phonetic feature categories during naturalistic continuous speech listening. Prior infant research largely relied on behavioural tasks (e.g., head-turn, MMN/MMR paradigms) using discrete stimuli, which do not capture neural encoding during continuous speech and often focus on limited contrasts. Although infants show early sensitivity to speech rhythm and some phonetic discrimination, it remains unclear when acoustically invariant, categorical phonetic encoding emerges in the brain during natural listening. The authors hypothesised that phonetic feature encoding, invariant to acoustic variability, would emerge across the first year of life—becoming evident from around 6–7 months—while cortical tracking of basic acoustics (e.g., envelope/spectrogram) would not necessarily increase with age. Using EEG and temporal response function (TRF) analyses with nursery rhymes, the study aims to chart the developmental trajectory of cortical phonetic feature encoding at 4, 7, and 11 months, compared with adults.
Literature Review
Neural studies of infant speech processing have been constrained by methods requiring discrete stimuli (e.g., MMN/MMR), limiting ecological validity and phonological richness. Cross-sectional EEG work has examined multiple vowels in isolation and suggested early development of perceptual vowel spaces, but these stimuli omitted consonants and broader phonology. In adults and older children, neural tracking has revealed encoding beyond acoustics, including phonetic features, phonotactics, and semantic measures, with phonetic encoding linked to phonemic awareness and second-language proficiency. Intracranial recordings in adults localized phonetic feature encoding in superior temporal regions. However, infant work demonstrating envelope tracking in naturalistic contexts had not established phonetic feature encoding. This gap motivated testing the emergence and acoustic invariance of phonetic category encoding in infants during continuous, ecologically valid speech (nursery rhymes).
Methodology
Design: Longitudinal EEG study re-analyzing an existing dataset. Fifty full-term, typically developing infants (24 male, 26 female) were recorded at 3 time points: 4 months (mean 115.6 ± 5.3 days), 7 months (212.5 ± 7.2 days), and 11 months (333.0 ± 5.5 days). Seventeen monolingual English-speaking adults (after exclusions) completed the same task. Ethics approval: University of Cambridge Psychology Research Ethics Committee; written informed consent obtained.
Stimuli and task: Participants watched videos of a native British English female singing/chanting 18 nursery rhymes (infant-directed style), each repeated 3 times for adults (total ~20 min 33 s). Infants listened to at least two repetitions (~13 min 42 s). A metronome during recording (not in the final audio) maintained consistent rhythm. Eye tracking (Tobii TX300) assessed infant gaze.
EEG acquisition and preprocessing: EEG was recorded with EGI Geodesic Sensor Net (infants: 64 channels; adults: 128 channels later spline-interpolated to 64 to match infant montage). Four facial electrodes were excluded; remaining 60 channels were bandpass filtered 0.1–8 Hz (Butterworth order 2), then downsampled to 50 Hz. Artifact Subspace Reconstruction (EEGLAB clean_asr) removed artifacts. Noisy channels (probability/kurtosis >3 SD) were interpolated (spherical). Data were re-referenced to linked mastoids (mastoid channels then removed), leaving 58 channels. Repeated trials were averaged. Three infants were excluded due to excessive noise in any session. For evaluation, a "ground-truth" EEG trace per age group and band (0.1–1 Hz; 1–4 Hz alpha; 4–8 Hz theta) was computed by averaging across channels and participants within each group.
Feature extraction: Acoustic spectrogram (S): 8 logarithmically spaced bands 0.25–8 kHz (Greenwood). Broadband envelope computed by summing across bands; its half-wave rectified derivative (D) was used as a nuisance regressor. Phonetic features (F): 14 articulatory categories (voiced, unvoiced, plosive, fricative, nasal, strident, labial, coronal, dorsal, anterior, front, back, high, low). Phoneme timing derived from transcripts, syllabic rate/onsets, TextGrid alignment, and manual adjustment (Praat). Visual motion (V): frame-to-frame mean luminance change as a nuisance regressor.
TRF modeling: Multivariate Temporal Response Functions (mTRF toolbox) modeled the forward mapping from stimulus features to EEG in a lag window of −100 to 500 ms, using Tikhonov regularization (lambda grid 0.01–10) with leave-one-out cross-validation across nursery rhyme trials. Prediction performance was quantified as Pearson’s r between each participant’s predicted EEG and the group ground-truth EEG, averaged across channels.
Analyses:
- Separate TRFs for acoustic spectrogram (S) and phonetic features (F) assessed age effects on tracking in low (0.1–1 Hz), alpha (1–4 Hz), and theta (4–8 Hz) bands.
- Categorical, acoustically invariant phonetic encoding was isolated via an acoustic-phonetic model (S+D+V+F) and an acoustic-only model (S+D+V). The EEG prediction gain (FS−S) was computed by subtracting acoustic-only prediction r from acoustic-phonetic prediction r. This was examined in bands where F showed age effects (alpha, theta).
- Topographies, individual trajectories, and SNR control: spatial distributions of r; Friedman test for SNR differences (none found).
Statistics: Repeated measures ANOVA across infant ages with corrections (Greenhouse–Geisser when sphericity violated); nonparametric Friedman tests when normality violated; Wilcoxon signed-rank tests for post hoc and one-sample tests; FDR correction applied where appropriate. Descriptives as mean ± SE.
Key Findings
- Phonetic feature tracking (TRF_F) increased with age in infants:
- Alpha (1–4 Hz): F(2,92)=4.6, p=0.013, ηp²=0.091.
- Theta (4–8 Hz): F(2,92)=5.8, p=0.004, ηp²=0.112.
- No significant age effect in low band (0.1–1 Hz): F(1.95,89.7)=0.9, p=0.427.
- Acoustic spectrogram tracking (TRF_S) showed a different pattern:
- Strongest tracking at 4 months in low band and alpha:
- Low: F(1.55,71.2)=6.9, p=0.002, ηp²=0.131.
- Alpha: F(1.74,80.2)=6.8, p=0.002, ηp²=0.129.
- No significant age effect in theta: F(2,92)=0.5, p=0.580.
- Categorical, acoustically invariant phonetic encoding (EEG prediction gain FS−S) emerged from 7 months:
- Age effect in alpha: F(2,92)=6.0, p=0.003, ηp²=0.115; gains >0 from 7 mo (4mo: p=0.554; 7mo: p=0.044; 11mo: p=0.002; adults: p=0.038). Post hoc: 4→7mo p=0.041; 4→11mo p=0.004; 7→11mo n.s. p=0.220.
- Age effect in theta: F(2,92)=5.0, p=0.009, ηp²=0.098; gains >0 from 7 mo (4mo: p=0.722; 7mo: p=0.006; 11mo: p=0.007; adults: p=0.723). Post hoc: 4→7mo p=0.033; 4→11mo p=0.022; 7→11mo n.s. p=0.849.
- Topographies: Delta-band phonetic tracking showed centro-frontal maxima across ages; infant topographies became progressively more similar to adults (bootstrap average correlations with adult maps: r=0.44 at 4mo, 0.58 at 7mo, 0.60 at 11mo).
- Control: No SNR differences across ages (Friedman χ²(2,92)=0.2, p=0.917).
Discussion
Findings provide longitudinal neurophysiological evidence that cortical encoding of phonetic feature categories strengthens across the first year of life during naturalistic nursery rhyme listening. Importantly, the increase pertains to phonetic category encoding rather than general acoustic tracking, which was strongest at 4 months for low and alpha bands. The emergence of acoustically invariant phonetic encoding from 7 months aligns with developmental accounts of increased selectivity for native contrasts during the latter half of the first year. Results suggest that early behavioural discrimination of syllables by very young infants may sometimes reflect acoustic differences or task simplification rather than robust categorical encoding during continuous speech. The observed encoding in both alpha and theta bands is consistent with adult TRF literature, with the alpha band particularly salient in infant-directed nursery rhymes that emphasize rhythmic and stress patterns. Using forward TRFs with continuous, ecologically valid stimuli overcomes limitations of discrete MMN/MMR paradigms and enables direct assessment of phonetic category encoding during natural listening. The developmental trajectory identified here offers a platform to investigate how phonetic encoding relates to other cognitive processes (attention, prediction) and to study atypical trajectories (e.g., dyslexia, developmental language disorder).
Conclusion
This study demonstrates that the infant cortex progressively acquires acoustically invariant phonetic feature encoding during the first year of life, with robust emergence from around 7 months, as measured by EEG TRFs during naturalistic nursery rhyme listening. The work advances methods for assessing phonetic category encoding in pre-verbal infants using continuous speech, distinguishing phonetic processing from acoustic tracking. Future research should test generalization to auditory-only continuous speech, examine cross-linguistic differences and bilingual development, and relate infant phonetic TRFs to later language outcomes and cognitive factors such as attention and prediction. Applications include early identification and mechanistic understanding of developmental language disorders.
Limitations
- Stimuli were audio-visual nursery rhymes with strong rhythmic structure; results may differ for auditory-only or less rhythmic natural speech.
- Infant EEG topographies change with age due to anatomical development, limiting channel-wise correspondence; analyses averaged across channels, a conservative approach that may understate localized effects.
- The study re-analyzed an existing dataset; although SNR controls indicated no group differences, infant EEG inherently has variable SNR.
- Generalizability to other languages and to non–infant-directed speech remains to be established.
- While invariant encoding emerged from 7 months, mechanisms underlying earlier behavioural discrimination in simplified tasks were not directly tested here.
Related Publications
Explore these studies to deepen your understanding of the subject.