logo
ResearchBunny Logo
The cortical representation of language timescales is shared between reading and listening

Linguistics and Languages

The cortical representation of language timescales is shared between reading and listening

C. Chen, T. D. L. Tour, et al.

This study reveals groundbreaking insights into how our brains process language, revealing shared representations for reading and listening across different timescales. The research, conducted by Catherine Chen, Tom Dupré la Tour, Jack L. Gallant, Daniel Klein, and Fatma Deniz, offers a fascinating glimpse into the cognitive mechanisms underlying language integration.

00:00
00:00
Playback language: English
Introduction
Human language comprehension involves integrating sensory input into a hierarchical structure, progressing from low-level features (e.g., phonemes, letterforms) to higher-level features (e.g., syntax, semantics, narrative arc). Previous research has explored brain representations at different levels of this hierarchy, but it remains unclear whether these representations are shared across reading and listening modalities. While low-level processing differs (visual cortex for reading, auditory cortex for listening), many cortical areas process both written and spoken language. This study aims to determine if higher-level language component representations are similarly organized for both modalities or if they are overlapping yet independent. Existing research lacks a direct comparison of cortical organization across high-level language components between reading and listening. Most studies compare brain responses generally, without specifying represented features. Others focus on limited components (low-level sensory features, word semantics, phonemes), lacking detailed differentiation of the language hierarchy. Studies differentiating levels often focus on a single modality. Therefore, this research directly addresses the question of shared cortical organization of the language hierarchy between reading and listening by operationalizing levels of the hierarchy as "language timescales"—spectral components varying over specific word counts—and comparing their cortical representation in both modalities.
Literature Review
Prior research on language comprehension has demonstrated a hierarchical processing model, with low-level sensory input being integrated into increasingly complex representations. Studies using fMRI have mapped the topographic organization of these representations in the cortex during spoken language comprehension. However, the extent to which these representations are shared across reading and listening modalities has not been thoroughly investigated. While differences exist in early sensory processing (visual vs. auditory cortex), many cortical regions process both modalities. The question of whether higher-level language components are represented similarly or independently across modalities remains unanswered. Existing studies often lack explicit feature descriptions or focus on limited components, hindering a comprehensive understanding of hierarchical language processing across modalities. This research directly addresses this gap by using a novel approach to analyzing language timescales, enabling a detailed comparison across modalities.
Methodology
Functional magnetic resonance imaging (fMRI) data were collected from nine participants (six males, three females; ages 24–36) while they read and listened to the same English narrative stories. Custom-made head molds minimized head motion. The stories were divided into training and test datasets. In the listening condition, stories were played through headphones, and in the reading condition, words were presented using a rapid serial visual presentation (RSVP) method. fMRI data were preprocessed using FSL, including motion correction and low-pass filtering to remove low-frequency voxel response drift. To analyze language timescales, a contextual language model (BERT) was used to extract contextual embeddings of the narratives. Linear filters were then applied to separate these embeddings into timescale-specific features (eight timescales ranging from 2–4 words to 256+ words). Voxelwise encoding models were used to estimate the timescale selectivity of each voxel, separately for reading and listening. These models predicted BOLD activity from timescale-specific features. Prediction performance was quantified using Pearson correlation coefficients. Language-selective voxels were identified using a one-sided permutation test (P<0.05, FDR corrected). Timescale selectivity, reflecting the average timescale to which a voxel was selective, was computed for each voxel. Finally, the cortical organization of timescale selectivity was compared between reading and listening, using Pearson correlation coefficients and permutation tests to assess significance. Sensory level feature spaces (spectrotemporal for auditory, motion energy for visual) were also constructed to differentiate linguistic and low-level sensory processing.
Key Findings
Comparisons of timescale representations between reading and listening revealed a strong positive correlation in timescale selectivity across language-selective voxels for each participant (p<0.001 for all participants). Visual inspection of cortical maps confirmed similar spatial organization of timescale selectivity for both modalities. Both modalities showed spatial gradients: from intermediate to long timescales in superior to inferior temporal cortex and posterior to anterior prefrontal cortex. The medial parietal cortex showed selectivity for long timescales in both. These findings were robust to variations in feature extraction methods. Low-level sensory features showed modality-specific activation (early visual cortex for reading, auditory cortex for listening), contrasting with the widespread, modality-independent linguistic processing across temporal, parietal, and prefrontal cortices. The order of presentation (reading then listening or vice-versa) influenced timescale selectivity, with longer timescales being slightly more prominent in the first modality. However, overall cortical organization remained consistent across modalities. Group-level analyses confirmed the strong correlation in timescale selectivity between reading and listening (r=0.48). Further analyses of the timescale selectivity profile (selectivity for each timescale separately) demonstrated consistent correlations between modalities. Cortical distribution of selectivity for each individual timescale was also highly similar across modalities. Short timescales were represented in posterior prefrontal cortex and superior temporal cortex, intermediate timescales in temporal, prefrontal, and medial parietal cortices, and long timescales in prefrontal cortex, precuneus, temporoparietal junction, and inferior temporal cortex.
Discussion
The findings demonstrate a striking similarity in the cortical organization of language timescale representations between reading and listening. This suggests that the integration of linguistic information proceeds similarly regardless of input modality after initial sensory processing. These results support the hypothesis that higher-level language processing shares common neural mechanisms across modalities. The observed spatial gradients in timescale selectivity within established cortical networks support the concept of a continuous gradient of language processing rather than discrete, specialized networks. The study builds upon previous findings by directly comparing reading and listening representations within individuals and by analyzing a finer range of timescales. The use of matched stimuli for both modalities addresses limitations of previous research which used different stimuli types (isolated sentences vs. full narratives) and may explain inconsistencies in previous findings. This research highlights the importance of using ecologically valid, naturalistic stimuli to reveal shared neural mechanisms in higher-level language processing. By explicitly modeling linguistic features, this study helps differentiate linguistic processing from modality-specific sensory processing or high-level control processes, offering a more refined understanding of brain activity during language comprehension.
Conclusion
This study introduces a novel methodology for investigating language timescales in the brain and provides compelling evidence for the shared cortical representation of these timescales across reading and listening. The strong correlation in timescale selectivity between modalities highlights the robustness of this finding. Future research could explore the effects of other variations in language processing (e.g., changes in presentation speed or task demands) on timescale representations, potentially using higher temporal resolution techniques (e.g., EEG, ECOG). Investigating alternative language models might also lead to more refined estimates of timescale selectivity.
Limitations
The temporal resolution of fMRI data (TR of 2 seconds) may limit the detection of very fine-grained distinctions in timescale selectivity. Low-pass filtering during preprocessing may have removed information about very long timescales. Future studies using higher temporal resolution techniques (EEG, ECOG) or methods that avoid low-pass filtering are needed to address these limitations. The reliance on current language model embeddings presents another limitation, as these embeddings may not capture all stimulus features. Using improved language models could offer more accurate timescale selectivity estimations.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny