
Linguistics and Languages
German in childhood and Latin in adolescence: On the bidialectal nature of lexical access in English
A. E. Hernandez, J. Ronderos, et al.
This compelling research conducted by Arturo E. Hernandez and colleagues highlights how the etymology of words affects lexical processing in English. Discover why native speakers process Germanic words faster than Latin-based ones, while non-native speakers find Latin words easier. Dive into the fascinating world of language acquisition!
~3 min • Beginner • English
Introduction
The study examines whether English lexical processing reflects a bidialectal pattern rooted in etymology: Germanic-origin words forming the base of early vocabulary and Latin-origin words increasing with later learning. The authors combined English Lexicon Project (ELP) items with Age of Acquisition (AoA) norms and etymological classifications (Germanic vs. Latin) for over 20,000 words, and incorporated English Crowdsourcing Project (ECP) data from native and L2 English speakers. They hypothesized that early-learned words would typically have Germanic roots, while later-learned words would be Latin-based. They further predicted that etymology would affect reaction times and accuracy in word recognition tasks even when controlling for AoA, frequency, and length. Finally, they anticipated that L2 English speakers would show relatively better performance for Latin-based words, reflecting an acquisition path emphasizing advanced, Latin-derived vocabulary.
Literature Review
Prior research has robustly documented AoA effects across tasks such as reading, lexical decision, picture naming, and eye-tracking, beyond influences of frequency and length (e.g., Barry et al., 2006; Juhasz, 2005; Zevin & Seidenberg, 2002, 2004; Juhasz et al., 2019; Dirix & Duyck, 2017). The network plasticity hypothesis and related mapping theories suggest early-learned words gain richer, more central representations, with later words integrating into established networks (Lambon Ralph, 2006; Zevin & Seidenberg, 2002; Brysbaert & Ellis, 2016; Chang & Lee, 2020). English exhibits a developmental shift from Germanic-origin vocabulary early to Latin-based vocabulary in formal, academic registers later (Bar-Ilan & Berman, 2007). Skilled readers are sensitive to graphotactic regularities tied to etymology (Treiman et al., 2018). Reilly and colleagues reported Latin-origin words are less imageable and that etymology relates to AoA, though earlier studies used smaller corpora and limited AoA ranges (Reilly & Kean, 2007; Reilly et al., 2007). This background motivates testing whether etymology uniquely contributes to lexical processing beyond AoA, frequency, and length, using a substantially larger word set spanning adolescence and adulthood.
Methodology
Data sources: The authors compiled 20,339 words by overlapping items from the English Lexicon Project (ELP; 40,481 words/nonwords; Balota et al., 2007) with AoA norms (Kuperman et al., 2012) and added English Crowdsourcing Project (ECP) datasets for native (Mandera et al., 2020) and L2 speakers (Brysbaert, 2020). Variables included AoA, (log) frequency, length, letters, phonemes, syllables, and performance (reaction times and accuracy) for naming and lexical decision in ELP/ECP.
Etymology coding: Ten undergraduate raters each classified batches of ~2000 words using the Online Etymology Dictionary. They identified roots (single vs. compound with two roots), and categorized each root (and present prefixes/suffixes) as Germanic, Latin (including Romance via Latin), or neither (coded 0; e.g., acronyms, proper names, onomatopoeia, other origins). Cross-rating was performed by a second rater; uncertain cases were adjudicated by the authors; three authors conducted final checks.
Statistical analysis: (1) AoA and etymology distributions were examined via linear regression (for etymology effects on AoA across single and compound words using Root 1 and Root 2) and chi-square tests on truncated AoA bins to assess distributional shifts in Germanic vs. Latin words across development. (2) For word recognition, separate linear regression models predicted accuracy and reaction times in ELP naming, ELP lexical decision, ECP lexical decision (monolingual), and ECP lexical decision (L2). Covariates were AoA, log frequency, and length. Etymology factors included Root 1 and Root 2 (Germanic vs. Latin vs. single-word baseline). (3) Planned contrasts (simple effects) following Fox & Weisberg (2018) were conducted to interpret interactions and compare specific etymological combinations (Germanic-Germanic, Germanic-Latin, Latin-Germanic, Latin-Latin, and corresponding single-root classes). The datasets are available at OSF: https://osf.io/fkr2j/?view_only=b8fedcffd19a4327ae5780412fd77163.
Key Findings
AoA and etymology (ELP): A regression showed a robust relationship between AoA and etymology (F(5, 20,292) = 722.6, p < 0.001, Adj. R^2 = 0.151). Germanic single words were acquired earliest (M = 8.45 years), while Latin-Latin compounds were latest (M = 11.34 years). Single Germanic words were learned earlier than Latin single words (b = −2.61, p < 0.001). For compounds with Germanic Root 1, AoA did not differ by Root 2 (b = −0.03, p = 0.84). For compounds with Latin Root 1, Latin-Germanic were learned earlier than Latin-Latin (b = −1.65, p < 0.001). A chi-square on truncated AoA categories confirmed distributional shifts: overall χ^2(15, 20,338) = 2679.95, p < 0.001, with Germanic words predominating early and Latin-based increasing in adolescence. Compound proportions: Germanic-Germanic > Germanic-Latin across ages (χ^2(15, 2335) = 27.55, p < 0.025); Latin-Germanic > Latin-Latin across ages (χ^2(15, 1070) = 199.59, p < 0.001). Single words showed a strong shift from Germanic to Latin with age (χ^2(16,933) = 2762.98, p < 0.001).
ELP naming: Covariates (AoA, log frequency, length) were significant. Etymology effects and Root1×Root2 interactions were significant for both accuracy and RT. Contrast highlights: Germanic-Germanic faster than Germanic single (ΔRT ≈ −29.1 ms, p < 0.001) and faster than Latin-Germanic (ΔRT ≈ −11.4 ms, p = 0.012). Germanic single slower than Latin-Germanic (ΔRT ≈ +17.7 ms, p < 0.001). For Latin Root 2, Germanic-Latin fastest (M ≈ 683.1 ms), faster than Latin single (ΔRT ≈ −44.6 ms, p < 0.001) and Latin-Latin (ΔRT ≈ −12.7 ms, p < 0.001). Mean RTs (approx.): Germanic-Germanic 679.5 ms; Germanic single 708.6 ms; Germanic-Latin 690.0 ms; Latin single 727.7 ms; Latin-Latin 740.4 ms. Accuracy was highest for compounds containing a Germanic root; Latin single and Latin-Latin were lowest.
Lexical decision (ELP vs. ECP monolingual): In ELP, etymology (Root 1 and Root1×Root2) significantly affected accuracy and RT; in ECP monolinguals, etymology did not affect accuracy but did affect RT. ELP accuracy means: Germanic-Germanic 88.0%, Germanic-Latin 87.7%, Germanic single 84.0%, Latin single 84.7%, Latin-Latin 82.6% (all key pairwise differences significant as reported). ELP RT means: Germanic-Germanic 752.3 ms (fastest), Latin-Latin 794.2 ms (slowest), with significant contrasts (e.g., GG vs GS b = 26.62 ms, p < 0.0001; GS vs GL b = 19.53 ms, p < 0.0001). ECP monolingual RTs: Germanic-Germanic 974.0 ms (fastest), Germanic single 1031.1 ms (slowest); significant differences mirrored ELP (e.g., GG vs GS b = −57.11 ms, p < 0.0001; GS vs LG b = 55.70 ms, p < 0.0001).
Lexical decision (ECP L2): Regression showed significant interactions (Root1×Root2) for both accuracy and RT (Adjusted R^2 = 0.564 for accuracy; 0.613 for RT). L2 accuracy means favored Latin-based words: Latin-Latin 77.3%, Latin single 77.0%, Germanic-Latin 75.9%, Latin-Germanic 75.4%, Germanic-Germanic 74.7%, Germanic single 71.6%. Significant differences within Germanic Root 2: Germanic-Germanic > Germanic single (b = 0.031, p < 0.001); Latin-Germanic > Germanic single (b = −0.038 for GS vs LG, p < 0.001). RT means: Latin single 1258 ms and Latin-Latin 1259 ms (fastest), others slower (Germanic single 1315 ms; Germanic-Latin 1318 ms; Germanic-Germanic 1321 ms; Latin-Germanic 1336 ms). Significant contrasts included GS vs LG (b = −20.43 ms, p = 0.0389) and, for Latin Root 2, GL slower than LS and LL by ~59–60 ms (p < 0.001). Overall, native speakers were faster and more accurate with Germanic-based items, while L2 speakers showed the opposite pattern, favoring Latin-based items.
Discussion
Findings support a bidialectal account of English lexical processing: Germanic-origin words constitute an early, core lexical base learned in childhood, whereas Latin-origin vocabulary grows markedly during adolescence and adulthood, especially in formal and academic registers. In native speakers, etymology predicted AoA and processing performance (accuracy and speed) beyond frequency, length, and AoA itself; compounds containing Latin roots were consistently slower and less accurate, with Latin-Latin compounds the most demanding. In contrast, L2 English speakers showed better accuracy and faster decisions for Latin-based words, consistent with an acquisition pathway emphasizing Latin-derived academic vocabulary.
These results align with and extend AoA frameworks that integrate neuroplasticity and representational richness: early-learned words occupy privileged positions and richer semantic networks; later-learned, abstract Latin-origin forms integrate into a system anchored by Germanic roots. The pattern also dovetails with evidence that readers are sensitive to orthographic-graphotactic regularities tied to etymology and with prior observations that Latin-based vocabulary tends to be less imageable and more abstract. The findings imply that English word recognition involves managing two partially distinct etymological systems, with implications for education, assessment, and theories of lexical access in diverse linguistic populations.
Conclusion
By assembling a large etymology-coded lexicon spanning over 20,000 words and linking it to AoA and behavioral performance, the study demonstrates that etymology robustly predicts when words are learned and how efficiently they are processed, above and beyond frequency, length, and AoA. Native English speakers show an advantage for Germanic-origin items, whereas L2 speakers favor Latin-based items, supporting a bidialectal view of English lexical access wherein Germanic forms scaffold early development and Latin forms expand the lexicon during adolescence and adulthood. The work highlights the importance of considering etymology in models of word recognition and in educational contexts reliant on academic, Latin-based vocabulary. Future research should examine how first language, dialectal variation, socioeconomic and educational factors, and age of English acquisition modulate these effects, and whether Latin-based vocabulary poses barriers to academic attainment.
Limitations
Participants and AoA ratings primarily reflected native English-speaking undergraduates, which may not generalize across socioeconomic, educational, or dialectal backgrounds. The study did not fully disentangle effects of L1 background among L2 learners or the timing of English acquisition. Potential variability across dialects (e.g., AAVE) and regional/social varieties was not assessed. While L2 analyses revealed clear differences, the influence of learners’ first languages (e.g., Romance vs. Germanic) on etymology effects remains to be systematically explored. The educational impact of Latin-based vocabulary on access to higher education warrants targeted study.
Related Publications
Explore these studies to deepen your understanding of the subject.






