Linguistics and Languages

Expansion by migration and diffusion by contact is a source to the global diversity of linguistic nominal categorization systems

M. Allassonnière-tang, O. Lundgren, et al.

Explore how geographical proximity influences language features! This research by Marc Allassonnière-Tang and colleagues uncovers the dynamics of linguistic patterns through diffusion and expansion, revealing that gender systems expand while classifiers diffuse. A fascinating look into the mechanics of language evolution, this study reshapes our understanding of linguistic connection.

00:00

~3 min • Beginner • English

Index

Introduction

The study investigates how two dispersal mechanisms—feature diffusion via contact and language expansion via migration—shape the global distribution of nominal categorization systems (grammatical gender, noun classes, and classifiers). While prior typological models often conflate horizontal diffusion and vertical inheritance, the authors hypothesize that more grammaticalized systems (gender, noun classes) predominantly spread through language expansion, whereas less grammaticalized systems (classifiers) spread more through feature diffusion. The work aims to clarify the roles of these mechanisms in cross-linguistic distributions, addressing the importance of grammaticalization, stability, and areal effects in understanding global language diversity.

Literature Review

The paper reviews work on stability and diffusibility of linguistic features, noting that grammatical elements tend to be more stable and less borrowable than lexical items (Nichols, 1992; Dediu & Cysouw, 2013; Thomason & Kaufman, 1988; Matras, 2009; Matras & Sakel, 2007). Frequency correlates with stability in both grammar and lexicon. Gender and noun class are typically more stable than classifiers (Nichols, 2003; Greenhill et al., 2017; Allassonnière-Tang & Dunn, 2020), and grammatical gender rarely arises through contact (Stolz & Levkovych, 2021). Prior databases (WALS; Gil, 2013; Corbett, 2013) provided smaller samples for classifiers and gender/noun class. The authors also discuss theoretical expectations that classifiers diffuse horizontally more readily, whereas gender and noun classes are more robustly inherited, and highlight cognitive learnability and preferences associated with more grammaticalized systems (Bentz & Winter, 2013).

Methodology

- Data compilation: Constructed a database of 3,077 languages annotated for presence/absence of grammatical gender, noun classes, and classifiers. Initial data were extracted from grammars and sketches via a lightweight keyword-extraction technique, followed by manual verification using the Gramfinder tool under precise coding criteria. - Areal and phylogenetic cohesion: Employed Delaunay neighbors (geographic neighbors) and phylogenetic neighbors to assess geographic (areal) and phylogenetic cohesion of each system. Statistical comparisons used Wilcoxon rank-sum tests with continuity correction. - Latitudinal distribution: Compared distributions of latitude across languages with each system to explore potential alignment with east–west migration patterns. - Family diversity within grids: Divided the world map into 3,267 grids; for each grid and feature, counted the number of language families represented by languages with the feature to estimate family density, testing whether classifier systems occur across more families locally. - Geographic coverage of families: Computed normalized mean pairwise geographic distances among languages within each family to estimate family geographic spread. - Environmental factors: As proxies for mid-Holocene conditions (likely period of major spreads), analyzed variance in three environmental variables per feature: standard deviation of elevation, distance to water bodies (rivers), and precipitation (wettest quarter), using mid-Holocene projections. Statistical assessments included Quantile dispersion, Levene tests, and Conover tests to compare variances across features.

Key Findings

- Prevalence: Of 3,077 languages, 26.5% (814) have classifiers, 20.1% (634) have gender, 10.3% (317) have noun classes, and 46.6% (1,434) have none of the three systems. Classifiers are mainly in Asia; gender is prominent in Europe; Africa shows both noun classes and gender; languages with both gender and classifiers occur mainly in South America and Papua New Guinea. - Areal and phylogenetic cohesion: Classifiers have significantly lower geographic cohesion (mean ~0.5) than gender (~0.6–0.7; w=207,207, p<0.001) and noun classes (~0.7–0.8; w=79,545, p<0.001). Gender has significantly lower geographic cohesion than noun classes (w=82,972, p<0.001). Phylogenetic cohesion: classifiers (~0.6–0.7) are significantly lower than gender (~0.8–1; w≈195,193, p<0.001) and noun classes (~0.9–1; w=79,472, p<0.001); gender is significantly lower than noun classes (w=84,686, p<0.001). - Latitudinal patterns: Noun classes are concentrated within a narrower latitude band (likely reflecting Africa), whereas classifiers and gender span broader latitude ranges. - Family density in local areas: Classifier languages occur across more language families within grid cells than gender (w=294,410, p<0.001) and noun classes (w=171,006, p<0.001). Gender shows higher family density than noun classes (w=117,264, p<0.001). - Family geographic coverage: Indo-European has the largest normalized geographic coverage (1.00), followed by Austronesian (0.62), Eskimo-Aleut (0.51), Afro-Asiatic (0.48), Turkic (0.46), Atlantic-Congo (0.41), Tungusic (0.40), Mongolic-Khitan (0.34), Athabaskan–Eyak–Tlingit (0.34), and Pama–Nyungan (0.33). Several of these families with broad expansion are associated with gender and/or noun class systems. - Environmental variance: Variance in distance to rivers, precipitation (wettest quarter), and standard deviation of elevation is largest for classifier languages, intermediate for gender, and smallest for noun class languages (consistent across multiple dispersion/variance tests).

Discussion

Findings support the hypothesis that gender and noun class systems exhibit stronger vertical heritability (higher phylogenetic cohesion) than classifiers. Contrary to expectations that more grammaticalized features diffuse less, gender and noun class also show higher geographic cohesion than classifiers, suggesting that mechanisms beyond local contact—specifically language expansion through migration and associated language shift—have shaped their distributions. Classifiers display higher family diversity within geographic grids and greater environmental variance, consistent with broad areal diffusion across different families and environments. Latitudinal and family coverage patterns further align with large-scale expansions (e.g., Indo-European in Eurasia, Bantu within Niger-Congo in Africa) influencing the spread of gender/noun class systems. The results imply that standard quantitative assessments of diffusibility and stability can be confounded by language expansion, and that distinguishing feature diffusion from language expansion is necessary for accurate interpretation of global typological patterns.

Conclusion

The study contributes a large database (3,077 languages) of nominal categorization systems and demonstrates that global distributions are shaped by both feature diffusion and language expansion. More grammaticalized systems (gender, noun class) appear to have spread primarily via language expansion, while classifiers spread more via areal diffusion. Recognizing these distinct mechanisms clarifies apparent contradictions between diffusibility and stability measures. The authors encourage extending this framework to other linguistic domains (phonology, syntax, semantics) and developing evolutionary models that jointly incorporate linguistic factors (e.g., grammaticalization) and non-linguistic factors (e.g., migration, environment) to more accurately model feature spread.

Limitations

- The study does not directly model or control family-specific dynamics or universal cognitive preferences that may inflate or diminish certain systems’ distributions; due to data and methodological limitations, such factors are not accounted for. - Language contact processes are influenced by complex, sparsely documented social factors, limiting precise inference of diffusion pathways. - The timing of spreads is largely unknown; environmental proxies rely on mid-Holocene projections and associated assumptions. - Measures of geographic cohesion do not fully disentangle family effects (features may diffuse within families), and areal measures cannot conclusively separate diffusion from expansion.

Related Publications

Explore these studies to deepen your understanding of the subject.

Interdisciplinary Studies

What is newsworthy about Covid-19? A corpus linguistic analysis of news values in reports by China Daily and The New York Times

S. Liu and H. Yu

Medicine and Health

Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021

G. 2. D. Collaborators, D. Kanyin, et al.

Medicine and Health

Gastric emptying of a glucose drink is predictive of the glycaemic response to oral glucose and mixed meals, but unrelated to antecedent glycaemic control, in type 2 diabetes

C. Xiang, Y. Sun, et al.

Linguistics and Languages

Global predictors of language endangerment and the future of linguistic diversity

L. Bromham, R. Dinnage, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny