
Linguistics and Languages
Global predictors of language endangerment and the future of linguistic diversity
L. Bromham, R. Dinnage, et al.
Exploring the fate of our world's languages, this research reveals startling insights: language contact isn't the main culprit behind endangerment. Conducted by Lindell Bromham and colleagues, the study highlights correlations between road density, schooling, and the alarming potential of tripling language loss in just 40 years. It's a wake-up call for urgent investment in language preservation efforts.
~3 min • Beginner • English
Introduction
The study addresses the global crisis in linguistic diversity, where nearly half of the approximately 7,000 languages are considered endangered, yet prior predictions of language loss have lacked statistically rigorous, global analyses. The authors aim to identify general, macroecological drivers of language endangerment that operate alongside language- and region-specific historical, social, and political influences. They analyze 6,511 languages using 51 predictors across demographic, educational, socioeconomic, environmental, connectivity, and policy domains. The purpose is to model current patterns and forecast future endangerment, while overcoming analytical challenges such as phylogenetic non-independence, spatial autocorrelation, and covariation among variables. The importance lies in revealing broad-scale correlates of endangerment to better focus conservation and revitalization efforts, similar to macroecological approaches in biodiversity research.
Literature Review
Prior work has highlighted extensive language endangerment and suggested rapid loss rates, sometimes projecting up to 90% loss within a century, but often without comprehensive statistical controls. Comparative studies have drawn analogies between species and language extinction risks, noted spatial associations with biodiversity, and suggested roles for colonization, globalization, and socioeconomic change. However, earlier analyses typically included fewer predictors and did not jointly control for spatial clustering and relatedness among languages, risking spurious associations. The Expanded Graded Intergenerational Disruption Scale (EGIDS) is a widely used framework capturing domains of use and intergenerational transmission and underpins this study due to its broad coverage. The authors position their work as extending and rigorously testing hypothesized drivers (e.g., contact, development, land use, education policy) at a global scale with robust statistical controls and forward projections.
Methodology
Data: 6,511 spoken L1 languages with ISO 639-3 codes. Nine world languages considered only as contextual national-level factors. Signed languages excluded due to data limitations. Endangerment measured via EGIDS, aggregated into seven ordered levels (1–6a as Stable; then 6b to 10). Variables (51 predictors) spanned 10 categories: language-level factors (e.g., L1 population, area, official status, documentation), neighbourhood diversity and local language ecology (e.g., number/evenness of neighbouring languages, local proportion endangered), education (recognition as language of education, national years of schooling, minority education policy, education spending), socioeconomic (GDP per capita, Gini, life expectancy at 60), land use (population density, cropland, built environment, pasture, human footprint), environment (growing season, mean annual temperature, temperature/precipitation seasonality), biodiversity loss (threatened species, proportion threatened), connectivity (road and navigable waterway distance scores, landscape roughness, altitudinal range), shift variables (changes in urbanization, population density, human footprint, croplands, pasture, built environment), and national world-language official status. Variables were computed per language polygon, national averages weighted by overlap, or in a 10,000 km² neighbourhood around the language polygon as appropriate. Transformations (log, square, square root, signed square root, cube) were applied per a predefined scheme.
Statistical modeling: The dependent variable (endangerment level) was modeled using an autoregressive ordinal probit regression accounting for three sources of autocorrelation: (1) phylogenetic relatedness (taxonomy-based matrix with scaled branch lengths), (2) spatial proximity (exponential function of great-circle distance between language polygon centroids), and (3) contact (binary overlap/100 km buffer of polygons). Each matrix was row-normalized with zero diagonals and assigned its own weight, estimated by maximum likelihood. Region-specificity was addressed via regional intercepts and interactions between each predictor and region (12 regions), with non-varying interactions removed. Variable selection: predictors were grouped by pairwise correlations; a stepwise selection on a 2/3 training set iteratively added/removes variables by likelihood improvement within groups to form candidate models. Predictive performance was evaluated on a 1/3 test set, and the best model comprised predictors appearing in over one-third of top-performing candidate models (not significantly worse than the top model). Final coefficients were estimated on the full dataset. Model fit explained 34% of variation in endangerment.
Future prediction: Using current EGIDS to infer intergenerational transmission and demographic structure, the authors projected L1 speaker declines for languages with reduced transmission, generating expected shifts in endangerment levels at 40 years (circa 2060) and 80 years (circa 2100). Repeated sampling (1,000 iterations) yielded distributions of endangered (EGIDS 6b–10) and Sleeping (EGIDS 9–10) languages overall and per hex grid (~415,000 km²). Additional projections incorporated plausible future changes in climate and land-use variables based on mid-range climate models and recent land-use change rates, while acknowledging uncertainty. The approach assumes stability for currently stable languages and no revitalization interventions (conservative).
Key Findings
- Current status: Of 6,511 languages, 37% are threatened or above (EGIDS 6b–10), and 13% of these are already Sleeping (no L1 speakers). Regions with the highest proportion endangered include Australia, North China, Siberia, North Africa and Arabia, North America, and parts of South America.
- Best model performance: Explains 34% of variation in endangerment globally.
- Consistent global and regional predictors (five):
1) L1 speaker population size (strongest predictor; fewer L1 speakers → higher endangerment).
2) Bordering language richness (more neighbouring autochthonous languages in contact → lower endangerment; multilingual contact itself is not a driver of loss).
3) Road density in the neighbourhood (higher road density → higher endangerment), likely via increased movement, commerce, and centralized governance influence rather than mere contact per se.
4) Average years of schooling (higher national average schooling → higher endangerment), independent of other socioeconomic indicators.
5) Number/proportion of endangered languages in the neighbourhood (regional contagion of endangerment).
- Non-predictors or contrary findings: Island status does not confer protection; barriers to movement (landscape roughness, altitudinal range, waterways) show no consistent protective association; GDP per capita and life expectancy do not predict endangerment; minority education policy presence is not globally associated with lower endangerment (likely due to heterogeneous implementation).
- Regional patterns: In Africa, greater pasture/cropland area associates with higher endangerment (possible subsistence-linked language shift). In Europe, temperature seasonality correlates with higher endangerment (Arctic erosion).
- Future projections without intervention: Language loss expected to at least triple within 40 years. By 2100, a nearly five-fold increase in Sleeping languages is projected, with at least 1,500 languages ceasing to be spoken. Hotspots of absolute loss: west coast of North America, Central America, Amazon, West Africa, north coast of New Guinea, northern Australia; later additions include Borneo, southwest China, and regions around the Caspian Sea. Highest proportional losses: Arctic, interior plains of North America, southern Chile (temperate), and the Sahara.
- Documentation risk: About one-third of languages projected to become Sleeping have little or no documentation, despite many currently having living L1 speakers.
Discussion
The findings demonstrate that while language-specific histories matter, broad-scale extrinsic factors significantly structure global endangerment patterns. Crucially, routine multilingual contact among Indigenous languages is associated with lower endangerment, challenging the notion that contact per se erodes vitality. Instead, infrastructural connectivity, particularly road density, emerges as a strong correlate of increased endangerment, likely facilitating bidirectional movement, integration into national economies, and spread of lingua francas and governance languages. Education, specifically higher national average years of schooling, correlates with greater endangerment, underscoring the role of formal education systems—especially where bilingual education is limited or oriented toward transition to a dominant language—in reducing intergenerational transmission of minority languages. The lack of association with GDP and other development metrics indicates that modernization proxies are not direct drivers apart from specific mechanisms like schooling and infrastructure. Regional analysis highlights context-dependent drivers (e.g., land use in Africa, climate in Europe), suggesting targeted, region-specific research and interventions. Forecasts indicate a substantial extinction debt due to interrupted transmission: without revitalization, many languages will lose L1 speakers over the next 80 years, with global hotspots aligning with areas of high diversity and rapid infrastructural expansion. These insights suggest actionable leverage points: prioritizing bilingual education implementation, community-based revitalization, and documentation efforts, and monitoring regions of rapid road expansion as red flags for both linguistic and biological diversity loss.
Conclusion
This study provides a rigorous global model of language endangerment, accounting for phylogenetic, spatial, and contact autocorrelation, and evaluating a broad suite of 51 predictors. It overturns the assumption that language contact itself drives loss, instead identifying road density, schooling, small L1 populations, and regional endangerment context as key correlates. The model predicts a tripling of language loss within 40 years and at least 1,500 languages becoming Sleeping by 2100 if no interventions occur. The work underscores urgent needs: invest in documentation (especially poorly documented threatened languages), implement and resource effective bilingual education that sustains L1 transmission, and support community-led revitalization programs. Future research should develop finer-grained, regionally resolved measures of schooling and policy implementation, integrate dynamic projections of education and infrastructure growth, and conduct targeted regional analyses (e.g., land-use effects in Africa, climate impacts in Europe and Central/East Asia) to refine predictions and guide interventions.
Limitations
- Data coverage and granularity: Global consistency constraints exclude many locally important drivers (e.g., historical policies, conflict, disease). Education and socioeconomic variables are largely national averages, masking within-country heterogeneity.
- Historical processes: Past events (colonization waves, forced assimilation, population collapses) are not directly captured, potentially producing extinction filter effects where current predictors miss past drivers.
- Model scope: Focuses on L1 speakers and intergenerational transmission; does not model L2 use or revitalization dynamics. Future projections assume no new interventions and stability for currently stable languages (conservative but possibly underestimating future risks for some languages).
- Classification and data uncertainty: Language/dialect distinctions, speaker counts, and EGIDS ratings have uncertainties and regional biases; some regions/families are under-represented or need expert revision.
- Signed languages excluded due to insufficient global data.
- Coarse treatment of policy: Presence of minority education policy does not capture quality, scope, or implementation variability at subnational scales.
- Autocorrelation controls rely on taxonomy-derived phylogeny, centroid distances, and polygon overlap; these may not fully reflect on-the-ground contact dynamics.
Related Publications
Explore these studies to deepen your understanding of the subject.