logo
ResearchBunny Logo
Universal attractors in language evolution provide evidence for the kinds of efficiency pressures involved

Linguistics and Languages

Universal attractors in language evolution provide evidence for the kinds of efficiency pressures involved

I. A. Seržant and G. Moroz

This fascinating study explores how efficiency pressures shape language evolution, particularly in verbal person-number subject indexes across 383 languages. Conducted by Ilja A. Seržant and George Moroz, the research uncovers a universal attractor that balances complex coding with the need for seamless communication.

00:00
Playback language: English
Introduction
Language's efficiency is crucial for successful communication, encompassing both online (contextual, individual speaker) and offline (conventionalized, population-level) effects. Online effects manifest in spontaneous speech reductions, while offline effects arise from the conventionalization of efficient variants. Offline linguistic structures are dynamic, constantly changing due to semantic shifts and sociolinguistic factors. Efficiency pressures operate on different production stages (articulatory, processing, planning), creating trade-offs. Existing theories lack integration of these diverse efficiency effects and their conventionalization mechanisms. This study proposes that universal attractors, states that related states prefer to develop into but not away from, are essential components of such a theory. The paper focuses on subject indexing on verbs as a grammatical domain to investigate the existence and properties of an attractor state influencing language evolution.
Literature Review
Previous research has highlighted the importance of efficiency in language, particularly the Zipfian effect where word length correlates inversely with frequency. This reflects conventionalization of more efficient forms, often obscuring their origins (e.g., "pants" from "pantaloons"). Information-theoretic approaches primarily focus on articulatory efficiency (message length), neglecting processing and planning efficiency. Minimizing articulatory effort requires considering communicative success and pre-planning, incurring processing costs. Ambiguities from efficient cues demand more processing, leading languages to develop context-independent cues. Efficient cues thus arise from trade-offs between processing, planning, and articulatory efficiency; offline cues emerge via selection and conventionalization of online variants influenced by social factors. This paper expands on this by introducing the concept of universal attractors to account for cross-linguistic patterns in efficiency.
Methodology
The study employed a cross-linguistic perspective, analyzing data from 383 languages representing 53 families and all six world macro-areas. The focus was on intransitive verbs' subject indexes (1SG, 2SG, 3SG, 1PL, 2PL, 3PL), excluding duals. Data were manually collected, with 15 families contributing 10-50 languages to control for family effects. Two large families (Nuclear Trans New Guinea and Afroasiatic) were divided into subfamilies, and the Bantu subfamily was used to represent the Atlantic-Congo family. Proto-language forms were included where available (15 in total) to conduct diachronic analysis. Length was measured by segment count (proxied by letter count, adjusting for long segments). A Poisson mixed effects model (index length ~ person * number + (1|clade)) was used to model attractor lengths, with person and number as fixed effects and clade as a random effect. To analyze evolution towards the attractor, proto- and modern forms were compared, determining whether the length difference between the modern form and the attractor was smaller than that between the proto-form and attractor. A logistic mixed effects model (movement towards attractor ~ person * number + (1|clade)) was used to model this diachronic pressure. Preference for cumulative coding was tested by categorizing languages based on changes in compositionality from proto- to modern forms, using a logistic mixed effects model (compositionality ~ person * proto-language compositionality + (1|clade)).
Key Findings
Index lengths showed minimal variation across languages. The Poisson regression model supported the attractor hypothesis, revealing statistically significant effects of person and number on index length. The model predicted attractor lengths for each person-number combination (Figure 2). Diachronic analysis indicated that modern forms tend to evolve towards or remain within the attractor state (Figure 3), regardless of initial proto-language lengths. Languages with initially shorter forms tended to lengthen them, while those with initially longer forms shortened them. The model confirmed a high probability of adherence to attractor lengths across person-number combinations, especially in singular forms. Analysis of compositionality revealed a strong preference for non-compositional (cumulative) coding (Figures 4 and 5). The logistic mixed effects model predicted a high probability of non-compositional coding for each person. Table 1 shows person-number frequencies in a Russian corpus, illustrating higher frequencies for third person and singular forms.
Discussion
Despite language-specific processes (reduction, reanalysis, analogy), universal pressures channel the development of subject indexes. The attractor’s properties (specific lengths and cumulative coding) indicate dominant efficiency pressures: reduced processing and articulatory effort, although the latter is overridden by the need for constant information flow. Lexicon complexity and memory costs are weaker pressures, due to the high frequency of these grammatical elements. The non-zero coding of even the most frequent third-person singular suggests processing and planning efficiency outweigh articulatory ease. Obligatory indexes (never optional) also prioritize planning efficiency over articulatory efficiency, even leading to redundancy. The longer plural forms compared to singular forms maintain constant information flow, with segment choice prioritizing distinguishability. Cumulative coding, despite higher lexicon complexity and memory costs, is preferred, suggesting that processing ease trumps lexicon simplicity for high-frequency items, aligning with research showing a preference for cumulative coding in culturally significant domains (Kemp et al., 2018; Xu et al., 2020).
Conclusion
The study establishes a universal attractor state for subject indexing, revealing two dominant efficiency pressures: reduced processing and articulatory effort (though the latter is constrained by information density). Lower lexicon complexity and memory costs are weaker pressures for this high-frequency category. Future research could investigate these findings experimentally.
Limitations
The study relies on cross-linguistic comparative data. Experimental evidence would strengthen the conclusions. The reliance on existing proto-language reconstructions might introduce biases due to uncertainties in these reconstructions. Further investigation into the specific interaction between various types of efficiency pressures is necessary to refine the current model.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny