Linguistics and Languages
Early humans out of Africa had only base-initial numerals
O. Her, Y. Liang, et al.
The paper addresses why languages differ in the order of elements within multiplicative numerals (base-initial vs. base-final) and tests the hypothesis that early human languages in Africa used base-initial numerals. Using a new global survey of 4099 languages, the authors quantify the worldwide distribution of base orders and propose that base-initial ordering was ancestral, with base-final orders outside Africa likely emerging later, potentially influenced by numerical notations and writing systems. They outline convergent lines of evidence to evaluate this claim: (i) present-day dominance of base-initial systems in Africa, (ii) a communicative-efficiency account favoring base-initial order at the emergence of multiplication (shortest distance principle), (iii) typological harmonization with head-initial noun phrase order, and (iv) phylogenetic reconstructions on a global language tree (with an Africa subset).
Prior typological work has focused on word order but largely overlooked base order in numerals, with claims that ordering is irrelevant (Comrie, 2013). Numeral systems are highly endangered and often converge to decimal base-final systems (Comrie, 2007; Freitas & Shell-Gellasch, 2012). Chrisomalis (2010) observes that numerical notation systems are universally base-final, suggesting potential influence on spoken numeral order. Foundational work establishes addition as a prerequisite for multiplication in numeral systems and documents a cross-linguistic preference for larger-plus-smaller ordering in additive numerals (Greenberg, 1978; Hurford, 2007; Chrisomalis, 2010; Liu & Xu, 2019). Word-order typology indicates consistent head-modifier relations across phrases (Dryer, 2006, 2013), and previous studies argue for harmonization between numeral bases, classifiers, and noun phrase order (Allassonnière-Tang & Her, 2020). Broader context comes from genetic and migration studies supporting Out-of-Africa and back-to-Africa scenarios (Liu et al., 2006; Mellars, 2006; Nielsen et al., 2017; Haber et al., 2019; Hodgson et al., 2014; Slebusch & Jakobsson, 2018), and work on worldwide linguistic diversification and phylogenetic methods (Bouckaert et al., 2022; Gell-Mann & Ruhlen, 2011; Campbell & Poser, 2008; Atkinson, 2011).
Data collection combined manual and automatic surveys of grammatical descriptions to code base order for each language in Glottolog. Sources included WALS (Comrie, 2013), WACL (Her et al., 2022), Allassonnière-Tang & Her (2020), Eugene Chan's Numeral Systems of the World's Languages, and automatic searches in the DReaM Corpus (Virk et al., 2020). For 4099 languages with sufficient data, the team determined: whether multiplicative bases are used; if so, whether order is consistent; and if consistent, whether base-initial or base-final. Languages were categorized as initial, final, split, or not_used. Minimal criteria for multiplicative bases required at least one base with at least two different multipliers. A subset analysis identified 182 languages with additive numerals but no multiplicative numerals. Statistical analyses tested alignment between base order and adjective-noun order using: (i) Chi-square tests and logistic regression; (ii) generalized linear mixed models (GLMMs) with language family and macroarea as random effects; and (iii) conditional inference trees (decision trees using permutation tests). Predictive performance was compared to a majority baseline (base-final = 0.48). For phylogenetics, the authors used the Bouckaert et al. (2022) world language tree, pruned to languages with both base order and adjective order. They also analyzed an Africa-only subset (filtered by Glottolog coordinates). Ancestral character estimation (equal-rates model with discrete characters) and reverse jump MCMC (RJ-MCMC) were used to infer root states and transition rates. RJ-MCMC parameters: 1,000,000 generations, 50% burn-in, sampling every 1000 iterations (500 samples), stepping-stone sampler with 100 stones and 1000 iterations per stone to estimate marginal likelihood. Correlated evolution between base order and adjective order was assessed by combining states (base-final/base-initial × N-final/N-initial) and comparing independent vs. dependent models via Bayes factors.
- Global distribution (n = 4099): base-final 1981 (48%), base-initial 1593 (39%), split 183 (4%), not_used 342 (8%).
- Africa (n = 1222): base-final 113 (9%), base-initial 1035 (85%), split 68 (6%), not_used 6 (0%).
- Additive-only languages (n = 182): base-initial 153 (84%), base-final 5 (3%), both 4 (2%), hard to determine 20 (11%); only 6 of these are in Africa.
- Communicative efficiency: The shortest distance principle predicts preference for expressing the element closest to the target value first, favoring base-initial order for both additive and multiplicative numerals; base-initial mitigates ambiguity by keeping simple numerals unambiguous and encourages overt marking of addition (observed in 121/153, 79%, of additive-only base-initial languages).
- Typological alignment: Strong harmonization between base order and adjective–noun order globally and in Africa. • Chi-square: X2 = 217.98, df = 3, p < 0.001; Cramer’s V = 0.47. • Logistic regression: N-initial languages are ~18× more likely to be base-initial. • GLMM (family and macroarea as random effects): model with adjective order outperforms null (AIC 551 vs. 641, p < 0.001); N-initial positively associated with base-initial (Est. 3.3514, SE 0.4293, p < 0.001, R2 = 0.6). • Conditional inference trees: accuracy 0.61 vs. 0.50 baseline; with family and area, accuracy 0.79 vs. 0.50 baseline.
- Phylogenetic results:
• Root state: RJ-MCMC indicates higher mean probability for base-initial at root than base-final worldwide and in Africa (p = 0.251 for base-initial), significantly higher than base-final via t-test with Bonferroni correction; ancestral character estimation is less decisive but consistent.
• Correlated evolution: Dependent model favored over independent.
- Worldwide: logML dependent = -522.3302, independent = -548.5822; Bayes factor = 52.50392 (very strong evidence).
- Africa: logML dependent = -127.5091, independent = -134.2548; Bayes factor = 13.49136 (strong evidence). • Root combination more likely base-initial & N-initial (world mean p = 0.261; Africa mean p = 0.253). • Transition tendencies: base and adjective–noun orders tend to harmonize; in Africa, base-final & N-final is less stable than base-initial & N-initial; N-final tends to shift to N-initial; once harmonized as base-initial & N-initial, change is less likely.
- Historical interpretation: African dominance of base-initial aligns with Niger-Congo prevalence and reconstructions (e.g., PNC additive base-initial patterns). The distribution of base orders in Africa may reflect early Eurasian back-migration introducing base-final orders and later Bantu expansion spreading base-initial orders; outside Africa, base-final orders likely spread later, potentially influenced by universally base-final numerical notation systems.
The findings support the hypothesis that early human languages employed base-initial numerals. Present-day African languages overwhelmingly exhibit base-initial order, and additive-only systems strongly favor base-initial, aligning with the shortest distance principle that prioritizes communicating the closest approximation to the target numerosity first. Typological analyses demonstrate robust harmonization between numeral base order and noun phrase head order, with N-initial languages much more likely to have base-initial numerals. Phylogenetic reconstructions, despite uncertainty, preferentially infer base-initial (and N-initial) states at the root for both global and Africa-focused trees, and correlated evolution between base order and adjective–noun order is strongly supported. These convergent lines of evidence suggest that base-initial numerals are ancestral, with the widespread base-final patterns outside Africa likely arising later, possibly facilitated by the universal base-final character of numerical notation and subsequent diffusion. Historical population movements (back-to-Africa migrations and Bantu expansion) further help explain the current African distribution of base orders. Overall, the evidence addresses the research question by showing that cognitive-communicative efficiency, typological synchronization, and phylogenetic trends jointly favor base-initial origins.
The study proposes that base-initial ordering in additive and multiplicative numerals is the likely ancestral state of human languages. This is supported by a communicative-efficiency account (shortest distance principle) and by synchronic and diachronic evidence of harmonization between numeral base order and head directionality in noun phrases. Global surveys, statistical modeling, and phylogenetic analyses (with Africa-focused subsets) consistently point to base-initial and head-initial combinations as more stable and likely at the root. The dominance of base-final orders in many non-African languages is plausibly a later development, potentially linked to numerical notation and writing. The work contributes to understanding the origins and evolution of language structure and suggests future research directions, including experimental validation of processing advantages and deeper investigation of cultural-technological influences on numeral order.
The authors acknowledge that some claims are speculative and require substantial further research. Psycholinguistic experiments directly testing processing advantages of base-initial order are lacking and proposed for future work. The phylogenetic signal is not strong due to uncertainty in the world tree, limiting decisiveness of root-state inferences. Conditional inference trees did not use explicit cross-validation (permutations were considered to fulfill this). Data coverage depends on the availability and quality of grammatical descriptions; some languages show limited or ambiguous numeral data, and classification minimally requires at least one base with two multipliers. Historical interpretations (e.g., influence of numerical notation and migration events) remain hypotheses requiring additional corroboration.
Related Publications
Explore these studies to deepen your understanding of the subject.

