Linguistics and Languages
Ancestral Dravidian languages in Indus Civilization: ultraconserved Dravidian tooth-word reveals deep linguistic ancestry and supports genetics
B. A. Mukhopadhyay
The paper addresses the long-standing question of which languages were spoken in the Indus Valley Civilization (IVC), a vast Chalcolithic civilization spanning Pakistan, Afghanistan, and northwest India. Given the undeciphered Indus script and the region’s historical multilingualism, prior proposals have ranged from Proto-Indo-Aryan to Proto-Dravidian and Austroasiatic, as well as an unidentified ‘Language X’. Dravidian languages, though concentrated today in southern India, have historical presence in the northwest (e.g., Brahui) and elsewhere, suggesting a broader prehistoric distribution. Quantitative linguistic studies date Proto-Dravidian to the IVC time frame. Archaeogenetic studies (e.g., Narasimhan et al., 2019; Shinde et al., 2019) reconstruct IVC-related ancestries contributing to present South Asians and entertain the hypothesis that Proto-Dravidian may have spread with IVC-derived ancestries. However, genetics does not directly identify language. The paper proposes to establish the existence of ancestral Dravidian in IVC by identifying proto-words whose origins most plausibly lie in IVC, whose referents were prevalent there, whose etymologies trace to an Indian language family by strict historical-linguistic criteria, and which belong to the ultraconserved, non-borrowable core lexicon (notably ‘tooth’). The study focuses on the Near Eastern elephant/ivory terms ‘pīru/pīri’ and Old Persian ‘pîruš’, hypothesizing they derive from Indic ‘pīlu’ (elephant), itself related to the Proto-Dravidian tooth-word *pal with alternate forms *pīl/*pil/*pel. By correlating archaeological evidence for IVC ivory exports, linguistic correspondences in Dravidian (elephant and tooth terms), and the ancient phytonym ‘pīlu’ for Salvadora persica (toothbrush tree) endemic to the Indus basin, the paper argues that a significant IVC population used a Proto-Dravidian tooth-word in everyday speech, implying ancestral Dravidian presence in IVC. The study does not exclude other language groups in the likely multilingual IVC.
The work reviews diverse hypotheses on IVC languages. Witzel initially favored a Para-Munda substrate, later keeping the question open pending improved reconstructions and substrate studies; Southworth supported Para-Munda while also documenting prehistoric Dravidian influence across western India; other scholars oppose Austroasiatic for IVC and either propose Proto-Indo-Aryan/Early Indo-European or defend Proto-Dravidian (e.g., Parpola). Parpola suggested Dravidian etymologies for Vedic substrate words, posited Dravidian readings of Indus signs, and noted Dravidian social institutions in Indo-Aryan speaking regions. Masica’s ‘Language X’ proposal highlights unknown substrate elements in Indo-Aryan agriculture but later judged Dravidian a strong contender for Harappan languages. The review also considers archaeogenetic findings: Narasimhan et al. (2019) infer an IVC-related ancestry cline contributing to both Ancestral North Indian (ANI) and Ancestral South Indian (ASI) components and suggest Proto-Dravidian could have spread either with IVC-related ancestry southward or independently from peninsular South Asia; Shinde et al. (2019) report IVC-era genomes with Iranian farmer–related ancestry but little if any Steppe ancestry. Despite these advances, the lack of deciphered Harappan texts and the non-linear relation between genes and languages leave the linguistic identity unresolved, motivating the present approach via ultraconserved lexemes linked to IVC material culture.
An interdisciplinary approach synthesizing archaeological, historical, linguistic, and genetic evidence is employed. The method targets proto-words that: (i) likely originated in IVC; (ii) denote objects widely used/produced there; (iii) can be rigorously traced etymologically to a South Asian language family; (iv) whose language family is attested in the subcontinent during the IVC era; (v) whose present speakers are genetically linked to IVC populations; and (vi) belong to the stable, non-borrowable core vocabulary (e.g., ‘tooth’). The study searches Near Eastern texts for loanwords plausibly from IVC commodities (e.g., ivory) and correlates them with Indic forms. It analyzes morphophonemics (e.g., l/r alternation in Iranian languages), Dravidian derivational patterns (enunciative -u, consonant doubling), and semantic fields (tooth/tusk, split/crush) to connect ‘pīlu’ (elephant) to the Proto-Dravidian tooth-word *pal with alternate forms *pīl/*pil/*pel. Archaeological and archaeobotanical records (ivory use in IVC; Salvadora spp. in Indus basin) and historical/toponymic attestations (e.g., ‘Pilusāra’, ‘pilu’ phytonym) are integrated. No experimental or statistical analyses were conducted beyond critical synthesis.
• Near Eastern elephant/ivory terms and chronology: Akkadian ‘pīru’ (elephant) attested since the Old Babylonian period (ca. 2000–1600 BC) and used in later Assyrian and Babylonian sources; a Hurrian form occurs in an Amarna letter (ca. 1400 BC); Old Persian uses ‘pîruš’ for ivory at Susa (6th century BC), with Elamite ‘pi-hi-ra-um’; certain Seleucid cuneiform texts (ca. 300 BC) show ‘pilu’. Middle Persian and Parthian use ‘pil’ for elephant. • Source and transmission: Archaeological and textual evidence identifies IVC as the main Bronze Age source of elephant ivory for the Near East: earliest worked tusk at Mehrgarh (ca. 5500 BC); ivory artifacts at Mundigak (ca. 3000 BC); abundant ivory objects at Harappa, Mohenjo-daro, Chanhu-daro, Lothal, Surkotda; Ur III texts record ‘ivory birds of Meluhha’; extensive ivory trade via Magan and Dilmun, which themselves imported from IVC; after IVC trade disruption, Mesopotamian ivory evidence becomes sparse. ‘Syrian elephants’ are shown to be Asian (Elephas maximus) and likely non-indigenous, appearing later (ca. 1700–700 BC). • Linguistic correspondences: The Indic ‘pīlu’ (elephant) is attested in Pali and various Dravidian languages (Tamil, Telugu, Kannada) alongside forms like ‘palla’, ‘pallava’, ‘piļļuvam’, ‘pīluru’, and the widespread Dravidian female-elephant ‘pidi’ (Proto-Dravidian *pid-i). Dravidian ‚l~r correspondences and the tendency for Iranian languages to render /l/ as /r/ explain Akkadian/Old Persian ‘pīru’ from Indic ‘pīlu’. The Dravidian tooth-word is reconstructed as *pal and widely preserved (pal/pallu/palu/pel etc.) across North, Central, and South Dravidian. Dravidian verbal roots ‘pil-/piḷ-’ meaning split/tear/crush support a semantic pathway to ‘tooth/tusk’ and derivational formation of ‘pīlu/pillu’ (elephant). • Phytonym evidence: ‘Pīlu’ is an ancient and common Indic name for Salvadora persica (toothbrush tree) and Salvadora oleoides, with Atharvaveda mention and sustained usage across Indo-Aryan and regional languages in the Indus region. Mahābhārata associates ‘pilu’ forests with the Indus basin; archaeobotanical records show Salvadora spp. as characteristic Indus flora frequently exploited by IVC populations. Travelogues (e.g., Sung-Yun) and toponyms (e.g., ‘Pilusāra’) further corroborate antiquity and regionality. • Core lexicon argument: ‘Tooth’ is an ultraconserved, low-borrowability item appearing in Swadesh, Leipzig-Jakarta, ASJP, and Dolgopolsky lists; Pagel et al. (2013) classify Proto-Dravidian *pal among ultraconserved forms. Widespread IVC usage of ‘pilu’ derivatives for elephants and a toothbrush tree implies active use of a Proto-Dravidian tooth-root within the IVC speech community, indicating ancestral Dravidian languages in IVC. • Corroborative genetics and ethnolinguistic links: Brahui genetics indicate an ancient Dravidian substrate shared with Pakistani populations, inconsistent with a recent south-to-north migration; this, along with timing of IVC-related ancestry flows, supports the presence of Dravidian languages in IVC and probable north-to-south dispersal.
By tracing Near Eastern ‘pīru/pîruš’ to Indic ‘pīlu’ and linking ‘pīlu’ etymologically to the Proto-Dravidian tooth-word (*pal/*pīl/*pil/*pel), the study connects IVC commodities (ivory), fauna (elephant), flora (Salvadora persica), and a core body-part term (‘tooth’) across archaeology, linguistics, and historical sources. Because ‘tooth’ belongs to ultraconserved, minimally borrowed vocabulary, its presence as the base for multiple culturally salient IVC terms implies that a substantial IVC population used Proto-Dravidian lexemes in everyday speech. The phonological pathway from l to r in Iranian intermediaries explains Near Eastern forms, while archaeological trade routes from IVC to Magan and Dilmun clarify lexical diffusion. The findings align with archaeogenetic models where IVC-related ancestry contributes to ASI and supports a scenario in which Proto-Dravidian languages were present in IVC and likely spread southward post-IVC. The paper explicitly refrains from excluding other language groups in the multilingual IVC and acknowledges that genetics and language shifts are not strictly coupled. Nevertheless, the convergence of ultraconserved lexicon evidence with material culture, botanicals, toponyms, and regional textual attestations provides a robust argument for ancestral Dravidian languages in IVC.
The paper argues that ‘pīlu’-based forms used for ivory, elephant, and the toothbrush tree in and around IVC ultimately derive from the Proto-Dravidian tooth-word (*pal/*pīl), demonstrating the presence of ancestral Dravidian languages in IVC. It offers new etymologies resolving inconsistencies in epigraphy (e.g., ‘pilupati/mahāpīlupati’) and clarifies the Indian origin of Persian/Old Persian ‘pil/pîruš’. The results strengthen archaeogenetic inferences of north-to-south movements of IVC-related groups and suggest Proto-Dravidian likely migrated from the Indus region to South India after IVC’s decline. Future research could expand the corpus of fossilized Near Eastern and South Asian loanwords tied to IVC commodities, further integrate quantitative phylogenetics of Dravidian with archaeological timelines, and explore additional ultraconserved lexemes that might corroborate multilingual dynamics within IVC.
The study relies on indirect evidence due to the undeciphered Indus script and conducts no experimental or statistical analyses. Genetic ancestries and languages are not linearly correlated, limiting genetic data’s power to determine language identity. The argument centers on etymological, morphophonemic, archaeological, and archaeobotanical correlations and acknowledges alternative dispersal hypotheses for Proto-Dravidian. The author explicitly refrains from excluding the presence of other language groups in IVC.
Related Publications
Explore these studies to deepen your understanding of the subject.

