logo
ResearchBunny Logo
On the Chinese resistance to lexical borrowing: a writing-driven self-purification system

Linguistics and Languages

On the Chinese resistance to lexical borrowing: a writing-driven self-purification system

L. Zhang

This paper by Liulin Zhang dives into the intriguing resistance of the Chinese language to lexical borrowing, exploring how its unique ideographic writing system shapes language ideology. Discover the complex relationship between writing and speech in this thought-provoking research that challenges conventional beliefs.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses why Chinese is notably resistant to lexical borrowing. Data from the World Loanword Database show Mandarin Chinese with only 25 probable/clear loanwords among 1460 basic meanings (about 1.2%), lower than any other surveyed language. Extending beyond basic vocabulary, analysis of the 8821-word HSK list finds only 0.75% strict loanwords. The author hypothesizes that the morphosyllabic, ideographic writing system—where each character represents a morpheme with meaning—impedes the survival of transliterations when the meanings of pre-existing characters interfere with borrowed meanings. Prior work often treats orthography as a site of ideological struggle (e.g., etymological vs phonemic approaches), presuming writing’s secondary role to speech. This study instead probes writing’s causal role in shaping language ideology and loanword assimilation in Chinese, outlining: a historical review, a quantitative study of retention of loanwords over the last century, discussion of ideographic writing’s link to linguistic purism, and broader implications for writing and language ideology.
Literature Review
History of lexical borrowing in Chinese is organized into three major waves: (1) Buddhist transmission (post-Han dynasty), (2) the New Culture Movement (1910s) bringing Japanese and European terms, and (3) post-1978 Reform and Opening-up with English influence. In early periods, new characters were sometimes created for foreign concepts (e.g., 僧 ‘priest’ for Sanskrit Samgha), but more often translators used pre-existing characters either for transliteration (e.g., 夜叉 for yakṣa, 般若 for prajñā) or for meaning-based strategies (calques, meaning extensions, compromises). Multiple renditions often co-existed for a source word, with competition resolving over time (e.g., 智慧 replacing 般若). In the 19th–early 20th century, Sino-Japanese formations (kanji compounds) modernized terminology (e.g., 民主, 经济), many later reborrowed into Chinese but read with Chinese pronunciations. Since 1978, diverse borrowing strategies have continued, including transliterations, calques, loanblends, meaning-pronunciation compromises, and graphic loans; explicit purist stances emerged against alphabetic graphic loans (e.g., official discouragement/bans of English abbreviations like GDP, WTO, NBA). Regional variation: Hong Kong and Taiwan exhibit more transliterations (often Cantonese-based) and different standards for proper names. Selective adoption is central: when multiple variants compete, transliterations with pre-existing characters are usually disfavored if analyzable native-morpheme options or newly created characters exist. Quantitatively (Zhang, 2017; HSK list), borrowing types include: newly-created character transliterations (0.17%), graphic loans from kanji (4.48%), pre-existing character transliterations (0.26%), meaning-pronunciation compromises (0.05%), loanblends (0.29%), calques/loan-based creations (4.50%), and meaning extensions (1.46%), totaling 11.21% of items—though by strict analyzability criteria, only the small subset of unanalyzable forms counts as loanwords, explaining the unusually low loanword ratio in Chinese.
Methodology
The study tests the hypothesis that transliterations have lower retention than analyzable items. Data source: Lu Xun’s essay collection Fen ‘Tomb’ (《坟》), published 1907–1925, approximately 180,000 characters. Extraction: 766 words denoting borrowed concepts (373 proper nouns) were identified from the text; multiple distinct written forms for the same concept were counted as different words. Classification: 314 transliterations (304 proper nouns), 405 analyzable items (morpheme-by-morpheme calques, loan-based creations, or meaning extensions; 30 proper nouns), and 47 loanblends (partially analyzable; 39 proper nouns). Many analyzable forms derive from Sino-Japanese morpheme compounds. Retention assessment: Each word was compared to the most commonly used form for the same concept in the Balanced Modern Mandarin Corpus (over 100 million characters; 1919–present). Ninety-one words (mostly uncommon proper nouns) did not occur in the corpus and were excluded. For the remaining 675 words, counts by type were: transliterations (unanalyzable) 242, loanblends 41, fully analyzable items 392. Retention vs replacement was computed for each type, and replacement targets were categorized (by other transliterations, loanblends, or fully analyzable items). Examples of replacements were documented (e.g., Cantonese/Shanghainese-influenced transliterations replaced by standardized Mandarin-based forms or analyzable forms).
Key Findings
- Overall retention: Of 675 comparable items, 412 (61.04%) were retained; 263 (38.96%) were replaced. - By type (Table 3): - Transliterations (unanalyzable; n=242): Retained 32.64%; Replaced 67.36%. Among replacements: by other transliterations 63.22%; by loanblends 1.24%; by fully analyzable items 2.89%. Examples: variants for ‘Galilei’, ‘Faust’, ‘Warsaw’ replaced by other transliterations; items like 之不拉 ‘zebra’ replaced by 斑马; 费厄泼赖 ‘fair play’ replaced by 公平竞争. - Loanblends (partially analyzable; n=41): Retained 41.46%; Replaced 58.54% (by other loanblends 56.10%; by fully analyzable items 2.44%; by transliterations 0%). - Fully analyzable items (n=392): Retained 80.61%; Replaced 19.39% (all by other fully analyzable items; none by transliterations or loanblends). - Proper nouns were predominantly treated via transliteration in the early data; non-proper nouns favored analyzable (native-morpheme) strategies. - Intra-author variation (Lu Xun): Over time, forms shifted from Cantonese/Shanghainese-influenced transliterations to standardized or analyzable forms (e.g., ‘America’ from 亚美利加 to 美国; multiple variants for ‘Ibsen’ with later stabilization). - Supporting background statistics (HSK list; Zhang 2017): strict loanword classes are rare—pre-existing character transliterations 0.26%, meaning-pronunciation compromises 0.05%, newly created character transliterations 0.17%, graphic loans from alphabet letters 0%; larger shares are analyzable items: calques/loan-based creations 4.50%, graphic loans from kanji 4.48%, meaning extensions 1.46%, loanblends 0.29% (total 11.21% across types).
Discussion
Findings support that Chinese writing exerts a conservative, purist effect on lexical borrowing. Because Chinese characters encode stable graphic-semantic units with many homophones and loose graphic-phonetic linkage, transliterations suffer two barriers to community-wide conventionalization: (i) arbitrariness in character choice due to extensive homophony (leading to multiple competing transliteration forms for the same source word), and (ii) dialectal variation, as the same characters are pronounced differently across Sinitic varieties. Early 20th-century contact centers (Shanghai, Hong Kong) funneled foreign terms through local pronunciations, producing regionally divergent transliterations (e.g., 沙发 vs 梳化 for ‘sofa’), which later competed when a Beijing-based standard (Modern Mandarin) became dominant. Government-led standardization in the mainland further favored Mandarin-based forms and discouraged alphabetic graphic loans. In contrast, analyzable items (calques/loan-based creations/meaning extensions) leverage stable graphic-semantic associations and thus travel robustly across dialect zones, yielding significantly higher retention. This writing-driven selection amounts to a self-purification mechanism implicitly resisting direct phonetic borrowing. Parallels in the Sinosphere indicate similar dynamics where Chinese characters historically functioned: ideographic writing stabilized semantic units and discouraged ateji-like transliterations (as in Japanese, later shifted to katakana for phonetic representation and kanji for semantic compounds).
Conclusion
Chinese has borrowed lexical items across millennia, yet direct transliterations using pre-existing characters are typically filtered out when analyzable native-morpheme alternatives exist. Quantitative evidence from a century-scale corpus comparison shows markedly lower retention for transliterations than for analyzable items. The ideographic nature of Chinese characters—stable graphic-meaning links and weak, variable graphic-phonetic ties—predisposes an implicit linguistic purism shaping lexical assimilation, observable across the Sinosphere. This challenges the conventional assumption that writing is secondary to speech: in ideographic systems, writing can actively steer language ideology and lexical outcomes. Contrasting with alphabetic ideologies (where writing aims to accurately represent speech and identity centers on spoken language), ideographic systems stabilize semantic units and require novel uses to align with established graphic-semantic correspondences. Hence, ideographic writing inherently fosters a writing-driven self-purification in loanword assimilation.
Limitations
The study focuses on Chinese and ideographic writing, calling for broader comparative work across writing types (syllabaries, abjads, abugidas, alphabets). It remains to be seen whether implicit purism linked to ideographic writing persists in places that abandoned Chinese characters (e.g., Korea, Vietnam). Other cultural, social, and political factors clearly interact with writing (e.g., Hong Kong and Taiwan retain more transliterations than the mainland). The analysis concentrates on implicit purism inferred from usage; explicit policy positions and public attitudes were not systematically examined. Finally, while writing plays a central role, it does not preclude other factors affecting loanword assimilation, and interactions between implicit and explicit ideologies warrant further study.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny