Linguistics and Languages
Dissecting The Analects: an NLP-based exploration of semantic similarities and differences across English translations
L. Yang and G. Zhou
The Analects, compiled by Confucius’ disciples during the Warring States Period, comprise 20 books of concise passages whose ethical teachings have profoundly influenced Chinese history. Because the text is in Classical Chinese, non-Chinese readers rely on translations, whose reliability is a basic expectation. More than 110 English translations (1691–2022) have expanded interpretive possibilities but also introduced distortion and incomprehensibility in places (e.g., rendering 天 as “Heaven” with Judeo‑Christian connotations). Linguistic and cultural distance complicates translation and comprehension, and prior comparative studies attribute differences to translators’ backgrounds, purposes, and strategies, often with subjective interpretations. Corpus-based quantitative work exists but tends to emphasize theory over practical reader guidance. With advances in NLP, objective analysis of textual features in digital humanities becomes feasible. This study asks how semantically similar different high-acceptability English translations of The Analects are at the sentence level, and which linguistic elements (e.g., core concepts, personal names) drive similarities or differences. It aims to guide readers and inform translators through empirical, NLP-based assessment.
Research on retranslation suggests translators critique and improve upon predecessors to meet target culture needs. Prior scholarship on Analects translations explains differences via translators’ life experiences, academic backgrounds, Sinology expertise, bilingual proficiency, purposes, and strategies, but often relies on subjective interpretation. Some corpus-based studies quantify macro-linguistic features across translations, offering partial objectivity but limited practical guidance. NLP has been applied across tasks including text generation, data mining, phonetics, sentiment analysis, and semantic similarity. Integrating NLP with translation studies of The Analects promises more empirical, unbiased insights. This study builds on such work by using sentence-level semantic similarity metrics to compare five widely read English translations.
Sample selection: Building on prior acceptability metrics (reviews/downloads/readership via major platforms), five high-acceptability English translations were selected: D. C. Lau, James Legge, William Jennings, Edward Slingerland, and Burton Watson.
Corpus building: High-resolution PDFs were obtained; preprocessing removed special symbols and lowercased text. Cleaned translations were sentence-aligned into a parallel corpus. The original text was segmented into 503 natural sections and further subdivided using punctuation and a line-based principle to preserve complete meaning. Each translated sentence was aligned with its source segment. Where a translation omitted a sentence, the placeholder "None" was used to maintain alignment. The corpus totals 136,171 English words and 890 lines per translation (890 aligned positions).
Semantic similarity modeling: Three models—Word2Vec, GloVe, and BERT—were used to compute sentence-level semantic similarity across all translator pairs. Word2Vec and GloVe provide static embeddings; BERT provides contextual (dynamic) embeddings. Python 3.6 implementation used re, pandas, streamlit, numpy, and model libraries. Translations were organized in Excel sheets per translator; a UI facilitated operation. Pairwise comparisons among the five translations yield 10 pairings; with 890 aligned sentences, each algorithm produces 8,900 pairwise similarity scores (total 26,700 across three algorithms). For analysis and ranking, averages of the three algorithms’ scores were used. Code and data are available on figshare (Attachments referenced in the article).
- Algorithm agreement: Despite differences in absolute values, Word2Vec, GloVe, and BERT displayed consistent trends across sentence pairs (e.g., sentences low by one model tend to be low by others), supporting robustness of overall judgments (Fig. 1).
- Volume of comparisons: 890 sentences per translation; 10 translator pairs; 8,900 similarity scores per algorithm; 26,700 total across three algorithms.
- Overall distribution (Table 2): The majority of sentence-pair similarities lie between 80–90% (5,507 pairs). Pairs above 80% total 6,927 (≈78%), indicating that, broadly, translations capture similar semantics. Pairs below 80% total 1,973 (≈22%), representing semantically divergent cases of special interest.
- Abnormal results: 33 “abnormal” outcomes arose where one side of a pair was "None" (untranslated). Such pairs received very low scores from Word2Vec/BERT and no score from GloVe. These were excluded from subsequent analyses.
- High-similarity examples (Table 4): Several sentence pairs between Slingerland and Watson achieved 100% similarity across all three models (e.g., NO. 461, 590, 616), typically with identical wording. Many other high-similarity pairs differed only slightly in word choice or phrasing.
- Low-similarity subset: 1,940 sentence pairs (21.8%) exhibited ≤80% similarity (Table 5), highlighting where translations diverge in lexis and structure, often tied to rendering of core conceptual terms and personal names.
- Translator-level patterns (Table 6): Jennings had fewer sentences in the highest similarity intervals (e.g., 95–100%: 1%; 90–95%: 14%), reflecting structural rearrangements (inversions/combining) to improve readability and reduce repetition (e.g., “The Master said”). Slingerland and Watson had higher shares in top intervals (e.g., Slingerland 95–100%: 30%, 90–95%: 24%; Watson 95–100%: 34%), consistent with close alignment to prior translations and, for Slingerland, extensive paratext.
- Lexical drivers of divergence: High-frequency analyses (Tables 7–8) show core concepts (e.g., 君子 Jun Zi, 小人 Xiao Ren, 仁 Ren, 道 Dao, 礼 Li) and personal names (e.g., Zi/Tsz/Tzu variants, Lu, Yu, Kung) dominate low-similarity contexts. Core conceptual terms map to multiple English choices across translators (Table 9), and ancient Chinese multi-name conventions (formal/style/nickname) lead to inconsistent rendering across translations (Table 10), both materially affecting semantic similarity.
The NLP-based similarity profiles demonstrate that while the five translations are predominantly semantically congruent (>80% in ~78% of pairs), meaningful divergences cluster where translators must render core conceptual terms and personal names. The consistent trend across Word2Vec, GloVe, and BERT strengthens confidence in comparative judgments.
- Similarities: High similarity reflects a shared capture of the Analects’ main semantic content. Slingerland and Watson show many very-high-similarity pairs, aligning with their reliance on prior translations (Watson cites 11) and, for Slingerland, a thick-translation approach with extensive paratext aiding clarity without altering core semantics.
- Differences: Jennings’ lower presence in the top similarity bands stems from structural strategies (inversion/combining) to mitigate repetitious formulae (e.g., “The Master said”), an innovative 1895 choice aimed at readability that slightly reduces sentence-aligned similarity but does not undermine overall semantic fidelity. Divergences often rest on lexical choices for core concepts (e.g., Jun Zi as “gentleman,” “superior man,” “noble-minded man”) and treatment of personal names within the complex Chinese naming system. Translators who preserve original naming variations (Legge, Lau, Watson) retain cultural specificity but may demand more reader background; those who standardize/simplify (Jennings, Slingerland) improve readability at some cost to nuance. Implications: For readers, understanding core concepts and Chinese naming conventions enhances comprehension; consulting multiple translations and paratext is recommended. For translators, compensatory strategies (consistent renderings plus paratextual explanations) can balance fidelity, clarity, and cultural nuance.
This study constructed a sentence-aligned parallel corpus of five high-acceptability English translations of The Analects and quantified cross-translation semantic similarity using Word2Vec, GloVe, and BERT. Results show strong overall semantic agreement, with most sentence pairs exceeding 80% similarity, and reveal that core conceptual words and personal names are the principal drivers of cross-translation divergence. Translator-specific strategies explain similarity patterns: Jennings’ structural reorganizations reduce alignment similarity while prioritizing readability; Slingerland and Watson yield more highly similar sentences, aided by paratext and reference to prior translations. The study offers practical guidance for readers (master key concepts and naming conventions; consult multiple translations) and recommendations for translators (use compensatory strategies with paratext to convey core concepts and names faithfully yet accessibly). The NLP similarity model and workflow are transferable to other translated texts for comparative analysis and quality assessment.
The analysis focused on three main factors—macro-structure, core vocabulary (core conceptual terms), and personal names—while other influences (e.g., syntactic variation beyond structure, discourse-level cohesion, translator ideology, commentary framing) were not systematically modeled. Future work should incorporate additional perspectives, broaden text coverage, and deepen multi-factor analyses to enhance support for readers and translators.
Related Publications
Explore these studies to deepen your understanding of the subject.

