logo
ResearchBunny Logo
Do translation universals exist at the syntactic-semantic level? A study using semantic role labeling and textual entailment analysis of English-Chinese translations

Linguistics and Languages

Do translation universals exist at the syntactic-semantic level? A study using semantic role labeling and textual entailment analysis of English-Chinese translations

L. Wang and Y. Jiang

Discover groundbreaking insights from Letao Wang and Yue Jiang as they reveal how English-Chinese translations exhibit fascinating patterns of explicitation, simplification, and leveling-out at the syntactic-semantic level. This research dives deep into the interplay between language systems and socio-cultural influences shaping translation universals.

00:00
00:00
~3 min • Beginner • English
Introduction
The study investigates whether translation universals exist at the syntactic-semantic level and what syntactic-semantic features characterize translated texts. Building on the concepts of “the third language” (Duff, 1981) and “the third code” (Frawley, 2000), and Baker’s translation universal hypothesis, the authors argue that prior research has focused largely on lexical and grammatical features, while key universals like explicitation and simplification may be more salient at the semantic and informational levels. Two main hurdles have limited such inquiry: the lack of automated semantic analysis tools for large corpora and the difficulty of extracting semantic features across corpora without topic interference. To address these, the study employs shallow semantic methods—semantic role labeling (SRL) and recognizing textual entailment (RTE)—to compare English source texts (ES), Chinese translations (CT), and Chinese original texts (CO). Research questions: (1) Do translation universals exist at the syntactic-semantic level, and if so, what features are typical of translated texts? (2) What factors contribute to distinct features observed in translations at the syntactic-semantic level?
Literature Review
The translation universal hypothesis (Baker, 1993) has been refined into S-universals (source-to-translation) and T-universals (translation-to-target) (Chesterman, 2004), with sub-hypotheses including simplification, explicitation, normalization, levelling out, and the unique item hypothesis. Explicitation, initially cohesion-focused (Blum-Kulka, 1986) and later broadened to informational explicitation (Baker, 1996), suggests translators make implicit information explicit, implying the need to assess beyond syntax to syntactic-semantic levels. Simplification and levelling out at lexical/syntactic levels may induce semantic deviations and alter informational structure, calling for semantic-level parameters. Two NLP tasks motivate the approach: SRL, which captures predicate-argument structures and adjuncts (A0/A1/A2; ADV, MNR, DIS), integrating syntax and semantics; and RTE, which models informational inclusion via semantic/syntactic subsumption. Prior work supports WordNet-based shallow semantic similarity for entailment. The study posits that SRL and RTE, being less content/topic-sensitive than deep semantic models, are suitable for identifying semantic universals in translated texts.
Methodology
Corpora: For S-universals, ES are compared with CT using the Yiyan English-Chinese Parallel Corpus (500 English-Chinese text pairs; ~1M English words; ~1.6M Chinese characters; balanced across 4 genres). For T-universals, CT are compared with CO using the Lancaster Corpus of Mandarin Chinese (LCMC; ~1M words; balanced, non-translated Mandarin, genre-comparable to Yiyan Chinese). In total: 500 parallel ES–CT pairs and 500 comparable CT–CO pairs. Texts were manually cleaned for annotation. Tools: Chinese SRL via N-LTP (multi-task, pretrained framework); English SRL via AllenNLP’s BERT-based SRL model. The analysis focuses on six frequent roles: core arguments A0 (agent/experiencer), A1 (patient/recipient), A2 (range/other core), and adjuncts ADV (adverbial), MNR (manner), DIS (discourse marker). Procedures: 1) Perform SRL on all sentences. 2) Conduct textual entailment-based semantic-depth estimation of predicates and syntactic subsumption analysis of roles. Two modifications to standard RTE: (a) For each sentence treated as hypothesis (H), construct a text (T) by replacing the predicate with its root hypernym to ensure entailment; approximate extra information I(E) as semantic distance between predicate and its root hypernym, quantified as 1 − similarity (Wu-Palmer or Lin). Lower similarity implies greater explicitness (deeper semantic depth). (b) Use WordNet and Open Multilingual WordNet via NLTK; Lin similarity uses Brown IC (ic-brown.dat). Indices for syntactic subsumption: average number of semantic roles per verb (ANPV), per sentence (ANPS), and average role length (ARL). ANPV reflects clause-level complexity; ANPS reflects sentence-level semantic richness; ARL reflects information per role. For cross-linguistic ARL comparability, ARL was also standardized by sentence length. Statistics: Levene’s tests for variance homogeneity; due to non-normality/heteroscedasticity, Mann–Whitney U tests were applied. Distributions of similarity measures were also inspected to assess levelling out.
Key Findings
S-universals (ES vs CT): - Semantic subsumption (predicate explicitness): CT verbs are more explicit (lower similarity to root hypernyms) than ES. Means (Table 2): Wu-Palmer Similarity ES 0.66 vs CT 0.50 (Z = −27.13, p < 0.001); Lin Similarity ES 0.83 vs CT 0.78 (Z = −22.59, p < 0.001). Major cause: denominalization and reduced use of English “be” predicates in CT, increasing predicate specificity and semantic depth. - Syntactic subsumption (role configuration): ANPS approximately equal (ES 7.75 vs CT 7.75; Z = −0.45, p = 0.651), but CT has fewer roles per verb and shorter roles: ANPV ES 2.64 vs CT 2.17 (Z = −26.64, p < 0.001); ARL ES 4.45 vs CT 2.77 (Z = −24.05, p < 0.001). Indicates simplification/unpacking in CT: long/nested English roles are split into shorter roles or separate argument structures. - Specific roles (Table 3): CT shows higher agent frequency (explicitation) and more discourse markers (coherence explicitation): • ANPV: A0 ES 0.52 vs CT 0.59; A1 ES 0.92 vs CT 0.68; A2 ES 0.33 vs CT 0.07; ADV ES 0.11 vs CT 0.49; MNR ES 0.09 vs CT 0.02; DIS ES 0.05 vs CT 0.12 (all p < 0.001). • ANPS: A0 ES 1.55 vs CT 2.08; A1 ES 2.70 vs CT 2.39; A2 ES 0.98 vs CT 0.24; ADV ES 0.32 vs CT 1.76; MNR ES 0.25 vs CT 0.08; DIS ES 0.14 vs CT 0.45 (all p < 0.001). • ARL: A0 ES 2.47 vs CT 2.17; A1 ES 5.38 vs CT 4.20; A2 ES 6.20 vs CT 3.98; ADV ES 6.02 vs CT 1.17; MNR ES 4.20 vs CT 4.48; DIS ES 1.32 vs CT 1.13 (most p < 0.001). ARL differences persist after standardization by sentence length (Z = −24.79, p < 0.001). - Illustrative examples show divide translation reduces nested structures and adds logical markers (e.g., 由于 “because of”, 而 “yet”), supporting simplification and explicitation. T-universals (CT vs CO): - Semantic subsumption: Similar averages but different distributions. Means (Table 5): Wu-Palmer CO 0.49 vs CT 0.50 (Z = −4.68, p < 0.001); Lin CO 0.77 vs CT 0.78 (Z = −2.87, p = 0.004; small effect ≈ 0.092). Distributional inspection shows CT values more centralized (levelling out), while CO shows greater variability. - Syntactic subsumption: CT exhibits higher complexity/density: ANPV CO 2.09 vs CT 2.17 (Z = −9.97, p < 0.001); ANPS CO 6.92 vs CT 7.75 (Z = −6.85, p < 0.001); ARL CO 2.67 vs CT 2.77 (Z = −3.33, p = 0.001). Role-level comparisons indicate CT has significantly higher ANPV and ANPS for core arguments; DIS ANPS is notably higher in CT (effect size ≈ 0.241). Some features suggest partial source-language “shining through,” retaining complex structures in a subset of CT.
Discussion
Findings support the presence of translation universals at the syntactic-semantic level. CT differs significantly from both ES and CO, consistent with the notion of a translational variety (“third language/code”). The results show an eclectic profile: relative to ES, CT evidences explicitation (more specific verbs via denominalization; more A0 and DIS) and simplification/unpacking (fewer/shorter roles, fewer nestings). Relative to CO, CT shows greater structural density/complexity (higher ANPV/ANPS/ARL) and levelling out (more centralized semantic depth distributions). This bidirectional pattern aligns with Halverson’s Gravitational Pull Hypothesis: the magnetism effect of the target language drives denominalization and divide translation; the gravitational pull of the source induces residual complex structures (“shining through”); and connectivity effects (high-frequency translation equivalents) can preserve source-like nominalizations and predicates in CT. Additionally, the consistent increases in agents and discourse markers across S- and T-comparisons suggest socio-cultural and translator-driven factors (e.g., clarity, coherence norms) contribute to explicitation beyond purely linguistic system influences, reinforcing translation as a complex, dynamic equilibrium shaped by multiple forces.
Conclusion
Using SRL and textual entailment, the study demonstrates that translation universals manifest at the syntactic-semantic level. CT differs markedly from ES and CO, with substantial evidence for explicitation, simplification, and levelling out. Overall syntactic-semantic features of CT are eclectic: CT simplifies relative to ES yet is denser and more complex relative to CO, indicating distinct S- and T-oriented effects arising from different forces (source vs target language influences). Consistent explicitation in specific roles (A0, DIS) across both comparisons points to additional socio-cultural/translator factors. The study extends translation-universal research beyond lexical and grammatical levels, highlighting translation as a complex system shaped by interacting forces. Future work should test other language pairs, investigate role interactions at sentence-level dynamics, expand feature sets (e.g., nestification indices, contextual semantics), and improve semantic analysis tools/models.
Limitations
Findings are limited to one language pair (English–Chinese) and may not generalize to others, especially those with similar information structures. The study focuses on corpus-level indices, with limited analysis of sentence-level interactions among roles. Additional features (e.g., contextual semantic indices, explicit measures of nestification) could refine analyses. The work relies on existing SRL/RTE tools without model refinement; enhancing and adapting these tools could improve semantic labeling and analysis fidelity.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny