logo
ResearchBunny Logo
Performance and perception: machine translation post-editing in Chinese-English news translation by novice translators

Linguistics and Languages

Performance and perception: machine translation post-editing in Chinese-English news translation by novice translators

Y. Yang, R. Liu, et al.

Discover how machine translation can support novice translators in news translation, as explored by Yanxia Yang, Runze Liu, Xingmin Qian, and Jiayue Ni. This study reveals the strengths and weaknesses of MT, particularly in handling cultural nuances and structural coherence, while highlighting its appeal and potential in translator training programs.... show more
Introduction

The study addresses whether machine translation (MT), specifically post-editing (MTPE), effectively supports novice translators in Chinese-to-English news translation. In the context of globalization and the growing need for rapid, culturally sensitive news dissemination, translators must manage linguistic, cultural, and stylistic complexities while meeting speed demands. Although MT has improved and aids cross-cultural communication, its suitability for news translation by learners remains unclear. The study aims to compare performance and perceptions between manual human translation (HT) and MTPE among translation learners, to inform translator training curricula and optimize the use of MT in news contexts.

Literature Review

The paper reviews two relevant strands. 1) Peculiarities of news translation: News translation is considered a process of rewriting involving linguistic transformations influenced by organizational, agent, and textual factors. Research includes product-oriented comparisons of source and target news across contexts and process-oriented studies of translation behavior (e.g., eye-tracking). Translating news is challenging at lexical, syntactic, and textual levels, and comprehensive analyses often require triangulated methodologies. 2) Machine translation post-editing: MT has progressed from rule-based and statistical methods to neural systems, yet raw outputs still contain errors (e.g., missing words, word order, lexical, syntactic, punctuation; acceptability vs adequacy issues). Post-editing, ranging from light to full, aims to raise MT outputs to acceptable or professional quality. Prior studies indicate MT affects translators' cognitive processes and that experience shapes attitudes toward MTPE. Productivity gains from MTPE vary with system, text type, and experience; some professionals prefer HT for creative tasks, while MTPE can enhance productivity in technical texts and potentially in literary contexts. MT has been tentatively adopted in news translation, where MTPE tasks include terminology correction, ambiguity avoidance, and cultural/ideological adjustments; however, news complexity and MT limitations challenge learners, underscoring the need to study novice performance and perceptions in MTPE for news.

Methodology

Design: Mixed-methods study combining quantitative analyses of performance (quality scores, error identification/correction rates, workload metrics) with qualitative analysis of post-task questionnaires on perceptions. Research questions: (1) How well does Google Translate perform in Chinese–English news translation from linguistic and cultural perspectives? (2) How do novice translators perform in MTPE versus manual translation? (3) How do novices perceive Google Translate use in Chinese–English news translation? Participants: 24 third-year Chinese L1 translation learners (EFL), with about 6 months of translation training; English proficiency: 50% CET-6, 37% CET-4, remainder CATTI Level-3. Computer literacy mostly moderate; no formal MTPE training (received MTPE instructions before testing). Materials: Two Chinese excerpts from the Report on China's Policies and Actions for Addressing Climate Change (2019), treated as news-type material. Text 1 assigned to HT; Text 2 assigned to MTPE (Google Translate raw output). Text complexity comparability assessed using CTAP (lexical and syntactic features). Key comparability metrics included: number of characters (265 vs 271), sentences (7 vs 6), avg sentence length (37 vs 45), TTR (0.49 vs 0.50), verb lexical variation (10.77 vs 10.73), characters occurring once (53 vs 67), mean length of prepositional phrases (10.5 vs 13.0), mean non-phrase per simple clause (4.9 vs 1.28), mean verb phrase per simple clause (2.7 vs 2.14), and mean simple clauses per sentence (1.43 vs 2.33). Procedure: Participants first translated Text 1 manually, then post-edited Google Translate output of Text 2. Online resources permitted; screen recordings captured process and to prevent plagiarism. Students followed explicit MTPE guidelines to identify and correct MT errors. Immediately after tasks, a post-test questionnaire collected self-assessments of quality and NASA-TLX workload. Measures and analysis: Translation quality (acceptability and adequacy) assessed by two trained raters following Daems et al. (2013). Inter-rater reliability via Pearson correlation. Descriptive statistics for HT and MTPE mean scores; Wilcoxon signed-rank tests (due to small sample) to test differences. Error identification and correction rates by error type documented for MTPE. Perception data included self-assessed quality for MT, HT, MTPE; perceived difficulties in HT; attention distribution during MTPE; NASA-TLX workload (time, physical, mental demands, frustration). SPSS 17.0 used for analysis. Screen recordings provided approximate task duration data.

Key Findings

MT output analysis: Predominant errors were lexical and syntactic. Lexical mistranslations were common (e.g., 基本国策 rendered as “basic national policy” instead of “basic state policy”; literal, context-insensitive rendering of 问题), and structural issues involved difficulty with conjunctions and logical relations in complex sentences. Tense/modality transfer was often inappropriate due to Chinese lacking grammaticalized tense/modality (e.g., present tense used where present perfect better conveyed intended meaning). Translation quality: Inter-rater reliability (Pearson r) was 0.66 for HT (Text 1) and 0.40 for MTPE (Text 2), indicating acceptable but limited agreement. Mean quality scores: HT M = 84.20 (SD = 4.66); MTPE M = 84.58 (SD = 2.39). No significant difference detected between HT and MTPE quality, with MTPE slightly higher on average. MTPE error handling by novices: Identification rates: lexical 61.29%, syntactic 38.71%, grammatical 19.35%, style 12.90%. Correction rates: syntactic 29.03%, lexical 19.35%, grammatical 16.13%, style 6.45%. Overall, identification and correction abilities were limited, especially for style and grammar. Perceptions and preferences: 70.83% rated MT quality as average (3/5), 29.17% as 4/5; overall acceptable. Self-assessed MTPE performance: 20.83% rated 2/5, 54.17% 3/5, 25% 4/5. Self-assessed HT performance: 25% 2/5, 66.67% 3/5, 8.33% 4/5. Preference: 96% preferred MTPE over HT. Reported HT difficulties: lexical 33%, semantic expression 29%, terminology 17%, background information 17%, structural 5%. MTPE attention distribution: structural cohesion 92%, semantic expression 84%, grammatical and lexical issues 68%, punctuation 36%. Workload and time: NASA-TLX means (HT vs MTPE): time demand 4.12 (SD 0.64) vs 3.71 (0.81); physical demand 3.04 (0.95) vs 2.71 (0.91); mental demand 4.13 (0.61) vs 3.83 (0.92); frustration 3.33 (0.76) vs lower in MTPE. Wilcoxon tests: significant differences for time demand (Z = -2.65, p < 0.05) and physical demand (Z = -2.53, p = 0.01); marginal for mental demand (Z = -1.94, p = 0.05) and frustration (Z = -1.89, p = 0.06). Screen recordings indicated approximate task durations of 30 minutes for HT and 20 minutes for MTPE. Overall, MTPE reduced processing time and workload without compromising quality relative to HT, and was the preferred method for learners.

Discussion

The study’s findings address the research questions by showing that Google Translate struggles with culturally and semantically nuanced phrases, complex sentence logic, and tense/modality realization in Chinese-to-English news contexts. Despite these limitations, novice translators’ post-edited outputs were of comparable quality to manual translations, with slightly higher mean scores for MTPE but no significant difference. Learners reported difficulty with lexical/semantic content and terminology in HT, while MTPE appeared to ease lexical comprehension, allowing attention to shift toward structural cohesion and semantic refinements. MTPE significantly reduced perceived time and physical demands and tended to reduce mental demand and frustration, reflecting efficiency gains when starting from MT output. However, novices exhibited low rates of error identification and correction, especially for style and grammar, indicating a need for targeted MTPE training (error detection strategies, stylistic coherence, and tense/modality handling). These results suggest MTPE can be a viable workflow for novice translators in news translation settings to increase efficiency without degrading quality, provided adequate training mitigates persistent MT shortcomings and learner weaknesses.

Conclusion

The study contributes evidence that, for Chinese–English news translation by novice translators, MT post-editing can reduce processing time and workload and yield translation quality comparable to manual translation. Google Translate’s output exhibits notable lexical, structural, and tense/modality issues, and learners’ MTPE skills in error identification/correction remain limited, particularly for style and grammar. Learners generally accept MT quality and prefer MTPE over HT. Implications include integrating MTPE modules into translator training to improve error detection, editing strategies, critical thinking, and handling of cultural and stylistic nuances. Future research should replicate with larger and more diverse samples, include broader and more representative news genres beyond government report excerpts, compare multiple MT systems, and examine longitudinal training effects on MTPE competence.

Limitations

Primary limitations include: small sample size and a single-institution cohort, limiting generalizability; use of government report excerpts as news-type materials, which may not capture the full range of news genres and features; results may be sensitive to the specific excerpts chosen; reliance on a single MT system (Google Translate), so findings cannot be extrapolated to other MT engines; only short-term performance was measured, with no longitudinal assessment of MTPE skill development; inter-rater reliability was only acceptable (r = 0.66 for HT; r = 0.40 for MTPE), potentially affecting precision of quality comparisons.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny