logo
ResearchBunny Logo
What are the differences? A comparative study of generative artificial intelligence translation and human translation of scientific texts

Linguistics and Languages

What are the differences? A comparative study of generative artificial intelligence translation and human translation of scientific texts

L. Fu and L. Liu

This fascinating study by Linling Fu and Lei Liu delves into the linguistic distinctions between generative artificial intelligence and human translation of scientific texts. It reveals how GenAIT and human translators bring unique strengths to the table, opening new avenues for optimization in translation training and technology development.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper examines how generative AI translation (GenAIT), represented by ChatGPT 3.5, compares with human translation (HT) when rendering English scientific texts into Chinese. Prior neural machine translation (NMT) research shows strengths in accuracy and fluency but also known shortcomings, and little work has analyzed GenAIT’s linguistic features against HT, particularly for scientific texts and for Chinese. The study assumes GenAIT and HT exhibit distinct lexical and syntactic characteristics and that integrating their complementary strengths can improve translation quality. It focuses on scientific texts because of their role in disseminating technical knowledge and poses two research questions: (1) What lexical and syntactic features do HT and GenAIT respectively present in translated scientific texts? (2) Based on these features, how can effective interaction between GenAIT and HT be sustained?
Literature Review
Two strands of prior work compare MT and HT: translation studies and linguistic analysis. Translation-studies research has explored translation universals (e.g., simplification, explicitation, convergence), translation quality via metrics like BLEU and TER, and translationese. Findings include evidence of explicitation in NMT and mixed adequacy differences between MT and HT. Linguistic analyses have examined lexical diversity and consistency, morphology, cohesion, syntax, and figurative language/style. Results indicate HT often exhibits greater lexical tightness and cohesion, while MT lags in lexical diversity and coherence and struggles with figurative language. Gaps remain: limited comparative analysis of GenAIT vs HT for scientific texts and underrepresentation of Chinese. With LLMs like ChatGPT potentially serving as alternatives to NMT and Chinese’s global importance, studying GenAIT and HT in English–Chinese scientific translation is warranted. On scientific text translation, scientific texts are informative, objective, lexically dense (terminology, abbreviations, figures/tables), and syntactically complex (long sentences, passive voice, noun phrase focus). Accuracy and objectivity are paramount but challenging due to technical vocabulary, specialized uses of common words, ellipsis, long noun phrases, and complex sentences. Prior studies span MT systems for scientific domains, cognitive sources of difficulty, human–machine composite quality, and fine-grained MT error typologies, but few address GenAIT vs HT linguistic characteristics in scientific texts.
Methodology
Design: An empirical comparison of GenAIT (ChatGPT 3.5) and human translations (HT) of an English scientific text into Chinese. Nineteen MTI (Master of Translation and Interpreting) students (admitted in 2022; no professional translation experience) produced HTs under timed conditions; GenAIT was generated via ChatGPT 3.5. Materials and tasks: The English source text (ST) of 243 words was extracted from a validated teaching material for scientific/technical translation, appropriate in topic and difficulty. Participants completed two translation assignments (EN→ZH and ZH→EN) in 1.5 hours; only EN→ZH outputs were analyzed. Students used computers and printed dictionaries only (no online/offline tools). Translations were graded to encourage best effort. The researchers also collected a Reference Translated Text (RTT) from the same source material. The final corpus comprised ST, RTT, one GenAIT output, and 19 HTs. GenAIT generation: Prompt to ChatGPT 3.5 (June 7, 2023): “please translate the following passage into Chinese,” followed by the same ST used for HT. Analytical framework: Two levels—lexical and syntactic. - Lexical metrics: tokens, types, standardized type–token ratio (STTR), terminology, and part-of-speech (POS: nouns, adjectives, numerals, conjunctions). STTR computed on 100-token segments (via Wordless 2.3.0). Terminology selection based on frequency and contextual importance; translations compared across RTT, GenAIT, HT. - Syntactic metrics: count of sentences (CS), sentence length in tokens (SLT), passive voice (PV: explicit “be + V_ed” and implicit “V_ed + by”), and subordinate clauses (object clause [OC] and attributive clause [AC, restrictive]). PV and clauses were identified manually. Tools: Wordless 2.3.0 (tokens, types, STTR, CS, SLT), CLAWS-5 (POS for ST), Corpus WordParser 3.0 (Chinese segmentation and POS for RTT/GenAIT/HT), AntConc 4.1.2 (POS retrieval). Terminology, PV, and clause structures were manually annotated. Data analysis: Quantitative comparison of metrics (with HT mean and dispersion where relevant) and qualitative analysis of selected examples (10 terminology cases, two PV structures, one OC, one AC).
Key Findings
- Lexical size/diversity: HT (mean) had more tokens and types but slightly lower STTR than GenAIT. Table values: ST tokens 246, RTT 229, GenAIT 208, HT(mean) 236.68; types ST 152, RTT 152, GenAIT 137, HT(mean) 153.32; STTR ST 0.62, RTT 0.66, GenAIT 0.66, HT(mean) 0.65. About 68.42% (13/19) of HTs had lower STTR than GenAIT. - Terminology accuracy: GenAIT matched RTT on 5/10 selected terms; HTs achieved 6/10 on average, with variation by term. Humans outperformed on abbreviations (e.g., 10/19 expanded “IT” to 信息技术), while GenAIT excelled in context-sensitive polysemy (e.g., correctly rendering “current” as 电流 when some HTs mistranslated). - POS patterns: Nouns dominated across all outputs. Counts (mean or totals): GenAIT—N 78 (75.73%), Adj 8 (7.77%), Num 7 (6.79%), Conj 10 (9.71%), total 103; HT(mean)—N 85.32 (73.95%), Adj 10.68 (9.26%), Num 7.79 (6.75%), Conj 11.58 (10.04%), total 115.37. HTs generally used more nouns, adjectives, numerals, and conjunctions than GenAIT. - Sentence segmentation/readability: HT(mean) produced more sentences with shorter average length than GenAIT. CS: ST 9, RTT 10, GenAIT 9, HT(mean) 11.68 (sd 1.97). SLT: ST 27.33, RTT 22.60, GenAIT 23.11, HT(mean) 20.71 (sd 3.42). Some HTs struggled with segmenting long sentences (e.g., HT03: fewest sentences but longest SLT). - Passive voice: For explicit PV (“be + V_ed”), RTT and most HTs (except HT17) converted to active voice in Chinese, whereas GenAIT retained a passive-like form (被). For implicit PV (“V_ed + by”), examples show GenAIT and most HTs did not explicitly convert the relation, though one HT did; the conclusion section reports that HTs shifted the implicit PV into AV while GenAIT seemed to ignore the passive relation. - Subordinate clauses: Object clause with that—RTT and most HTs split into two clauses (often with a comma), while GenAIT kept the clause intact. Restrictive AC—GenAIT inserted a causal connector (因此, thus) linking clauses, clarifying relations; some HTs followed the source structure without explicit connectors. - Overall: HTs tended toward longer outputs with slightly lower lexical diversity, stronger sentence segmentation (more, shorter sentences), and more frequent PV→AV shifts; GenAIT showed higher lexical diversity, strong handling of certain terms, and automatic insertion of logical connectors but produced fewer, longer sentences.
Discussion
Findings support the hypothesis that GenAIT and HT exhibit distinct and complementary linguistic profiles in scientific translation. HTs’ larger sentence counts and shorter sentence lengths reflect training in decomposing long, complex English sentences into digestible Chinese clauses, improving readability. HTs’ frequent PV→AV transformations align with Chinese stylistic preferences and information structure, while GenAIT often preserves English-like passives in Chinese. HTs’ higher use of conjunctions and stable numerals reflects attentiveness to cohesion and accuracy; however, students showed occasional terminology errors under time pressure. GenAIT’s higher STTR and stronger performance on context-sensitive polysemy likely arise from extensive pretraining and conversational fine-tuning, enabling broader lexical variation and insertion of logical connectors (e.g., 因此) to clarify relations across clauses. Yet GenAIT tended to produce fewer, longer sentences, potentially reducing readability in Chinese and sometimes failing to split complex structures. Together, the results suggest a synergistic workflow: humans can segment and structure complex content and correct stylistic or voice issues, while GenAIT can supply diverse lexical options, background knowledge, and assist with cohesion via connectors. This complementary interaction can enhance overall translation quality and efficiency in scientific domains.
Conclusion
The study demonstrates clear, complementary differences between ChatGPT 3.5 and human student translators when translating English scientific texts into Chinese. Lexically, HTs produced longer outputs with more types but slightly lower STTR, while GenAIT showed higher lexical diversity and stronger accuracy on certain terms; HTs were better with abbreviations. Syntactically, HTs generated more sentences with shorter average length and frequently converted explicit passives to active; GenAIT retained passive-like constructions and produced fewer, longer sentences, though it sometimes improved clause connectivity via added logical markers. The authors propose combining the strengths of GenAIT (lexical diversity, terminology support, connective insertion) and HT (sentence segmentation, stylistic appropriateness, PV→AV handling) to sustain effective interaction and improve translation quality. Future work should broaden metrics and contexts to further optimize human–AI collaboration in translation.
Limitations
The study focuses on lexical and syntactic levels, omitting textual-level features due to the short ST/TT lengths; other dimensions may reveal additional differences. The participant pool is limited to 19 MTI students without professional experience, which may constrain generalizability. The materials are drawn from a single scientific teaching source and a single language direction (EN→ZH), with only one GenAIT system (ChatGPT 3.5) and one prompt. Future research should: (1) include textual/discourse-level analyses; (2) expand participant numbers and proficiency levels; (3) test additional language pairs and genres (e.g., legal, literary); and (4) explore varied prompts and GenAI systems.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny