logo
ResearchBunny Logo
Processing Chinese formulaic sequences in sentence context: a comparative study of native and non-native speakers

Linguistics and Languages

Processing Chinese formulaic sequences in sentence context: a comparative study of native and non-native speakers

K. Chen, L. Gu, et al.

This study, conducted by Ken Chen, Lei Gu, and Qiaoyan Bai, reveals fascinating insights into how both native and non-native speakers process Chinese formulaic sequences within sentences. With a focus on response times and the influence of context, the findings emphasize the critical role that contextual effects play in second language teaching and learning.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper investigates how formulaic sequences (FSs)—fixed, frequent multiword units processed and used as holistic semantic units—are recognized and processed by native and non-native readers of Chinese within sentence contexts. FSs are central to fluent, accurate, and native-like language production and can serve as indicators of L2 development. Despite extensive research in English, evidence in other languages and among L2 learners remains mixed, especially regarding whether NNSs process FSs holistically like NSs and how sentence context modulates processing. The study addresses two questions: (1) Do participants employ holistic processing for Chinese FSs? (2) Does sentence context influence FS processing? The authors hypothesize holistic processing across groups and stronger contextual effects for NNSs due to limited L2 knowledge.
Literature Review
The review defines FSs as pre-constructed, holistically stored and retrieved strings whose meanings are not compositional (Wray 2002, 2008). FSs arise from frequent language use and pragmatic constraints and contrast with analytically processed, grammar-driven combinations. While children’s L1 acquisition shows clear FS roles, evidence for adult L2 acquisition is less conclusive. L2 FS research spans receptive and productive perspectives, examining factors such as FS type, frequency, statistical properties, L1–L2 congruency, instruction duration, proficiency, vocabulary size, and pedagogy (e.g., Ding & Reynolds 2019; Nguyen & Webb 2017). Psycholinguistic work contrasts holistic vs analytical vs hybrid processing, generally showing a processing advantage (shorter latencies) for FSs in NSs across tasks (lexical decision, self-paced reading, eye-tracking), but findings for NNSs are inconsistent. The review also details contextual facilitation in word recognition across tasks and modalities (e.g., Becker 1980; Grosjean 1980; Sereno et al. 2003), positing that sentence context may similarly modulate FS processing. Prior work with Chinese L2 learners (Zheng et al. 2016) showed contextualized FSs improved reading speed and accuracy, motivating the present investigation with quantified sentence-context measures and cross-proficiency comparisons.
Methodology
Design: Self-paced moving-window masking (word-by-word) paradigm implemented in E-Prime. Dependent measures were response times (RTs) to each word and accuracy on post-sentence Yes/No comprehension questions. Participants: Two groups: NSs (n=20; 11 males, 9 females; age 18–23, M=20.45, SD=1.39) and NNSs at three Chinese proficiency levels via HSK: Elementary (n=20; 7 males, 13 females; age 18–28, M=21.65, SD=2.37), Intermediate (n=20; 9 males, 11 females; age 20–28, M=23.10, SD=2.51), Advanced (n=20; 5 males, 15 females; age 23–29, M=26.70, SD=2.00). All NNSs studied in Chinese colleges and had passed their respective HSK levels. Screening: NNSs completed a character recognition test using experiment characters; inclusion required ≥90% recognition. During the task, participants answered comprehension questions; datasets with <70% accuracy were excluded and replaced. Materials: 40 target sequences: 20 FSs and 20 matched non-FSs. FSs and non-FSs were selected/constructed using frequency and mutual information (MI) from the BCC Mandarin corpus to ensure significant differences (frequency: t=3.76, df=19, p=0.001; MI: t=6.38, df=19, p<.001). Non-FSs were created by substituting one or two characters in FSs. Stroke count differences were nonsignificant (t=0.483, df=19, p=0.635, d=0.108). Examples include FS 不一定 ‘uncertain’ vs non-FS 不充分 ‘insufficient’. Materials were cross-checked with HSK vocabulary to ensure character familiarity. Three linguistic researchers (NSs) evaluated items; five NS students provided psycholinguistic judgments. Sentence embedding: Each FS and its matched non-FS were embedded in the same grammatical, meaningful sentence (counterbalanced across lists so the FS and its non-FS counterpart never appeared in the same list). Final set: 20 sentences with FSs, 20 with non-FSs, and 20 filler sentences without FSs/non-FSs. Readability was piloted with NSs and NNSs (not in the main experiment) and revised as needed. Quantifying sentence context: Contextual numerical information was computed using n-gram conditional probabilities (Bayesian formulation; log-transformed). Sentences containing FSs and those with non-FSs differed significantly in quantified context (t=7.98, df=19, p<.001, d=1.78). This contextual measure served as a covariate in ANCOVA analyses. Procedure: Sentences were presented word-by-word; Chinese multi-character words and FSs were treated as single words. Participants advanced through words via keypress; RTs were recorded. After each sentence, a Yes/No comprehension question probed key content. Statistical analysis: Two approaches were applied to RTs for correctly answered trials. (1) Two-way repeated-measures ANOVA (4 participant groups × 2 stimulus sets: FS vs non-FS) without context covariate. (2) Two-way repeated-measures ANCOVA with the same factors plus quantified sentence context as covariate. Follow-up one-way ANOVAs compared FS vs non-FS within each group (without covariate), and one-way ANCOVAs (with context covariate) assessed the impact of context on FS vs non-FS processing within each group.
Key Findings
- Overall FS advantage: Across analyses, FSs were processed faster than matched non-FSs in all groups. - Two-way ANOVA (no context covariate): Significant interaction of group × stimulus set, F(3,76)=24.9, p<.001, η²=0.002. Post hoc: FSs < non-FSs in RTs (p<.001). Group differences: NNSs slower than NSs; RTs decreased with higher proficiency (p<.001). - Two-way ANCOVA (with quantified context covariate): Significant group × stimulus interaction, F(3,74)=24.49, p<.001, η²=0.002. FSs remained faster than non-FSs (p<.001). NSs vs NNSs differences remained significant (p<.001). - One-way ANOVAs (FS vs non-FS by group, no covariate): Elementary: F(1,19)=125, p<.001, η²=0.319; Intermediate: F(1,19)=89.0, p<.001, η²=0.303; Advanced: F(1,19)=80.8, p<.001, η²=0.244; Native: F(1,19)=49.0, p<.001, η²=0.101. - One-way ANCOVAs (with context covariate): Elementary: F(1,37)=7.19, p=0.011, η²=0.147; Intermediate: F(1,37)=6.19, p=0.017, η²=0.130; Advanced: F(1,37)=4.46, p=0.042, η²=0.075; Native: F(1,37)=4.362, p=0.044, η²=0.105. - Context effects: Quantified sentence context significantly modulated processing, especially for elementary and intermediate NNSs. Context had weaker (marginal) effects for advanced NNSs and NSs. Context facilitated FS processing more than non-FS processing. - Proficiency effects: NNS RTs decreased with higher proficiency, approaching NS performance at advanced levels.
Discussion
Findings support the holistic processing hypothesis for Chinese FSs: consistent RT advantages for FSs over matched non-FSs appeared in both NSs and NNSs, in isolation and within sentence contexts. This indicates that FS representations are accessed and processed more efficiently than compositional sequences, aligning with theories that FSs are pre-constructed units with fewer internal parsing demands. Contextual information, quantified via corpus-derived conditional probabilities, facilitated processing across groups, most strongly for lower-proficiency NNSs who likely relied on contextual cues to compensate for limited lexical or structural knowledge. For advanced NNSs and NSs, context effects were smaller, consistent with richer lexical and pattern knowledge and greater automaticity. The graded impact of context across proficiency underscores its role as a scaffold in L2 processing and suggests that FS facilitation is robust but can be further enhanced by supportive contexts, particularly for learners still consolidating form-meaning mappings. The results extend predominantly English-focused FS literature to Chinese, providing converging evidence for holistic FS processing and demonstrating a practical method for quantifying sentence context as a psycholinguistically relevant predictor of RTs.
Conclusion
The study demonstrates a reliable processing advantage for Chinese FSs over matched non-FSs in both native and non-native readers, supporting holistic processing accounts. Quantified sentence context facilitates processing, with pronounced effects in elementary and intermediate L2 learners and weaker effects for advanced learners and NSs. Pedagogically, FSs should be taught and practiced as holistic units within rich sentence contexts to leverage contextual scaffolding, especially for lower-proficiency learners. Theoretically, the findings contribute cross-linguistic evidence for FS holistic processing and highlight the value of corpus-based contextual metrics in modeling reading times. Future research should expand to multiple target languages, examine different FS types and task paradigms (e.g., eye-tracking, ERP), and explore how instructional designs optimize contextual support for FS acquisition and processing across proficiency levels.
Limitations
- Language scope: The study focuses on Chinese L2 learners and Chinese FSs only; it does not test learners across multiple target languages, limiting cross-linguistic generalizability. - Task paradigm: Self-paced masking may not capture all real-time processing dynamics observable in eye-tracking or neurophysiological measures. - Context quantification: Although corpus-based n-gram probabilities reduce subjectivity, they may not fully capture all dimensions of semantic/pragmatic predictability or discourse-level context. - Sample size: While adequate, group sizes (n=20 each) may limit detection of small effects and the precision of effect size estimates, particularly in covariate analyses.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny