logo
ResearchBunny Logo
The role of auditory processing in L2 vowel learning: evidence from recasts

Linguistics and Languages

The role of auditory processing in L2 vowel learning: evidence from recasts

W. Zhang and Y. Liao

Dive into groundbreaking research by Wei Zhang and Yi Liao that uncovers how recasts facilitate Chinese speakers' learning of English vowels /i/-/ɪ/. This study reveals intriguing links between auditory processing and language acquisition effectiveness, showcasing significant gains in both perception and production. Don't miss this insightful exploration into the art of language learning!... show more
Introduction

The study investigates whether recasts (a corrective feedback technique where instructors reformulate a learner’s erroneous utterance) facilitate Chinese learners’ acquisition of the English vowel contrast /i/-/ɪ/, and whether individual differences in auditory processing explain variability in gains. Prior work shows corrective feedback helps L2 speech learning, but outcomes vary with cognitive abilities. Auditory processing—capacity to encode/proceduralize spectral and temporal sound characteristics—has emerged as a key factor. The research questions were: (1) Do recasts facilitate /i/-/ɪ/ learning for Chinese native speakers regardless of time and lexical items? (2) Are gains from recasts tied to individual differences in auditory processing (perceptual acuity and audio-motor integration)? The authors predicted recasts would significantly improve perception and production across trained and untrained items, with perceptual acuity relating to perception gains and audio-motor integration relating to production gains.

Literature Review

Recasts in L2 speech learning: Over five decades, diverse instructional approaches (auditory exposure, discrimination training, high variability phonetic training, explicit articulatory instruction, awareness training, combined methods) have shown benefits, but generalizability to authentic communication can be limited. Focus-on-form in communicative settings aims to integrate attention to meaning and form, aiding proceduralization. Recasts are frequent in communicative classrooms and have been linked to improved phonological perception and production; however, effectiveness varies with learner and target characteristics. Individual differences: Cognitive factors (phonemic coding, music aptitude, motivation) influence L2 speech outcomes; aptitude-treatment interactions have been shown in lexicogrammar-focused recasts with roles for attention control, language analytic ability, working memory, and cognitive style. Extending this to L2 speech suggests learners with different cognitive profiles may benefit differentially from recasts. Auditory processing: Comprises perceptual acuity (encoding spectral/temporal details) and audio-motor integration (proceduralizing patterns). In L1, deficits predict language impairments. In L2, domain-general auditory processing correlates moderately to strongly with speech learning. Gaps: prior studies often assessed only perception or production, lacked delayed tests and generalization items, and focused on form-oriented rather than meaning-oriented instruction. The present study addresses these by testing both perception and production, using immediate and delayed posttests, including untrained items, and examining recasts (meaning-oriented).

Methodology

Design: Longitudinal pretest–posttest with control group over 7 weeks. Week 1: background questionnaire, IELTS proficiency, auditory processing tests. Weeks 2–3: pretests, then 10 consecutive daily treatment sessions (90 min each). Immediate posttests followed the final session. Week 7: delayed posttests. Participants: 68 recruited first-year Computer Science students at Hainan University (China); 8 excluded (early English exposure 5; professional music experience 2; incomplete training 1). N=60 Mandarin L1 speakers (Haikou-born/raised; no residence abroad), normal hearing (pure-tone audiometry 250–8000 Hz at 20 dB). Random assignment: recast group (RG) n=30 (15F/15M; mean age 18.7) and control group (CG) n=30 (15F/15M; mean age 18.4). All enrolled in twice-weekly 90-min English course. IELTS used to control proficiency (overall M=5.5, SD=0.7; no group differences across skills). Compensation: $10. Ethics approved. Instructor: Female native American English teacher in China (>5 years teaching; MA in education), emphasizing communicative skills; familiar with participants to reduce anxiety. Native judges: Six monolingual native English listeners (Ottawa, Canada; mean age 38.2) with normal hearing; two aided stimulus preparation. Target stimuli: English vowels /i/ vs /ɪ/, a difficult contrast for Mandarin learners per PAM/PAM-L2, SLM, and NRV frameworks. Acoustic cues: spectral (F1 lower for /i/ ~342–437 Hz vs /ɪ/ ~427–483 Hz) and duration (/i/ longer ~243–306 ms vs /ɪ/ ~192–237 ms). Chinese learners tend to over-rely on duration. Auditory processing tests (GORILLA online):

  • Perceptual acuity: Adaptive three-alternative oddity discrimination for pitch (F0), duration, amplitude rise time, and formant (F2) using complex tones. 70 trials or 8 reversals; threshold averaged after the third reversal. Lower thresholds indicate better acuity. Stimulus parameters: F0 330.3–360 Hz in 0.3 Hz steps; duration 252.5–500 ms in 2.5 ms steps; amplitude rise time 178–300 ms in 2.85 ms steps; formant stimuli with F1=500 Hz, F3=2500 Hz, F2 target 1502–1700 Hz in 2 Hz steps.
  • Audio-motor integration: Melody reproduction (10 seven-note melodies from 5-note scale: 220–329.6 Hz; 300 ms notes) clicking note boxes to reproduce heard sequences; and rhythm reproduction (10 patterns of 16×200-ms segments; nine hits; reproduce via spacebar presses). Accuracy scored as percentage; audio-motor score is mean of melody and rhythm accuracy. Treatment sessions (10×90 min): Meaning-focused tasks designed to elicit target vowels with recasts provided in RG; CG completed same tasks without recasts. Tasks: picture description (e.g., “Every Monday, Jessica sits on a sofa, and Howard cleans the seats of his car.”). Instructor provided immediate partial, one-word recasts with falling intonation, brief pause for self-repair. Debating/public speaking tasks eliciting /i/-/ɪ/ words (e.g., “Is it cheap to eat fried chips outside?”). Emphasis on communication first, accuracy second; recasts delivered without disrupting flow. Outcome measures (administered pre, immediate post, delayed post; order: spontaneous production, controlled production, perception):
  • Perception: Forced-choice identification of 44 minimal pairs (/i/-/ɪ/; 22 target pairs: 14 trained, 8 untrained; 22 distractor pairs varying onset or nucleus). Tokens recorded by 1 male and 1 female native English speaker in “I said …”, validated by a third native speaker; acoustic properties within norms. Presented individually via E-Prime at ~70 dB SPL with practice trials.
  • Controlled production: Carrier-sentence reading (“I said _____”) with 7 minimal pairs (4 trained, 3 untrained) plus distractors; first production taken.
  • Spontaneous production: Picture narrative with memory of 4 critical words per set; 16 pictures/words per version (8 targets evenly split /i/ and /ɪ/, 4 trained and 4 untrained; 8 distractors). No planning time; first production taken. Recording and scoring: Speech recorded via DELL UC 350 headset (44.1 kHz/16-bit). Tokens segmented with Praat. Six native judges identified intended word among three options (target /i/ word, target /ɪ/ word, or neither) and rated pronunciation goodness on a 9-point scale. Production accuracy per token = mean of the six judges’ scores if identification matched intended word; 0 if mismatch. Reliability: ICC/Cronbach’s alpha ~0.74 overall (controlled 0.76; spontaneous 0.73). Statistical analysis: Group comparability checked via one-way ANOVAs on pretests (no differences). Repeated-measures ANOVAs tested effects of Group (RG, CG), Lexical context (trained, untrained), and Time (pre, immediate, delayed). Bonferroni-adjusted post hoc tests reported. Correlations (Pearson) examined relationships between auditory processing (perceptual acuity thresholds; audio-motor accuracy) and gain scores (immediate-pre and delayed-pre averages), with assumptions checked.
Key Findings
  • Group comparability: No significant pretest differences between RG and CG across perception and production (p > 0.05). Perception (accuracy %):
  • RG trained: 28.71 (11.60) → 65.38 (11.18) → 61.89 (10.69); untrained: 26.14 (10.04) → 58.69 (8.21) → 54.55 (10.73).
  • CG trained: 26.59 (12.86) → 27.12 (10.42) → 27.27 (11.12); untrained: 26.74 (9.60) → 30.83 (9.86) → 28.79 (10.51).
  • ANOVA: Significant effects for Group × Time F(2,57)=80.865, p<0.001; Lexis × Group F(1,58)=16.117, p<0.001; Lexis F(1,58)=4.412, p=0.040; Group F(1,58)=247.203, p<0.001; Time F(2,57)=102.961, p<0.001. Bonferroni: RG improved significantly at immediate and delayed posttests for trained (d=3.218, 2.975) and untrained (d=3.549, 2.734) items; CG showed no significant change. Controlled production (9-point scale):
  • RG trained: 1.73 (0.52) → 5.00 (0.79) → 4.33 (0.66); untrained: 1.50 (0.57) → 4.23 (0.68) → 3.80 (0.61).
  • CG trained: 1.63 (0.61) → 1.77 (0.68) → 1.73 (0.74); untrained: 1.67 (0.55) → 1.77 (0.63) → 1.70 (0.53).
  • ANOVA: Group × Time F(2,57)=214.540, p<0.001; Lexis × Group F(1,58)=16.303, p<0.001; Group F(1,58)=569.685, p<0.001; Time F(2,57)=247.115, p<0.001. RG improved significantly at both posttests for trained (d=4.890, 4.376) and untrained (d=4.351, 3.896); CG did not. Spontaneous production (9-point scale):
  • RG trained: 1.13 (0.35) → 3.77 (0.43) → 3.70 (0.53); untrained: 1.27 (0.45) → 3.27 (0.52) → 3.23 (0.63).
  • CG trained: 1.20 (0.41) → 1.30 (0.47) → 1.27 (0.45); untrained: 1.33 (0.48) → 1.37 (0.49) → 1.37 (0.56).
  • ANOVA: Significant Lexis × Group × Time F(2,57)=3.536, p=0.036; Lexis × Time F(2,57)=4.972, p=0.010; Group × Time F(2,57)=333.128, p<0.001; Lexis × Group F(1,58)=10.496, p=0.002; Group F(1,58)=680.223, p<0.001; Time F(2,57)=333.128, p<0.001. RG improved significantly at both posttests; CG did not. Auditory processing and gains (Pearson r, p):
  • Perception gains vs perceptual acuity (lower threshold = better): trained r=-0.364, p=0.048; untrained r=-0.416, p=0.022 (small-to-medium to medium effects). No relations with audio-motor.
  • Controlled production gains vs audio-motor integration: trained r=0.385, p=0.036; untrained r=0.438, p=0.015 (small-to-medium to medium). No significant relations with perceptual acuity.
  • Spontaneous production gains vs audio-motor integration: trained r=0.374, p=0.042; untrained r=0.405, p=0.026 (small-to-medium). No significant relations with perceptual acuity.
  • The two auditory constructs were not significantly correlated with each other (r=-0.215, p=0.131).
Discussion

Recasts substantially improved Chinese learners’ perception and production of the English /i/-/ɪ/ contrast, with gains evident immediately and retained four weeks later, and generalizing to untrained lexical items. In contrast, meaning-focused interaction without corrective feedback did not yield significant gains, underscoring the added value of recasts in communicative instruction. Gains tended to be larger for trained than untrained items, but significant generalization indicates learners abstracted phonological knowledge beyond practiced words. Critically, individual differences in domain-general auditory processing predicted who benefited most: better perceptual acuity (fine-grained encoding of spectral/temporal cues) related to larger perception gains, while stronger audio-motor integration (remembering and reproducing auditory patterns) related to larger production gains in both controlled and spontaneous tasks. These patterns align with the Auditory Precision Hypothesis for L2, suggesting auditory processing underpins the effective use of input and feedback. Mechanistically, recasts provide positive evidence (native models) and negative evidence (error signals) plus opportunities for self-modified output. Learners with sharper acuity may better detect cue contrasts (formant, duration, amplitude, pitch) in the models, enhancing perception; those with stronger audio-motor integration may more efficiently map auditory patterns to articulatory actions, enhancing production. Correlations were often stronger for untrained items, perhaps because novel items demand deeper encoding and integration of cues where auditory processing advantages are more consequential. These findings clarify why some learners benefit more from recasts: auditory processing profiles mediate treatment effectiveness. Pedagogically, assessing learners’ auditory profiles could inform individualized emphasis—enhancing perceptual focus for those with weaker acuity and providing more production-focused, imitation and rhythm/melody-based practice for those with weaker audio-motor skills.

Conclusion

The study demonstrates that recasts in meaning-focused tasks significantly improve Chinese L2 learners’ perception and production of the English /i/-/ɪ/ contrast, with durable gains and generalization to new lexical items. It further shows an aptitude–treatment interaction: perceptual acuity predicts perception gains, and audio-motor integration predicts production gains. These contributions deepen understanding of how domain-general auditory processing shapes responsiveness to corrective feedback in communicative instruction. Future research directions proposed: (1) incorporate electrophysiological measures (e.g., brainwave indices) alongside behavioral protocols of auditory processing; (2) examine additional target features (lexical tones, word stress, intonation) beyond monosyllabic /i/-/ɪ/ minimal pairs; (3) compare multiple corrective feedback types (e.g., prompts vs recasts) and their interaction with auditory processing; (4) model interactions among auditory processing, experiential factors (age, learning context, proficiency), and other cognitive factors (working memory, inhibitory and attention control) in shaping recast effectiveness.

Limitations
  • Auditory processing was measured only with behavioral tasks; no electrophysiological indices were included.
  • Target features were limited to the English /i/-/ɪ/ contrast in monosyllabic minimal pairs; generalizability to other segmental and suprasegmental features remains to be tested.
  • Only one corrective feedback type (recasts) was examined; classroom practice involves varied CF types, which may interact differently with auditory processing.
  • The exploratory design did not model broader experience and cognitive factors (e.g., age, learning context, proficiency, working memory, attention control) jointly with auditory processing.
  • Although generalization to untrained items was tested, all assessments used the same token sets across sessions (albeit reordered), which could allow minor test–retest familiarity effects (mitigated by the control group).
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny