logo
ResearchBunny Logo
L2 writer engagement with automated written corrective feedback provided by ChatGPT: A mixed-method multiple case study

Linguistics and Languages

L2 writer engagement with automated written corrective feedback provided by ChatGPT: A mixed-method multiple case study

D. Yan and S. Zhang

This fascinating mixed-method study by Da Yan and Shuxian Zhang explores how L2 writers interact with ChatGPT as an automated written corrective feedback provider. Discover the intricate dynamics of language proficiency, technological competence, and affective engagement in this innovative learning environment.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the shift in L2 writing research from focusing solely on feedback effects to examining students’ engagement with written feedback across behavioral, cognitive, and affective dimensions. While AWCF has been increasingly used, comprehensive, multidimensional investigations of student engagement—especially with cutting-edge, GAI-based systems like ChatGPT—remain scarce. Given early evidence that ChatGPT can outperform prior tools on grammatical error correction and its interactive, iterative feedback affordances, the study examines how learners engage with ChatGPT-provided feedback in authentic classroom contexts. Research question: How do L2 writer with varied language proficiency and technological competence behaviorally, cognitively, and affectively engage with AWCF provided by ChatGPT? The purpose is to reconceptualize engagement with GAI-generated feedback, document actual engagement processes, and inform pedagogy in AI-enhanced L2 writing environments.
Literature Review
The review positions AWCF as a prominent innovation in L2 pedagogy, highlighting benefits such as reducing teacher/peer burden, empowering student revision, and timely feedback, while noting mixed evidence on efficacy versus human feedback and limitations of traditional corpus-based systems. AI-based tools (e.g., Grammarly, QuillBot) improve uptake and revision quality over earlier systems; emerging work suggests ChatGPT further advances grammatical error correction, leveraging vast pretraining, interactivity, and iterative feedback generation. Risks include hallucinations, variable feedback quality contingent on user prompting, and the need for AI literacy and ethical use. Engagement is conceptualized as behavioral (feedback-seeking actions and revision strategies), cognitive (use of cognitive and metacognitive strategies), and affective (emotional/attitudinal responses). Prior studies link engagement with proficiency, strategy use, trust in AWCF, and feedback explicitness, yet show contradictions (e.g., positive affect but insufficient cognitive engagement). Gaps include limited attention to revision processes and outcome-centric approaches. For GAI contexts, individual differences should include technological competence alongside language proficiency due to the centrality of prompt-based interaction. The study proposes a model where individual differences and contextual factors (ChatGPT, L2 writing environment) shape behavioral (prompting and revision), cognitive (metacognitive/cognitive strategy use), and affective engagement.
Methodology
Design: Mixed-method multiple case study with convergent design and cross-case comparison. A case is defined as an individual learner’s behavioral, cognitive, and affective engagement with ChatGPT-generated AWCF. Participants and setting: Undergraduate EFL program at a Chinese university with three writing courses emphasizing formative assessment and technology-enhanced feedback. From an initial voluntary pool of 14, four focal participants were purposefully sampled to represent varied language proficiency and technological competence (based on averages from four prior L2 writing assessments and two digital humanities course assessments), interest/trust in AWCF, and faculty recommendations. Broader practicum involved 68 students. Procedures: Five-week L2 writing practicum with two weekly instructor-led sessions (usage and scaffolded exploration of ChatGPT) and four self-directed sessions. Each week students drafted, sought ChatGPT feedback, revised, and submitted work. Data sources: weekly reflective learning journals (including multimodal artifacts), task worksheets (drafts, revisions, final products), classroom observations with keylogging and screen recording, and immediate post-session interviews (10–15 minutes, audio-recorded and transcribed). Reliability/trustworthiness: interobserver agreement established; member checking and investigator triangulation conducted. Analysis: (1) Quantified document analysis to derive time on feedback processing, number of ChatGPT prompts, interaction time, and feedback uptake; prompt-writing patterns coded using a prior scheme (Fleiss’ κ=0.86, 95% CI [0.78, 0.91]). (2) Lag sequential analysis (LSA) with GSEQ 5.1 on coded metacognitive and cognitive strategies from observations; significant transitions identified (Z>1.96); inter-rater reliability for observation coding was Cohen’s κ=0.72 (95% CI [0.65, 0.84]). (3) Thematic analysis (six-step Braun & Clarke) of interviews by multiple coders; disagreements resolved via discussion. Findings across methods were triangulated.
Key Findings
Behavioral engagement: - Feedback seeking: Over 5 weeks, high-proficiency learners Emma and Sophia each produced over 2000 ChatGPT prompts; Robert produced 1670; Mia 1238. Emma and Sophia increasingly used sophisticated prompt techniques (e.g., providing background, task requirements, persona, tone, specificity, narrowing focus, quality/affective evaluations) rather than repeatedly regenerating feedback. Robert and Mia relied more on regeneration and showed different temporal trends. - Revision operations: Average AWCF per task: Emma ~11 items; Sophia ~12.4; Robert and Mia >22. Correct revision rates: Emma 74.55%; Sophia 74.19%; Robert 61.74%; Mia 60.71%. Substitution rates: Emma 14.55%; Sophia 19.35%; Robert 4.35%; Mia 6.25%. Incorrect revision: Robert 16.52%; Mia 12.50%. Deletion: Robert 10.43%; Mia 10.71%. Lower-proficiency learners rejected or deleted more suggestions and achieved fewer correct/adopted revisions. Cognitive engagement (LSA): - High-proficiency (Emma, Sophia) showed stronger integration between metacognitive regulation (monitoring, evaluation) and feedback-elicitation/refinement, with significant bidirectional transitions (e.g., feedback elicitation ↔ monitoring; monitoring ↔ refinement) especially for Emma; Sophia’s monitoring during refinement was weaker (more unidirectional MN). - Lower-proficiency (Robert, Mia) showed limited or severed integration of metacognitive and cognitive strategies; many one-off activities (e.g., evaluation→decision; decision→revision) and reduced links between monitoring/evaluation and feedback processes; Mia displayed the weakest integration. Affective engagement: - Overall positive attitudes: students found ChatGPT feedback beneficial, interesting, and mostly trustworthy; iterative prompting improved perceived quality. - Cognitive load/time: feedback seeking and refinement were competence-demanding and time-consuming; mental effort noted by most, with Robert reporting low stress due to high digital competence. - Negative feedback acceptance: students found it easier to accept harsh/critical feedback from ChatGPT than from teachers/peers (reduced “losing face”). - Intent to continue: all expressed willingness to continue using ChatGPT, though lower-proficiency/competence learners worried about keeping up with advanced usage. Additional data quality notes: Coding reliabilities were good (Cohen’s κ=0.72; Fleiss’ κ=0.86).
Discussion
Findings directly address how learners with varied language proficiency and technological competence engage with ChatGPT-provided AWCF. Behaviorally, all cases actively sought feedback and revised, but higher proficiency and greater technological competence correlated with more sophisticated prompting, more effective refinement, and more accurate revisions. Cognitively, only higher-proficiency learners effectively integrated metacognitive monitoring and evaluation with feedback elicitation and refinement, indicating that successful engagement with GAI feedback requires both language and self-regulatory capacities. Notably, stronger technological competence appeared to partially compensate for weaker metacognitive regulation by enabling persistent, creative interaction with the chatbot. Affective responses were predominantly positive, with reduced social anxiety around negative feedback and strong intentions for continued use, but the iterative prompting and processing imposed additional time and mental effort. Collectively, results suggest ChatGPT can enhance engagement and outcomes beyond earlier AWCF systems, while underscoring the need for scaffolding to develop AI literacy, prompting skills, and metacognitive strategy use to maximize benefits and manage cognitive load.
Conclusion
This multiple mixed-method case study of four EFL undergraduates found that: (1) students were behaviorally engaged with ChatGPT-generated feedback, but feedback-seeking and revision outcomes were closely tied to language proficiency and technological competence; (2) only higher-proficiency learners showed effective cognitive engagement through metacognitive regulation; and (3) ChatGPT was perceived as a powerful, affectively engaging AWCF provider, albeit competence-demanding and time-consuming. Pedagogically, the study supports integrating GAI tools in L2 writing while providing targeted teacher scaffolding for prompt engineering, feedback processing, and metacognitive strategy development. It advocates a balanced, realistic stance toward GAI, emphasizing multi-competence building for students and instructors and instructional redesign to leverage conversational AI. Future research should use larger samples and longer durations, incorporate peer/instructor feedback modes (collaborative scaffolding), and examine impacts across writing genres to determine long-term learning effects and generalizability.
Limitations
- Small, purposefully sampled multiple-case design limits generalizability beyond the focal cases. - Short duration (five weeks) with only five tasks may not capture longer-term effects. - Feedback sources emphasized self-directed interaction with ChatGPT; roles of peers and instructors in processing feedback were not examined. - Genre coverage was limited; effects across multiple writing genres were not studied. - Participant recruitment showed relatively high drop-out, suggesting current variability in AI competence and domain knowledge among students.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny