Computer Science

Improving Assessment of Programming Pattern Knowledge through Code Editing and Revision

S. Nurollahian, A. N. Rafferty, et al.

This insightful study by Sara Nurollahian, Anna N. Rafferty, and Eliane Wiese explores how code-writing tasks assess programming patterns and anti-patterns among students. Surprising findings reveal that simply analyzing initial code writing may overlook true student capabilities, as many can expertly revise their code upon reflection. Discover how combining various coding tasks leads to a richer understanding of student knowledge!

00:00

~3 min • Beginner • English

Index

Introduction

The paper investigates whether common assessments based solely on code writing accurately reflect students’ knowledge of programming patterns and anti-patterns, particularly discourse rules for code structure. While code functionality can be achieved with both expert and alternative structures, expert structure enhances readability and maintainability. Prior practice often interprets anti-patterns in students’ written code as misunderstandings, but students may not be incentivized to attend to structure or may fail to activate relevant knowledge during production. The study evaluates multiple assessment lenses—writing, editing, and revising—to provide a more comprehensive measure of knowledge of code structure choices. The research questions are: RQ1: To what extent do anti-patterns indicate knowledge gaps regarding target code structure? RQ2: How useful is code writing as a predictor of success in editing and revision? RQ3: What additional facets of knowledge (e.g., identifying expert code, readability preferences) improve prediction of success in editing and revising?

Literature Review

The prior work highlights challenges in teaching and assessing code structure. A) Writing well-structured code is difficult because discourse rules are often implicit and violations may not affect functionality, leading students and graders to prioritize functionality over structure; scalable feedback is needed. B) Professional static analyzers (e.g., PMD, FindBugs, SonarQube) often miss pedagogically relevant issues and produce hard-to-act-on messages; educational analyzers target relevant violations but cover a limited subset, may overload students with detailed hints, and evidence of impact is sparse. C) Dedicated instruction for code structure, such as faded Parsons problems and refactoring tutors, can directly practice patterns and anti-patterns; standalone activities ensure all students receive support, not only those flagged in assignments. D) Learning theories (Ohlsson’s learning from errors; dual process theory) suggest students may revise or evaluate correctly without additional instruction when prompted to attend to structure; prior tutoring studies show varying support needs across structures. This motivates assessing knowledge through editing and revising, not just writing or counting violations.

Methodology

Design: An online, self-paced survey adapted from the RICE instrument measured five areas: code writing, style/readability preferences, comprehension, code editing, and code revising. The present paper focuses on writing, editing, and revising. Participants: 328 consenting students from two intermediate CS courses (CS2 and the subsequent course) at one institution received extra credit for completion; IRB-approved (protocol 00124175). Task topics: Seven control-structure patterns were targeted overall; three topics (T1–T3) were included in revising due to simpler detection: T1 returning a boolean expression with an operator vs. literals (e.g., return num == 7 vs. if/else true/false); T2 returning a boolean expression with a method call vs. literals (e.g., return word.startsWith("A")); T3 unique vs. repeated code within if/else (shared code factored outside vs. duplicated within branches). Survey flow: For writing, students completed short methods (< 10 lines). Preferences tasks asked students to select most readable and expert-styled code among 3–4 blocks. Editing presented functional but alternatively-structured code; students were to improve style without changing functionality. Revising presented students’ own previously flagged non-expert code with progressive support: - Step 1: Prompt to improve style, without guidance. - Step 2: If still flagged, a hint suggesting structure-level changes (e.g., remove if). - Step 3: If students indicated they could not follow the hint, they saw an isomorphic worked example contrasting non-expert vs. expert versions, then attempted a final revision. Randomization: Writing and preference tasks were order-randomized (forward vs. reverse) to control for ordering effects; editing and revising items were ordered low-to-high expected difficulty. Detection and coding: Writing responses for T1–T2 were flagged as non-expert if they included if-statements; for T3, flags triggered on presence of else or multiple returns (less accurate). Code was compiled and auto-tested for functionality where applicable; limited manual edits fixed minor compile errors per pre-registered rules. Structure was evaluated if responses met completion thresholds: T1–T2 required compilation; T3 required addressing specified method requirements and was human-coded by two raters with disagreements resolved via discussion. Analyses included chi-square tests and logistic regressions relating writing style/functionality, editing success, revising success, and preference selections.

Key Findings

RQ1 (Do anti-patterns indicate knowledge gaps?): For returning-boolean topics, many students who wrote non-expert structure could revise correctly at the first prompt (no guidance beyond asking to improve style): T1 first-prompt success 57% (84/148); T2 first-prompt success 69% (92/133). After additional supports, overall revision success among those flagged was very high: T1 91% (135/148) revised successfully, yielding 95% (313/328) expert by end; T2 93% (124/133) revised successfully, yielding 97% (318/328) expert by end. For T3 (unique vs. repeated code), far fewer succeeded: only 6.5% (11/169) revised at first prompt; overall only 31% (52/169) of flagged revised successfully, resulting in 54% (178/328) expert by end. RQ2 (Is writing predictive of editing/revision?): Editing success rates by initial writing style: T1 editing—69% of those who wrote expert vs. 27% of those who wrote non-expert edited to expert; chi-square writing style vs. editing p < .0001; writing functionality vs. editing p = .817; logistic regression with writing style predicts editing but weakly (β = 1.9073, p < .001, pseudo-R^2 = .14). T2 editing—67% (expert writers) vs. 33% (non-expert writers); writing style vs. editing p < .0001; writing functionality vs. editing p = .040; logistic regression including style and functionality remains weak (functionality β = .7034, style β = 1.2764, pseudo-R^2 = .15). T3 editing—14% (expert writers) vs. 6% (non-expert writers); style vs. editing p = .00258; functionality vs. editing p = .158; logistic regression (style only) β = 1.4428, p = .012, pseudo-R^2 = .08. Writing functionality did not reliably correlate with revising success for any topic. RQ3 (Which additional facets predict success?): Logistic regressions predicting editing, including identification of expert style and selection of expert code as most readable, improved explanatory power. For T1, selecting expert as most readable (p = .005, β = 0.9859) and identifying expert style (p = .044, β = 0.6342) predicted editing (pseudo-R^2 = .2594). For T2, writing style (p < .001, β = 1.2311), identifying expert style (p < .001, β = 1.4967), and selecting expert as most readable (p = .001, β = 0.9845) predicted editing (pseudo-R^2 = .2029). For T3, no predictors were significant due to low success. Predicting initial writing style across topics: selecting expert as most readable, writing functionality, and editing success were significant (e.g., T1 readability choice β = 1.4712, p = .001; functionality β = 2.7587, p < .001; editing success β = 1.5613, p < .001; pseudo-R^2 = .2313). Predicting first revision success: editing success and selecting expert as most readable were significant for T1 (e.g., editing β = 3.2566, p < .001), and editing success for T2–T3; second-revision models had no significant predictors. Additional observations: Over 25% of students who initially wrote non-expert code edited correctly for T1–T2; more than 30% of expert writers still failed the editing tasks across topics. Detector accuracy was high for T1–T2 but lower for T3, complicating interpretation. Error patterns among non-revisers included misuse of loops, incorrect returns, copying examples without adaptation, or structural changes that harmed functionality.

Discussion

Findings indicate that anti-patterns in students’ written code do not uniformly reflect conceptual gaps in understanding code structure. For returning-boolean patterns, many students revised to expert structure when merely prompted to attend to style, consistent with theories suggesting more knowledge is available during evaluation than production and that deliberate attention influences performance. However, unique vs. repeated code demands broader refactoring over longer code sequences, posing greater cognitive load and lower success rates; this aligns with prior work identifying simplification of complex control structures as challenging. Code writing, editing, and revising measure related but distinct facets of knowledge and skill; writing style is only a weak predictor of editing, and functionality does not predict revision success. Incorporating additional measures—recognition of expert style and readability preferences—improves prediction but still leaves substantial unexplained variance, reinforcing that multiple assessment modes are necessary. Educators and tool designers should avoid inferring deep misconceptions solely from anti-pattern counts and should consider targeted support where structural refactoring is more complex (e.g., removing duplication across branches).

Conclusion

Assessing 328 intermediate CS students across writing, editing, and revising tasks on three code structure topics shows that code writing alone provides an incomplete picture of students’ knowledge. Many students who initially wrote alternatively-structured code revised successfully without instruction for returning-boolean patterns, and a substantial minority edited others’ code correctly. Writing style weakly predicts editing success, while recognition of expert style and readability judgments add predictive value. A constellation of tasks—writing, editing, revising, and preference/identification—yields a more accurate assessment of students’ understanding of programming patterns. Future work should extend to additional structures and contexts, refine automated detection of anti-patterns (especially for duplication within branches), explore instructional sequencing prioritizing difficult refactorings, and evaluate ecologically valid settings where students can compile and run code.

Limitations

The study covered only three control-structure topics selected for ease of automatic detection; results may not generalize to other patterns. Expert structure definitions were based on unanimous agreement of three instructors and may be subjective or context-dependent. The regex-based detector, especially for unique vs. repeated code, had false positives/negatives, complicating analyses. The survey environment prevented compiling/running during the task, possibly discouraging structural edits due to inability to validate functionality. Extra credit irrespective of correctness and the optional, self-paced online setting may have reduced effort or altered behaviors. The first revision prompt signaled that something was wrong, which may not reflect authentic code review scenarios. Short, simple methods may not predict performance on larger, more complex programs. The study cannot disambiguate lack of knowledge from lack of effort or attention.

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Effectiveness of Augmented and Virtual Reality-Based Interventions in Improving Knowledge, Attitudes, Empathy and Stigma Regarding People with Mental Illnesses-A Scoping Review

T. J.l., X. H., et al.

Education

How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

A. Gilson, C. W. Safranek, et al.

Medicine and Health

Social Networks Play a Complex Role in HIV Prevention Knowledge, Attitudes, Practices, and the Uptake of PrEP Through Transgender Women Communities Centered Around Three "Casas Trans" in Lima, Peru: A Qualitative Study

T. Temelkovska, K. Moriarty, et al.

Business

Exploring the green edge: the role of market orientation and knowledge management in achieving competitive advantage through creativity

Z. Zhang

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny