logo
ResearchBunny Logo
Growth in Knowledge of Programming Patterns: A Comparison Study of CS1 vs. CS2 Students

Computer Science

Growth in Knowledge of Programming Patterns: A Comparison Study of CS1 vs. CS2 Students

S. Nurollahian, N. Brown, et al.

This fascinating study by Sara Nurollahian, Noelle Brown, Anna N. Rafferty, and Eliane Wiese delves into how students' understanding of code structures evolves from introductory to intermediate programming courses, revealing surprising gaps that necessitate targeted instructional strategies.... show more
Introduction

The study investigates how novice programmers’ knowledge of code structure evolves from CS1 to CS2 and where students struggle. The context is that well-structured, idiomatic code supports transparency, readability, and maintainability, yet explicit teaching and scalable assessment of code structure are challenging and often deprioritized in CS1. The authors focus on two common, language-independent structures frequently violated by students: S1 (returning boolean expressions directly instead of via conditionals returning literals) and S2 (factoring unique vs. repeated code within if/else). They assess multiple dimensions of knowledge with tasks targeting identification of expert patterns, readability judgments, comprehension, writing, and editing. Research questions: RQ1: For each task, how does CS2 knowledge differ from CS1 for S1 and S2, and in which areas do students struggle? RQ2: How well do non-writing measures of code structure knowledge predict students’ code-writing performance?

Literature Review

Teaching and assessing code structure (quality, patterns vs. anti-patterns) is difficult due to time constraints and the scalability of assessment; automated tools are not always aligned with pedagogical needs and often emphasize functionality over structure. Prior work shows students frequently violate S1 and S2. De Ruvo et al. found about half of CS1 submissions violated S1 and fewer violated S2, hypothesizing S1 reflects knowledge gaps (e.g., what can be returned) and S2 reflects reluctance or lack of confidence to factor repetition. Whalley et al., using the SOLO taxonomy, observed widespread redundancy (S2) as a direct translation of specs, indicating lower-level understanding. Keuning et al. reported S1 and S2 among the most frequent violations, with less than 20% fixed over time, suggesting lack of awareness or motivation. Other work suggests disagreement with expert readability judgments may drive anti-pattern use; Wiese et al. found many intermediate students returned boolean literals (S1) and often rated anti-patterns as more readable, with readability judgments predicting writing.

Comparative studies across CS1 and CS2 are sparse. Breuker et al. found similar S2 violation rates in CS1 and CS2 projects. Wicklund and Östlund observed an increase in various quality issues over time, potentially due to increasing assignment complexity and opportunities for errors. The present study controls for assignment variation by using a common survey and examining multiple tasks per structure.

Methodology

Design: A cross-sectional, online survey compared CS1 and CS2 students’ knowledge of two code structures: S1 (returning boolean expressions directly) and S2 (unique vs. repeated code within if/else). Knowledge was assessed via five tasks: identification of expert pattern, judgment of readability, code comprehension, code writing, and code editing. Language: Java.

Participants: 354 undergraduates from the University of Utah: 149 CS1 (two courses: an accelerated OOP course for students with prior exposure and an OOP course following CS0) and 205 CS2 (Data Structures and Algorithms). All were 18+ and consented; participants skipped no more than one question in writing, editing, and identification/readability. IRB approval: IRB_00162580.

Procedure and timing: Courses lasted 16 weeks. The survey ran for CS2 in weeks 12–13 and CS1 in weeks 13–14. The self-paced survey took ~1 hour and was open for two weeks with extra credit for completion. To reduce order effects, students were randomly assigned to one of two task orders: (1) writing → editing → readability/identification → comprehension; or (2) writing → readability/identification → comprehension → editing. Question order within each task was independently randomized (forward or reverse).

Tasks and materials:

  • Identification of expert pattern and readability judgment: For each topic, students saw 3–4 functionally equivalent code blocks; they selected which had expert style (identification) and which was most readable to them (readability). One or two blocks were expert patterns; others were common anti-patterns. For S2, one CS1 class (35/149) received alternative questions due to lack of Map familiarity.
  • Code comprehension: Multiple-choice tracing for both the expert pattern and a common anti-pattern per topic; students predicted outputs (true/false/error). Example S1 pattern comprehension involved directly returning a conjunction of three boolean expressions.
  • Code writing: Given a method signature and behavioral specification, students implemented the method body, e.g., S1 required returning true when a string meets three conditions (starts with "A", length ≥ 6, ends with "b"), else false.
  • Code editing: Given functional anti-pattern code, students were asked to refactor for style without changing functionality. Prompt asked whether the code already had the best style; those indicating it could be improved proceeded to edit.

Instrument redesign: The S1 items combined returning boolean expressions with operators and with method calls to test generalization to more complex expressions. S2 identification/readability and comprehension items were revised to mirror authentic violations observed by an instructor.

Data processing and coding: Writing and editing submissions were compiled and tested via automated tests; limited, pre-registered manual fixes for trivial compile errors (e.g., missing semicolons) were allowed. Analyses focused on submissions that compiled (with allowed fixes) and passed all tests; results were consistent across compiled vs. fully functional, and the paper reports the fully functional subset. Qualitative coding used pre-registered, fine-grained categories; expert vs. anti-pattern codes were derived from these. For S1, code was labeled expert if it directly returned at least one expression; double-coding achieved ≥90% agreement on new responses for S1 and was applied to all S2.

Analyses: Group comparisons (CS1 vs. CS2) via chi-square tests; logistic regressions examined relationships between writing structure and other measures (editing success, readability judgments, identification, comprehension). An exploratory analysis regressed writing on the summed non-writing scores to address collinearity.

Key Findings

Overall RQ1 (CS1 vs. CS2): CS2 students outperformed CS1 in identification of expert patterns, readability judgments, and editing for both S1 and S2. However, CS2 advantages were topic- and task-dependent for comprehension and writing: comprehension was better for CS2 only for S2; writing used expert structure more often for CS2 only for S1. Performance was far from ceiling on most tasks (except S1 pattern comprehension).

Identification and readability:

  • S1 (Returning booleans): Identification expert: CS1 54% (80/149) vs. CS2 70% (143/205), χ²=9.55, p=0.001. Readability agreement with experts: CS1 29% (44/149) vs. CS2 44% (90/205), χ²=7.57, p=0.005. Many CS2 students still preferred anti-patterns (e.g., single if returning literals or nested ifs).
  • S2 (Unique vs. repeated): Identification expert: CS1 47% (70/149) vs. CS2 62% (127/205), χ²=7.83, p=0.005. Readability agreement: CS1 36% (54/149) vs. CS2 57% (116/205), χ²=14.30, p<0.001. Even among CS2, 38% chose repeated-code variants as expert code. Students more often identified expert code than rated it most readable (S1: χ²(1)=44.750, p<0.0001; S2: χ²(1)=4.124, p=0.045). Identification and readability were correlated (S1: β=1.495, z=5.697, p<0.001; S2: β=1.699, z=7.069, p<0.001).

Comprehension:

  • S1: High pattern comprehension for both groups (CS1 83% (124/149) vs. CS2 87% (178/205), χ²=0.89, p=0.34). Low anti-pattern comprehension (<60%; CS1 56% (83/149) vs. CS2 58% (119/205), χ²=0.19, p=0.660), potentially due to nested ifs and modulus use without functional naming cues. Readability judgments did not predict comprehension (β=0.206, z=0.660, p=0.509).
  • S2: CS2 outperformed CS1 on both pattern (CS1 59% (88/149) vs. CS2 71% (147/205), χ²=6.18, p=0.012) and anti-pattern (CS1 64% (96/149) vs. CS2 78% (161/205), χ²=8.63, p=0.003) comprehension. Readability judgments did not reliably predict comprehension (β=0.299, z=1.295, p=0.195).

Writing:

  • S1: Fully functional solutions: CS1 44% (66/149) vs. CS2 65% (133/205), χ²=14.85, p<0.001. Among fully functional, expert pattern usage: CS1 11% (7/66) vs. CS2 24% (32/133), χ²=5.06, p=0.024. 93% correctly conjoined conditions, but 72% used a single if returning literals; 9% used sequential/nested ifs returning literals.
  • S2: Fully functional solutions were lower overall: CS1 17% (25/149) vs. CS2 41% (84/205), χ²=23.71, p<0.001. Among fully functional, expert pattern usage was similar: CS1 68% (17/25) vs. CS2 70% (59/84), χ²=0.04, p=0.830. Many failures were due to difficulties computing transformed string lengths.

Editing:

  • Editing success was low overall. S1 editing performance was similar to S1 writing; many students removed nested ifs but still returned literals (among CS2 functional edits, 85% used a single if with conjoined conditions returning a literal). CS2 students were more likely than CS1 to edit to the expert pattern for S1.
  • S2 editing was particularly challenging: only 2% of CS2 fully removed all repeated code/logic; with a relaxed criterion (no identical lines across branches), 10% of CS2 succeeded vs. 2% of CS1 (χ²(1)=8.230, p=0.004). Over 40% of CS2 indicated the given S2 code already had the best style.

Predictors of writing (RQ2):

  • Editing success predicted writing structure for both S1 and S2.
  • S1: Student level (B=0.905, z=2.178, p=0.029) and readability judgment (β=1.239, z=3.562, p<0.001) also predicted writing. Summed non-writing scores predicted writing (β=0.590, z=3.146, p=0.002).
  • S2: Comprehension of the expert pattern predicted writing (β=0.801, z=2.170, p=0.030), in addition to relaxed editing success. Summed non-writing scores showed a non-significant trend (β≈0.236, z=1.885, p=0.059).
Discussion

Findings address RQ1 by showing that while CS2 students generally make gains in recognizing expert patterns, agreeing with experts on readability, and refactoring for style, improvements in comprehension and writing are structure-dependent: S1 shows better writing with expert structure by CS2, whereas S2 shows better comprehension. Performance far from ceiling (except S1 pattern comprehension) underscores the need for targeted instruction. Differences across tasks indicate partially distinct underlying skills; students can identify expert code more readily than consider it most readable, and readability judgments do not reliably predict comprehension.

For RQ2, editing success consistently predicts expert-pattern writing, linking refactoring ability to writing structure. For S1, alignment with expert readability also predicts writing, consistent with the hypothesis that disagreement with expert readability contributes to anti-pattern writing in S1. For S2, comprehension predicts writing, aligning with the hypothesis that structural violations are tied to sense-making and reasoning about code reorganization. Editing was not uniformly easier than writing, suggesting that lack of motivation is not the sole driver of violations; task demands (e.g., scope changes and logic reorganization in S2) matter.

Educational significance: A one-size-fits-all approach to teaching code structure is unlikely to be effective. S1 may benefit from explicit instruction on returning boolean expressions and exercises that build comfort with reading expert patterns; S2 may require activities that develop planning, abstraction, and refactoring skills (e.g., identifying and factoring duplication, adjusting scope). The persistent gap between identification and perceived readability suggests the value of bridging exercises that connect expert rationale to student intuitions about readability.

Conclusion

The study provides a multi-faceted comparison of CS1 vs. CS2 knowledge of two common code structures (S1 returning boolean expressions; S2 factoring unique vs. repeated code in conditionals). CS2 students outperformed CS1 in identifying expert patterns, agreeing on readability, and editing for both structures. However, only S1 showed improved expert-pattern writing, and only S2 showed improved comprehension. Except for S1 pattern comprehension, performance remained well below ceiling, indicating substantial room for growth.

Implications include the need for targeted, structure-specific supports: explicit instruction and readability-alignment tasks for S1, and planning, tracing, and refactoring activities for S2. Editing success predicts writing quality across both structures, highlighting the value of integrating code review and refactoring into early curricula. Future work should evaluate interventions tailored to each structure, expand to additional patterns and languages, improve instrument validation and authenticity, and examine longitudinal learning trajectories and the roles of motivation and task complexity.

Limitations

Incentivizing participation with extra credit not contingent on correctness may have reduced some students’ effort, which cannot be distinguished from lack of knowledge. Prior knowledge was not collected. Using a survey allowed controlled comparisons across tasks and topics but may introduce differential cognitive load for CS1 vs. CS2 and reduce authenticity relative to homework. Although students were provided an online IDE link, the survey environment itself did not enable compiling/running code, which may have deterred edits. The instrument was not externally validated (though related versions have been used), and expert patterns were agreed upon by instructors, which may not align perfectly with professional developer preferences. One CS1 class received alternate S2 items due to Map familiarity, though results were robust to excluding these students.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny