logo
ResearchBunny Logo
A RCT for assessment of active human-centred learning finds teacher-centric non-human teaching of evolution optimal

Education

A RCT for assessment of active human-centred learning finds teacher-centric non-human teaching of evolution optimal

L. Buchan, M. Hejmadi, et al.

This groundbreaking research by Loredana Buchan, Momna Hejmadi, Liam Abrahams, and Laurence D. Hurst challenges the belief that student-centered learning is always superior. A recent RCT with 2657 primary school students revealed that a teacher-centric approach can be more effective. Discover the implications of this study on educational pedagogy.

00:00
00:00
~3 min • Beginner • English
Introduction
The study interrogates the prevailing assumption that active, human-centred learning is superior to teacher-centred approaches for conceptual change, specifically in primary school evolution education. Prior work has emphasized active learning benefits largely in university settings and via parallel A vs. not-A comparisons, rarely considering sequences of lessons or interaction effects between pedagogical components. The authors argue that evolution’s recent inclusion in the UK primary curriculum creates an opportunity for large-scale, in situ, sequential testing with non-specialist teachers who are receptive to training and resources. Research questions: (a) Can 10–11-year-old students effectively learn evolution with effect sizes above implementation thresholds? (b) Which teaching modes are most effective? (c) Do teaching modes interact across serial lessons? The study leverages a genetics-first sequence and a 2×2 factorial design to test interactions between Lesson 2 (student-centred vs teacher-centred) and Lesson 4 (human-centred vs non-human). The purpose is to produce robust, replicated evidence in realistic classrooms, accounting for teacher confidence and other covariates.
Literature Review
The paper reviews evidence and assumptions that active, student-centred and human-relevant (human/primate) materials drive engagement and learning in evolution education. Although some studies suggest benefits of active learning for undergraduates, there is limited evidence for primary-aged pupils in authentic classroom contexts. Existing research often isolates single activities and uses parallel comparisons, ignoring potential interaction effects across lesson sequences. There is also a recognized scarcity of quantitative assessment tools for evolution understanding in primary students, with most research focusing on secondary levels. Prior work supports a genetics-first sequence for improving understanding of evolution and genetics, and various age-appropriate activities have been proposed (e.g., natural selection via peppered moths, homology via pentadactyl limbs). However, gaps remain regarding serial lesson effectiveness, the role of human-centred contexts, and generalizability to primary classrooms taught by non-specialists.
Methodology
Design: Large-scale in situ randomized controlled trial with replication using a 2×2 factorial design of four Schemes of Work (SoW), each comprising four lessons taught in the same order under a genetics-first framework. Lessons 1 (variation/inheritance) and 3 (geological deep time) were constant. Lesson 2 (natural selection) varied by learning mode: (a) student-centred hands-on ‘moth hunting’ vs (b) teacher-centred PowerPoint with scaffolded writing. Lesson 4 (homology/common ancestry) varied by context: (a) human-centred pentadactyl limb vs (b) non-human trilobites. Thus four SoW: 1) student-centred moths + trilobites; 2) student-centred moths + pentadactyl limb; 3) teacher-centred moths + trilobites; 4) teacher-centred moths + pentadactyl limb. Participants and setting: UK primary and middle schools in Southwest England. Tranche 1: N=1152 students, 17 schools, 40 classes; Tranche 2 (replicate): N=1505 students, 28 schools, 56 classes. Students were ~10–11 years old (Key Stage 2). Non-specialist classroom teachers delivered lessons after standardized training. Recruitment achieved ~10% uptake and ~90% completion. Allocation: One SoW assigned per class by the principal researcher, not fully randomized but done blind to covariates to balance school type and sample size across SoW. Schools participating in both tranches received the ‘opposite’ SoW in tranche 2. Teaching resources and training: Fully differentiated, low-cost materials (<£10 per class; ~£0.10 per student) supplied; standardized 1-hour teacher training covered lesson delivery, classroom management, common misconceptions, and resource use. All SoW used identical starters and plenaries to embed varied main activities within a common conceptual framework. Assessment instrument: A 15-item multiple-choice test adapted from AAAS items across five domains (homology/common ancestry, natural selection, variation, fossils/geological time, extinction), optimized for primary readability and delivered via teacher read-aloud; students recorded answers on response grids. Testing at three timepoints: pre, post (~1 week after teaching), and retention (3–6 months later) using the same instrument. Collected student gender, date of birth, and teacher-rated science ability (high/middle/low). Teacher questionnaires assessed acceptance/understanding of evolution, experience, training, and perceived confidence change. School-level data derived from Ofsted reports and indices of deprivation. Statistical analysis: Reliability and validity checks for the instrument (e.g., Cronbach’s alpha, discrimination indices, fatigue). Primary analyses used Wilcoxon signed-rank/rank-sum tests; effect sizes via Cliff’s d. To control for ceiling effects and pre-score differences, change in scores modeled using LOESS regressions; residuals used for subsequent comparisons. Heterogeneity among SoW tested by Kruskal–Wallis with Dunn post-hoc and Bonferroni adjustment. Interaction effects (IE) for 2×2 design estimated per Sevdalis & Jacklin using mean LOESS residuals; significance via 10,000 randomization simulations comparing true vs simulated absolute interaction effect sums and per-SoW IEs. Multivariate models included gender, age, teacher-rated ability, SoW, and (where relevant) teacher confidence increases. Replicability evaluated across tranches via combined P-values, replication of significance, and correlations of effect sizes. Ethics and reporting: Departmental ethical approval (University of Bath; EIRA). CONSORT-aligned reporting. No blinding was feasible. No harms reported. Data and R code available: https://github.com/edmllb/GEVO2teach.
Key Findings
- Instrument reliability/validity: The assessment tool was accessible, reliable, and discriminatory across ability levels, with no significant fatigue effects. - Overall effectiveness: Significant gains from pre to post across all students. Tranche 1: mean increase 2.44/15 marks (16.27%), P<2.2×10^-16, Cliff’s d=0.55 (95% CI 0.52–0.59). Tranche 2: mean increase 2.37 marks (15.80%), P<2.2×10^-16, d=0.49 (0.45–0.52). Paired analyses mirrored these results (Tranche 1: +2.42 marks, d=0.55; Tranche 2: +2.32 marks, d=0.48). Gains exceeded the 0.40 implementation threshold; retention testing showed partial long-term retention with some waning. - Each SoW effective: All four SoW improved understanding significantly (all P<2.2×10^-16). Effect sizes (Cliff’s d) by SoW: Tranche 1—SoW1 0.44 (moderate), SoW2 0.55 (large), SoW3 0.66 (large), SoW4 0.52 (large); Tranche 2—SoW1 0.38 (moderate), SoW2 0.66 (large), SoW3 0.51 (large), SoW4 0.42 (moderate). - Mode-specific contrasts weak/inconsistent: Lesson 2 student-centred vs teacher-centred yielded small, inconsistent effects (Tranche 1 favored teacher-centred; Tranche 2 negligible). Lesson 4 human-centred vs non-human showed negligible effects (Tranche 2 P=0.04 uncorrected; both tranches negligible effect sizes). - Heterogeneity among SoW: Significant heterogeneity replicated (Tranche 1: χ²=37.53, P=3.56×10^-8; Tranche 2: χ²=40.91, P=6.84×10^-9). Observed ranks—Tranche 1: 1<4<2<3; Tranche 2: 1<4<3<2. Contrary to predictions, SoW3 (teacher-centred moths + trilobites) was most or second-most effective; SoW1 (student-centred + non-human) was consistently least effective. - Robust interaction effects: Absolute interaction effect sums (AIS) exceeded all 10,000 simulants in both tranches (empirical one-tailed P≈1.0×10^-4). Per-SoW true IEs significantly different from simulated distributions, with positive IEs for SoW2 and SoW3 and negative IEs for SoW1 and SoW4 (replicated across tranches). This indicates meaningful, replicable interactions between Lesson 2 mode and Lesson 4 context. - Covariates: Student science ability (teacher-rated) positively predicted gains; age had no effect; gender effects were weak and non-replicable. At teacher level, increased teacher confidence to teach evolution was the only replicable predictor of student improvement; other teacher characteristics showed poor replicability. School-level predictors were not robustly significant, though effect-size patterns replicated across tranches. - Adjusting for teacher confidence bias: In tranche 2, where perceived confidence increases differed by SoW, multivariate models controlling for teacher confidence yielded SoW rank order 3 > 2 > 1 > 4, reinforcing SoW3’s superiority. - Engagement and acceptability: Teacher feedback endorsed all activities as engaging and practical; students were fascinated by trilobites and fossils. All resources were low-cost and easily implemented.
Discussion
The study demonstrates that primary pupils (9–11 years) can attain substantial conceptual gains in evolution when taught via simple, standardized, low-cost lessons delivered by their regular teachers. Crucially, analyzing lessons in sequence revealed strong, replicable interaction effects between pedagogical components, challenging conclusions drawn from parallel one-off activity comparisons. Contrary to prevailing educational discourse, the most effective overall scheme in replication was teacher-centred natural selection paired with a non-human-focused homology activity (SoW3). Direct, isolated comparisons of ‘active’ vs ‘teacher-centred’ or ‘human-centred’ vs ‘non-human’ activities were weak or inconsistent, yet certain combinations performed synergistically. These findings imply that engagement and conceptual change can be achieved through various routes, not exclusively via student-centred or human-relevant activities, and that some lesson pairings may function as primers for subsequent conceptual consolidation. The results argue for evaluating teaching as sequences, not isolated events, and for reconsidering dichotomous active/passive or human/non-human prescriptions in policy. Student ability consistently predicted gains, and teacher confidence emerged as a repeatable lever, underscoring the importance of teacher training. Despite limited replicability of many teacher- and school-level predictors, the robust SoW-level effects and interaction terms provide actionable guidance for curriculum design and evaluation frameworks.
Conclusion
This large-scale, replicated, in situ RCT provides robust evidence that: (1) primary students can learn evolution concepts with substantial effect sizes; (2) all four low-cost schemes are effective, with SoW2 and especially SoW3 performing best; (3) serial interaction effects between lesson components are strong and replicable, meaning that parallel, single-activity tests can be misleading; and (4) boosting teacher confidence is a practical, repeatable pathway to improved student outcomes. The work contributes a validated primary-level assessment tool, openly available resources, and a demonstration of CONSORT-informed educational RCTs. Future research should investigate mechanisms behind interaction effects, extend testing to diverse populations and contexts, compare additional lesson sequences and topics, examine long-term retention more deeply, and integrate concept-based curricular approaches that leverage sequencing and priming across lessons.
Limitations
- Generalizability: Schools primarily in Southwest England with predominantly White British/European students; results may not extend to other ethnic, cultural, or highly religious contexts. - Recruitment and allocation: Volunteer schools may be more motivated; SoW allocation was balanced and blind to covariates but not fully randomized; potential residual selection biases. - Temporal/sequencing constraints: Unlike drug trials, simultaneous delivery across all settings wasn’t feasible; temporal/ordering effects cannot be fully disentangled beyond the fixed sequence design. - Blinding/adherence: No blinding of teachers/students; small levels of non-compliance were noted but appeared not to differ by SoW; adherence not independently verified beyond feedback. - Assessment design: Same test used pre/post/retention may introduce testing effects, although mitigations were implemented; potential negative suggestion effects cannot be excluded; retention sample sizes smaller than pre/post. - Measurement of ability: Teacher-rated ability may include bias, though meta-analyses support moderate-to-strong correspondence with actual performance. - Teacher/class-level analyses: Smaller Ns at the teacher level limited power; many teacher- and school-level predictors showed poor replicability. - Cost/training dependency: Positive outcomes may partly depend on provided materials and standardized training, which may vary in real-world rollouts.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny