logo
ResearchBunny Logo
Why Meta-Analyses of Growth Mindset and Other Interventions Should Follow Best Practices for Examining Heterogeneity: Commentary on Macnamara and Burgoyne (2023) and Burnette et al. (2023)

Psychology

Why Meta-Analyses of Growth Mindset and Other Interventions Should Follow Best Practices for Examining Heterogeneity: Commentary on Macnamara and Burgoyne (2023) and Burnette et al. (2023)

E. Tipton, C. Bryan, et al.

Traditional yes-or-no meta-analyses can obscure where interventions truly work. Comparing two recent meta-analyses of growth-mindset interventions, this article shows that modern, heterogeneity-attuned, multi-level methods reveal meaningful effects in focal (at-risk) groups—contrasting with conclusions from an aggregation-focused approach. This research was conducted by Elizabeth Tipton, Christopher Bryan, Jared Murray, Mark McDaniel, Barbara Schneider, and David S. Yeager.

00:00
00:00
~3 min • Beginner • English
Introduction
This commentary addresses how meta-analyses in psychology should prioritize examining heterogeneity of effects rather than focusing solely on an overall average effect. Using two recent meta-analyses of growth mindset interventions as a case study—Macnamara and Burgoyne (traditional approach) and Burnette et al. (heterogeneity-attuned approach)—the authors ask why the two reached different conclusions and what best practices should guide research synthesis. The purpose is to demonstrate that modern, multilevel, moderator-focused methods better capture when and for whom interventions work, aligning with theory and helping avoid dichotomous, all-or-nothing conclusions about intervention efficacy. The article emphasizes the importance of modeling effect variation, testing theory-driven moderators simultaneously, and adjusting appropriately for biases to advance theory and practice in heterogeneous social science literatures.
Literature Review
The paper situates its argument within a body of methodological and substantive literature showing substantial heterogeneity in psychological meta-analyses (e.g., τ≈0.35), where effects often vary widely across contexts, populations, and outcomes. It reviews prior calls for heterogeneity-attuned approaches (e.g., Gelman, McShane, Tipton) and reporting standards (PRISMA; CONSORT-SPI), and documents that many published meta-analyses underreport or underanalyze heterogeneity and moderators. Substantively, it reviews the growth mindset literature, including large-scale, pre-registered RCTs (e.g., the U.S. National Study of Learning Mindsets and independent verification by MDRC) showing meaningful effects for lower-achieving or at-risk students, with context-dependent variability (e.g., school mindset-supportive culture). It also reviews recent debates (e.g., nudge meta-analyses) where focusing on average effects obscured wide effect distributions, underscoring the need to interpret heterogeneous literatures through moderator analysis and prediction intervals rather than single summary effects.
Methodology
The article is a commentary that contrasts methodological choices between two meta-analyses and reports an exploratory re-analysis. Key best-practice recommendations include: (1) framing research questions around effect distributions and theory-driven heterogeneity; (2) including all relevant within-study effect sizes and modeling dependence via multilevel models; (3) testing multiple, theory-driven moderators simultaneously through meta-regression while adjusting for confounders (e.g., study quality); and (4) using appropriate methods to assess and adjust for publication bias (e.g., selection models) rather than traditional funnel plot asymmetry in heterogeneous settings. Exploratory re-analysis: The authors re-analyzed Macnamara and Burgoyne’s dataset using the multilevel meta-analytic framework employed by Burnette et al. They used a correlated, hierarchical effects meta-analysis including all available within-study effect sizes and applied robust variance estimation to guard against misspecification. Software and specifications: R packages metafor (for multilevel models) and clubSandwich (for robust standard errors). They modeled within-study (ω²/σ²) and between-study (τ²) variance components, estimated overall effects and prediction intervals, and conducted meta-regressions including moderators such as student risk status, while adjusting for study-quality measures and potential bias. Where Macnamara and Burgoyne had averaged or excluded multiple effect sizes, the re-analysis included all effect sizes (totaling 122), supplementing with subgroup information from Burnette et al. where necessary and documenting all decisions (osf.io/mr3yx). They also highlight publication bias assessment via selection models (Vevea–Hedges) as used by Burnette et al., contrasting with traditional Egger/Trim-and-Fill approaches that can be misleading in heterogeneous literatures.
Key Findings
- Traditional versus heterogeneity-attuned approaches: Macnamara and Burgoyne reported small average effects (e.g., ~0.05 SD) and emphasized potential bias, relying on one effect per study or averaged effects and one-at-a-time moderator tests. Burnette et al. modeled within- and between-study variation, prominently reported prediction intervals, tested moderators simultaneously, and found theoretically consistent effects for focal (at-risk) groups. - Re-analysis results applying Burnette’s methods to Macnamara and Burgoyne’s data (including all 122 effect sizes): - Overall mean effect: 0.09 SD, p < .001. - Estimated SD of true effects: √(τ² + ω²) = 0.16 (vs. 0.07 under the traditional approach), yielding a 95% prediction interval of −0.22 to 0.40, aligning with Burnette et al.’s interval (−0.08 to 0.35 SD). - At-risk (focal) groups: mean effect 0.15 SD, p < .001. - Moderation by risk: B = −0.08, p < .05 (effects larger for at-risk students), consistent with theory and prior large RCTs. - Burnette et al. reported overall mean 0.09 SD and 0.16 SD for targeted (at-risk) students, with a 95% prediction interval −0.08 to 0.35 SD. - Prediction intervals from Macnamara and Burgoyne’s own random-effects models (not emphasized): e.g., overall effect 0.05 ± 1.96·√0.005 ≈ (−0.09, 0.19), indicating meaningful heterogeneity even under their aggregation approach. - Illustrative study examples show how averaging across theoretically heterogeneous subgroups masks effects (e.g., NSLM designed to detect effects in low-achievers only; Peru trial with 10 subgroup effects: 0.23–0.35 SD in high-poverty schools vs. ~0 in low-poverty/high-achieving schools). - Study quality and confounding: Using dichotomized, ad hoc quality criteria and one-at-a-time subgroup analyses can be underpowered and sensitive to arbitrary cut-points. Correcting coding errors and modeling quality continuously yields significant effects even among higher-quality studies (e.g., overall 0.05 SD, p = .004; at-risk 0.12 SD, p = .01). - Publication bias: Traditional methods (Egger, Trim-and-Fill, PET-PEESE) can falsely indicate bias in heterogeneous literatures; selection models (Vevea–Hedges) are preferable and did not indicate publication bias in Burnette et al.
Discussion
The findings demonstrate that focusing solely on the average effect obscures substantial and theoretically meaningful heterogeneity in growth mindset intervention outcomes. By modeling all within-study effects, quantifying both within- and between-study variance, and testing moderators simultaneously, the heterogeneity-attuned approach reveals that growth mindset interventions produce meaningful benefits for theoretically targeted groups (e.g., at-risk or lower-achieving students) and in supportive contexts. This directly addresses the research question of how effects vary across populations, procedures, and contexts, and aligns with prior theory and large-scale pre-registered trials. The commentary shows that differences in conclusions across the two meta-analyses stem primarily from methodological choices: traditional software constraints that force single-effect summaries and one-at-a-time moderators can underestimate heterogeneity and mask or reverse moderation patterns, leading to misleading ‘yes-or-no’ judgments. In contrast, modern multilevel meta-regression, robust variance estimation, validated study-quality frameworks, and selection-model approaches to publication bias enable a principled interpretation of heterogeneous literatures, advancing mechanism-focused theory and practical guidance for when and for whom interventions work.
Conclusion
Growth mindset interventions can yield meaningful, scalable benefits for targeted students, especially those who are academically at risk, though effects vary by context and implementation. More broadly, meta-analyses in psychology should embrace heterogeneity-focused best practices: formulate questions about effect distributions; include all relevant within-study effect sizes; use multilevel models and robust variance estimation; plan and test theory-driven moderators simultaneously while adjusting for confounders; and assess publication bias with selection models. Adopting these practices will produce more accurate, theory-informative syntheses and help avoid boom-or-bust narratives that undermine scientific progress. Future research should extend moderator measurement, pre-register meta-analytic plans (especially moderators), improve standardization of study-quality metrics, and further develop robust methods for bias adjustment in highly heterogeneous literatures.
Limitations
The re-analysis is exploratory and intended to illustrate methodological implications rather than provide definitive parameter estimates; results depend on available coding and supplemental subgroup information (some taken from Burnette et al.) and specific modeling choices documented at osf.io/mr3yx. The commentary notes that study-quality measures used in prior meta-analyses vary in standardization; some criteria may be anachronistic or ad hoc, complicating interpretation. Traditional publication-bias tests are problematic in heterogeneous literatures, and while selection models are preferable, bias detection remains an evolving methodological area. As with all moderator analyses, potential measurement error, correlated moderators, and limited power for small moderator effects can constrain inference.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny