
Education
A meta-analysis of the effects of design thinking on student learning
Q. Yu, K. Yu, et al.
This meta-analysis reveals exciting insights into the profound impact of design thinking on student learning, demonstrating significant positive effects across various contexts and conditions. Conducted by Qing Yu, Kun Yu, and Rongri Lin from Fudan University, this research uncovers the nuances that enhance learning through innovative teaching methodologies.
~3 min • Beginner • English
Introduction
The study investigates whether and to what extent design thinking (DT) improves student learning, addressing inconsistent findings in prior empirical research. DT, introduced by Rowe (1987) and applied in education since 2005, is described as an analytic and creative process enabling experimentation, prototyping, feedback, and redesign. It is widely used across educational stages and contexts, emphasizes learner-centeredness, and aims to foster competencies such as problem-solving, creativity, collaboration, empathy, and metacognition. Despite its promise, prior studies report positive, null, and even negative effects on learning, and educators face challenges implementing DT due to its complexity and open-ended nature. This meta-analysis aims to provide quantitative, comprehensive evidence on DT’s effects on student learning and to identify conditions (moderators) under which DT is most effective. Research questions: RQ1 describes characteristics of empirical DT studies (e.g., design, class size, grade level, duration, subject, team size, DT model, region/countries); RQ2 estimates the overall effect of DT on student learning; RQ3 examines moderator effects (learning outcome, class size, grade level, duration, subject, team size, DT model, region).
Literature Review
Conceptual framework: DT is commonly defined (Razzouk & Shute, 2012) as an analytic and creative process involving experimentation, prototyping, feedback, and redesign. In education, DT functions as a student-centered instructional method that promotes deep learning, engagement, and improved performance by addressing real-world problems. DT models: Multiple models exist with varying stages and complexity, including Simon’s analysis–synthesis–evaluation; Stanford’s EDIPT (empathize, define, ideate, prototype, test); IDEO’s discovery–interpretation–ideation–experimentation–evolution; extensions for K–12; Brown’s inspiration–ideation–implementation; the Design Council’s Double Diamond (discover–define–develop–implement). Model choice should align with learner needs and instructional goals, and different models may yield different outcomes. Prior findings and gaps: Empirical results on DT’s effectiveness are mixed (significant positive, non-significant, and negative findings). Key gaps include limited guidance for classroom implementation (e.g., optimal class size, team size, duration, model), limited systematic assessment of DT’s effectiveness, and the absence of prior meta-analyses in education. This study addresses these gaps via meta-analysis and moderator examination.
Methodology
Design: Meta-analysis conducted following Field & Gillett (2010). Databases and search: Web of Science (Core Collection), Scopus, and Google Scholar; search terms included “Design Thinking” AND (“Learning Performance” OR “Learning Outcomes” OR “Academic Achievement” OR “Academic Performance”); time frame January 2005–June 2023. Study flow: 1204 records identified; 1059 after removing duplicates; 296 screened-in on title/abstract; 84 full texts assessed; 25 peer-reviewed English studies included for quantitative synthesis (42 effect sizes). Inclusion criteria: (1) reports the relationship between DT and student learning performance; (2) empirical (experimental, quasi-experimental, correlational); (3) participants received DT instructional intervention; (4) provides data for effect size calculation (e.g., N, mean, SD, t, p); (5) peer-reviewed English publication. Quality and bias control: Multiple databases used to reduce search bias; clear inclusion criteria to reduce selection bias; methodological quality assessed using Downs & Black (27-item checklist); all selected studies rated high quality (scores 18–21). Moderators coded: Background moderators—learning outcome (academic achievement, self-efficacy, learning motivation, problem-solving ability, creative thinking, learning engagement); treatment duration (<1 month, 1–3 months, >3 months); class size (1–30, 31–50, 51–100, >100; analyzed as ≤30, 31–50, ≥51); grade level (kindergarten, primary, junior high, high school, university); subject (STEM, No-STEM, multidiscipline); region (Asia, America, Australia, Europe, Africa). Method moderators—DT model (3IE; UOPIPT; EDIPT; EDEIPT; OSIP; PAS; 2UPPI; CTC; LAUNCH); team size (1–4, 5–7, ≥8; analyzed as 1–4 and 5–7). Data analysis: Comprehensive Meta-Analysis (CMA) 3.0 used. Pearson’s r selected as effect size; Fisher’s Z-transformation with sample-size weighting used to compute pooled r and 95% CIs. Heterogeneity tested via Q and I2; random-effects model applied due to substantial heterogeneity. Publication bias assessed using funnel plot, classic fail-safe N, and trim-and-fill. Sensitivity assessed via one-study removal. Descriptive characteristics (for RQ1): Publication years 2015–2023 (1 in 2015; 1 in 2017; 3 in 2020; 8 in 2021; 6 in 2022; 6 in 2023). Study designs: 2 correlational and 23 experimental (pre-, quasi-, or true-experiments). Grade level: kindergarten (N=1), primary (N=3), junior high (N=2), high school (N=9), university (N=10). Class size: 0–30 (N=9), 31–50 (N=10), ≥51 (N=6). Duration: 0–1 month (N=8), 1–3 months (N=7), ≥3 months (N=10). Subject: STEM (N=16), No-STEM (N=6), multidiscipline (N=3). DT model: EDIPT (N=14), and one each for 3IE, UOPIPT, LAUNCH, OSIP, PAS, 2UPPI (noted as PPI2U in text), EDEIPT, CTC; Unknown (N=3). Team size (when reported): 1–4 (N=7), 5–7 (N=6). Region: Asia (N=21), America (N=1), Australia (N=1), Europe (N=1), Africa (N=1). Countries: China (N=12), Thailand (N=2), Australia (N=1), Austria (N=1), Philippines (N=2), Saudi Arabia (N=2), Nigeria (N=1), America (N=1), Indonesia (N=1), Jordan (N=1), Turkey (N=1).
Key Findings
Publication bias: Funnel plot suggested some asymmetry; classic fail-safe N Nfs=9179 far exceeded tolerance (5*K+10=220, K=42); trim-and-fill imputed 5 studies on the right, indicating no material publication bias. Heterogeneity and sensitivity: Heterogeneity was high (Q=554.908, p<0.001; I2=92.611%), warranting random-effects modeling. One-study-removed analyses yielded pooled r from 0.418 to 0.467, indicating robustness. Overall effect (RQ2): DT had an upper-medium positive effect on student learning: r=0.436, 95% CI [0.338, 0.525], p<0.001. Moderator analyses (RQ3): - Learning outcomes (Qbetween=21.847, p=0.001): learning engagement r=0.740; learning motivation r=0.608; academic achievement r=0.450; problem-solving ability r=0.447; creative thinking r=0.329; self-efficacy r=0.230. - Class size (Qbetween=0.856, p=0.652; not significant): ≤30 r=0.609; 31–50 r=0.422; ≥51 r=0.389. - Treatment duration (Qbetween=16.324, p<0.001): ≥3 months r=0.535; ≤1 month r=0.456; 1–3 months r=0.245. - Grade level (Qbetween=14.678, p=0.005): high school r=0.538; university r=0.463; junior high r=0.443; primary r=0.222 (ns, p=0.075); kindergarten r=0.174. - Subject (Qbetween=2.130, p=0.345; not significant): multidiscipline r=0.604; No-STEM r=0.470; STEM r=0.393. - DT model (Qbetween=55.147, p<0.001): OSIP r=0.766; EDIPT r=0.522; 2UPPI r=0.346; PAS r=0.301; UOPIPT r=0.297; 3IE r=0.222; EDEIPT r=0.174; CTC r=0.191 (ns, p=0.544); LAUNCH r=0.066 (ns, p=0.562). - Team size (Qbetween=0.224, p=0.885; not significant): 1–4 r=0.477; 5–7 r=0.441. - Region (Qbetween=50.576, p<0.001): Africa r=0.690; Asia r=0.435; Australia r=0.355; Europe r=0.346; America r=0.066 (ns, p=0.562). Descriptives (RQ1): Most studies applied EDIPT, focused on STEM, and were conducted with high school and university students in Asia. Publication activity increased markedly from 2020 onward.
Discussion
The meta-analysis demonstrates that DT yields an upper-medium positive effect on student learning across educational levels and contexts, addressing prior inconsistencies by providing aggregate quantitative evidence. The findings suggest DT is particularly effective for enhancing learning engagement, motivation, academic achievement, and problem-solving, supporting the premise that DT’s iterative, hands-on, learner-centered processes foster deep learning and key 21st-century competencies. Moderator results clarify when and how DT is most effective: smaller classes show larger effects (though class size did not significantly moderate effects statistically), likely due to the need for close facilitation and feedback. Longer treatment durations (≥3 months) are most beneficial, while 1–3 months shows the smallest effect—possibly due to the fading novelty effect and the time needed to master DT processes. DT appears most impactful at the secondary and university levels, consistent with developmental readiness for complex, collaborative tasks. Though subject area did not significantly moderate effects, multidiscipline implementations were associated with larger effects, aligning with DT’s interdisciplinary nature. Model choice matters: OSIP and EDIPT produced the strongest effects, suggesting clearer, actionable stages support learning outcomes; some models showed negligible or non-significant effects, warranting careful selection and adaptation. Team size did not significantly moderate effects; groups of 2–7 members were generally effective, with composition and facilitation likely more influential than size alone. Regional differences indicate stronger effects in Africa and Asia, potentially reflecting cultural and educational system factors such as collectivist orientations; however, uneven study distributions call for cautious interpretation. Overall, the results reinforce DT as a valuable pedagogical approach when implemented with appropriate duration, facilitation, and model selection, contributing actionable guidance to educators and researchers.
Conclusion
This meta-analysis of 25 studies (42 effect sizes) provides quantitative evidence that design thinking (DT) has an upper-medium positive effect on student learning (r≈0.44). DT significantly improves key outcomes including learning engagement, motivation, academic achievement, problem-solving, creative thinking, and self-efficacy, with especially strong effects on engagement, motivation, and achievement. Moderator analyses indicate that learning outcome type, grade level, treatment duration, DT model, and region significantly influence effectiveness. DT tends to be more effective with secondary and university students, longer durations (≥3 months), and when using OSIP or EDIPT models; multidiscipline contexts also show relatively higher effects. Practical guidance includes keeping classes as small as practical, planning sufficient time for DT cycles, selecting effective DT models (notably EDIPT/OSIP), and forming teams of up to seven members. Future research directions include: expanding studies in underrepresented regions (Americas, Africa, Australia, Europe), levels (kindergarten, junior high), outcomes (learning engagement, self-efficacy), and DT models beyond EDIPT; examining DT effectiveness in larger classes and non-STEM/multidisciplinary contexts; exploring optimal team compositions and the mechanisms underlying duration effects, particularly the 1–3 month window; and complementing meta-analytic evidence with systematic reviews addressing broader qualitative insights.
Limitations
- Uneven distribution of studies across regions (predominantly Asia), grade levels (few in kindergarten and junior high), and DT models (EDIPT dominates), limiting generalizability of moderator findings. - Language restriction to English may omit relevant studies in other languages. - Substantial heterogeneity (I2≈92.6%) indicates variability across studies; additional moderators (e.g., learning environment, implementation fidelity) may be unmeasured. - Limited number of included studies and effect sizes for some subgroups/models (e.g., OSIP, LAUNCH, CTC, regions outside Asia) reduce precision; some subgroup effects should be interpreted cautiously. - Meta-analysis may not capture all contextual and process nuances of DT implementation; complementary systematic reviews are recommended.
Related Publications
Explore these studies to deepen your understanding of the subject.