Education
Inadequate foundational decoding skills constrain global literacy goals for pupils in low-and middle-income countries
M. Crawford, N. Raheel, et al.
A recent study analyzing reading assessment data from over half a million pupils in 48 low- and middle-income countries uncovers a startling lack of basic decoding skills among students aged 10. Conducted by Michael Crawford, Neha Raheel, Maria Korochkina, and Kathleen Rastle, this research points to the critical need for systematic phonics programs to boost reading fluency and better assessments.
~3 min • Beginner • English
Introduction
The paper addresses why the majority of children in low- and middle-income countries (LMICs) are unable to read with basic comprehension by age 10 despite large investments and high primary enrolment. Prior work in LMICs has emphasized systemic and governance factors; here the authors frame the problem through the science of reading. They posit that failure to develop foundational decoding skills—mapping letters to sounds and using those mappings to decode printed words—prevents children from progressing to fluent, meaningful reading. They argue that systematic phonics in the early years is essential for building decoding, which in turn enables vocabulary-based access to word meaning and sufficient practice to become proficient readers. The study’s purpose is to assess whether pupils in LMICs acquire these foundational decoding subskills in the first three instructional years and how those skills relate to reading comprehension.
Literature Review
The authors contrast practices in high-income countries—where early reading instruction typically emphasizes decoding with systematic phonics and where decoding is frequently assessed (e.g., DIBELS, state early screening laws)—with LMIC frameworks such as the Global Proficiency Framework (GPF), which focus largely on comprehension and give limited attention to decoding and phonics. They highlight the paucity of decoding-focused assessments in LMIC monitoring systems. The Early Grade Reading Assessment (EGRA), modeled on DIBELS and designed for LMIC contexts, offers a theoretically grounded suite of foundational reading tasks enabling assessment of decoding subskills. Unlike centralized cross-country assessments (PIRLS, PISA), EGRA is decentralized but maintains design standards allowing comparable assessment of foundational skills across languages, with known limitations. Prior psychological research supports decoding as foundational to reading comprehension, and systematic phonics as an effective instructional approach, suggesting a disconnect between evidence and LMIC policy targets centered on comprehension.
Methodology
The authors assembled and curated a large EGRA database from 254 publicly available reports, covering 347 EGRA surveys (2007–2021) across 62 LMICs, 32,800 schools, 726,680 pupils, and 115 languages. For comparability to US benchmarks, analyses in this article were restricted to alphabetic writing systems. The focal analytic sample comprised 230 EGRA surveys with 694 subsurveys from 48 countries, 96 languages, and 22,656 schools, totaling 526,862 pupils assessed primarily in a language of instruction across the first three instructional years. Four decoding subskills with DIBELS analogues were analyzed: letter name identification, letter sound identification, non-word (pseudoword) reading, and oral reading fluency (ORF, words correct per minute). Scores reflect items read or produced accurately in 1 minute. Where available, reading comprehension (literal and inferential questions on ORF passages) and listening comprehension were used for correlational analyses, restricting comprehension tasks to those with 4–6 items for comparability. EGRA protocols were reviewed against the EGRA Toolkit to flag deviations; four subskill scores (letter names and ORF) were excluded for not being per-minute. Analyses benchmarked EGRA results against DIBELS 8th Edition substantial-risk and severe-risk thresholds (benchmarks averaged across beginning, middle, and end of year). Statistical analyses used Bayesian estimation (BEST in R, with JAGS MCMC), providing posterior distributions for means, standard deviations, and effect sizes. Two analyses were conducted: (1) progress across instructional years (Bayesian two-group comparison with weakly informative priors), and (2) deviation from the substantial-risk benchmark using priors centered on DIBELS benchmark distributions for each task-year (one-sample comparison). Effect sizes were reported as Cohen’s d from posterior draws, with 95% highest density intervals (HDIs). Robustness checks included analyses restricted to nationally representative samples and to English-only EGRAs; results were comparable. A supplementary analysis using a skewed t likelihood yielded similar estimates, indicating limited impact of skewness.
Key Findings
- Descriptive scope: 230 EGRA surveys, 694 subsurveys, 48 countries, 96 languages, 22,656 schools, and 526,862 pupils (alphabetic writing systems) across the first three instructional years.
- Decoding performance vs benchmarks: The vast majority of average subskill scores fell below both DIBELS substantial- and severe-risk benchmarks. By year 3, 64–99% of subskill scores were below the substantial-risk benchmark and 55–96% below the severe-risk benchmark.
- Table 1 percentages below benchmarks:
- Substantial risk: Year 1—Letter names 81%, Letter sounds 85%, Non-words 66% (no ORF benchmark in Year 1). Year 2—Letter names 80%, Letter sounds 93%, Non-words 72%, ORF 82%. Year 3—Letter sounds 99%, Non-words 64%, ORF 99% (no letter-name benchmark in Year 3).
- Severe risk: Year 1—Letter names 79%, Letter sounds 69%, Non-words 47%. Year 2—Letter names 79%, Letter sounds 88%, Non-words 57%, ORF 60%. Year 3—Letter sounds 96%, Non-words 55%, ORF 92%.
- Progress across years (Bayesian estimates, Table 2): Mean subskill scores generally improved from Year 1 to Year 2 and from Year 2 to Year 3, with medium to large posterior effect sizes:
- Letter names: Y2–Y1 d=0.47 [0.16, 0.78]; Y3–Y2 d=0.77 [0.51, 1.04].
- Letter sounds: Y2–Y1 d=0.70 [0.45, 0.96]; Y3–Y2 d=−0.13 [−0.37, 0.10] (no credible improvement).
- Non-words: Y2–Y1 d=0.76 [0.56, 0.95]; Y3–Y2 d=0.65 [0.45, 0.86].
- ORF: Y2–Y1 d=0.82 [0.64, 0.99]; Y3–Y2 d=0.67 [0.50, 0.85]. Despite effect sizes, absolute levels remained very low.
- Widening gaps vs substantial-risk benchmarks (Table 3): Posterior effect sizes for all task-year comparisons were large and negative (95% HDIs did not include 0), indicating average performance was credibly below substantial-risk benchmarks in all cases analyzed. Absolute effect-size magnitudes increased with instructional year for all tasks except non-word reading, implying pupils fell further behind expected trajectories over time.
- Absolute performance illustration: ORF increased from about 10 words correct per minute (wpm) in Year 2 to 22 wpm in Year 3 (d≈0.67), yet remained far below the benchmark trajectory (≈23.3 wpm in Year 2; ≈73.7 wpm in Year 3).
- Predicted means from benchmark comparison (Table 3 examples): Letter names—Year 1 posterior mean ≈22.4 vs prior mean 35; Year 2 ≈31.6 vs 53. Letter sounds—Year 1 ≈9.1 vs 21.7; Year 2 ≈15.8 vs 45.7; Year 3 ≈14.4 vs 64.7. Non-words—Year 1 ≈0.48 vs 3.7; Year 2 ≈7.7 vs 11.3; Year 3 ≈16.3 vs 19. ORF—Year 2 ≈10.3 vs 23.3; Year 3 ≈22.5 vs 73.7.
- Relationship to comprehension (Fig. 2): Strong positive correlations between decoding subskills and reading comprehension: Letter names r=0.737 (N=266), Letter sounds r=0.624 (N=452), Non-words r=0.755 (N=556), ORF r=0.769 (N=730), all P<0.001. Listening comprehension correlated more weakly with reading comprehension (r≈0.26) and with decoding subskills (all r<0.18 except letter names not significant), suggesting decoding deficits are the primary constraint on comprehension in these samples.
- Robustness: Similar patterns held when restricting to nationally representative samples and to English-language EGRAs, indicating results are not artifacts of sampling or language-benchmark mismatch.
Discussion
Findings indicate that pupils in LMICs are not acquiring foundational decoding skills—letter names and sounds, accurate decoding of novel letter strings, and fluent oral reading—during the first three instructional years. Although mean scores improve year by year, pupils fall increasingly below substantial-risk benchmarks, implying they are diverging from trajectories needed for proficient reading. Because fluent decoding reduces cognitive load and enables integration across words and sentences, deficits in decoding directly constrain reading comprehension. The strong correlations between decoding subskills and comprehension align with psychological theories that decoding is prerequisite for reading for meaning. The work underscores that large effect sizes from low baselines can mask the practical reality that absolute performance remains far below minimum proficiency, so progress should be tracked with absolute metrics (e.g., words correct per minute). The analysis also suggests that reliance on frameworks emphasizing comprehension (e.g., GPF) without ensuring antecedent decoding mastery may misdirect instructional efforts and monitoring. While language-of-instruction and multilingual contexts can affect comprehension, the data indicate decoding deficits themselves are a primary driver of poor outcomes. Benchmarks based on English (a relatively opaque orthography) are a pragmatic choice; language-specific benchmarks are desirable but unlikely to overturn the central conclusion: early decoding skills are not being established at scale.
Conclusion
The study provides large-scale evidence that foundational decoding skills are not being mastered by most early-grade pupils in LMICs, and that gaps relative to proficiency benchmarks widen with each instructional year, undermining reading comprehension and broader learning. To meet global literacy goals, policy and practice must pivot to the earliest point of failure by ensuring mastery of decoding by the end of the third instructional year. Recommended responses include implementation of rigorous systematic phonics programs, explicit instruction in letter–sound mappings, and regular, curriculum-linked assessments of decoding to guide instruction and intervention. The authors call for the reading research community to adopt a global perspective, developing language- and script-appropriate benchmarks and instructional tools, and expanding research on reading acquisition in diverse writing systems, multilingual environments, and low-resource contexts.
Limitations
- EGRA is not centrally overseen; survey administration and adherence to protocols can vary across contexts, introducing heterogeneity.
- Sampling approaches differ; not all surveys are nationally representative. Analyses aggregate across countries, languages, instructional contexts, and socio-economic conditions.
- Available data are at the subsurvey average level; individual pupil data were not accessible, limiting inferences to system-level performance and preventing individual-level analyses.
- Benchmarks used are from DIBELS in English; while pragmatic and likely conservative for decoding tasks, they may not perfectly transfer across languages with different orthographic transparency or morphological/orthographic structures.
- Multilingual and language-of-instruction issues in LMICs could affect comprehension; however, the study suggests these are less likely to explain basic decoding shortfalls. Listening comprehension benchmarks comparable to DIBELS were unavailable.
- Potential distributional skewness was examined; results were robust, but residual unmodelled heterogeneity across systems may persist.
Related Publications
Explore these studies to deepen your understanding of the subject.

