Education
Exploring the genetic prediction of academic underachievement and overachievement
K. Kawakami, F. Procopio, et al.
The study addresses how to identify and characterize academic underachievement early in schooling, when intervention may be most effective. Traditional approaches use prior achievement or intelligence (g) to predict performance, but these are limited early in life: very early IQ weakly predicts later intelligence, and prior achievement requires waiting until students have already fallen behind. Advances in genomics enable prediction of educational outcomes from genome-wide polygenic scores (GPS), which are stable across the lifespan and not influenced by practice or testing conditions. The EA4 GPS for educational attainment is currently the strongest predictor of education-related traits. The authors propose a genomically predicted achievement discrepancy index (GPAA), defined as observed achievement minus GPS-predicted achievement, to quantify under- and overachievement at age 7 and examine whether students’ achievement trajectories from ages 7 to 16 regress toward their genomic predictions. They compare GPAA with a traditional discrepancy index using g (cogA) and explore domain-specific under/overachievement in English and mathematics.
Prior work shows previous achievement is the strongest predictor of future achievement, while early-life intelligence has limited predictive value for later intelligence and achievement. Genome-wide association studies have identified many variants with small effects that can be aggregated into GPS, which currently explain up to ~16% of variance in educational achievement and offer potential for predictive enrichment. Meta-analyses of interventions for underachievement report heterogeneous effects, with earlier interventions more effective, and interventions for gifted underachievers often unsuccessful. The literature also recognizes multiple determinants of underachievement, including motivation, self-regulation, conscientiousness, and family environment, and increasing interest in domain-specific underachievement (e.g., math, reading). GPS-based prediction increases with age and may capture both cognitive and non-cognitive contributors to educational attainment. Ethical, equity, and ancestry-related limitations of GPS prediction are noted in prior reviews, with reduced predictive power in non-European ancestries.
Design and sample: Longitudinal analysis of the Twins Early Development Study (TEDS), a UK cohort of twins born 1994–1996. Final analytic sample N=4175 genotyped individuals with at least three achievement observations across ages 7, 9, 12, and 16; participants of self-identified European ancestry and without serious medical conditions. Ethical approval from King’s College London Ethics Committee. Genotyping and GPS: DNA extracted from saliva/buccal swabs and genotyped on Affymetrix GeneChip 6.0 or Illumina HumanOmniExpressExome arrays. QC removed samples by call rate (<0.98), non-European ancestry, heterozygosity, and non-DZ relatedness; SNPs excluded for MAF<0.5%, missingness>2%, HWE p<1e-5, non-autosomal markers, indels, and array/batch effects (p<1e-4). Post-QC N=10,346 (7026 unrelated; 3320 DZ co-twins). GPS computed using LDPred2; residualized for chip, batch, and first ten genetic PCs. Primary GPS from EA4 educational attainment GWAS (summary stats excluding 23andMe; N≈765,283). Additional GPS for multi-polygenic analyses: IQ, cognitive performance, self-reported math ability, math attainment. Measures: Teacher-assessed UK National Curriculum (NC) levels at ages 7, 9, 12 in English, math, and science (science not at age 7): age 7 and 9 scales 5-point; age 12 nine-point. GCSE grades at age 16 (core subjects English, math, science), scaled 4 (G) to 11 (A*). Scores standardized and averaged to form composite general achievement; at age 7 composite from English and math. Teacher ratings correlate highly with exam data; GCSE from TEDS correlates >0.95 with NPD records. General cognitive ability (g) at age 7 from four telephone-administered tests (Conceptual Grouping, Similarities, Vocabulary, Picture Completion); standardized and averaged. Family SES from parental education and employment and maternal age at first birth; standardized. Under/overachievement indices: GPAA = standardized observed achievement at age 7 minus standardized GPS (EA4, or multi-GPS in ancillary analyses). Negative GPAA indicates underachievement (achievement below genomic expectation), positive indicates overachievement. For comparison, cogA = standardized achievement minus standardized g at age 7. Groups for descriptive extremes: relative underachievers/overachievers (GPAA ≤−1 SD or ≥+1 SD) and absolute underachievers/overachievers (subsets also ≤−1 SD or ≥+1 SD on observed achievement at age 7 while having extreme GPAA). Achievement trajectories: For each participant, achievement slopes computed by linear regression of standardized achievement on age (7, 9, 12, 16); slope standardized (mean=0, SD=1). Outliers in slope removed using the median absolute deviation (MAD) rule (>|3×MAD|). Analyses: Correlations between GPAA (age 7) and achievement slopes; regression models predicting slopes from GPAA with and without covariates (sex, age, SES); SES moderation tested via interaction. Decile analyses among participants with age-7 achievement within ±0.5 SD of the mean. Multi-polygenic score approach using 10-fold cross-validation (3 repeats) to predict age-7 achievement from five GPS; assessed multicollinearity via VIF. Domain-specific analyses constructed English- and math-specific GPAA and slopes. Twin univariate ACE modeling (OpenMx) estimated heritability (A), shared environment (C), and non-shared environment (E) for GPAA, achievement at 7, subject-specific GPAA/achievement, and achievement slopes. Preregistration noted (OSF link); analyses reported as exploratory beyond preregistration.
- GPAA at age 7 strongly predicted longitudinal achievement trajectories (slopes from ages 7–16): r = −0.456 (95% CI −0.480 to −0.432, p<0.001), explaining 20.8% of variance in slopes. Negative correlation indicates regression toward genomic predictions: underachievers improved and overachievers declined relative to peers.
- Magnitude of regression: Using a trimmed mean approach, participants on average regressed to 38.4% of their genomically predicted achievement levels by age 16. Group-specific regressions toward genomic predictions: absolute overachievers 45.3%; relative overachievers 38.9%; relative underachievers 28.4%; absolute underachievers 30.6%. Most change occurred by age 9.
- Decile analysis among children with near-average age-7 achievement (±0.5 SD): the lowest GPAA decile improved by nearly +0.5 SD by age 16; the highest GPAA decile declined by nearly −0.5 SD.
- Covariate adjustment: Adding sex, age, and SES reduced GPAA’s semi-partial R² from 20.8% to 18.5%; the full model explained significantly more variance overall (ΔR²=0.038; F(3,3926)=66.949, p<0.001). SES did not significantly moderate the GPAA–slope association (β interaction = −0.021; 95% CI −0.043 to 0.000; p=0.134).
- Multi-GPS vs EA4-only: Multi-GPS GPAA–slope correlation r=−0.430 (95% CI −0.453 to −0.404, p<0.001) vs EA4-only r=−0.456 (95% CI −0.480 to −0.432, p<0.001); overlapping CIs indicate no improvement from multi-GPS.
- Domain-specific results: English GPAA–English slope r=−0.416 (95% CI −0.440 to −0.390, p<0.001); Math GPAA–Math slope r=−0.478 (95% CI −0.501 to −0.454, p<0.001). Magnitudes similar to general achievement, slightly stronger for math than English.
- Twin ACE estimates: Heritability (A) for GPAA (age 7) was very high at 85% (higher than general achievement at age 7: 67%); mathematics GPAA 87% and English GPAA 85% (higher than math achievement 71% and English achievement 68%). Achievement slope heritability was 57% (slightly lower than achievement at age 7).
- Comparison with traditional discrepancy (cogA = achievement − g): GPAA and cogA correlated 0.377. Heritability of cogA was 34.3% (95% CI 0.240–0.444). After adjusting for SES, sex, and age, GPAA semi-partial R²=0.185 (95% CI 0.164–0.206, p<0.001) and cogA semi-partial R²=0.145 (95% CI 0.122–0.169, p<0.001); their CIs overlapped, indicating similar predictive power. Combined in multiple regression, GPAA and cogA jointly explained R²=0.261 of slope variance; both contributed independently (standardized β GPAA=−0.351; β cogA=−0.262).
Findings confirm that discrepancies between achievement and genomic prediction at age 7 are meaningfully related to how students’ performance changes through schooling: those underachieving relative to their GPS tend to improve, and those overachieving tend to decline, moving toward their genomically predicted trajectories. This pattern aligns with regression to the mean, but the study quantifies its magnitude across the distribution and over key school years, showing substantial regression by age 9 and continued alignment by age 16. The predictive power of GPS for later achievement likely increases with age because later measures (e.g., GCSEs) more closely approximate the target phenotype of EA4 (educational attainment), which captures cognitive and non-cognitive contributors. GPAA and cogA each represent complementary constructs: GPAA reflects a broader set of education-related influences embedded in educational attainment GPS, while cogA isolates cognitive ability. Their modest correlation and additive prediction (26% joint variance) underscore their distinct yet complementary roles in forecasting achievement trajectories. Practically, GPAA may serve as an early-warning indicator to identify children who are underachieving relative to their genetic propensity and could benefit from targeted support, potentially improving cost-effectiveness by aligning with genetic propensities. However, intervention efficacy for GPAA-identified underachievers versus other underachievers requires direct testing. SES neither explained nor moderated the GPAA–trajectory relationship, suggesting the phenomenon generalizes across SES levels within this cohort. The generality across English and math suggests GPAA captures broad educational processes rather than domain-specific nuances alone.
The study introduces and evaluates the Genomically-Predicted Achievement Discrepancy (GPAA), showing that children who deviate from genomic predictions at age 7 tend to regress substantially toward those predictions through ages 9, 12, and 16. GPAA predicts achievement trajectories as well as a traditional ability-based discrepancy (cogA), and the two measures together improve prediction. High heritability estimates for GPAA motivate future GWAS targeting GPAA directly, which could enable earlier identification and intervention even before school achievement is measured. Future research should: conduct GWAS of GPAA; test whether interventions tailored to GPAA-identified underachievers outperform standard approaches; investigate antecedents and sequelae of GPAA trajectories; and assess generalizability across ancestries and educational systems.
- Attrition across nine years of data collection; analytic inclusion required at least three of four time points, which may introduce selection effects.
- Inability to fully control classroom and school effects; reliance on teacher assessments, though these align well with standardized exams and NPD records.
- Ancestry limitation: analyses restricted to participants of European ancestry; GPS predictive power is attenuated in other ancestries, limiting generalizability and potentially exacerbating disparities.
- Practicality: GPAA currently requires both genotyping and early achievement data; without a dedicated GPAA GWAS, early-life prediction before schooling is not feasible.
- Potential interpretational constraints due to regression-to-the-mean phenomena inherent in discrepancy-based definitions of under/overachievement.
- The EA4 summary statistics used excluded 23andMe participants; although large, this may slightly affect GPS calibration relative to full EA4.
- Exploratory analyses beyond preregistration; results should be replicated in independent cohorts.
Related Publications
Explore these studies to deepen your understanding of the subject.

