logo
ResearchBunny Logo
Multi-polygenic score prediction of mathematics, reading, and language abilities independent of general cognitive ability

Psychology

Multi-polygenic score prediction of mathematics, reading, and language abilities independent of general cognitive ability

F. Procopio, W. Liao, et al.

This fascinating study by Francesca Procopio and colleagues delves into the heritability of specific cognitive abilities like mathematics, reading, and language, independent of general cognitive ability. By utilizing twins and DNA, the research reveals significant insights into how our genetic makeup influences these skills. Discover the implications of these findings for understanding cognitive strengths and weaknesses beyond general intelligence.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses whether specific cognitive abilities (SCA)—mathematics, reading, and language—retain substantial heritability independent of general cognitive ability (g), and whether multiple polygenic scores (multi-PGS) can predict these g-independent components (SCA.g). Although SCA correlate with each other and with g, prior twin evidence indicates non-unity genetic correlations and suggests meaningful trait-specific genetic influences. The authors aim to quantify heritability of SCA and SCA.g via twin and SNP-based methods, and to evaluate the predictive power of multi-PGS for SCA and SCA.g in a large, well-characterized twin cohort. This work is important for developing genetic profiles of strengths and weaknesses in cognitive abilities beyond g, potentially informing early identification and targeted interventions.
Literature Review
Decades of behavioral genetic research show substantial heritability for both g and SCA (~50–56%), with genetic correlations among SCA around 0.5, indicating both shared and specific genetic factors. Limited prior work on SCA.g (phenotypically corrected for g) suggested similarly high heritability (~53%). Early GWAS of SCA were underpowered (often <10,000 participants), yielding few genome-wide significant hits and weak PGS prediction. Larger GWAS of related broad traits (educational attainment with N≈3,000,000; self-reported math/education-related traits with N≈500,000) produce PGS explaining 6–12% of variance in educational outcomes, but are highly g-loaded. Only one prior GWAS directly examined SCA.g (Donati et al.), finding significant SNP-heritability for maths and science but no genome-wide significant SNPs due to small N. Multi-PGS strategies that aggregate diverse PGS can boost prediction and may capture both cognitive and noncognitive influences relevant to SCA and SCA.g.
Methodology
Design and sample: Data come from the Twins Early Development Study (TEDS), a longitudinal UK twin cohort initially including over 16,000 twin pairs born 1994–1996; over 8,000 families remain active. Genomic data are available for >10,000 twins. Exclusions included serious medical conditions, perinatal complications, or missing key background data. Analyses combined same- and opposite-sex DZ twins due to minimal sex differences. Participants were predominantly of white ancestry; only white participants were genotyped and included in genomic analyses. Cognitive measures (age 12): A battery of 14 assessments was administered online/telephone. Composite scores were created for reading (mean of 4 tests), mathematics (mean of 3 tests), language (mean of 3 tests), and g (mean of 4 independent tests). All measures were standardized and corrected for age and sex. - Reading: Two comprehension tests (PIAT adaptation; GOAL Key Stage 3) and two fluency tests (Woodcock-Johnson III Reading Fluency adaptation; TOWRE via telephone). - Mathematics: Three NFER subtests—Understanding Numbers, Non-numerical Processes and Computation, Knowledge (booklets 6–14). - Language: Online tests of syntax (TOAL-3 Listening Grammar), semantics (TLC Figurative Language, Level 2), and pragmatics (TLC Making Inferences, Level 2). - General cognitive ability (g): Independent of the above SCA tests, using two verbal (WISC-III-PI Multiple Choice Information; Vocabulary Multiple Choice) and two non-verbal tests (WISC-III-UK Picture Completion; Raven’s Standard/Advanced Progressive Matrices). A 14-test g factor was also computed for comparisons. Construction of SCA.g: For each SCA composite, g (from the 4-test g factor; also repeated using the 14-test g factor) was regressed out; standardized residuals served as SCA.g indices. SCA.g correlated strongly with their corresponding SCA (0.76–0.81) and weakly with other uncorrected SCA (0.22–0.29). Twin analyses: Univariate ACE models estimated additive genetic (A), shared environmental (C), and non-shared environmental (E) components. Initial estimates by Falconer’s formulas were complemented with maximum-likelihood structural equation modeling in OpenMx to obtain ACE estimates and 95% CIs. SNP-based heritability: SNP-heritability (h2_SNP) was estimated with GCTA-REML using the Zaitlen et al. method to include DZ twin data (kinship matrix from GRM with off-diagonals <0.05 set to 0). Analyses included the first 10 genetic principal components and sex as covariates. Polygenic scores (PGS): PGS were previously constructed using LDpred or LDpred2-auto, using all SNPs and adjusted for 10 PCs, batch, and chip. Starting from 327 available PGS (as of Oct 1, 2023), those derived from discovery GWAS with N<10,000, that included TEDS, or that included 23andMe summary statistics were excluded, yielding 230 PGS. For each SCA, PGS with |r|≥0.03 with the target SCA were retained: 57 for reading, 52 for mathematics, 50 for language. Multi-PGS modeling: Elastic net penalized regression (gimnet/caret in R) predicted each SCA and SCA.g using retained PGS. Data were split into 80% training and 20% hold-out sets. Within training, 10-fold cross-validation repeated 100 times selected hyperparameters minimizing RMSE. Out-of-sample R² was computed in the hold-out set. Parallel standard multiple regression analyses were conducted for comparison. Sensitivity analyses used one genotyped individual per DZ pair.
Key Findings
- Twin heritability: Average h2_twin was 53% for uncorrected SCA and 40% for SCA.g; differences were not statistically significant. Ordering by domain was consistent (highest reading, then mathematics, then language). Shared environmental estimates were lower for SCA.g than SCA. - SNP-heritability: Average h2_SNP was 35% for SCA and 26% for SCA.g, with overlapping SEs indicating non-significant differences between each SCA and its g-corrected counterpart. Notably, mathematics.g showed slightly higher h2_SNP (37.1%) than uncorrected mathematics (33.2%). - Multi-PGS prediction: Elastic net multi-PGS explained on average 11.1% of variance in SCA and 4.4% in SCA.g. Using one DZ twin per pair yielded similar values (SCA 9.6%; SCA.g 4.3%). Simple multiple regression produced similar results (SCA 10.1%; SCA.g 4.0%). - Domain-specific SCA.g prediction: Reading.g had the highest multi-PGS prediction (6.9%), followed by mathematics.g (3.6%) and language.g (2.5%). - Most predictive PGS: Educational attainment (EA4) was the strongest predictor for all SCA and for language.g; intelligence PGS best predicted reading.g; cognitive performance PGS best predicted mathematics.g. The multi-PGS approached only slightly improved prediction beyond the single top PGS. - PGS retained: Elastic net retained 32, 33, and 22 PGS for reading, mathematics, and language (uncorrected), and 23, 17, and 19 for reading.g, mathematics.g, and language.g, respectively. Largest independent contributions per PGS explained up to ~4% of variance when squared.
Discussion
Findings show that mathematics, reading, and language abilities retain substantial heritability after removing variance shared with g, reinforcing that SCA are not merely reflections of g. Both twin and SNP-based heritabilities for SCA.g were approximately three-quarters of those for uncorrected SCA, and differences were non-significant. In contrast, multi-PGS prediction decreased more markedly from SCA to SCA.g, likely because the most powerful available PGS (educational attainment, intelligence, cognitive performance) are highly g-loaded; their predictive power diminishes once g variance is removed. Nevertheless, multi-PGS still predicted all SCA.g, with the strongest performance for reading.g, consistent with twin-based evidence suggesting reading may be relatively less g-dependent than mathematics and language (though prior genetic correlations with g remain high across domains). Results highlight that adding PGS from more specific cognitive traits (e.g., executive function) contributes incremental prediction even in the presence of general cognitive PGS. Overall, the study provides proof-of-principle that SCA.g can be predicted from DNA, albeit modestly with current PGS resources, and underscores the need for large-scale GWAS of SCA and SCA.g to develop more specific and powerful predictors.
Conclusion
SCA independent of g (SCA.g) show substantial heritability (twin h2 ≈ 40%; SNP h2 ≈ 26%) and can be predicted from DNA using multi-PGS (up to 6.9% for reading.g), demonstrating that the heritability of SCA is not solely due to g. The larger drop in PGS-based prediction from SCA to SCA.g reflects the current dominance of g-loaded predictors (educational attainment, intelligence). Future research should prioritize large GWAS of SCA and especially SCA.g—potentially via brief, scalable assessments or GWAS-by-subtraction approaches—to create powerful, domain-specific PGS that can inform early, targeted educational interventions by profiling cognitive strengths and weaknesses independent of g.
Limitations
- Twin design assumptions (equal environments, additivity) and standard limitations of PGS and GCTA (focus on additive effects of common SNPs, incomplete tagging of causal variants). - Ancestry and generalizability: Predominantly white UK sample; only white participants were genotyped. Findings may not generalize to other ancestries or populations; diverse-ancestry GWAS are needed. - PGS limitations: Current strongest PGS are highly g-loaded; domain-specific SCA GWAS are relatively small, limiting SCA.g prediction. Proprietary restrictions (e.g., 23andMe) reduced usable SNP sets for some PGS (e.g., highest math class), possibly attenuating predictive power. - Measurement considerations: Although composites were robust, residual confounding and potential over/undercorrection for g remain concerns; however, analyses with a 14-test g factor yielded similar inferences.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny