Education
Shared genetic architectures of educational attainment in East Asian and European populations
T. Chen, J. Kim, et al.
This groundbreaking study reveals the first large-scale genome-wide association study of educational attainment in East Asian individuals, uncovering high genetic correlation with European populations. Conducted by prominent researchers including Tzu-Ting Chen and Jaeyoung Kim, it emphasizes the importance of diverse ancestry in unraveling the genetic factors influencing education.
~3 min • Beginner • English
Introduction
Educational attainment (EduYears), measured as years of education, is a behavioral trait linked to varied social, economic, and health outcomes. It correlates phenotypically and genetically with diseases including cardiovascular and metabolic diseases, psychiatric disorders, Alzheimer’s disease, and cognitive function, making it a useful proxy phenotype in large population studies. Heritability is moderate (~40% from twin studies; ~20% SNP-based). Prior GWASs have largely targeted individuals of European ancestry, including a meta-analysis of about 3 million individuals that identified 3,952 loci, leaving uncertainty about generalizability to non-European populations. This study conducts the first large-scale EduYears GWAS in East Asian populations and a cross-ancestry meta-analysis to: (1) identify loci in cross-population samples, (2) investigate biological bases in EAS, (3) test shared genetic architecture between EAS and EUR, and (4) assess cross-population gains in polygenic prediction and fine-mapping.
Literature Review
Previous GWAS meta-analyses and replication efforts identified numerous variants associated with educational attainment, culminating in a European-ancestry meta-analysis of ~3 million individuals reporting 3,952 independent genome-wide significant loci. EduYears exhibits moderate heritability (~40% from twin studies) and SNP-heritability around 20%. However, the literature has focused almost exclusively on European ancestries, limiting evidence for transferability and generalizability to non-European populations and raising concerns of health and social disparities stemming from underrepresentation.
Methodology
Study cohorts: EduYears and genotype data were obtained from Taiwan Biobank (TWB; N genotyped 107,493; analyzed 104,722 after QC) and the Korean Genome and Epidemiology Study (KoGES; N genotyped 72,294; analyzed 71,678 after QC). Education was harmonized via ISCED categories mapped to years of schooling for comparability across cohorts. Ethics approvals were obtained (TWB EC1090402-E, EC1110608-E; KoGES X-2107-699-902).
Genotyping, QC, imputation: Pre-imputation QC included sample and variant call rate >0.98, removal of duplicates, monomorphic and mis-mapped variants. Genetic ancestry was inferred with a random forest using top six PCs and 1000 Genomes (1KG) Phase 3; EAS samples required predicted probability >0.8. PCs were computed after LD pruning; outliers (>6 SD on any of top 10 PCs) and sex mismatches were excluded. Post-QC variants required call rate >0.98 and HWE P ≥ 1e-10. Imputation used Eagle v2.4 (pre-phasing) and Minimac4 with 1KG Phase 3 EAS reference.
Association analyses: Primary GWAS used Regenie v2.2.4 two-step whole-genome regression accounting for relatedness and population structure. Covariates: birth year (BY), BY^2, BY^3, sex, BY×sex, BY^2×sex, BY^3×sex, and top 10 PCs. For LDSC-based analyses (heritability and genetic correlation), linear regression in PLINK v2.0 was performed in unrelated individuals (relatedness filtered with kinship thresholds). Variant filters for meta-analyses: imputation INFO >0.6 and MAF >0.5% for Regenie; INFO >0.8 and MAF >1% for PLINK.
Meta-analyses: Within EAS, TWB batches were combined, then TWB and KoGES were meta-analyzed using inverse-variance weighted (IVW) fixed effects in METAL; heterogeneity assessed via Cochran’s Q. Cross-ancestry meta-analysis combined EAS with publicly available EUR summary statistics (Lee et al. 2018; n=766,345; ~10.1M variants) using METAL with genomic control per ancestry. Multi-ancestry meta-analysis (MAMA) modeled ancestry-specific LD, allele frequency, and effect size differences to provide EAS- and EUR-specific meta-analytic results.
Functional and enrichment analyses: eQTL and gene mapping used FUMA v1.3.7 with GTEx v8 brain tissues (cis-eQTLs; FDR<5%). Gene-based and gene set analyses used MAGMA within FUMA (18,123 genes; 10,678 gene sets from MSigDB v6.2; Bonferroni correction). Partitioned heritability used stratified LDSC with 97 baseline-LD annotations (Gazal et al.) and LDSC-SEG for tissue/cell-type enrichment leveraging GTEx, Franke lab data, Roadmap Epigenomics, ENCODE, and Cahoy cell-type expression datasets.
Cross-ancestry genetic correlation: S-LDXR estimated trans-ancestry genetic correlation (EAS–EUR), using 1KG Phase 3 ancestry-specific LD scores.
Transferability: Power-adjusted transferability (PAT) ratio was computed using 246 EUR loci (Lee et al. 2018), defining credible sets by r^2≥0.8 within 50 kb and P < 100×P_lead using 1KG EUR LD; a locus was transferable if any credible-set variant was associated in EAS at P<0.05 with concordant direction. Expected transferable counts were power-adjusted; PAT ratio = observed/expected.
Fine-mapping: Within- and cross-population fine-mapping employed SuSiEx (Sum of Single Effects model extension) using 1KG Phase 3 ancestry-matched LD. Loci were extended to at least 1 Mb (±250 kb if needed) and 95% credible sets were identified (max 10 causal signals). LD consistency between reference and discovery was evaluated with SuSiE-RSS diagnostics (s metric, z-score comparisons).
Cross-trait genetic correlations: LDSC v1.0.1 estimated rg between EduYears and 82 EAS-available socioeconomic and health traits (and 64 in EUR), applying FDR control.
Polygenic prediction: PGSs were constructed using PRS-CS and PRS-CSx with 1KG ancestry-matched LD references (EAS n=504, EUR n=503), global shrinkage parameter 0.01. Testing cohorts: three EAS cohorts (EMCIT, Korean-based cohort, UK Biobank Chinese) and one EUR cohort (NIA-LOAD). Prediction accuracy was assessed via partial R^2 adjusting for BY, sex, BY×sex, and top 10 PCs; 95% CIs via bootstrap (1,000 replicates).
Key Findings
EAS GWAS and within-EAS architecture: After QC and imputation, the EAS meta-analysis (TWB and KoGES) included 6,951,085 autosomal variants. Seven genome-wide significant loci (P<5×10−8) with 11 independent SNPs were identified in EAS; all were within ±500 kb of previously reported EUR lead SNPs. SNP-based heritability (LDSC) was 9.7% (SE 0.8%) in TWB, 8.7% (SE 0.9%) in KoGES, 9.0% (SE 0.6%) in the EAS meta-analysis, and 10.7% (SE 0.3%) in EUR. Within-EAS genetic correlation between TWB and KoGES was rg=0.871 (SE 0.073). Mean Fst between TWB and KoGES was 0.005.
Heterogeneity at ALDH2: A locus near ALDH2 on chr12 was significant in KoGES but not TWB, with PheWAS indicating strongest association with total alcohol consumption. In KoGES, alcohol drinking had significant negative global genetic correlation with EduYears (rg = −0.193, SE 0.063, P=0.002) and strong local rg within ALDH2 (≈ −0.82, P=7.4×10−8). Stratified GWAS showed ALDH2 associated among drinkers (P=2.4×10−22) but not in non-drinkers (P=0.032), suggesting gene–environment interaction.
Functional enrichment and tissues: eQTL mapping identified 13 genes across 13 GTEx brain tissues; rs12936234 mapped to DCAKD, NMT1, CIQL1 across ten brain tissues. Stratified LDSC found significant enrichment in EAS for six annotations (e.g., H3K4me1 peaks; FDR<5%); in EUR, 17 annotations including conserved primate phastCons46way were enriched. MAF-dependent enrichment showed more common bins enriched in both ancestries. LDSC-SEG indicated enrichment of EduYears SNP heritability in brain tissues (parietal lobe, putamen) and CNS chromatin marks (fetal brain, dorsolateral prefrontal cortex, inferior temporal lobe). Neuron-specific enrichment exceeded astrocytes/oligodendrocytes in EAS.
Pathways: GSA-SNP2 identified 16 enriched pathways in EAS and 27 in EUR (q<0.05), with 14 shared between ancestries.
Cross-ancestry meta-analysis and MAMA: The cross-ancestry meta-analysis combined 942,745 individuals and 12,232,310 variants, identifying 315 lead SNPs across 102 genome-wide significant loci; all had been previously reported in EUR studies. MAMA EAS-specific results found 94 independent genome-wide significant SNPs, including two previously unreported for EduYears (rs2881903 near FAM81B and rs16930687 near PR11-224P11.1), whereas MAMA EUR-specific identified 357 independent SNPs, all previously reported.
Trans-ancestry genetic correlation and transferability: Cross-population genetic correlation was high (S-LDXR rg=0.873, SE 0.042). The power-adjusted transferability ratio of EUR loci to EAS was 0.62 (95 observed transferable of 153 expected), indicating relatively high transferability.
Fine-mapping: Cross-population SuSiEx fine-mapping yielded more refined credible sets and higher maximum PIPs than EAS-only fine-mapping. For a chr17 locus near 43.18 Mb, EAS-only maximum PIP=0.20 (rs12936234), EUR showed two signals (e.g., rs2867316 PIP=0.87), and cross-population fine-mapping increased maximum PIPs (e.g., rs2867316 PIP=0.90; rs12948326 PIP=0.71; rs11871429 PIP=0.51). LD consistency diagnostics (SuSiE-RSS) indicated good agreement (s metrics 0.007–0.027).
Cross-trait genetic correlations: In EAS, 12 traits were significantly correlated with EduYears (FDR<5%); strongest positive was income (rg=0.93, P=9.87×10−25) and strongest negative was pulse pressure (rg=−0.24, P=6.20×10−10). In EUR, strongest positive was income (rg=0.80, P=2.33×10−70) and strongest negative was type 2 diabetes (rg=−0.33, P=4.24×10−7). Directions were largely concordant across ancestries.
Polygenic prediction: In EAS testing cohorts, cross-population PGSs (PRS-CSx integrating EAS and EUR GWAS) explained up to 4.0% of phenotypic variance in EduYears and outperformed single-ancestry PGSs. In the EUR NIA-LOAD cohort, cross-population PGS explained 6.1% of variance, similar to EUR-only PGS, suggesting limited added value from incorporating smaller non-matched GWAS. Under matched sample sizes, cross-population PGSs still outperformed EUR-derived PGSs in EAS targets, indicating gains from ancestral diversity.
Discussion
Findings demonstrate a shared and highly similar genetic architecture of educational attainment between East Asian and European populations, evidenced by comparable SNP-heritability and high cross-ancestry genetic correlation. Despite fewer genome-wide significant loci in EAS owing to smaller sample size, cross-ancestry analyses improved locus discovery, fine-mapping resolution, and polygenic prediction performance. The heterogeneity at the ALDH2 locus highlights context-specific gene–environment interactions (alcohol consumption) affecting EduYears associations within EAS cohorts. Functional enrichment analyses consistently implicate central nervous system tissues and neurons across ancestries, and pathway analyses reveal largely shared biological pathways. Cross-trait genetic correlations with socioeconomic and health-related traits broadly replicate across ancestries, reinforcing shared pleiotropic architecture. Polygenic prediction benefits from multi-ancestry integration, especially for EAS targets, underscoring the value of diverse cohorts. Nevertheless, the authors emphasize that while EduYears is a useful proxy, the results do not justify clinical applications such as direct-to-consumer PGS usage. Overall, combining diverse population cohorts enhances gene discovery, fine-mapping precision, and predictive utility for educational attainment and informs related health outcomes.
Conclusion
This work presents the first large-scale EduYears GWAS in East Asian populations and a cross-ancestry meta-analysis with European cohorts, revealing high cross-population genetic correlation, shared functional enrichments, and improved fine-mapping and prediction through multi-ancestry integration. Although EAS-specific GWAS identified fewer loci due to current sample sizes, cross-population analyses uncovered additional signals and refined causal sets, including two EAS-specific genome-wide significant SNPs not previously reported. The study demonstrates that leveraging diverse ancestries increases transferability and predictive performance while elucidating shared biology. Future work should expand EAS sample sizes, include additional diverse cohorts, refine phenotype harmonization across educational systems, and deepen functional characterization of identified loci and pathways.
Limitations
EduYears was derived from categorical self-reported educational attainment mapped to years, which may not precisely reflect actual schooling duration (e.g., broad categories like elementary school spanning 0–6 years). Differences in compulsory education between Taiwan and South Korea may constrain phenotypic variation. The EAS GWAS sample size was modest relative to EUR studies, reducing power and yielding fewer genome-wide significant loci, all overlapping with previously reported EUR signals. Fine-mapping relies on external LD reference panels (1KG Phase 3), which can bias results if LD mismatches exist; diagnostics suggested good consistency but residual bias is possible. Environmental differences (e.g., UK Biobank Chinese vs. local EAS cohorts) may affect PGS performance.
Related Publications
Explore these studies to deepen your understanding of the subject.

