Educational attainment (EA) is a crucial socioeconomic factor linked to health behaviors and outcomes. Previous GWAS meta-analyses of EA used samples of ~1.1 million individuals. This study reports on an updated meta-analysis with a nearly three times larger sample (N = 3,037,499), leveraging data from 23andMe (expanded from ~365,000 to ~2.3 million). The primary analysis is a GWAS of autosomal SNPs, identifying a substantially larger number of significant SNPs (3,952 vs. 1,271 previously). The increased sample size provides more accurate effect size estimates for constructing a genome-wide PGI, which serves as a predictor of EA and related phenotypes.
Literature Review
The study builds upon previous research on the genetic basis of educational attainment. Prior GWAS meta-analyses, although informative, were limited by smaller sample sizes. This study addresses the limitations of previous research by significantly increasing sample size and employing more advanced analytical techniques, such as analyzing dominance deviations and within-family effects. The research also acknowledges and investigates the role of assortative mating in shaping the genetic architecture of EA.
Methodology
The study conducted a meta-analysis of three datasets: publicly available data excluding 23andMe and UK Biobank, new association results from 23andMe, and new results from a UK Biobank GWAS. The analysis focused on autosomal SNPs and included rigorous quality control procedures. Lead SNPs were identified using an iterative clumping algorithm with a pairwise r² cutoff of 0.1. The sensitivity of the findings was assessed using a conditional and joint (COJO) multiple-SNP analysis. In addition to the additive GWAS, the study performed a GWAS of dominance deviations and an X-chromosome GWAS. Polygenic prediction was assessed using three European-ancestry holdout samples (Add Health, HRS, WLS). Prediction accuracy was measured using incremental R² for quantitative phenotypes and incremental Nagelkerke's R² for binary outcomes. Within-family analyses, employing data with genotyped siblings and parents, were performed to estimate the direct and population effects of the PGI. Assortative mating was studied using data on genotyped mate pairs from UK Biobank and Generation Scotland.
Key Findings
The study identified 3,952 approximately uncorrelated SNPs associated with EA at genome-wide significance. The resulting PGI explained 12-16% of EA variance across different datasets. Direct genetic effects accounted for approximately half of the PGI's predictive power for EA and other phenotypes, highlighting the substantial influence of indirect effects and gene-environment correlations. The correlation between mate-pair PGIs was significantly larger than predicted by phenotypic assortment alone, indicating additional assortment on factors correlated with the PGI. Dominance effects of common SNPs were found to be negligible, as indicated by the absence of genome-wide significant SNPs in the dominance GWAS. An X-chromosome GWAS identified 57 significant SNPs. The EA PGI demonstrated significant predictive power for ten common diseases, with higher EA PGI values associated with lower relative risks for these diseases. Analyses in samples of African genetic ancestry showed much lower predictive power than in European samples, due to factors beyond allele frequency and LD differences.
Discussion
The study significantly advances our understanding of the genetic architecture of EA. The large sample size allowed for more precise estimation of genetic effects and a comprehensive analysis of various genetic architectures, revealing the substantial contribution of both direct and indirect effects. The findings challenge models that assume substantial dominance variance, zero gene-environment correlation, or purely phenotype-based assortative mating. The PGI's ability to predict various phenotypes and diseases highlights the importance of EA as a complex trait influencing multiple aspects of health and well-being. The lower predictive power in African ancestry samples emphasizes the need for more research in diverse populations.
Conclusion
This study, with its unprecedented sample size, provides a comprehensive understanding of the genetic architecture of EA. The identification of thousands of associated SNPs and the development of a highly predictive PGI offer valuable insights into the complex interplay of genetic and environmental factors influencing EA and related phenotypes and diseases. Future research should focus on resolving the remaining unexplained variance, examining gene-environment interactions, and replicating these findings in more diverse populations. Larger samples will also power additional analyses, such as estimating SNP effect size differences across phenotypes or populations, and exploring the effects of epistatic interactions.
Limitations
The study primarily focused on individuals of European genetic ancestry. The generalizability of the findings to other populations is limited, requiring further investigation. The reliance on self-reported EA may introduce some measurement error. While the study considered several factors, some gene-environment correlations and gene-environment interactions might remain unaccounted for, potentially affecting the interpretation of results.
Related Publications
Explore these studies to deepen your understanding of the subject.