logo
ResearchBunny Logo
Multivariate genome-wide covariance analyses of literacy, language and working memory skills reveal distinct etiologies

Linguistics and Languages

Multivariate genome-wide covariance analyses of literacy, language and working memory skills reveal distinct etiologies

C. Y. Shapland, E. Verhoef, et al.

This groundbreaking study by Chin Yang Shapland and colleagues explores the genetic underpinnings of literacy, phonological awareness, oral language, and phonological working memory in UK youth. The findings reveal shared genetic variation among these traits while highlighting unique genetic influences, particularly for oral language and working memory. Dive into the intricacies of these cognitive abilities and their genetic connections!

00:00
00:00
Playback language: English
Introduction
Reading comprehension, a complex skill, relies on several interconnected abilities. The Simple View of Reading posits that it's a product of decoding and language comprehension. Other models incorporate cognitive resources like phonological working memory (PWM), assessed through non-word repetition tasks. Extensive research using twin studies and genome-wide association studies (GWAS) has shown moderate to strong heritability for reading, spelling, phonological awareness, language comprehension, and PWM. This points to the existence of 'generalist genes' contributing to shared cognitive functions, possibly increasing liability to developmental disorders like dyslexia. However, a complete map of genetic covariance structures—including both broad and unique genetic relationships—is lacking. For example, the genetic links between oral language and reading comprehension are stronger than those between oral language and word reading fluency. This suggests that genetic factors might differentiate meaning-based and code-based abilities, consistent with the Simple View of Reading's two core factors. Existing knowledge primarily comes from twin analyses, which rely on assumptions like equal environments for monozygotic and dizygotic twins. This study uses an independent approach—analyzing data from unrelated individuals in the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort with genome-wide single nucleotide polymorphism (SNP) information—to investigate the multivariate genetic architecture of these traits. Using genetic-relationship-matrix structural equation modeling (GSEM), the study models the multivariate genetic covariance across literacy (reading fluency and spelling), phonological awareness, oral language, and PWM. GSEM adapts multivariate twin models to GRM-based analyses, dissecting phenotypic variation into additive genetic variance (A) and residual variance (E), allowing for estimation of SNP-based heritability and genetic correlations and identification of underlying genetic factor structures. A novel combined Independent Pathway/Cholesky (IPC) model enhances the interpretability of genetic factor structures.
Literature Review
The literature review extensively cites previous research on the genetic and phenotypic relationships between various literacy-related skills. Studies using twin designs and GWAS have consistently demonstrated significant heritability for reading abilities, spelling, phonological awareness, language comprehension, and working memory. These studies have supported the existence of pleiotropic effects, suggesting that some genes influence multiple related cognitive abilities. The Simple View of Reading and other models have provided theoretical frameworks for understanding the interplay between decoding skills, language comprehension, and other cognitive factors in reading. However, the existing literature lacked a comprehensive understanding of the multivariate genetic covariance structure across these different domains, with most studies relying on twin designs which can have limitations in terms of assumptions about equal environments. The authors also highlight the limitations of solely relying on twin studies, citing concerns about equal environment assumptions and representativeness of the general population. The introduction emphasizes the importance of using independent methodologies, such as those using unrelated individuals with genome-wide SNP data, to validate findings from twin studies.
Methodology
The study utilizes data from the Avon Longitudinal Study of Parents and Children (ALSPAC), a UK population-based birth cohort. The sample included 6453 unrelated children and adolescents aged 8-13 years. Eleven measures were collected, covering reading fluency (non-word and word reading speed and accuracy, passage reading speed and accuracy), spelling accuracy, phonemic awareness, listening comprehension, and non-word repetition (PWM). Due to the computational demands of multivariate genetic variance analyses, a two-stage approach was used. Stage 1 involved fitting smaller multivariate models focusing on literacy measures to identify proxy measures of reading fluency and spelling. Stage 2 incorporated these proxies along with phonemic awareness, listening comprehension, and non-word repetition into multivariate genetic models. Three GSEM submodels (saturated Cholesky decomposition, independent pathway, and IPC) were compared for each multivariate model (except for spelling, which used a Cholesky model) based on AIC, BIC, and likelihood ratio tests. Genetic relationship matrices (GRMs) were constructed using genome-wide SNP data. Phenotype scores were adjusted for age, sex, and population stratification before rank transformation to improve model fit. GSEM models were used to estimate SNP-based heritability (SNP-h²), genetic correlations (rg), factorial co-heritabilities, and bivariate heritabilities. Sensitivity analyses were performed using GCTA software for univariate and bivariate GREML analyses to validate GSEM results.
Key Findings
Univariate analyses confirmed moderate heritability for all five traits (reading fluency, spelling, phonemic awareness, listening comprehension, and non-word repetition), ranging from 30% to 50% SNP-h². All traits showed phenotypic and genetic correlations. The multi-domain analyses (using IPC models) revealed a major shared genetic factor across all five traits, explaining a large proportion (13-45%) of the phenotypic variance of each trait and a high proportion of their SNP-h². This highlights extensive pleiotropy. However, substantial trait-specific genetic variance was also evident, particularly for listening comprehension and non-word repetition, indicating unique genetic influences beyond shared factors. Notably, strong genetic correlations between oral language (listening comprehension) and literacy/phonological awareness contrasted sharply with near-zero residual correlations. This suggests that while these abilities share a substantial genetic basis, the environmental factors shaping them are distinct. The analyses were robust across different choices of reading fluency proxy measures (passage reading accuracy and word reading speed). Sensitivity analyses confirmed that bivariate genetic correlations estimated with GSEM and GCTA were highly consistent. Multivariate SNP-h² estimates were consistent with univariate estimates.
Discussion
The findings strongly support the presence of a major pleiotropic genetic factor influencing literacy, phonological awareness, oral language, and PWM. This is consistent with the concept of 'generalist genes' impacting broad cognitive functions. The shared genetic variance between oral language and literacy is noteworthy, extending previous twin study findings. The significant trait-specific genetic variance, particularly for oral language and PWM, indicates the presence of unique genetic influences on these abilities. The marked difference between genetic and residual correlations for oral language and literacy/phonological awareness suggests distinct environmental factors influencing these domains, pointing to different levels of trait modifiability. These findings emphasize the complexity of cognitive development and highlight the importance of considering both shared and specific genetic influences when designing educational interventions.
Conclusion
This study demonstrates the extensive shared genetic basis for literacy, phonological awareness, oral language, and PWM, while also revealing substantial trait-specific genetic effects. The discordance between genetic and residual correlations for oral language and literacy suggests differences in environmental influences and trait modifiability. Future research should investigate the specific genes involved and how they interact with environmental factors to shape these complex skills. Longitudinal studies assessing these abilities across different developmental stages are also crucial to fully understand the complex interplay of genetics and environment.
Limitations
The study's cross-sectional design limits the ability to infer causal relationships and track developmental changes. The reliance on a single cohort (ALSPAC) reduces the generalizability of the findings. Population-level effects like assortative mating might inflate genetic correlations. The computationally intensive nature of the analysis also limited the ability to use a larger, combined model of all 11 measures simultaneously; a two-stage approach was used instead.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny