The role of rare non-coding variation in complex human phenotypes remains largely unknown. Previous studies have primarily focused on common variants using genotyping arrays or rare coding variants using exome sequencing. However, the vast majority of inherited human genetic variation is both rare and located in non-coding regions. Whole-genome sequencing (WGS) offers the potential to identify rare non-coding variants associated with complex traits, which could reveal new regulatory gene mechanisms and enhance our understanding of human biology and disease. While WGS has successfully identified rare non-coding causes of monogenic diseases, few studies have investigated its role in complex phenotypes. This study aims to identify novel rare variant associations with height, a model complex trait, using large-scale WGS data from UK Biobank, All of Us, and TOPMed cohorts.
Literature Review
Genome-wide association studies (GWAS) using genotyping arrays have successfully identified common variants associated with complex traits like height, explaining a large proportion of the common variant heritability. However, the identification of rarer variation, potentially with larger effects, has been limited. Exome sequencing studies have identified rare coding variants associated with specific diseases, such as loss-of-function variants in *GIGYF1* associated with diabetes. Recent studies using WGS data from TOPMed have shown suggestive associations with rare variants in non-coding regions influencing lipid levels and blood pressure. This study builds upon these findings by investigating rare non-coding variants' contribution to height variation on a much larger scale.
Methodology
This study used WGS data from three datasets: UK Biobank (N = 200,003), TOPMed (N = 87,652), and All of Us (N = 45,445). The analysis focused on rare (<0.1% minor-allele-frequency) single-variant and aggregate testing of non-coding variants in regulatory regions. Variants were annotated using the Ensembl Variant Effect Predictor and categorized into gene-centric (coding and splicing; proximal regulatory) and non-gene-centric (intergenic and intronic) regions. Aggregate testing was performed using three published weights representing in silico predicted deleteriousness (CADD), conservation (GERP), and non-coding constraint (JARVIS). Single variant association testing used REGENIE, conditioning on previously reported height variants. Genomic aggregate association testing used REGENIE, employing burden, SKAT, and ACAT tests. Replication analyses were conducted in TOPMed and All of Us cohorts. Statistical significance was determined using simulations.
Key Findings
The study identified 29 independent rare and low-frequency single variants associated with height (P<6×10<sup>-10</sup>) after conditioning on previously reported variants. These variants had effect sizes ranging from -7.25 cm to +4.71 cm. Three of these variants showed robust replication in TOPMed and All of Us. Additionally, the study identified and replicated three rare non-coding regions associated with height based on aggregate tests. These regions included those proximal to *HMGA1*, *C17orf49* (overlapping *MIR497HG*), and *GH1*. The *HMGA1* region showed multiple rare variants forming an allelic series with substantial effects on height, including a variant altering the transcription start site. The *C17orf49* aggregate association was primarily driven by variants in *MIR195* and *MIR497*. The *GH1* aggregate association included a rare variant previously reported in clinical cohorts for idiopathic short stature. The analysis also revealed a 47,543 bp structural deletion downstream of *SHOX*, associated with lower height and previously reported in clinical cohorts with Leri-Weill dyschondrosteosis.
Discussion
This study's findings demonstrate the importance of rare non-coding variation in influencing complex traits like height. The identification of multiple independent variants and aggregate associations highlights the polygenic nature of height and the contribution of non-coding regulatory elements. The replication of findings across different cohorts strengthens the validity of these associations. The results underscore the potential of WGS to uncover novel genetic mechanisms underlying complex traits. The identified genes and regulatory regions provide valuable insights into the biological pathways involved in human growth, opening avenues for future research into the genetic basis of growth disorders and related conditions.
Conclusion
This large-scale WGS analysis revealed novel non-coding single variants and genomic aggregate loci associated with human height. The study's approach provides a valuable template for future rare-variant analyses of other complex phenotypes. Future studies with larger sample sizes and improved functional annotation will further illuminate the genetic architecture of height and other complex traits.
Limitations
The study's sample size, while large, might still be limited for detecting extremely rare variants. The analysis primarily focused on individuals of European ancestry, limiting the generalizability of the findings to other populations. The lack of high-quality tissue-based functional data for non-coding regions hinders a complete understanding of the functional consequences of the identified variants.
Related Publications
Explore these studies to deepen your understanding of the subject.