logo
ResearchBunny Logo
Introduction
Mendelian Randomization (MR) leverages genetic variants as instrumental variables (IVs) to infer causal relationships between exposures and outcomes. However, MR's core assumptions—relevance, exchangeability, and exclusion restriction—can be violated by pleiotropy (one gene affecting multiple traits), confounding, and heterogeneous causal effects. Pleiotropy, particularly horizontal pleiotropy (a genetic variant affecting an outcome through pathways other than the exposure), leads to biased estimates. Confounding, such as parental traits influencing both offspring BMI and education, similarly violates MR assumptions. Heterogeneous causal effects also emerge when various exposure subtypes or biological pathways impact the outcome differently. Existing MR methods often treat these as separate issues. This paper introduces a novel approach, PheWAS-driven clustering of instrumental variables (PWC-MR), designed to investigate these biases simultaneously by grouping IVs based on their pleiotropic effects. The study focuses on the BMI-educational attainment (EDU) relationship, a complex association previously shown to be potentially confounded by parental factors. By clustering BMI-associated genetic instruments based on their association with a wide range of traits (PheWAS), the researchers aim to uncover distinct mechanisms underlying the observed BMI-EDU relationship and improve the accuracy of causal effect estimation.
Literature Review
Genome-wide association studies (GWAS) have identified numerous genetic variants associated with multiple complex phenotypes, facilitating the annotation of SNPs and their functions, as well as the identification of putative causal genes. As GWAS sample sizes increase, more SNP associations are revealed, improving downstream analyses including polygenic score prediction and causal inference using Mendelian Randomization (MR). MR uses genetic variants as instrumental variables (IVs) to estimate causal effects, offering protection against unmeasured confounding and reverse causality. However, pleiotropy, where a genetic variant affects multiple traits, is a major challenge in MR. Phenome-wide association studies (PheWAS) demonstrate the widespread pleiotropy of genetic instruments. Vertical pleiotropy, where a primary associated trait mediates all other associations, is less problematic than horizontal pleiotropy, which violates the exclusion restriction assumption. Methods like MR-Egger attempt to mitigate bias from horizontal pleiotropy under certain assumptions. Heritable confounders can introduce correlated pleiotropy, significantly biasing causal estimates. Family-based designs, like sibling-pair studies, offer a solution by mitigating biases from population stratification and other confounders. Heterogeneous causal effects, arising from distinct biological mechanisms or exposure subtypes, further complicate MR analyses. Existing methods largely treat horizontal pleiotropy, confounding, and heterogeneous effects as separate issues. The authors highlight the importance of addressing these challenges simultaneously to obtain accurate causal effect estimates.
Methodology
The PWC-MR approach involves three main steps: (1) Instrumental variable (IV) selection and PheWAS: The study selected 324 genome-wide significant SNPs associated with BMI from the UK Biobank. A PheWAS was conducted on these SNPs across 407 traits after filtering for sample size and genetic correlation with BMI. (2) K-means clustering: The standardised effect matrix from the PheWAS was used to perform K-means clustering, with the optimal number of clusters determined using the AIC score. Six clusters were identified. (3) Enrichment analysis and cluster-specific MR: Enrichment ratios were calculated to identify traits strongly associated with each cluster. Cluster-specific MR analyses were then conducted to estimate the causal effect of each cluster on educational attainment. To complement this, the study performed sensitivity analyses. They used within-sibling MR estimates from Howe et al. (2022) to assess robustness against confounding. Childhood BMI, a less confounder-prone measure, was also used as the exposure. Furthermore, they replaced educational attainment with systolic blood pressure (SBP), a phenotype less likely to be affected by confounding, as the outcome. A systematic confounder search was implemented using bidirectional MR to identify traits influencing both BMI and educational attainment. A stepwise multivariable MR (MVMR) was then used to assess the impact of these potential confounders on the BMI-EDU causal effect. Finally, a tissue-specific colocalisation analysis (eQTL data from adipose and brain tissue) was performed to explore potential tissue-specific mechanisms underlying the different clusters.
Key Findings
The PWC-MR analysis revealed six distinct clusters of BMI-associated SNPs, each exhibiting unique trait enrichments. Cluster 2 was enriched for lean mass traits and showed a small negative causal effect on education (-0.09), consistent with within-family studies. Cluster 4, strongly enriched for socioeconomic position (SEP) indicators, exhibited a substantially larger negative effect (-0.49). Sensitivity analyses supported these findings. Within-sibling MR yielded a small negative effect (-0.05), consistent with Cluster 2. Childhood BMI showed a negligible effect (-0.03), with homogeneous causal estimates across clusters, suggesting less confounding. Replacing the outcome with SBP showed homogeneous causal effects across clusters, highlighting that the heterogeneity observed with educational attainment is likely due to confounding rather than true biological heterogeneity. A systematic confounder search identified 19 candidate confounders. Stepwise MVMR including these, and focusing on those with significant effect on EDU, showed a substantial reduction in the causal effect of BMI on education (-0.045), aligning with the estimate from Cluster 2. Comparison with MR-Clust, another IV clustering method, showed some overlap but also important differences, highlighting the value of using external data (PheWAS) in cluster definition. The colocalization analysis did not reveal a strong association between clusters and tissue-specific gene expression, potentially due to limitations in the data.
Discussion
The study demonstrates that the conventional MR estimate of the BMI-EDU causal effect is upwardly biased due to confounding. The PWC-MR approach successfully identified a lean-mass-related cluster (Cluster 2) providing a more plausible, attenuated causal effect. The large negative effect in Cluster 4 is likely driven by IVs associated with SEP, acting as confounders. The sensitivity analyses strengthened these findings: within-sibling MR results mirrored Cluster 2's estimate, childhood BMI analyses showed negligible effects, and the SBP analysis demonstrated homogeneous effects across clusters. The systematic confounder search and subsequent MVMR further supported the confounding explanation. These findings highlight the importance of considering potential confounding factors and heterogeneity when interpreting MR results.
Conclusion
The PWC-MR method offers a valuable tool for improving the accuracy and interpretation of MR analyses by explicitly accounting for pleiotropy and confounding. The study revealed a more accurate, attenuated estimate of the BMI-EDU causal relationship, highlighting the importance of addressing confounding and heterogeneity in causal inference. Future research could explore the application of PWC-MR to other exposure-outcome pairs and investigate potential improvements to the clustering algorithms or confounder identification steps. Further investigation into the specific genetic mechanisms underlying each cluster could provide deeper biological insights.
Limitations
The study's limitations include the availability of traits with PheWAS data and the treatment of binary traits as continuous in the analysis. The genetic correlation threshold used to filter traits is arbitrary. Some p-value thresholds and the selection of specific MR methods are also somewhat arbitrary choices. The identified confounders are potential proxies for true confounders; it's the earlier versions of these exposures that are likely the true confounders. It can be difficult to decide which cluster provides the most reliable causal estimate, especially when the number of clusters increases. The colocalization analysis may have been limited by the high false negative rates and low eQTL sample sizes.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny