logo
ResearchBunny Logo
Introduction
The *PAX6* gene, encoding a highly conserved transcription factor, plays a vital role in eye development across various species, including humans. Genetic variants within *PAX6* are associated with a spectrum of ophthalmic disorders, with aniridia being the most prevalent, resulting from *PAX6* haploinsufficiency caused by heterozygous loss-of-function variants. Missense variants, while generally linked to milder phenotypes, have also been implicated in more severe conditions like microphthalmia and anophthalmia. Accurate prediction of the effect of these missense variants remains challenging, with a substantial proportion classified as VUS using established criteria such as those from the ACMG/AMP guidelines. Computational tools offer a valuable approach to assess variant pathogenicity, employing diverse algorithms that consider evolutionary conservation, protein structure, and other features. However, these tools exhibit variability in predictive accuracy across different genes. Some algorithms integrate outputs from multiple tools (meta-predictors) to enhance predictive power. Previous studies have demonstrated varying performance of these tools across different genes, highlighting the need for gene-specific optimization. This study addresses the lack of evaluation and optimization of computational tools for *PAX6* missense variants, aiming to improve the reliability and accuracy of variant classification.
Literature Review
The literature extensively documents the importance of PAX6 in eye development and the diverse ophthalmic disorders associated with its genetic variants. Studies have explored the genotype-phenotype correlations of *PAX6* mutations, observing a range of severity from mild to severe ocular anomalies. The challenges in classifying missense variants as pathogenic or benign are well-established, with a significant proportion remaining as VUS. The use of *in silico* prediction tools for variant interpretation has gained traction, with various tools employing different algorithms and incorporating features like evolutionary conservation and protein structure. However, the accuracy of these tools varies considerably across genes, prompting research into gene-specific threshold optimization to improve their performance. Previous studies have demonstrated the success of this approach in other genes, suggesting its potential applicability to *PAX6*. The lack of a dedicated study on *PAX6* variant prediction motivated the current research.
Methodology
This study utilized a two-pronged approach involving primary and secondary analyses. The primary analysis involved collecting *PAX6* missense variants from publicly available databases, including gnomAD (versions 2.1.1 and 3.1.1), LOVD (versions 2.0 and 3.0), HGMD, and ClinVar (all accessed in February 2023), as well as a PubMed literature search (2021-2023). Variants classified as VUS or with questionable pathogenicity (DM?) were excluded. The remaining variants were categorized into "Primary Dataset Disease" (pathogenic variants) and "Primary Dataset Neutral" (benign variants). Ten commonly used computational tools (AlphaMissense, BayesDel, CADD, ClinPred, Eigen, MutPred2, PolyPhen-2, REVEL, SIFT4G, and VEST4) were evaluated using these datasets. The dbNSFP resource (version 4.1) provided pathogenicity scores, with AlphaMissense scores obtained separately. Default thresholds were used for each tool, classifying variants as "predicted pathogenic" or "predicted benign". Performance was assessed using sensitivity, specificity, accuracy, precision (PPV), and MCC. ROC curves were used to determine optimal gene-specific thresholds for each tool, maximizing the MCC score. The study then explored combining the top three tools using a majority rule method. The secondary analysis involved validating the findings using a local database from the Manchester Centre for Genomic Medicine (MCGM), including variants classified as "likely pathogenic," "pathogenic" ("Secondary Dataset Disease"), and presumed benign variants from the BRAVO database ("Secondary Dataset Neutral"). Fivefold cross-validation was used to assess the robustness of AlphaMissense's performance. Finally, a set of VUS from the MCGM database was analyzed to further evaluate the tools' predictive capabilities. The distribution of variants along the PAX6 protein sequence was visualized using a lolliplot generated by cBioPortal (version 5.4.5).
Key Findings
The primary analysis included 241 *PAX6* missense variants (167 disease, 74 neutral). Using default thresholds, SIFT4G, AlphaMissense, and MutPred2 showed the highest MCC scores. Threshold optimization significantly improved performance, particularly specificity, across all tools. After optimization, AlphaMissense achieved the highest MCC (0.81), followed by SIFT4G and REVEL. Combining the top three tools didn't outperform AlphaMissense alone. Fivefold cross-validation confirmed AlphaMissense's robustness. The secondary analysis (17 variants from MCGM and 65 from BRAVO) corroborated the superior performance of AlphaMissense and SIFT4G with optimized thresholds. Analysis of seven VUS revealed consistent pathogenic predictions for six, but one variant (PAX6 c.926T>G, p.(Phe309Cys)) showed discordant predictions, with AlphaMissense and SIFT4G predicting benignity. Variants deemed pathogenic tended to cluster near the paired domain (PD) and homeodomain (HD) of PAX6, while benign variants were found outside these domains. VUS showed no clear clustering pattern.
Discussion
This study demonstrates that optimizing gene-specific thresholds for computational tools significantly enhances their accuracy in predicting the pathogenicity of *PAX6* missense variants. While most tools perform well in identifying pathogenic variants using default thresholds, their specificity is limited. AlphaMissense's superior performance, especially after threshold optimization, highlights the benefits of deep learning models that incorporate protein structure information. The high conservation of *PAX6* makes tools like SIFT4G, relying on evolutionary conservation, effective. The lack of improvement by combining top tools suggests AlphaMissense alone is sufficient. The discordant prediction for PAX6 c.926T>G, p.(Phe309Cys), highlights the importance of considering factors beyond evolutionary conservation and emphasizes AlphaMissense's ability to identify functionally crucial sites. This study underscores the necessity of gene-specific threshold optimization to enhance the accuracy of computational variant interpretation, leading to more reliable clinical diagnoses.
Conclusion
This study provides valuable insights into optimizing computational tools for *PAX6* missense variant interpretation. Using gene-specific thresholds, particularly for AlphaMissense, significantly improves prediction accuracy. This approach can enhance clinical variant interpretation, facilitating more precise and timely diagnoses for individuals with *PAX6*-related disorders. Future studies should explore a broader range of computational methods, including tools that evaluate splicing and 3D protein structure, and advanced AI algorithms. Integrating multiple lines of evidence, including functional assays and segregation analysis, remains crucial for robust variant classification.
Limitations
The study's limitations include the relatively small number of presumed pathogenic variants due to the rarity of *PAX6*-related diseases. The possibility that some variants used in this study were part of the training datasets of some tools cannot be fully excluded. Some presumed neutral variants might be associated with undetected phenotypes or incomplete penetrance. Further research should address these limitations by incorporating larger datasets, exploring the impact of potential dataset contamination, and considering other mechanisms of variant impact such as splicing and gene expression. The study did not incorporate all potential consequences of missense variants and did not combine conventional methods with approaches analyzing splicing or gene expression.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny