logo
ResearchBunny Logo
Introduction
Single nucleotide variations (SNVs) are a leading cause of numerous genetic diseases. While CRISPR/Cas9 technology and base editors (BEs) offer promising therapeutic avenues, creating sufficient cell disease models with pathogenic SNVs for research and therapy development remains a significant challenge. Traditional methods are time-consuming, expensive, and prone to errors. This limitation underscores the urgent need for high-throughput, automated methods for generating these critical models. The current research aims to address this challenge by developing an automated platform for high-throughput genome editing. This platform is coupled with a machine learning model to predict base editing efficiency, thereby significantly improving the efficiency and accuracy of generating cell disease models for research and development of gene therapies. The large number of SNVs associated with known diseases necessitates an automated approach to efficiently generate the required cell models for widespread research and therapeutic development. Current limitations in generating these models include the time-consuming and error-prone nature of manual processes. Existing AI models for predicting base editing performance often rely on integrated editing data, lacking the crucial in situ information of the chromosomal environment, which significantly impacts editing efficiency. The introduction of an automated high-throughput platform coupled with a machine-learning model incorporating in situ data and chromatin accessibility aims to address these issues, and substantially accelerate the progress in this crucial area of genomic research.
Literature Review
CRISPR/Cas9 technology, particularly base editors (BEs), has shown great potential for treating genetic diseases by directly correcting base mutations without causing double-stranded DNA breaks. Three main types of base editors exist: cytosine base editors (CBEs), adenine base editors (ABEs), and glycosylase base editors (GBEs). While BEs offer a solution for many pathogenic SNVs, the construction of mammalian cell disease models for BE optimization and gene therapy applications remains a major bottleneck. Previous attempts to develop AI-based prediction models for base editing efficiency have utilized integrated editing data, neglecting the in situ chromosomal environment. However, studies have shown a strong correlation between editing efficiency and chromatin accessibility. Previous works have demonstrated that gene editing is more efficient in euchromatin than in heterochromatin and that nucleosomes directly impact Cas9 binding and cleavage. This study builds upon these findings by developing an automated platform to generate a large-scale in situ editing dataset, which is then used to train a more accurate predictive model that incorporates chromatin accessibility.
Methodology
The researchers developed an automated high-throughput platform comprising four modules: (1) computerized gRNA design targeting endogenous SNVs, (2) automated gRNA plasmid construction, (3) automated base editing in mammalian cells (HEK293T cells), and (4) machine learning model development for predicting CBE performance. Module 1 utilized the ClinVar database and bioinformatic analysis to select gRNAs targeting 1210 genes. Module 2 employed an acoustic liquid handler and automated colony picker to construct gRNA plasmids at a high throughput (576 plasmids/day). Module 3 used automated liquid handlers for cell seeding, transfection, medium exchange, and sample collection. The entire process, from gRNA design to editing result analysis, was automated. A total of 1210 sets of gRNAs and BE4max plasmids were co-transfected into HEK293T cells within 6 hours, with subsequent medium exchange in 2 hours. Edited cells were harvested for PCR analysis and Sanger sequencing after 5 days of cultivation. The CAELM machine learning model, using XGBoost Regressor, was trained using the large in situ dataset obtained from the automated platform. The model inputs included the 20 bp protospacer sequence and the DNA accessibility value, retrieved from ENCODE. Model performance was evaluated using Pearson's correlation coefficient. To expand the model's applicability, the researchers further trained CAELM on additional CBE types (Anc-BE4max and hyA3-BE4max) and cell lines (HepG2). The relative contribution of DNA accessibility versus sequence context to the editing prediction was also assessed using feature importance scores.
Key Findings
The automated platform achieved comparable or higher base editing efficiencies compared to manual methods at 32 genomic loci. In a large-scale experiment targeting 1210 disease-associated SNVs with BE4max, 823 showed 10-50% editing efficiency, while 248 showed >50% efficiency. The CAELM model demonstrated a strong correlation (Pearson's r = 0.64) between predicted and actual BE4max editing efficiencies in HEK293T cells. This performance was superior to the BE-Hive prediction tool (r=0.53). Expanding the CAELM model to other CBEs and cell lines (HepG2) maintained good predictive accuracy (Pearson's r ranging from 0.42 to 0.87). Feature importance analysis revealed that while the DNA sequence context was the major determinant of editing efficiency, DNA accessibility contributed significantly (less than 16% relative contribution). Finally, the automated platform successfully generated 9 disease-associated SNV cell models for further research using FACS cell sorting, demonstrating the platform's ability to produce homogeneous cell populations for detailed study and optimization of ABE-mediated correction of pathogenic SNVs. These models showed high correction efficiencies (up to 98%), further validating the efficacy of the automated system and the generated cell models.
Discussion
This study successfully developed a highly efficient automated high-throughput platform for in situ genome editing. The associated machine learning model (CAELM), trained on the large in situ dataset generated by the platform, provides a more accurate and realistic prediction of base editing efficiency compared to existing models, incorporating both sequence context and chromatin accessibility. The higher prediction accuracy is attributed to the use of in situ editing data, which avoids the biases of integrated editing data previously used. The platform's capacity to generate large numbers of homogenous cell lines carrying pathogenic SNVs in a short time (a week) significantly accelerates research in gene therapy and disease modeling. The adaptability of CAELM to various CBEs and cell types extends its usability and potential impact on the field. The quantitative assessment of the relative contributions of sequence context and chromatin accessibility provides valuable insights for future optimization of base editors and gRNA design strategies.
Conclusion
This research presents a significant advancement in genome editing technology. The automated platform provides a scalable and efficient method for generating cell disease models, overcoming the limitations of manual methods. The CAELM model offers a powerful tool for predicting base editing efficiency, considering both sequence context and chromatin accessibility. This integrated approach accelerates the development and optimization of base editor-based gene therapies. Future research could explore the integration of additional genomic features and expand the model's applicability to other genome editing tools and organisms.
Limitations
While the automated platform significantly improves efficiency, the success rate of editing varied across different target loci. Some targets showed low editing efficiencies despite the automated process, highlighting the inherent complexities of base editing. Furthermore, the CAELM model's predictive accuracy could be further improved with a larger and more diverse dataset encompassing various genomic contexts, base editors, and cell types. The current model focuses mainly on CBE; additional data with ABEs and GBEs could further improve its predictive capacity and universality. Finally, the validation of the generated cell models was limited to a relatively small subset; a larger-scale validation would further strengthen the study's conclusions.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny