logo
ResearchBunny Logo
Introduction
Base editors, powerful tools for precise genome editing, offer programmable single-nucleotide conversions in mammalian genomes. These fusion proteins combine a catalytically inactive Cas9 nuclease (Cas9<sup>D10A</sup>) with a nucleobase deaminase. Guided by a single-guide RNA (sgRNA), the Cas9<sup>D10A</sup>-sgRNA complex binds to a target DNA sequence, creating a single-stranded DNA loop. The deaminase then modifies the DNA within a small window at the 5' end of the target sequence. Two main types exist: cytidine base editors (CBEs) converting C:G to T:A, and adenine base editors (ABEs) converting A:T to G:C. While successful in various organisms, a critical limitation is the induction of off-target mutations, particularly Cas9-dependent ones arising from the tolerance of mismatches between the gRNA and target sequence. Although tools exist for predicting on-target efficiency, accurate prediction of Cas9-dependent off-targets remains a significant challenge. Existing methods for experimentally evaluating off-targets are laborious and time-consuming, necessitating the development of efficient in silico prediction tools. This research builds upon previous high-throughput gRNA-target library screening for SpCas9 activity, extending the approach to develop predictive models specifically for ABE and CBE off-targets. The goal is to create a robust computational tool to predict off-target editing efficiencies, thereby enhancing the safety and precision of base editing applications.
Literature Review
Several studies have explored the prediction of base editing efficiency. Arbab et al. (2020) used machine learning to analyze determinants of base editing outcomes from target library data. Song et al. (2020) developed sequence-specific prediction models for ABE and CBE efficiencies. Xiang et al. (2021) improved gRNA efficiency prediction using deep learning and data integration. However, these studies primarily focused on on-target efficiency prediction, with limited exploration of Cas9-dependent off-target effects. Research on off-target effects has largely relied on experimental techniques like Digenome-seq (Kim et al., 2017, 2019; Liang et al., 2019), which, while sensitive, are resource-intensive and time-consuming. The need for accurate in silico prediction of off-target effects to complement experimental approaches is evident. Existing models struggle with the wide range of mutation types (mismatches, insertions, deletions) and their positional effects on off-target efficiency. This study aims to overcome these limitations by using a deep learning approach tailored to the unique characteristics of ABE and CBE off-target behavior.
Methodology
This study employed a high-throughput screening strategy to generate large datasets of ABE and CBE off-target editing efficiencies. Two gRNA-off-target pair libraries were designed, each including a variety of mutations (mismatches, insertions, deletions) at different positions within the target sequence. These libraries were transduced into human cells stably expressing optimized ABEmax and AncBE4max base editors using the Sleeping Beauty transposon system. After five days, genomic DNA was extracted and subjected to deep sequencing to determine editing efficiencies. The datasets consisted of 54,663 and 55,727 off-target efficiencies for ABEs and CBEs, respectively. Deep learning models, ABEdeepoff and CBEdeepoff, were developed using a fusion embedding-based architecture. This architecture uses a shared embedding matrix for both gRNA and off-target sequences to improve efficiency and generalization. A bidirectional LSTM network was employed for feature extraction, incorporating an attention mechanism to focus on relevant sequence features. The models were trained using a 10-fold GroupKFold cross-validation strategy to ensure robustness. Model performance was evaluated using Spearman correlation and compared against conventional machine learning algorithms (Linear Regression, Ridge Regression, Multiple Perceptron, and XGBoost). The models' explainability was assessed using LayerIntegratedGradient to analyze feature contributions at nucleotide positions. External datasets from the literature were used for independent validation. Finally, an integrated web server (BEdeepoff) was developed to provide user-friendly access to the trained models.
Key Findings
The study generated extensive datasets of ABE and CBE off-target editing efficiencies, comprising 54,663 and 55,727 valid data points, respectively. Analysis of these datasets revealed that all mutation types negatively impacted the off:on-target ratio, with deletions exhibiting a stronger effect than insertions and mismatches. Mutations at positions 1-10 generally showed higher tolerance than those at positions 11-20. The deep learning models, ABEdeepoff and CBEdeepoff, exhibited strong predictive performance. In 10-fold cross-validation, CBEdeepoff achieved a Spearman correlation of 0.863 ± 0.012. Evaluation on external datasets yielded Spearman correlation scores ranging from 0.710 to 0.859. While the models performed well on datasets with 1–3 bp mismatches, 1–2 bp insertions, and 1–2 bp deletions, performance was weaker with more extensive mutations, particularly those identified via the more sensitive Digenome-seq method. LayerIntegratedGradient analysis revealed that mutations negatively contributed to the off:on-target ratio, with mutations at positions 1–10 generally having a smaller impact than those at positions 11–20. The developed BEdeepoff web server allows for both single off-target and genome-wide off-target prediction.
Discussion
The findings demonstrate the successful development of accurate deep learning models for predicting base editor off-target efficiencies. The strong predictive performance, validated by both internal cross-validation and external datasets, highlights the potential of these models to significantly enhance the design and optimization of base editing experiments. The high-throughput screening approach combined with the fusion embedding-based deep learning architecture offers a robust and efficient solution for addressing the challenges of off-target prediction. The web server provides a valuable resource for the broader research community, facilitating the application of base editors with improved safety and precision. The insights into the positional effects of mutations and the model's explainability further contribute to a deeper understanding of base editor specificity. While the model performed less well on external datasets generated with Digenome-seq, which detects lower efficiency off-targets, this highlights the limitations of relying solely on in silico predictions and the importance of integrating experimental validation in practical applications.
Conclusion
This study presents ABEdeepoff and CBEdeepoff, deep learning models for accurately predicting base editor off-target efficiencies. The models' strong performance, validated through extensive datasets and external validation, makes them a valuable tool for improving the design and application of base editing experiments. The user-friendly web server, BEdeepoff, makes these models readily accessible to the wider research community. Future work could focus on expanding the models to encompass a wider range of base editor variants and improving the prediction accuracy for low-efficiency off-targets. Further investigations into the underlying mechanisms of off-target effects could also lead to more refined prediction models and improved base editor design.
Limitations
The models' predictive accuracy was lower for datasets generated by Digenome-seq, particularly those with multiple or more complex mutations. This suggests limitations in predicting low-efficiency off-targets, emphasizing the need for experimental validation. The gRNA-off-target pair library did not include sequences without editable nucleotides, potentially limiting the models' generalizability to such cases. Further refinement of the models and expansion of the training datasets may be needed to address these limitations.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny