Introduction
The fitness landscape of a biomolecule depicts its activity across its sequence space. Molecular evolution is an adaptive walk along this landscape. Understanding how the fitness landscape's topography impacts this walk is crucial. Empirical exploration of these landscapes is challenging due to the vast combinatorial space of biomolecular sequences. Recent advances in high-throughput sequencing and DNA synthesis have made larger-scale experimental analysis feasible. RNA enzymes (ribozymes) are valuable models for studying molecular evolution, with numerous studies mapping their fitness landscapes. Many studies show that wild-types are located near the top of isolated fitness peaks, with few mutational steps leading to significant fitness reduction. However, these findings contradict theoretical works that use predicted RNA secondary structures as fitness proxies and suggest extensive neutral networks. A neutral network connects genotypes with equivalent phenotypes, enabling large mutational distances without fitness loss. This study aims to experimentally verify the existence of extensive neutral networks in ribozymes and assess the predictability of these networks using deep learning.
Literature Review
Previous empirical studies on RNA fitness landscapes often revealed wild-type ribozymes located on or near isolated fitness peaks, implying that evolution away from local optima is difficult. These studies showed rugged fitness landscapes with sparsely distributed fitness peaks and extensive fitness valleys. This contrasts with earlier theoretical studies that, using predicted RNA secondary structures, suggested the existence of extensive neutral networks. A neutral network consists of genotypes connected by single mutations that share the same phenotype (e.g., structure or catalytic activity). These networks allow for significant mutational change without fitness loss, facilitating evolutionary exploration. The current work aims to reconcile these contradictory views through experimental investigation.
Methodology
The researchers used a deep learning-guided evolutionary algorithm to identify neutral genotypes in an RNA ligase ribozyme. They measured the activities of all 2<sup>16</sup> variants connecting two active ribozymes differing by 16 mutations. High-throughput sequencing was used to analyze over 120,000 ribozyme sequences. The experimental workflow involved iterative cycles of library preparation, in vitro transcription, ligation reactions, RNA extraction, reverse transcription, PCR amplification, and Illumina sequencing. Relative activity (RA) was calculated by dividing the fraction ligated (FL) for each mutant by that of the wild type. Successive generations of ribozyme variants were designed using experimental screening combined with in silico selection, recombination, and mutation. Tournament selection was employed, where the highest RA variant was selected from a random subset in repeated tournaments. Genetic diversity was generated via recombination (one-point crossover), mutation (single random substitution), or a combination. A multilayer perceptron (MLP) model was trained using data from previous generations to classify variants as neutral (RA ≥ 0.2) or deleterious (RA < 0.2). The MLP guided in silico evolution, generating generation 8. PAGE analysis was used to validate the sequencing-based RA values. Epistasis analysis was performed using a log-additive model, and the Walsh-Hadamard transform was used to analyze higher-order epistasis in the neutral network. Mutational robustness was assessed by calculating the decay parameter (α) in the directional epistasis model ω(n) = e<sup>−αn</sup>.
Key Findings
The study identified an extensive neutral network linking two ribozymes differing by 16 mutations. Lower-order mutational interactions effectively predicted neutral paths within this network. In silico genetic processes, especially recombination, significantly increased the probability of finding neutral mutants. Machine-learning models, particularly MLP, effectively learned epistatic information and predicted neutral mutants in distant regions of the fitness landscape. The in silico evolutionary algorithm identified neutral mutants with up to 17 mutations. A mutant (F1*U<sup>TM</sup>) evolved through this algorithm exhibited enhanced mutational robustness in the P5 stem region compared to the wild type, indicating localized mutational robustness. The neutral network between F1<sup>U</sup> and F1<sup>*U</sup> was relatively smooth compared to the surrounding sequence space, suggesting that fitness in this network is largely influenced by lower-order mutational interactions. Background-averaged first- and second-order epistatic terms could explain over half of the fitness effects in the neutral network. The MLP model achieved high prediction accuracy by learning these lower-order interactions.
Discussion
The findings challenge previous observations suggesting that long evolutionary paths are often blocked by deleterious mutants, highlighting the potential role of neutral networks in facilitating adaptation. The extensive neutral network discovered demonstrates the potential for evolutionary paths across larger mutational distances. The study demonstrates the usefulness of a combined experimental and in silico approach in exploring complex fitness landscapes. The localized mutational robustness observed in the evolved mutant suggests that evolutionary innovation might be more likely to occur through the expansion of small contiguous motifs. The finding that lower-order background-averaged interaction terms significantly predict fitness underscores the potential for extrapolation from small-scale sampling in fitness landscape analysis. The algorithm's ability to predict distant genotypes using information from lower-order mutants offers a potentially simpler approach to understanding early evolution.
Conclusion
This study provides strong experimental evidence supporting the existence of extensive neutral networks in RNA ligase ribozymes and demonstrates the power of a deep learning-guided evolutionary algorithm in exploring such networks. The findings highlight the importance of neutral networks in facilitating evolutionary adaptation and suggest that evolutionary innovation might occur through the expansion of small contiguous motifs. Future research could explore the generality of these findings across different ribozyme families and investigate the potential for using this approach to predict other parts of the fitness landscape.
Limitations
The study focused on a specific RNA ligase ribozyme and a specific region within the molecule. The generalizability of the findings to other ribozymes or other regions of the same ribozyme requires further investigation. The model might have overfitted to the neutral network, and further statistical tests could improve model performance. The experimental assays may contain global noise that might affect the prediction accuracy.
Related Publications
Explore these studies to deepen your understanding of the subject.