Biology

Experimental exploration of a ribozyme neutral network using evolutionary algorithm and deep learning

R. Rotrattanadumrong and Y. Yokobayashi

This paper by Rachapun Rotrattanadumrong and Yohei Yokobayashi unveils a fascinating analysis of neutral networks in RNA ligase ribozymes through an innovative deep learning-guided evolutionary algorithm. With a study of over 65,000 variants, they provide groundbreaking insights into how lower-order mutational interactions can predict neutral paths, shedding light on the complexity of fitness landscapes.

00:00

~3 min • Beginner • English

Index

Introduction

The fitness landscape of a biomolecule maps genotype to phenotype and frames molecular evolution as an adaptive walk via stepwise mutations. Empirical exploration of such landscapes is challenging due to the vast combinatorial sequence space, though high-throughput methods now enable broader sampling. RNA enzymes (ribozymes) are key models with implications for the RNA world hypothesis. Many empirical RNA landscapes show wild types near isolated peaks, with fitness dropping after few mutations and landscapes appearing rugged and sparsely populated with functional genotypes, limiting adaptive paths. This contrasts with theoretical predictions based on RNA secondary structure that suggest extensive neutral networks connecting genotypes with similar phenotypes. Understanding whether and how neutral networks facilitate accessibility and predictability of evolution is the central question addressed here.

Literature Review

Prior large-scale empirical mappings of ribozymes generally indicated rugged, multi-peaked landscapes where functional genotypes are rare and separated by deep valleys, restricting long adaptive walks. Comprehensive maps for certain RNA functions (e.g., GTP binding and self-aminoacylation) revealed sparse peaks and high ruggedness, implying strong constraints on evolution. Theoretical studies using predicted RNA secondary structures suggested extensive neutral networks of genotypes sharing phenotypes, enabling drift across sequence space. Epistasis, especially reciprocal sign epistasis, has been identified as a key determinant of ruggedness and path accessibility. The discrepancy between theory and experiment motivates experimental tests of neutral network extent and their role in evolvability.

Methodology

The study used an RNA ligase ribozyme (F1U) and quantified activity via a deep-sequencing ligation assay. Libraries comprising all 105 single mutants, all 5355 double mutants, and 4540 random triple mutants of the catalytic core were constructed and assayed. Relative activity (RA) was computed as the fraction ligated normalized to wild type; each library was measured in duplicate, and variants with low read counts were filtered. Epistasis was initially assessed using a log-additive model comparing observed and expected ln(RA) for double and triple mutants. An iterative pipeline combined in silico genetic operations (tournament selection, one-point crossover recombination, and random point mutation) with experimental screening to generate successive generations (1–8) of variants, prioritizing novel sequences each round. Early generations (2–5) always coupled recombination with a random substitution; from generation 6 onward, the algorithm was modified to generate a larger fraction of pure recombinants (without forced substitution) to better preserve function. Machine learning classifiers (logistic regression, linear SVM, k-NN, gradient-boosted decision trees, and a multilayer perceptron, MLP) were trained on one-hot encoded sequences from generations 1–6 to predict neutral (RA ≥ 0.2) versus deleterious variants, with precision and recall used to select a model. The MLP (three dense layers with ReLU, batch norm, dropout; Adam optimizer) provided the best recall and was incorporated into the design of generations 7b–8. Generation 7a increased pure recombinants; 7b and 7c applied MLP filtering to offspring (7c additionally used high-rate shuffling). Generation 8 was produced by 100 rounds of fully in silico evolution (selection, recombination, mutation, MLP classification) starting from generation 7, then experimentally assayed. To directly probe a putative neutral network, a combinatorial library spanning the 16 positions differing between F1U (WT) and an evolved mutant F1U^TM (16 mutations) was synthesized and assayed exhaustively (2^16 = 65,536 variants). Ruggedness and epistasis were quantified by counting reciprocal sign epistasis in 2^2 subgraphs and by decomposing landscape contributions using background-averaged epistatic terms via the Walsh–Hadamard transform, reconstructing ln(RA) with increasing epistasis orders. Mutational robustness was modeled by fitting the decay of neutral fraction with Hamming distance using a directional epistasis model w(n) = e^{-α n}, with β describing directional epistasis strength.

Key Findings

- Initial local landscape around WT was relatively smooth: 75.5% of double/triple mutants were predictable by a log-additive model (low epistasis). - Algorithmic change to favor pure recombination (generation 6 onward) substantially increased neutral fractions; recombination preserved function better than random substitution. - Machine learning improved discovery of neutral mutants at greater distances: MLP trained on generations 1–6 achieved recall 0.93 and precision 0.77, outperforming other models in recall, especially at higher Hamming distances. - Experimental fractions of neutral mutants: 7a = 0.74 (with 80% pure recombinants), 7b = 0.89 (MLP-selected offspring), 7c = 0.89 (high recombination with MLP selection), and 8 = 0.28 despite an average Hamming distance of 13. Neutral mutants with up to 17 mutations were identified; many had RA comparable to WT across 16 steps. - Generation 8 accuracy dropped due to extrapolation beyond training diversity, yet the neutral fraction remained much higher than early generations with smaller distances. - An evolved mutant F1U^TM (16 mutations) retained high activity (RA = 0.63 by sequencing; 0.72 by PAGE) and a similar predicted secondary structure to WT; mutations concentrated in the P5 stem loop. - Directional epistasis fits showed similar overall mutational robustness (α, β) between WT and F1U^TM, with β > 1 indicating excess negative epistasis; however, P5 region exhibited higher local robustness in F1U^TM. - Comprehensive 2^16 WT/Mut library revealed an extensive neutral network: at RA ≥ 0.2, neutral fraction was 0.60 vs 0.11 in generation 1’s local space. Nearly 10% of 10^6 randomly sampled paths connecting WT to F1U^TM were fully neutral at threshold 0.2; 39 paths maintained RA > 0.6. - The neutral network region exhibited lower ruggedness: a smaller fraction of reciprocal sign epistasis across Hamming distances compared to sequences sampled by generations 1–8. - Background-averaged lower-order epistatic terms captured much of the landscape: reconstructing ln(RA) with 1st–2nd order terms yielded R^2 = 0.54 while comprising ~0.2% of all terms; near-perfect prediction required terms up to 7th order. Using only 3rd order terms achieved performance similar to the MLP (R^2 ≈ 0.72). - MLP predictions on the WT/Mut library (without retraining) achieved accuracy 0.71, exceeding null accuracy 0.60, with balanced precision/recall (F1 = 0.77), indicating learned lower-order interactions generalize within the neutral network.

Discussion

The study demonstrates that combining in silico selection, recombination, and mutation with deep learning can efficiently navigate RNA fitness landscapes to find distant neutral mutants. By shifting to recombination-heavy generation strategies and training an MLP on early-generation data, the pipeline discovered high fractions of neutral genotypes even at larger Hamming distances, overcoming the typical scarcity of functional genotypes in holey landscapes. Direct mapping of the combinatorial space between two functionally similar ribozymes uncovered an extensive neutral network with abundant neutral paths, providing strong empirical evidence that such networks can enhance both accessibility and predictability of evolution. Analyses of ruggedness and epistasis showed that the neutral network is smoother than nearby sampled space and that much of the fitness variation can be encoded by lower-order, background-averaged interactions. This explains the effectiveness of the MLP trained largely on lower-order mutants and suggests that evolutionary processes and learning algorithms can exploit sparsity in epistatic structure. Locally increased robustness in the P5 region of the evolved ribozyme illustrates that neutrality-driven walks can concentrate robustness in specific structural modules, potentially facilitating adaptation. Overall, the findings reconcile some theoretical expectations of extensive neutral networks with experimental evidence by identifying contiguous regions where neutral paths and lower-order interactions dominate, enabling practical navigation and prediction within complex landscapes.

Conclusion

This work provides experimental evidence that an RNA ligase ribozyme fitness landscape contains an extensive neutral network connecting genotypes 16 mutations apart, with many accessible neutral paths. A recombination-focused evolutionary algorithm, augmented by a neural network classifier trained on early-generation data, efficiently identified neutral mutants at substantial sequence distances. Landscape analysis showed reduced ruggedness within the neutral network and that lower-order, background-averaged epistasis captures much of the fitness variation, explaining the predictive success of the MLP. These insights suggest that neutral networks can increase the accessibility and predictability of biomolecular evolution and that machine learning can leverage sparse interaction structures to extrapolate beyond local data. Future work should test whether similar neutral networks and lower-order epistatic sparsity are prevalent across other ribozymes and proteins, optimize model architectures and hyperparameters to reduce overfitting at large distances, and explore how localized robustness can seed evolutionary innovation or functional shifts.

Limitations

- Sampling bias: Generations 1–8 represent a biased, partial sampling shaped by the algorithm; ruggedness comparisons beyond the WT/Mut neutral network are limited by coverage. - Generalizability: The comprehensive combinatorial mapping was confined to 16 positions; conclusions may not extend to other regions, ribozyme families, or more structurally divergent genotypes. - Model extrapolation: MLP accuracy degraded for high Hamming distances (generation 8), indicating limits when extrapolating beyond training diversity and potential overfitting within the neutral network. - Experimental noise: Global nonlinearities and measurement errors can contribute to apparent epistasis; although mitigated by replicates and averaging, residual noise may affect metrics. - Hyperparameter optimization: Machine learning models were largely used with default settings; better tuning might improve performance and reduce overfitting. - Neutrality threshold context-dependence: RA ≥ 0.2 as neutrality is context-specific and may not reflect neutrality in all evolutionary scenarios.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Medicine and Health

Recent Advancements and Perspectives in the Diagnosis of Skin Diseases Using Machine Learning and Deep Learning: A Review

J. Zhang, F. Zhong, et al.

Biology

Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter

A. Hoarfrost, A. Aptekmann, et al.

Engineering and Technology

Nondestructive monitoring of annealing and chemical-mechanical planarization behavior using ellipsometry and deep learning

Q. Sun, D. Yang, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny