
Biology
Self-driving laboratories to autonomously navigate the protein fitness landscape
J. T. Rapp, B. J. Bremer, et al.
Discover how Jacob T. Rapp, Bennett J. Bremer, and Philip A. Romero have revolutionized protein engineering with the SAMPLE platform, using intelligent agents to autonomously design and rigorously test new proteins, making significant strides in the efficiency and potential of scientific exploration.
Playback language: English
Introduction
Protein engineering holds immense potential across various fields, yet the process of creating proteins with improved or novel functions remains inefficient and labor-intensive. Traditional protein engineering relies on a cyclical process of hypothesis generation, experimental design, wet-lab execution, and data interpretation, often taking years to complete. This iterative process, while successful in many instances, is hampered by its inherent inefficiency, repetitiveness, and reliance on human intervention. This research addresses these limitations by introducing a fully automated system, leveraging advancements in robotic scientists and self-driving laboratories. These systems combine automated learning, reasoning, and experimentation to accelerate the scientific discovery process, showcasing superior capabilities compared to human researchers in learning from diverse data sources, decision-making under uncertainty, continuous operation, and highly reproducible data generation. While autonomous systems have been applied to various fields, including gene identification, chemical synthesis, and material discovery, their application to protein engineering has been challenging due to the complexity of biological phenotypes, high-dimensionality of genomic search spaces, and the difficulty of automating error-prone experimental steps. Existing automated workflows often require some human input, hindering full autonomy. This study aims to overcome these challenges by developing a fully autonomous platform for protein engineering, eliminating human intervention and subjectivity.
Literature Review
Existing literature highlights the potential of automated systems for accelerating scientific discovery, particularly in fields like synthetic biology. Studies have demonstrated the use of automated workflows for gene identification, chemical synthesis, and the discovery of new materials. However, fully autonomous systems for protein engineering are less common. Previous work has shown automated workflows for synthetic biology, but these often require some human input and manual sample processing, falling short of complete autonomy. The complexities of biological systems, including nonlinear relationships between sequence and function, and the challenges associated with automating wet-lab experiments have hindered the development of fully autonomous protein engineering platforms. This paper builds upon the existing knowledge by developing a system that addresses these challenges and achieves a high degree of autonomy.
Methodology
The Self-driving Autonomous Machines for Protein Landscape Exploration (SAMPLE) platform integrates an intelligent agent with a fully automated robotic system. The intelligent agent employs Bayesian optimization (BO) and a Gaussian process (GP) model to learn sequence-function relationships and design new proteins. The GP model simultaneously models protein activity (active/inactive) and a continuous property of interest (e.g., thermostability). Two heuristic BO methods, UCB positive and Expected UCB, were developed to focus sampling on functional sequences. The robotic system automates gene assembly using Golden Gate cloning, cell-free protein expression, and biochemical characterization. The entire process, from sequence design to data acquisition, is automated, enabling seamless design-test-learn cycles. The system incorporates multiple layers of exception handling and data quality control to ensure reliability. A glycoside hydrolase combinatorial sequence space was designed, comprising natural sequences, Rosetta-designed sequences, and evolution-designed sequences. Four independent SAMPLE agents, seeded with the same initial sequences, were deployed to optimize the thermostability of these enzymes. Experiments were conducted on both an in-house Tecan liquid-handling system and the Strateos Cloud Lab for enhanced scalability. The agents' performance was evaluated by monitoring their optimization trajectories and analyzing their internal landscape perception. Finally, human characterization of the top-performing enzyme sequences was performed using standard molecular biology and protein engineering protocols to validate the SAMPLE system's findings.
Key Findings
Four independent SAMPLE agents successfully engineered glycoside hydrolase enzymes with enhanced thermal tolerance, achieving thermostability improvements of at least 12 °C compared to the initial sequences. Despite individual differences in search behavior stemming from experimental noise, all agents converged on highly stable enzymes, exploring less than 2% of the full sequence space. The multi-output GP model showed excellent predictive ability, achieving 83% accuracy in active/inactive classification and a Pearson correlation of 0.84 for thermostability prediction. The UCB positive and Expected UCB methods outperformed standard UCB and random methods in simulated experiments, requiring significantly fewer samples to reach thermostable sequences. The automated experimental pipeline demonstrated high reproducibility, with thermostability measurement error less than 1.6 °C. The agents' search trajectories, while initially diverging due to experimental noise, ultimately converged on the same global fitness peak, highlighting the robustness of the system. Analysis of the agents' internal landscape perception revealed that their understanding of the fitness landscape refined over time, leading to the discovery of thermostable sequences even before a complete understanding of the landscape was achieved. The agents prioritized thermostability predictions, initially exploring uncertain regions and later emphasizing the probability of active enzymes. Human characterization of the top sequences confirmed the significant thermostability enhancements achieved by the SAMPLE agents, with some designs showing nearly a 10 °C improvement. The kinetic properties of the designed enzymes were comparable to the wild-type enzymes, demonstrating that thermostability improvements were not achieved at the expense of catalytic activity. The total duration for a 20-round optimization, while initially six months due to unforeseen delays, is estimated to be reduced to two months with better planning and continuous operation.
Discussion
The SAMPLE platform represents a significant advancement in autonomous protein engineering. The successful engineering of thermostable glycoside hydrolase enzymes, surpassing improvements achieved by other methods, validates the effectiveness of the integrated intelligent agent and automated robotic system. The platform's autonomy allows for a significantly higher throughput compared to traditional methods, accelerating the protein engineering process. While the current demonstration focused on thermostability, the general approach is applicable to various protein engineering goals, including enzyme activity, specificity, and the creation of novel chemical reactions. The primary bottleneck is the experimental throughput, currently limited by the robotic system's speed and potential downtime. The platform's modularity allows for integration of more advanced analytical instruments, expanding the types of protein functions that can be engineered. The use of the Strateos Cloud Lab enhances the accessibility and cost-effectiveness of the platform. The observation of diverse search strategies among the four identical agents, despite converging on the same optimal solution, highlights the potential for coordinating multiple agents to further enhance efficiency.
Conclusion
SAMPLE demonstrates a fully autonomous platform for protein engineering, successfully engineering thermostable glycoside hydrolase enzymes. Its high throughput and scalability, combined with the potential for integrating advanced analytical tools, offer significant advantages over traditional methods. Future research could focus on scaling the combinatorial sequence space, exploring more conservative design algorithms, and developing strategies for coordinating multiple agents to optimize protein engineering efficiency. The platform's modular nature allows for continuous improvement and expansion, paving the way for more efficient and rapid advancements in protein engineering and synthetic biology.
Limitations
The current implementation of SAMPLE is limited by the size of the combinatorial sequence space and the complexity of the biochemical assays. The robotic system's throughput is a major bottleneck, although this could be improved with faster liquid-handling systems and better error handling. Unforeseen issues, such as shipping delays, can impact the timeline for optimization runs. The current system requires a relatively straightforward assay; more complex assays would require more advanced instrumentation and potentially more sophisticated data analysis techniques. Finally, the current study focused on a specific set of enzymes and a particular property (thermostability), and it's important to evaluate the generalizability of the results to different protein types and properties.
Related Publications
Explore these studies to deepen your understanding of the subject.