logo
ResearchBunny Logo
Self-driving laboratories to autonomously navigate the protein fitness landscape

Biology

Self-driving laboratories to autonomously navigate the protein fitness landscape

J. T. Rapp, B. J. Bremer, et al.

Discover how Jacob T. Rapp, Bennett J. Bremer, and Philip A. Romero have revolutionized protein engineering with the SAMPLE platform, using intelligent agents to autonomously design and rigorously test new proteins, making significant strides in the efficiency and potential of scientific exploration.... show more
Introduction

The study addresses the inefficiency of traditional, human-driven biological discovery cycles—hypothesis generation, experimental design, wet-lab execution, and interpretation—which often require many iterative cycles over long timelines. The research question is whether a fully autonomous, closed-loop system integrating machine learning and laboratory automation can efficiently explore protein sequence–function landscapes to design proteins with improved properties without human intervention. The context includes growing interest in self-driving laboratories that integrate automated reasoning and experimentation. The purpose is to introduce and evaluate the SAMPLE platform, an intelligent agent coupled to automation, for autonomous protein engineering. The importance lies in accelerating protein design across applications in chemistry, energy, and medicine, and overcoming challenges in biological systems such as nonlinear phenotypes, high-dimensional sequence spaces, and complex, error-prone experimental workflows.

Literature Review

Prior autonomous and semi-autonomous systems have demonstrated successes across domains: gene identification in yeast; autonomous discovery and optimization in synthetic chemistry; and discovery of photocatalysts, photovoltaics, adhesives, and thin-film materials. In biology and synthetic biology, levels of autonomy have varied, with earlier design-build-test-learn pipelines requiring human input and manual steps and lacking full closed-loop autonomy. The literature also highlights challenges unique to biological systems: complex, nonlinear fitness landscapes with inactive regions (holes), high-dimensional search spaces, and multi-step wet-lab processes prone to error and difficult to automate. Bayesian optimization and Gaussian process models have been used to navigate protein fitness landscapes, but naïve approaches can be inefficient when many sequences are inactive. This work builds on these foundations by incorporating multi-output GP modeling and BO heuristics tailored to active/inactive classification and continuous property optimization.

Methodology

The SAMPLE platform comprises: (1) an intelligent agent that models the protein sequence–function landscape and selects sequences to test; (2) a fully automated laboratory pipeline that assembles genes, expresses proteins, and measures biochemical properties; and (3) closed-loop data flow to iteratively refine models and decisions. Modeling and decision-making: The agent poses protein engineering as a Bayesian optimization (BO) problem. It employs a multi-output Gaussian Process (GP) model that jointly classifies active vs inactive sequences and regresses a continuous property (thermostability), addressing landscape holes. Benchmarked on cytochrome P450 data (518 sequences: 331 inactive, 187 active with thermostability), the classifier achieved 83% accuracy and, for active sequences, thermostability prediction Pearson r = 0.84 via ten-fold cross-validation. To improve sample efficiency, two BO heuristics incorporate the GP’s active probability P_active: (a) UCB positive (selects among sequences with P_active > 0.5 using UCB on predicted thermostability), and (b) Expected UCB (UCB score scaled by P_active after baseline shifting). Simulations (10,000 trials) compared random, standard UCB, UCB positive, and Expected UCB. Batch selection variants sequentially add sequences while updating the GP with predicted means. Automated experimental pipeline: Designed sequences are sent to a robotic workflow implementing Golden Gate gene assembly (using pre-synthesized fragments with defined overhangs), PCR amplification (Phusion), verification of double-stranded DNA via EvaGreen fluorescence, cell-free protein expression (T7-based, AccuRapid E. coli extract), and thermostability assays using fluorogenic substrates. The thermostability metric T50 is defined as the temperature causing 50% irreversible inactivation in 10 minutes, derived by fitting a shifted sigmoid to residual activity vs temperature. Throughput: ~1 h gene assembly, 1 h PCR, 3 h expression, 3 h thermostability measurement (~9 h total per design-test-measure cycle). Data quality control: automated checks for gene assembly/PCR success, curve shapes and sigmoid fit quality, and activity above background; inconclusive or failed steps are retried or sequences are re-queued. Combinatorial sequence space: A GH1 glycoside hydrolase sequence space was constructed via a DNA assembly graph combining fragments from natural sequences, Rosetta-designed segments, and evolution-designed segments. The space includes 1,352 unique sequences with broad diversity (average 116 mutations apart, minimum 16), sampling up to six amino acids per site across a TIM barrel fold. Cloud implementation and autonomous runs: The pipeline was deployed on the Strateos Cloud Lab. Four independent SAMPLE agents were seeded with the same six natural GH1 sequences. Using Expected UCB, each agent proposed three sequences per round across 20 rounds, with full autonomy including exception handling and metadata tracking. Human validation: Top designs per agent were expressed in E. coli and evaluated via lysate-based thermostability assays and Michaelis-Menten kinetics to validate autonomous findings under standard protocols.

Key Findings
  • Modeling and BO benchmarking: The multi-output GP classifier achieved 83% active/inactive accuracy; for active sequences, thermostability predictions had Pearson r = 0.84. In simulations on P450 data (10,000 trials), UCB positive and Expected UCB found thermostable variants with a median of 26 measurements, requiring 3–4× fewer samples than standard UCB and random. Smaller experimental batch sizes provided slight performance benefits. - Automated pipeline performance: Reproducible thermostability measurements on four diverse GH1 enzymes had measurement error <1.6 °C. End-to-end autonomous cycle time ~9 h. About 9% of experiments failed (likely liquid handling), with automated QC and retries implemented; two erroneous thermostability assignments to inactive sequences were intentionally left uncorrected to assess agent recovery. - Autonomous exploration outcomes: All four agents rapidly converged on thermostable GH1 enzymes at least 12 °C more stable than the initial natural seed sequences while evaluating <2% of the 1,352-member space. Agents broadly explored before converging on the same global fitness peak; each top discovered sequence was unique. - Decision-making dynamics: Agents’ landscape perceptions evolved with data. Correlation to a unified landscape model (trained on all agents’ data) increased over rounds, with thermostable designs typically found by rounds 11–12 despite moderate landscape correlations (~0.5). Agents initially disagreed on landscape structure (sometimes negative correlations), reflecting exploration of different regions; agreement increased later. Expected UCB decisions prioritized predicted thermostability throughout, with early emphasis on uncertainty and later increased emphasis on P_active; Agent 3 emphasized P_active earlier due to early discovery of thermostable sequences. - Fragment preferences: Thermostable designs commonly contained P6F0, P1F2 or P5F2, and P1F3 fragments, suggesting stabilizing regions. Designs incorporating Rosetta/evolution fragments were rarely active (2 of 7 active, with low thermostability). - Human validation: Lysate-based thermostability (T50, °C): WT Bgl3 44.7; Agent 1 54.6 (Δ9.9); Agent 2 53.0 (Δ8.3); Agent 3 50.9 (Δ6.2); Agent 4 54.6 (Δ9.9). Kinetics (kcat s^-1, KM μM, kcat/KM μM^-1 s^-1) were similar in magnitude to wild type, indicating maintained catalytic activity despite stability engineering, though some parameter shifts were observed. - Throughput and costs: A 20-round run with batch size 3 is estimated at US$5,200 (DNA fragments US$2,400; reagents US$1,300; cloud lab US$1,500). With improved logistics, 20 cycles could be completed in ~2 months on the cloud lab.
Discussion

The findings demonstrate that a fully autonomous, closed-loop system can efficiently navigate a protein fitness landscape to engineer improved properties, here thermostability in GH1 enzymes. By integrating multi-output GP modeling with BO criteria that account for activity likelihood, SAMPLE avoids inactive regions and prioritizes informative, high-value experiments. Convergence to a shared global optimum across independent agents, despite divergent early trajectories due to measurement noise and stochasticity, underscores robustness and sample efficiency. The work shows that self-driving labs can outperform traditional manual cycles in speed and reproducibility, and that multi-agent deployments can explore complementary regions and specialize in different modeling tasks (classification vs property prediction), suggesting opportunities for coordinated multi-agent strategies. The approach is generalizable to other protein functions (activity, specificity, new reactions) provided appropriate assays and instrument integration are available. Implementation on a cloud lab highlights scalability and accessibility. Comparisons to prior semi-autonomous pipelines emphasize SAMPLE’s higher autonomy enabling many more design–test–learn cycles without human intervention.

Conclusion

This work introduces SAMPLE, a self-driving laboratory platform that autonomously learns sequence–function relationships, designs proteins, executes fully automated experiments, and iteratively optimizes toward engineering objectives. In GH1 thermostability engineering, four independent agents discovered thermostable enzymes >12 °C above starting sequences by sampling <2% of a 1,352-member combinatorial space. Tailored BO heuristics and multi-output GP modeling yielded strong predictive performance and sample-efficient optimization, validated by human assays showing substantial T50 improvements with broadly maintained kinetics. SAMPLE is a generalizable framework poised to accelerate protein engineering and synthetic biology. Future directions include integrating advanced analytical instruments (e.g., LC–MS, NMR) to expand assayable functions; scaling combinatorial spaces via large oligo pools; coordinating multiple agents; refining fragment design with conservative mutations and modern design tools (e.g., ProteinMPNN, CADENZ); and improving reliability and throughput on cloud labs to further shorten timelines.

Limitations
  • Demonstration limited to a relatively small combinatorial space (1,352 sequences) and a single objective (thermostability), which is comparatively tractable. - Dependence on fluorescence/colorimetric assays and available instruments constrained the range of measurable protein functions. - Experimental throughput was the bottleneck; practical runs experienced downtime, robotic malfunctions, reagent restocking delays, and ~9% experimental failures. - Measurement noise influenced early decisions and led to divergent search paths; quality filters missed two faulty data points, though agents recovered. - Rosetta/evolution-designed fragments often reduced activity or stability, likely due to overly aggressive sequence changes; design strategies may require more conservative mutations. - Results are specific to GH1 enzymes and cell-free expression/assay conditions; transferability to other protein families and assay contexts requires validation.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny