logo
ResearchBunny Logo
Introduction
Designing artificial proteins for specific functions is a key goal of synthetic biology. Recent advancements in ML-based generative models have significantly improved the design of proteins with individual functionalities (e.g., catalytic activity, binding). However, designing proteins with emergent functions—complex functionalities arising from interactions within a biological system—remains challenging. These emergent functions are crucial for many cellular processes (e.g., cell division, migration) and are difficult to predict computationally. This study tackles this challenge by developing a combined computational and experimental pipeline for screening ML-generated proteins. The pipeline focuses on the MinDE system in E. coli, which exhibits spatiotemporal oscillations crucial for cell division and serves as a well-studied model for emergent functions. The research aims to demonstrate that a combined in silico and in vitro screening approach can effectively identify functional protein variants designed for higher-order functions, bridging the gap between computational design and experimental validation.
Literature Review
The field of protein design has seen significant progress with the introduction of ML-based generative models. These models have been successfully applied to design proteins with individual functions. However, the design of proteins with emergent functions, which depend on complex interactions within a biological system, lags behind. Existing methods for sequence generation, such as conditional generative models and diffusion models, are advancing, but tailored screening methods, both computational and experimental, are lacking for emergent functions. Computational prediction of protein function, especially higher-order functions, remains a significant challenge. Experimental screening is further complicated by the need for specific cellular environments to observe emergent functions. Existing in vitro screening systems are inadequate for such specialized functions. The MinDE system in E. coli, which involves ATP-driven reaction-diffusion dynamics to determine the division site, serves as an ideal model for developing a screening pipeline for emergent functions due to its well-characterized behavior and in vitro reconstitution potential.
Methodology
The study employed a multi-stage pipeline: 1. **Sequence Generation:** A Multiple Sequence Alignment-based Variational Autoencoder (MSA-VAE) was trained on MinE sequences to generate 4000 variants. Sequences with >60% identity to the wild-type E. coli MinE were removed, and the remaining sequences were clustered by 60% identity, with one representative selected per cluster. 2. **In silico Screening:** A divide-and-conquer approach was used, estimating three sub-functions crucial for MinE's oscillation: membrane binding (predicted by ProteinSol Patches), MinD interaction (AlphaFold2 Multimer Predicted Align Error), and homodimerization (AlphaFold2 Multimer PAE). Solubility in E. coli was also predicted (ProteinSol). These scores were normalized and combined into a Function Score, ranking the variants. The top and bottom 24 variants were selected for further experimental analysis. 3. **In vitro Screening:** A cell-free protein synthesis system (PURE system) was used for rapid expression of the selected variants. The proteins were then encapsulated in lipid droplets with EGFP-MinD and ATP. Light microscopy was used to observe spatiotemporal patterns, identifying 14 positive variants. 4. **In vivo Screening:** The 14 positive variants were introduced into a ΔminDE E. coli strain. Cell morphology (normal, minicell, filamentous) and Min oscillations were assessed. 5. **Functional Analysis of synMinEv25:** The best-performing variant (synMinEv25) was further characterized in vivo (growth rate, cell size distribution, oscillation period) and in vitro (membrane binding, MinD ATPase stimulation, oligomerization). The study also explored post-hoc analysis of the scoring system and compared the results with traditional methods like sequence similarity and HMM profiles.
Key Findings
The study successfully generated MinE variants using an MSA-VAE. The divide-and-conquer in silico screening approach effectively predicted the functional potential of the variants. Cell-free protein expression and encapsulation in lipid droplets provided a rapid in vitro screening platform, identifying 14 promising variants. In vivo analysis demonstrated that seven of the high-scoring variants induced Min oscillations in E. coli, and synMinEv25 fully substituted the wild-type MinE function, restoring normal cell growth and morphology. The post-hoc analysis showed that the initial function score effectively distinguished between functional and non-functional variants and outperformed traditional methods based on sequence similarity. In vitro characterization showed that synMinEv25 exhibited similar functional properties as the wild-type MinE in assays for membrane binding, MinD ATPase stimulation, and oligomerization. Interestingly, the study revealed a correlation between specific in vitro functional characteristics and specific in vivo phenotypes (minicell vs. filamentous). SynMinEv25 exhibited less than 50% sequence identity and less than 70% sequence similarity to wild-type MinE, highlighting the successful generation of a functional homolog using the developed pipeline.
Discussion
This study demonstrates a successful integration of ML-based protein design and a combined in silico and in vitro screening pipeline for evaluating emergent functions. The divide-and-conquer approach effectively breaks down the complex emergent function into simpler, measurable sub-functions. The use of synthetic cell mimics and cell-free protein expression significantly accelerated the experimental screening process. The successful in vivo substitution of the wild-type MinE gene by a designed variant underscores the potential of this strategy for engineering cellular functions. The findings suggest that this integrated approach, combining computational predictions with experimental validation, is a viable strategy for designing proteins with complex emergent behaviors. The study also provides insights into the relationships between specific protein properties and cellular phenotypes, which can inform future design efforts.
Conclusion
This work provides a proof-of-concept for designing and screening proteins with emergent functions using a combination of machine learning, in silico analysis, and in vitro/in vivo testing. The successful creation of a functional MinE homolog that completely replaces the wild-type gene in E. coli demonstrates the pipeline's effectiveness. Future research could explore the application of this pipeline to design proteins with other emergent functions, including those involved in cell motility, signaling, and other complex cellular processes. Adapting the approach to diverse biological systems and refining the in silico scoring methods will further enhance its power and versatility.
Limitations
The study focused on the MinDE system, which might limit the generalizability of the findings to other systems. The in silico scoring relies on the availability of information on the sub-functions necessary for the emergent function. This limits the applicability to systems with less well-characterized emergent functions. The accuracy of AlphaFold2 predictions might influence the reliability of the in silico scoring, although structural features were used directly for assessing the sub-functions in post-hoc analysis. The in vitro screening system, while efficient, does not fully replicate the complexity of the cellular environment, which could lead to some discrepancies between in vitro and in vivo observations.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny