Introduction
Antimicrobial resistance poses a significant global health threat, with projections of 10 million deaths annually by 2050. The overuse and misuse of antibiotics have fueled the emergence and rapid spread of drug-resistant bacteria like MRSA, creating an urgent need for new antibiotics. The high cost and lengthy development time of new antibiotics (estimated at $1.5 billion and over 10 years) highlight the critical need for innovative approaches. Antimicrobial peptides (AMPs), composed of 10-50 amino acids, are considered promising alternatives due to their unique mechanism of action, which involves membrane disruption. This mechanism offers the potential for broad-spectrum activity and reduced resistance development compared to traditional antibiotics. However, traditional experimental methods for AMP identification are time-consuming and expensive, necessitating the development of faster and more efficient technologies. This study addresses this challenge by leveraging the power of advanced machine learning models.
Literature Review
Existing machine learning models have shown promise in peptide discovery, but limitations remain. Virtual screening methods, while useful, are highly dependent on data quality and feature extraction techniques, often lacking in generalization and adaptability. De novo design methods, although capable of generating novel molecules, can be hampered by challenges in peptide synthesis and high experimental costs. Furthermore, the relatively small size of existing experimental AMP datasets compared to those in natural language processing (NLP) presents a challenge for traditional deep learning models. This study aims to overcome these limitations by applying the successful strategies of NLP language models to the challenge of AMP discovery.
Methodology
The researchers developed deepAMP, a peptide language model designed to address the limitations of existing AMP discovery methods. The framework consists of a pre-training phase and a fine-tuning phase. In the pre-training phase, a generalized peptide generative model (deepAMP-general) was trained using a large dataset of 300,000 peptide sequences from UniProt. This unsupervised training allows the model to learn the general rules of peptide sequence syntax and generate diverse, novel peptide sequences. To overcome the limitations imposed by the small number of available AMP sequences, a sequence degradation approach was employed. deepAMP-general was used to transform existing highly bioactive peptides into multiple low-activity peptides, creating AMP pairs to expand the training dataset. This resulted in the generation of 1009 AMP pairs for the subsequent fine-tuning phase. In the two-stage fine-tuning process, the AMP optimization model (deepAMP-AOM) was fine-tuned using the generated AMP pairs to learn the activity characteristics crucial for distinguishing between low- and high-activity AMPs. Additionally, a penetratin optimization model (deepAMP-POM) was fine-tuned to optimize for both broad-spectrum activity and cell permeability. This was achieved by using a set of 29 penetratin sequences and further augmenting the dataset using sequence degradation to create 1000 pairs. Finally, deepAMP-predict was used for screening candidate AMPs prior to experimental validation. The performance of deepAMP was compared to several existing methods, including random mutation, baseline optimization (Baseline-T and Baseline-G), HydroAMP and PepVC, across multiple iterations. The optimization was assessed using established scoring metrics and deepAMP-predict, which involves a support vector machine classifier trained on a dataset of 6760 positive and 6760 negative samples. Following the computational optimization, a library of 92 peptide sequences was synthesized and experimentally validated using MIC and MBC assays against various bacterial strains, including Gram-positive, Gram-negative, and antibiotic-resistant strains. Hemolytic assays, MTT assays were conducted to evaluate the safety and cytotoxicity of the candidate AMPs. Further mechanistic studies, including propidium iodide staining, SEM, membrane depolarization assays, and outer membrane permeability assays, were used to investigate the mode of action of the AMPs. Finally, in vivo efficacy was assessed using a mouse wound infection model.
Key Findings
The study yielded several key findings: 1. deepAMP successfully identified 29 AMPs (18 T1-AMPs and 11 T2-AMPs) with superior antimicrobial activity compared to existing AMPs. More than 90% of the designed AMPs exhibited better antibacterial activity than the original AMPs. 2. Among the designed AMPs, T2-9 demonstrated the strongest antibacterial activity, comparable to FDA-approved antibiotics. 3. Selected AMPs (T1-5, T1-6, and T2-10) significantly reduced resistance to S. aureus compared to ciprofloxacin and showed effectiveness against P. aeruginosa wound infection in a mouse model. 4. Mechanistic studies indicated that deepAMP-designed AMPs act by disrupting bacterial cell membranes. 5. The deepAMP-identified AMPs showed low hemolysis and cytotoxicity. 6. In a mouse wound infection model, the AMPs T1-2, T1-5, and T2-10 significantly reduced bacterial load compared to control. The MIC values of various AMPs against multiple bacteria strains are detailed in Table 1, showing several AMPs with MIC values significantly lower than existing antibiotics. UMAP analysis revealed the distribution of optimized sequences in chemical space, demonstrating the model’s capability to generate chemically novel AMPs. The authors also calculated several physicochemical properties of the AMPs to investigate structure-activity relationships, finding moderate correlations between these properties and MIC values.
Discussion
This study successfully demonstrates the application of a large-scale language model to discover potent AMPs with low resistance potential. The deepAMP framework effectively overcomes limitations of existing computational methods by leveraging the power of pre-training and fine-tuning on a dataset augmented by a sequence degradation approach. The ability to generate diverse, novel peptide sequences and learn complex relationships between sequence and activity is a significant advantage. The in vitro and in vivo results demonstrate the potency and efficacy of the identified AMPs against multiple bacterial strains, including antibiotic-resistant strains. The mechanistic studies further support the membrane-disrupting mechanism of action, suggesting a potential for overcoming antibiotic resistance. The low hemolytic and cytotoxic profiles indicate good safety profiles. While the current study focused on bacterial infections, this approach has the potential to be extended for discovery of AMPs with broader activity.
Conclusion
This study successfully demonstrates the application of a language model-based framework, deepAMP, for discovering potent and broad-spectrum AMPs with low resistance potential. The identified AMPs demonstrated high efficacy against multiple bacterial strains, including drug-resistant ones. Future research should focus on validating the efficacy of lead candidates in additional animal models and exploring the potential for clinical translation. Further development of deepAMP to incorporate 3D structural information and improve model interpretability is also warranted.
Limitations
The study's limitations include a relatively small number of experimentally validated candidate AMPs in the initial phase, though this is a common limitation in initial screening experiments. Further large-scale experimental validation is needed. The in vivo study was limited to a single mouse model; further studies using additional animal models and clinical trials would be beneficial to better understand the broader efficacy and safety of the AMPs. The peptide language model is a black box model, limiting interpretability of the results and hindering feature characterization of the peptides.
Related Publications
Explore these studies to deepen your understanding of the subject.