Biology

Efficient evolution of human antibodies from general protein language models

B. L. Hie, V. R. Shanker, et al.

This groundbreaking research by Brian L. Hie, Varun R. Shanker, Duo Xu, Theodora U. J. Bruun, Payton A. Weidenbacher, Shaogeng Tang, Wesley Wu, John E. Pak, and Peter S. Kim showcases an innovative method where general protein language models effectively evolve human antibodies, achieving significant improvements in binding affinities and demonstrating broad applicability across protein families.

00:00

Playback language: English

Index

Introduction

Directed protein evolution in the lab faces challenges due to the vast sequence space needing exploration to find rare beneficial mutations. Natural evolution utilizes random mutation and recombination, but this approach is experimentally demanding. Artificial evolution often involves significant effort on weakly active or non-functional proteins. To enhance efficiency, this study explores the concept of 'evolutionary plausibility' – incorporating general properties that enhance protein stability and evolvability across protein families. The key question is whether general evolutionary information, learned from patterns in sequence variation, suffices for efficient evolution under specific selection pressures (e.g., higher binding affinity to a specific antigen). This research investigates if protein language models, trained on general protein sequences, can predict mutations enhancing fitness without task-specific training data. The hypothesis was that evolutionary information alone is sufficient to direct evolution towards higher fitness under specific selection pressures, focusing on antibody affinity maturation as a test case.

Literature Review

The study cites previous work demonstrating that language models can predict natural evolution despite lacking knowledge of specific selection pressures, although this prediction was retrospective. The authors aimed to demonstrate that a language model could utilize only a wild-type sequence to suggest a manageable set of variants for experimental testing—a general approach without requiring structural information or task-specific data. Prior research on protein language models and their application in directed protein evolution is also reviewed, highlighting the use of models trained on large, non-redundant protein sequence datasets to learn general evolutionary rules. The potential and limitations of antibody-specific models and models trained with binding affinity data are also discussed, setting the stage for the current study’s focus on general protein language models and their application in antibody affinity maturation.

Methodology

Seven human IgG antibodies targeting antigens from coronavirus, ebolavirus, and influenza A virus were selected for evolution. Evolutionarily plausible mutations were identified using protein language models (ESM-1b and ESM-1v ensemble), trained on broad protein sequence datasets (UniRef50 and UniRef90) without antibody-specific biases. These models assessed the likelihood of single-residue substitutions in antibody variable regions (VH and VL). Substitutions with higher likelihood than wild-type, across a consensus of six models, were selected. A two-round evolution process was used. Round 1 involved measuring the binding affinity (via biolayer interferometry, BLI) of single-substitution variants. Round 2 involved measuring variants with combinations of substitutions selected based on round 1 results. For clinically relevant antibodies (with high initial affinity), the Kd of the monovalent Fab region was measured; for unmatured antibodies, the apparent Kd of the bivalent IgG and the Kd of high-avidity Fab fragments were measured. Evolved antibodies were further characterized for thermostability (Tm), polyspecificity (non-specific binding to membrane proteins), and predicted immunogenicity. Pseudovirus neutralization assays were also conducted. Finally, the study compared the general protein language model approach to other sequence-based methods (abYsis, UniRef90, AbLang, and Sapiens) in terms of their ability to suggest avidity-enhancing substitutions in unmatured antibodies.

Key Findings

All seven antibodies showed improved affinity after screening only 20 or fewer variants. Clinically relevant antibodies showed improvements up to sevenfold; unmatured antibodies showed improvements up to 160-fold. Many variants demonstrated improved thermostability (Tm > 70°C) and maintained or improved viral neutralization activity. The general protein language models effectively guided evolution across diverse protein families and selection pressures beyond antibodies (e.g., antibiotic resistance, enzyme activity). The general protein language models consistently outperformed antibody-specific language models and other sequence-based methods in suggesting avidity-enhancing substitutions. The computational pipeline was highly efficient, taking less than 1 second per antibody for prediction. About half of the affinity-enhancing substitutions were in framework regions, typically not targeted in conventional affinity maturation. Some affinity-enhancing substitutions involved rare residues, indicating that the models learned both common and more complex evolutionary rules.

Discussion

The study's findings demonstrate the potential of general protein language models as a powerful tool for antibody engineering. Their efficiency and ability to suggest beneficial mutations with minimal input data offer a significant advantage over traditional methods. The success across various protein families and selection pressures indicates that evolutionary information alone serves as a potent prior in directing evolution towards higher fitness. The results suggest that when mutations adhere to general evolutionary rules, a substantial fraction will improve fitness. This approach may provide insight into natural evolutionary mechanisms. The study also notes that the approach is more readily applied to improving existing functionalities rather than generating novel functions. However, in cases where there is a need for rapid development, it is a favorable approach.

Conclusion

General protein language models provide a highly efficient approach to affinity maturation, surpassing traditional methods and other sequence-based approaches. This approach is particularly valuable for preclinical antibody development, due to its speed and ability to identify improved variants with minimal input. Future research could explore the integration of this unsupervised approach with supervised methods for even greater efficiency and the application of this method to proteins with less natural selection.

Limitations

The affinity improvements observed, while significant, are lower than those seen in *in vivo* evolution, which explores a much larger mutational space. The study focuses on improving existing baseline functions, and the strategy’s effectiveness may be limited when dealing with unnatural selection pressures or when wild-type sequences are already at a fitness peak. Furthermore, computational prediction of immunogenicity remains a challenge.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

A framework for human evaluation of large language models in healthcare derived from literature review

T. Y. C. Tam, S. Sivarajkumar, et al.

Sociology

Duration of agriculture and distance from the steppe predict the evolution of large-scale human societies in Afro-Eurasia

T. E. Currie, P. Turchin, et al.

Medicine and Health

Evolution of the SARS-CoV-2 spike protein in the human host

A. G. Wrobel, D. J. Benton, et al.

Computer Science

ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models

M. Reza, P. Dushniku, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny