logo
ResearchBunny Logo
Efficient evolution of human antibodies from general protein language models

Biology

Efficient evolution of human antibodies from general protein language models

B. L. Hie, V. R. Shanker, et al.

This groundbreaking research by Brian L. Hie, Varun R. Shanker, Duo Xu, Theodora U. J. Bruun, Payton A. Weidenbacher, Shaogeng Tang, Wesley Wu, John E. Pak, and Peter S. Kim showcases an innovative method where general protein language models effectively evolve human antibodies, achieving significant improvements in binding affinities and demonstrating broad applicability across protein families.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses whether general evolutionary information, learned from large corpora of natural protein sequences, is sufficient to guide efficient directed evolution toward higher fitness under specific selection pressures without task-specific data. The context is that conventional directed evolution requires high-throughput exploration of vast mutational spaces, much of which yields non-functional variants. The authors propose constraining search to evolutionarily plausible mutations—those consistent with patterns in natural protein sequences—to improve efficiency. They hypothesize that if plausibility correlates with fitness within this constrained regime, then protein language models, trained without antigen-specific or structural information, can suggest beneficial antibody mutations for affinity maturation.
Literature Review
Prior work shows that protein language models trained on millions of sequences capture evolutionary constraints and can correlate with experimental phenotypes and evolutionary dynamics, but mostly retrospectively or across entire landscapes. Conventional directed evolution and deep mutational scanning have revealed that stability and general biophysical constraints shape evolvability and fitness, and framework mutations can contribute substantially to antibody affinity maturation. Antibody-specific language models and site-independent frequency models exist (e.g., abYsis, AbLang, Sapiens), but their ability to generalize without task-specific supervision is limited compared to general protein LMs in this study. The authors position their approach as leveraging unsupervised, general evolutionary rules to guide efficient experimental selection with minimal measurements.
Methodology
- Models and training data: Used general masked protein language models ESM-1b (trained on UniRef50 ~27M sequences) and the ESM-1v ensemble of five models (trained on UniRef90 ~98M sequences). These datasets contain only a few thousand antibody-related sequences and predate SARS-CoV-2 antibodies and variants of concern. - Variant proposal (consensus scheme): For each antibody heavy (VH) and light (VL) chain, compute mutant-to-wild-type likelihood ratios for all single-residue substitutions using each language model. Select substitutions with higher likelihood than wild-type (α=1) across a consensus of models, requiring presence in at least k models and that the substitution has the highest likelihood at its site in at least one model. Typical k chosen to yield ~10 single-site variants per antibody (round 1). Consensus used wild-type marginals p(x|x) to increase stringency. - Experimental evolution: Seven human IgG antibodies targeting influenza HA, ebolavirus GP, and SARS-CoV-2 Spike/RBD were affinity matured: MEDI8852 (matured), MEDI8852 UCA (unmatured), mAb114 (matured), mAb114 UCA (unmatured), S309 (matured), REGN10987 (matured), and C143 (unmatured). Round 1: test 8–14 single-site variants per antibody via biolayer interferometry (BLI) for binding to target antigen (Fab KD for matured antibodies; apparent IgG KD for UCAs, then Fab KD for top variants). Round 2: combine beneficial or neutral substitutions from round 1 into multi-mutation variants (1–11 per antibody) and measure binding. Total 122 variants designed; all but one expressed successfully. - Biophysical and functional assays: Thermostability by differential scanning fluorimetry (Tm). Polyspecificity via PolySpecificity Particle assay (non-specific binding to soluble membrane proteins). Immunogenicity risk assessed computationally by predicted HLA class I/II peptide binders. Functional neutralization using pseudovirus assays (Ebola GP, SARS-CoV-2 D614G and Beta), IC50 from n=4 replicates. - Benchmarking against sequence-only baselines: Compared LM consensus to (i) site-independent frequency models (abYsis curated antibody alignments; UniRef90 MSA-derived frequencies) and (ii) antibody-specific LMs (AbLang, Sapiens). Benchmarked on recommending the same number of single substitutions as round 1 for three unmatured antibodies (MEDI8852 UCA, mAb114 UCA, C143), measuring IgG avidity via BLI. - Generality tests beyond antibodies: Applied the same consensus algorithm (α=1, k=2; loosened to k=1 or α=0.5 when |A|≤5) to eight additional protein families across diverse selection pressures (e.g., β-lactamase antibiotic resistance, HA infectivity, PafA enzyme kinetics, GPCR ADRB2, MAPK1, P53, HIV Env, infA), validating enrichment using published deep mutational scanning datasets. Assessed enrichment via one-sided hypergeometric tests. - Computational efficiency: Pipeline predicts in <1 s per antibody (VH+VL) on a GPU; generated recommendations for 742 therapeutic antibodies (Thera-SAbDab) in ~3 minutes. - Statistical analyses: Correlations (Spearman) between affinity changes and neutralization; simulations to assess significance vs site-independent UniRef90 null; parameter sweeps (α, k) to assess trade-offs between stringency and enrichment.
Key Findings
- Efficient affinity maturation with minimal screening: Across seven antibodies, 71–100% of first-round single-site Fab variants retained sub-micromolar binding; 14–71% improved affinity (≥1.1-fold KD improvement). Many round-2 combinations further improved binding. Notably, 36/76 LM-recommended single substitutions (18/32 affinity-improving) were in framework regions. - Matured antibodies: Achieved further gains despite high starting affinity. • MEDI8852: best variant improved Fab KD up to 7-fold for H7 HK17 (0.21 nM to 0.03 nM) and broadly across HAs. • mAb114: best variant improved Fab KD 3.4-fold for Ebola GP. • REGN10987: 1.3-fold improvement vs Beta S-6P; another design achieved 5.1-fold improvement vs Omicron BA.1 RBD. • S309: best variant surpassed sotrovimab (VH N55Q) with 1.3-fold (Wuhan-Hu-1 S-6P), 1.7-fold (Beta S-6P), and 0.93-fold (Omicron RBD) Fab KD improvements vs wild-type, compared to 1.1-, 1.3-, and 0.82-fold for sotrovimab. - Unmatured antibodies (larger gains): • MEDI8852 UCA: best Fab KD improved 2.6-fold vs H1 Solomon; acquired breadth with 23-fold (H4 Hubei) and 5.4-fold (H7 HK17) improvements. • mAb114 UCA: best Fab KD improved 160-fold vs Ebola GP; even excluding substitutions/sites found in matured mAb114, achieved up to 33-fold improvement (VH G88E/VL V43A). • C143: best designs improved 13-fold (Beta S-6P) and 3.8-fold (Omicron RBD). - Thermostability: Of 31 LM-recommended affinity-enhancing variants tested, 21 had higher Tm than wild-type; all remained thermostable (Tm >70°C). Example: S309 best design Tm 72.8°C vs wild-type 72.5°C, while sotrovimab VH N55Q decreased Tm to 69.6°C. - Polyspecificity and immunogenicity: No substantial changes in polyspecificity across seven antibodies; all variants within therapeutically viable range. No significant increase in predicted HLA class I/II peptide binders (one-sided binomial P>0.05). - Neutralization: Significant IC50 improvements observed (Bonferroni-corrected one-sided t-test P<0.05, n=4). Best examples: 32-fold improvement for C143 vs SARS-CoV-2 Beta pseudovirus; 19-fold for C143 VL T33N-G53V vs D614G; ~2-fold for REGN10987 vs Beta; 1.5-fold for mAb114 vs Ebola. Fold change in KD correlated with fold change in IC50 (Spearman r=0.82; n=15 variants). - Originality of beneficial substitutions: ~16% (5/32) of affinity-enhancing substitutions changed to rare residues in natural antibody repertoires (e.g., MEDI8852 UCA VL G95P: glycine (99%) to proline (<1%)), indicating the model can propose non-intuitive yet beneficial changes. - Outperformance of baselines: General protein LMs consistently outperformed abYsis, UniRef90 site-independent frequencies, AbLang, and Sapiens in recommending avidity-enhancing substitutions to UCAs (simulation-based P=0.0085 vs UniRef90). Site-independent models missed high-fitness substitutions such as C143 VL T33N/G53V and MEDI8852 UCA VL G95P. - Generality across proteins: Language-model-recommended substitutions were significantly enriched for high-fitness variants in 6/9 datasets. Examples of enrichment over background: β-lactamase high ampicillin resistance 7% (background) vs 40% (LM guided); HA high infectivity 7% vs 31%; PafA improved kinetics 3% vs 20%. Even at stringent cutoffs, top LM-selected variants often exceeded the 99th percentile of fitness.
Discussion
The findings demonstrate that evolutionary plausibility learned by general protein language models is a strong prior for efficient directed evolution, enabling improvements in specific fitness measures (e.g., antigen-binding affinity) without any task-specific supervision, structural data, or antigen information. By constraining mutations to those favored by evolution, the search space is biased toward functional and evolvable regimes, substantially increasing the hit rate among a small number of tested variants. The approach yielded affinity gains even for highly matured antibodies and larger gains for unmatured antibodies, with many beneficial substitutions occurring in framework regions, consistent with nature’s use of distal or stability/geometry-modulating mutations. Improvements in binding translated to better neutralization potency, and models generalized to other protein families and selection pressures. The results suggest that general evolutionary rules can guide laboratory evolution efficiently and may mirror natural mechanisms (e.g., somatic hypermutation biases) that accelerate adaptation.
Conclusion
This work shows that consensus recommendations from general protein language models enable rapid, low-throughput affinity maturation of human antibodies, achieving up to 7-fold improvements for matured and up to 160-fold for unmatured antibodies in only two rounds screening ≤20 variants per antibody. The method preserves desirable biophysical properties, improves neutralization, and outperforms antibody-specific and site-independent baselines. The same strategy generalizes across diverse proteins and fitness definitions, enriching for high-fitness variants with minimal measurements. Practical implications include using LM-guided priors to replace or complement random mutagenesis, reallocating experimental capacity to combinatorial variants, and serving as a baseline for future supervised or structure-informed models. Future work could integrate supervised fine-tuning over iterative rounds, explore larger combinatorial designs, refine antibody-specific language models, and investigate mechanistic bases of framework-mediated affinity gains.
Limitations
- Magnitude of affinity gains is generally smaller than those achieved by extensive in vivo affinity maturation, which explores far larger mutational spaces. - Strategy improves existing functions rather than conferring entirely new functions; performance may degrade when selection pressures are highly unnatural or when the wild-type is already at a local fitness peak. - Some evolved variants (e.g., for MEDI8852 and its UCA) showed modest decreases in Tm despite remaining thermostable. - Round-2 combinatorial selection was manually curated for some antibodies due to combinatorial explosion, potentially missing optimal combinations. - John E. Pak’s institutional affiliation was not specified with the numbered footnotes, limiting precise attribution. - The approach relies on priors from natural protein sequences; rare functional innovations not reflected in natural data might be underexplored.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny