Medicine and Health
MAIVESS: streamlined selection of antigenically matched, high-yield viruses for seasonal influenza vaccine production
C. Gao, F. Wen, et al.
Seasonal influenza vaccines must be regularly updated because hemagglutinin (HA), the main antigenic target, undergoes antigenic drift allowing immune escape. Selecting candidate vaccine viruses (CVVs) that match circulating strains and grow to high yield in eggs or cell culture is resource- and time-intensive within the WHO GISRS pipeline. Conventional approaches, including egg/cell adaptation and reassortment with high-yield donor strains, can take months and risk undesirable antigenic changes. Existing computational models infer antigenic variants from sequence but do not jointly predict antigenic match and growth yield directly from clinical sequences. This study proposes MAIVESS, a machine-learning framework to predict antigenicity and yield phenotypes from HA sequences to enable rapid identification of naturally circulating, antigenically matched, high-yield strains for vaccine production, demonstrated here for A(H1N1)pdm09.
Prior work has shown that egg or cell adaptation can introduce HA mutations that alter antigenicity and that reassortment does not always improve yield. Computational models have been developed to map antigenic evolution and predict antigenic variants from sequence for influenza A (e.g., H3N2, H1N1), but these do not directly identify strains that both antigenically match vaccine targets and exhibit high yield in eggs or cells. Studies have also documented neuraminidase (NA) antigenic drift and host-dependent glycosylation effects on HA receptor binding. Together, the literature highlights the need for integrated approaches that consider antigenicity and growth phenotypes without relying on prolonged adaptation.
MAIVESS was designed to learn sequence features of three HA-linked phenotypes: antigenicity, growth yield in cells and eggs, and glycan receptor binding. A random mutant library targeting the HA receptor binding site (RBS; H1 residues 119–241; H3 126–244) of A/California/04/2009 (CA/04) was generated by error-prone PCR. From 822 HA mutant plasmids (each with 1–7 mutations), 189 viable reassortant viruses (mutant HA, CA/04 NA, PR8 internal genes) were rescued after up to three passages; substitutions within the RBS pocket were not viable. Phenotyping included: (1) antigenicity via hemagglutination inhibition (HAI) with ferret antisera; (2) yield quantified as TCID50 in MDCK cells and embryonated chicken eggs (high-yield defined as >10-fold TCID50 increase vs WT 6:2 rgCA/04 on same substrate); and (3) glycan binding profiling by microarray of 75 glycans grouped into 27 substructure features, with select validation by biolayer interferometry (BLI). Sequence-serology integration combined the 189-mutant HAI data with archival HAI for seasonal H1 (1977–2009) and A(H1N1)pdm09 (2009–2016). Feature engineering included amino-acid residues and N-linked glycosylation sites, with surface exposure derived from HA structure (PDB 3LZG). For antigenicity and glycan-binding, a multi-task learning group-guided sparse learning model (MTL-GGSL) was used to jointly learn across HAI datasets (tasks) and feature groups (amino acids and glycosylation). For yield, a generalized hierarchical sparse model (GHSM) captured single features and up to third-order interactions (3468 second-order and 12,337 third-order interactions). Predictive models estimated antigenic distance y = x^T(w_global + (1 − μ)w_local), with μ=0.4, and yield scores via GHSM-derived weighted feature sums. MAIVESS was then applied to 11,424 A(H1N1)pdm09 HA sequences (2009–2020) from GISAID to predict antigenic clusters and growth phenotypes (HYcell, HYegg, HYboth). Experimental validation synthesized HA/NA from four MAIVESS-selected epidemic strains (rgSP/16, rgCQ/17, rgBRU/19, rgMAS/20) on PR8 backbone to test antigenicity (antigenic cartography from HAI) and yield (TCID50 in cells and eggs). Structural modeling (based on CA/04 HA complexes PDB 3UBN/3UBQ) evaluated the effects of N159K, K166Q, and S206T on receptor binding pocket conformation.
- Antigenicity: Of 189 mutants, only 5 showed ≥4-fold HAI titer reduction vs homologous antisera; triple mutant D131E-S193T-A198S escaped neutralization by CA/04 ferret antisera. MAIVESS identified 30 HA residues associated with antigenic change, largely within/near Sa, Sb, Ca1/Ca2, and Cal sites; position 225 overlapped with reported egg-adaptive sites. Two antigenic clusters were predicted among A(H1N1)pdm09: CA/09-like and WI/19-like, with N159K driving drift to WI/19-like. - Yield: 14 HYcell and 29 LYcell mutants were identified; the N159D-K166I mutant reached 1.52×10^7 TCID50/mL (~100-fold above WT in cells). In eggs, three mutants (D131E-S193T-A198S, N159D-K166I, I169F-D225G) achieved ~800-fold higher titers than WT and were HYboth. MAIVESS associated 38 residues with yield, predominantly on HA surface near the RBS; outcomes were residue- and chemistry-specific (e.g., at 142, nonpolar favored yield; polar/charged reduced). - Glycan binding: HYcell mutants increased binding to 6′SLN (Neu5Acα2-6Galβ1-4GlcNAc); HY and HYboth expanded binding to 3′SLN and sLe^x, with some increased Neu5Gc-terminated glycan binding. BLI confirmed broadened specificity for certain HYboth candidates. Structural modeling supported N159K-mediated stabilization of 3′SLN binding via interactions with Q192 and Q196 and potential effects of K166Q (130-loop) and S206T (220-loop). - Population predictions: Among 11,424 HA sequences (2009–2020), MAIVESS predicted 155 HYcell, 433 HYegg, and 761 HYboth strains. Of HYboth, 294 were CA/09-like and 467 WI/19-like; HYboth prevalence rose markedly after WI/19-like emergence (2019: 256/2198, 11.65%; 2020: 386/895, 43.13%). WI/19 was predicted to yield ~105-fold (cells) and ~23-fold (eggs) higher than CA/04. - Experimental validation: Four MAIVESS-selected reassortants (rgSP/16, rgCQ/17, rgBRU/19, rgMAS/20) showed antigenic properties matching predictions (two CA/04-like, two WI/19-like) and reached >10^8 TCID50/mL in both eggs and cells, at least 100-fold above WT.
MAIVESS addresses the bottleneck in influenza vaccine seed selection by predicting, from HA sequence alone, strains that simultaneously match circulating antigenicity and achieve high yields in eggs and cells. The approach learned mechanistic sequence features underlying antigenic drift and growth, identifying key residues near the RBS that modulate glycan binding breadth and yield. The framework recapitulated known antigenic transitions (e.g., N159K-driven drift to WI/19-like) and revealed that diversification or enhanced avidity in glycan binding (notably expanded 3′SLN recognition) can facilitate high-yield phenotypes without laboratory adaptation. At the population level, MAIVESS detected increasing frequencies of HYboth strains after 2018–2019, suggesting natural selection of variants compatible with both cell- and egg-based production. Experimental validation confirmed that MAIVESS-selected strains retained antigenic match to target clusters and achieved substantially improved yields, supporting the utility of sequence-based preselection to shorten timelines. While HA was the focus, the authors note that NA antigenicity and additional viral and host factors also influence antigenicity and growth, motivating future model extensions that integrate NA and human serological responses. Overall, MAIVESS demonstrates a practical path to accelerate CVV identification directly from surveillance sequences while preserving antigenic fidelity and ensuring manufacturability.
This work introduces MAIVESS, a machine-learning framework that integrates sparse learning and multi-task modeling to predict antigenic distances and growth yields of influenza A(H1N1)pdm09 directly from HA sequences. Using a targeted mutant library and public serology/sequence datasets, MAIVESS identified antigenicity- and yield-associated residues, mapped antigenic clusters, and prioritized high-yield, antigenically matched candidates. Large-scale predictions uncovered a rising prevalence of HYboth strains in recent seasons. Prospective validation with four reassortant viruses confirmed both antigenic match and >100-fold yield gains over WT in eggs and cells. MAIVESS can reduce optimal vaccine candidate selection from months to days and is adaptable to other influenza subtypes and, with future development, to include NA and human serology. Future research should expand training across subtypes, incorporate NA and comprehensive human immune datasets, and further probe the mechanistic links between HA mutations, glycan specificity, and production yield.
- Current predictions are based on HA sequence; neuraminidase antigenicity and other gene segments that affect fitness and yield are not yet included. - Antigenicity training data rely on ferret antisera, which may not fully reflect human adult immune histories with complex priming. - Host factors (e.g., innate responses, cell line or egg glycome variability) and viral factors beyond HA-receptor binding can impact yield, potentially limiting generalizability. - Some high-yield epidemic strains (e.g., WI/19) bind only 6′SLN, indicating additional glycans or mechanisms are involved that are not fully captured. - Rescue failures for mutations within the RBS pocket constrain exploration of certain sequence space. - Yield was proxied by TCID50 titers; although correlated with protein/HA yields, it is not a direct HA content measurement used in manufacturing. - Glycan microarray and BLI employed a limited panel of glycans and analogs, which may not fully represent in vivo receptor repertoires.
Related Publications
Explore these studies to deepen your understanding of the subject.

