Medicine and Health

Algorithm for optimized mRNA design improves stability and immunogenicity

H. Zhang, L. Zhang, et al.

Discover how a pioneering algorithm called LinearDesign is reshaping mRNA vaccine technology developed by an expert team of researchers, including He Zhang and Liang Zhang. This innovative approach enhances mRNA stability and efficiency, dramatically boosting immune response. Learn how these advancements could revolutionize mRNA-based therapies and vaccines.... show more

Introduction

The study addresses the challenge that mRNA vaccines, while effective and scalable, suffer from chemical instability and rapid degradation, limiting storage, distribution, and in vivo efficacy. Prior evidence links increased RNA secondary structure (lower minimum free energy, MFE) and optimized codon usage to extended mRNA half-life and improved translation. However, designing coding sequences (CDS) that jointly optimize structural stability and codon optimality is hard due to an astronomically large synonymous design space (e.g., ~2.4 × 10^632 sequences for the 1,273-aa SARS-CoV-2 spike). Conventional codon optimization improves codon usage but leaves most highly stable designs unexplored and only marginally affects stability (especially in human GC-biased codon preference). The research question is whether a principled algorithm can efficiently search this space to find mRNA sequences that optimize stability and codon usage simultaneously, thereby improving chemical stability, protein expression, and immunogenicity.

Literature Review

Prior work established that mRNA structure correlates with protein expression via changes in functional half-life (e.g., Mauger et al., PNAS 2019). Codon optimization is widely used to enhance expression but does not sufficiently improve structural stability and may correlate with GC content biases in vertebrates. Earlier dynamic-programming approaches (e.g., Cohen & Skiena 2003; CDSfold, Terai et al. 2016) targeted most-stable CDS design but could not incorporate codon optimality into the objective. UTR engineering strongly affects translation, but most design tools treat CDS and UTRs separately. Recent studies (e.g., Leppek et al. 2022) explored combinatorial optimization of mRNA properties. The present work connects mRNA design to lattice parsing in computational linguistics, enabling efficient joint optimization of structure and codon usage.

Methodology

The LinearDesign algorithm formulates CDS design as an optimization over the space of all synonymous mRNA sequences encoding a target protein. Objectives:

Stability: minimize folding minimum free energy (MFE) using a standard nearest-neighbour RNA energy model.
Joint optimization: combine stability and codon optimality (CAI). The objective is MFE − λ p |log(CAI)|, where p is protein length and λ is a tunable weight (λ = 0 yields stability-only).

Key algorithmic components:

DFA (lattice) representation of synonymous design space: A deterministic finite-state automaton encodes all codon choices per amino acid as a compact graph, concatenated to represent the full CDS. Each path corresponds to a candidate mRNA.
Lattice parsing with SCFG: RNA folding is modeled via a stochastic context-free grammar (SCFG). Lattice parsing intersects the SCFG (stability scoring) with the mRNA DFA (candidate set), folding all candidates simultaneously to find the optimal sequence/structure. The exact algorithm has worst-case cubic time in length but shows quadratic scaling empirically for practical sequence lengths (<10,000 nt).
Weighted DFAs for CAI: Edge weights encode codon relative adaptiveness w(c) via costs −log w(c). Path costs accumulate to −log CAI, enabling joint optimization by weighted SCFG–DFA intersection with optimality guarantees.
Expressiveness: The DFA framework can represent alternative genetic codes, coding constraints, and modified nucleotides.
Linear-time approximation: A beam search variant (beam size b) yields linear-time approximate design with small energy gaps, inspired by LinearFold.

Experimental design and constraints:

For spike designs A–G (LinearDesign) and H (OptimumGene codon optimization baseline), all use identical amino acid sequence, natural nucleotides, and shared UTRs. To avoid translation inhibition from structured 5′ leaders, the first 5 amino acids were excluded from LinearDesign and the first 15 nt chosen heuristically. To mitigate potential innate immune activation from long helices, designs avoided excessively long stems. Additional UTR interactions were assessed computationally; LinearDesign CDSs formed fewer base pairs with common UTRs.

Computational benchmarking:

Runtime scaling evaluated on UniProt proteins for exact and beam-search modes, with and without CAI integration (λ).

Key Findings

Computational results:

Design space and efficiency: The spike CDS has ~2.4 × 10^632 possible sequences; enumeration would take ~10^10 billion years. LinearDesign finds an optimal spike design in ~11 minutes (exact), with empirical quadratic scaling. CAI-integrated exact search (λ = 4) is ~15% slower than MFE-only. Beam search (b = 500) achieves linear scaling; for spike, MFE-only design takes ~2.7 min with ~1.2% energy gap relative to exact.
Stability–codon tradeoff: Under human GC-biased codon preference, conventional codon optimization slightly improves stability but is largely orthogonal to stability optimization. LinearDesign traces the feasibility boundary by varying λ (0 → ∞), enabling access to previously unexplored high-stability regions.
Structural outcomes: For spike, wild type MFE −967.8 kcal/mol (63.4% paired) vs optimally stable design MFE −2,487.3 kcal/mol (83.6% paired), a ~2.6× free-energy reduction. For VZV gE, optimal-CAI design MFE −690.7 kcal/mol (63.3% paired), λ = 4 design −1,043.8 kcal/mol (78.1% paired), optimally stable −1,251.7 kcal/mol (83.7% paired), up to ~1.8× free-energy reduction versus CAI-optimal.

Experimental results for SARS-CoV-2 spike mRNA (HEK293, mice):

Chemical stability: Non-denaturing gel mobility correlates with lower MFE across A–H. In 10 mM Mg2+ at 37°C, half-life t1/2: A = 20.0 h vs H = 3.9 h; in 20 mM Mg2+: A = 12.6 h vs H = 3.3 h.
Protein expression (HEK293, 48 h): All A–G (LinearDesign) exceed H; D and G ~2.3× higher than H; A (lowest MFE) ~2.9× higher.
Immunogenicity (C57BL/6, 2-dose IM): A–D near the optimal boundary elicit 57× to 128× higher anti-spike IgG titres and 9× to 20× higher neutralizing antibody titres than H; robust IFNγ T cell responses induced only by LinearDesign mRNAs.
Comparison with BNT-like control (unmodified nucleotides, matched UTRs): A and C degrade more slowly and express higher protein than BNT; they also elicit higher binding and neutralizing antibody responses than H and BNT.

Experimental results for VZV gE mRNA:

Chemical stability: gE-A (lowest MFE) t1/2 = 66.5 h (10 mM Mg2+) and 50.7 h (20 mM Mg2+) vs gE-Ther 10.9 h and 5.9 h, respectively.
Protein expression (HEK293): gE-B/C/D/E significantly higher than gE-Ther and gE-WT at 48 h (and 24 h). Best performers (gE-B/C/D) lie in a “sweet spot” balancing low MFE and adequate CAI; gE-A (lowest CAI) underperforms despite high stability; gE-E (highest MFE) underperforms.
Immunogenicity (mice): gE-B/C/E induce significantly higher anti-gE IgG titres than gE-Ther or gE-WT.

Discussion

The findings demonstrate that explicitly optimizing RNA secondary structure stability alongside codon usage overcomes the limitations of traditional codon optimization, which cannot access highly stable regions of the design space. The lattice parsing formulation enables efficient global search across exponentially many synonymous CDSs, yielding sequences with substantially lower MFE (more double-strandedness), which correlates with increased chemical stability, higher protein expression, and markedly enhanced immunogenicity in vivo. The improved antibody and neutralization titres for spike, and the superior stability, expression, and immunogenicity for VZV gE, validate the approach across two antigens and different UTR contexts. The results support the hypothesis that lower MFE and appropriate codon usage synergize to improve mRNA performance. The DFA framework’s generality suggests applicability to alternative genetic codes, modified nucleotides, and additional design constraints, and the reduced interaction with UTRs indicates robustness across UTR designs.

Conclusion

This work introduces LinearDesign, a principled, efficient algorithm that reframes mRNA CDS design as lattice parsing, enabling joint optimization of structural stability and codon usage. It computes optimal or near-optimal designs for long proteins (e.g., spike) in minutes and delivers mRNAs with improved in-solution stability, higher cellular expression, and dramatically increased immunogenicity in mice—up to 128-fold higher binding antibody titres compared to a codon-optimized benchmark. The approach opens access to previously unreachable, highly stable sequence regions and is robust across different UTR pairs. Future directions include integrating modified nucleotide energy models, combining with UTR engineering, incorporating additional biological constraints (e.g., innate immune sensing motifs, translation initiation), and extending to a broad class of mRNA therapeutics encoding diverse proteins.

Limitations

Scope is limited to coding region optimization; UTRs were not optimized by the algorithm (though compatibility was assessed computationally and in VZV experiments with different UTRs).
Designs avoided very long helices to mitigate potential innate immune activation; thus the absolute lowest-MFE designs near the boundary were not pursued experimentally.
The nearest-neighbour thermodynamic model was used for standard nucleotides; modified nucleotide chemistry was not modeled, and designs used unmodified nucleotides.
Experimental validation focused on HEK293 cells and mouse models; clinical efficacy and generalizability to other cell types/species were not evaluated.
The objective balances MFE and CAI; other determinants of translation and immunogenicity (e.g., RNA–protein interactions, decay elements) were not explicitly optimized.
Approximate beam search introduces small energy gaps relative to exact solutions (though shown minimal for tested lengths).

Related Publications

Explore these studies to deepen your understanding of the subject.

Social Work

What is wellbeing for rural South African women? Textual analysis of focus group discussion transcripts and implications for programme design and evaluation

G. Ferrari

Medicine and Health

Weaving community-based participatory research and co-design to improve opioid use treatments and services for youth, caregivers, and service providers

R. Turuba, C. Katan, et al.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Engineering and Technology

Photon shifting and trapping in perovskite solar cells for improved efficiency and stability

S. Haque, M. Alexandre, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny