Medicine and Health

AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery

Y. Xu, S. Ma, et al.

Discover the innovative AGILE platform that revolutionizes mRNA delivery through ionizable lipid nanoparticles (LNPs)! This exciting research by a team from the University of Toronto showcases how deep learning and combinatorial chemistry can enhance LNP customizations for diverse therapeutic applications.... show more

Introduction

Messenger RNA (mRNA) has broad biomedical applications but requires effective delivery systems due to instability and nuclease susceptibility. Ionizable LNPs enable efficient mRNA encapsulation, endosomal escape, and reduced toxicity at physiological pH. Existing FDA-approved RNA LNPs use distinct ionizable lipids, underscoring their critical role. Despite advances in rational lipid design and combinatorial chemistry, exploring sufficiently large chemical spaces remains challenging and costly. There is a need for strategies that accelerate discovery of ionizable lipids tailored for specific cell types and tissues. Deep learning can extract structure-activity relationships from large datasets, offering a data-driven route to navigate vast molecular spaces. This work proposes AGILE, a platform that combines self-supervised pre-training on a large virtual library and supervised fine-tuning on high-throughput wet-lab data to predict mRNA transfection potency (mTP) across cell types and rapidly identify potent, cell-selective ionizable lipids.

Literature Review

Prior work used multi-component (3-CR) combinatorial chemistry (e.g., Ugi reactions, Michael addition) to generate diverse ionizable lipid libraries enabling rapid synthesis and screening, with successes such as STING-activating lipids for vaccines and >700-lipid libraries yielding lung-optimized lipids. However, synthesizing and testing hundreds of thousands of candidates is time-consuming and expensive, limiting exploration of chemical space. Deep learning has shown promise in chemical discovery by learning from large molecular datasets to generalize to unobserved molecules. Earlier ML-assisted LNP efforts lacked extensive pre-training and integration of structural encoders; AGILE addresses this gap by combining a GNN-based graph encoder, molecular descriptors, self-supervised pre-training (warm-started from MoICLR trained on >10M molecules), and supervised fine-tuning with high-throughput mTP data.

Methodology

AGILE is a three-stage platform. Stage 1: Virtual library design and self-supervised pre-training. A 60,000-member virtual ionizable lipid library (head groups with ionizable amines; two tails with varied chain lengths, saturation, and ester positions, spanning C6–C26) was constructed using Ugi combinatorial chemistry rules via ChemAxon Markush editor and exported as SMILES. A graph encoder (GIN-based GNN warm-started from MoICLR) was continuously pre-trained on the virtual lipids via contrastive learning using graph augmentations (atom masking and bond deletion) to learn lipid-structure representations. Stage 2: High-throughput synthesis, screening, and supervised fine-tuning. An automated liquid-handling HTS platform executed one-pot Ugi 3-component reactions to synthesize 1,200 ionizable lipids (20 head groups; 12 ester-linked tails; 5 isocyanide tails). LNPs were formulated at standard molar ratios with helper lipids (e.g., DOPE, cholesterol, PEG-lipid) and loaded with firefly luciferase mRNA (mFLuc). mTP was defined as log2 of luminescence intensity ratio between transfected and untreated cells at 24 h. HeLa and RAW 264.7 cell lines were used for in vitro screening. The AGILE model was fine-tuned via supervised regression on mTP using both GNN structural embeddings and Mordred molecular descriptors (after preprocessing and feature selection to 813 descriptors), with an 80/10/10 scaffold-based train/validation/test split. Stage 3: Candidate library prediction and diversity-aware ranking. A 12,000-member candidate library was filtered from the virtual set by: retaining tertiary amine-containing lipids; excluding tails C18; and ensuring reagent commercial availability. An ensemble of the top 5 fine-tuned models (selected by RMSE and PCC) predicted mTP; for each molecule, mean minus standard deviation of ensemble predictions yielded a score. Candidates were ranked within head groups and tail combinations to enhance structural diversity; the top five head groups and top three tail sets per head were chosen (15 per cell-type screen) for synthesis and validation. Experimental details: Automated synthesis/formulation and luciferase assays were performed on liquid-handling robots; properties of robot-formulated LNPs matched manual methods. Post-HTS, lead formulations underwent Design of Experiments (Box–Behnken) optimization. In vivo validation included intramuscular injection of mRNA-LNPs in mice with IVIS imaging, biodistribution using Cy5-labeled mRNA, and Cre mRNA in mTmG reporter mice. Benchmarks included MC3 (DLin-MC3-DMA) and ALC-0315. Cytotoxicity and stability tests, encapsulation efficiency (Ribogreen), size/PDI/zeta (DLS), and pKa (TNS assay across pH 2–11) were performed. Model architecture: five-layer GIN with ReLU; average pooling to 512-dim, MLP to 256-dim structure embedding; descriptor encoder MLP to 100-dim; concatenation and two-layer MLP for prediction. Pre-training used Adam, weight decay 1e-5, temperature 0.1, batch 512, 100 epochs. Fine-tuning used Adam, weight decay 1e-5, batch 128, 30 epochs. Baseline comparisons: Ridge, Lasso, Gradient Boosting, SVM trained on descriptors only. Model interpretation: Integrated Gradients (Captum) for descriptor saliency; candidate similarity networks via cosine similarity of learned embeddings; ExMol for counterfactual regions in structures.

Key Findings

AGILE prediction performance and ranking: On the 1,200-lipid HeLa dataset, AGILE outperformed Ridge, Lasso, Gradient Boosting, and SVM (R²=0.249; PCC=0.573). Predicted vs actual mTP stratified well by head groups and tails; lipids predicted in the top 16% had a 0.41 probability of being in the top tier in vitro.
Muscle-targeted candidate H9: From a 12,000-member candidate set (HeLa-tuned model), AGILE selected top 15 lipids; H9 showed markedly superior in vitro mTP versus initial library leads and consistently surpassed ALC-0315 in HeLa for both DOPE- and DSPC-containing formulations. After DoE optimization, H9 LNPs delivered mRNA to mouse muscle by IM injection with 7.8-fold higher efficiency than MC3 and comparable to ALC-0315. H9 exhibited significantly lower off-target liver expression than MC3 and ALC-0315. In mTmG Cre-reporter mice, H9 and ALC-0315 produced similar GFP at the muscle injection site, but H9 showed significantly lower liver GFP. Biodistribution with Cy5-mRNA indicated reduced liver accumulation for H9 vs ALC-0315, supporting muscle specificity via decreased non-target distribution. In vaccination with mOVA, H9 induced anti-OVA IgG comparable to ALC-0315 and showed lower ALT/AST, suggesting reduced hepatotoxicity.
HeLa as screening surrogate for IM: Correlation of in vitro mTP with IM in vivo was similar in HeLa and C2C12 cells (PCC 0.78 vs 0.756), supporting HeLa for initial HTS for IM delivery.
Macrophage-targeted candidate R6: After fine-tuning on RAW 264.7 data, AGILE identified 15 macrophage candidates; 11 exceeded MC3 in vitro. R6 was optimized (including DOTAP helper lipid) and showed superior mRNA delivery in RAW 264.7 vs H9 and MC3; for GFP-mRNA, R6 achieved ~5-fold higher transfection than H9 and MC3. R6 underperformed in muscle vs H9 in vivo IM, consistent with cell-type specificity.
Model insights: Key descriptors for HeLa included VSA_EState3 and SssNH; for RAW 264.7, SpDiam_Dzi and VR3_D were influential, indicating cell-type-specific physicochemical preferences. In RAW 264.7 predictions, Tail 1 carbon length exhibited a non-monotonic relationship with mTP: increasing from C10 to C12 improved predicted potency; further increases reduced potency. Variance in predicted mTP increased for Tail 1 >C12. Correlations (PCC) between predicted mTP and chain length were −0.58 (top head group) and −0.39 (all head groups) for Tail 1, versus −0.15 for Tail 2. For HeLa, chain-length correlation was weaker (PCC −0.22). Mid-ranked (31–45) candidates did not exceed ALC-0315, and bottom-ranked candidates showed minimal mFluc expression, aligning with AGILE predictions.

Discussion

AGILE integrates self-supervised representation learning with HTS-derived supervised fine-tuning to decode structure–activity relationships for ionizable lipids and accelerates discovery of potent, cell-selective LNPs. Pre-training on lipid-like structures confers generalizable molecular understanding that improves prediction accuracy, as supported by ablation. Ensemble prediction with diversity-aware ranking enables efficient triage of large libraries. Experimentally, AGILE identified previously unreported ionizable lipids (H9 and R6) with strong, cell-specific performance: H9 provided IM muscle-selective delivery with reduced liver exposure and favorable safety signals versus a clinical benchmark, and R6 enabled efficient macrophage transfection. Model interpretation revealed cell-dependent descriptor importance and an asymmetric role of tail lengths, especially Tail 1 in macrophage delivery, providing actionable design rules. These findings demonstrate the feasibility of rapidly customizing LNPs to target distinct cell types, potentially broadening mRNA therapy applications while reducing development time and cost.

Conclusion

AGILE is a deep learning–combinatorial chemistry platform that reduces ionizable lipid discovery timelines from months/years to days by coupling virtual library pre-training, wet-lab fine-tuning, and diversity-aware candidate selection. It successfully discovered muscle- and macrophage-optimized lipids (H9, R6) that match or exceed benchmark performance with improved specificity and safety profiles. The platform offers interpretable insights into key molecular features (e.g., descriptor importance, tail-length effects) that guide rational LNP design. Future work will expand pre-training and fine-tuning datasets (including in vivo and human-derived data), incorporate additional combinatorial chemistries and biophysical/functional measurements, and potentially integrate generative models (e.g., diffusion) to propose novel, application-specific ionizable lipids.

Limitations

Training data quality: HTS used crude ionizable lipids without purification during screening; lack of dialysis/filtration and full physicochemical characterization may introduce noise. Purifying lipids and high-throughput characterization would likely improve model accuracy.
Fixed formulation during HTS: Using a standard formulation ratio across thousands of compounds may overlook lipids requiring specific formulations; DoE optimization is applied only post-selection.
Generalizability: The fine-tuned model was trained on 1,200 Ugi 3-CR lipids; performance on out-of-distribution chemotypes may be limited. Pre-training mitigates but does not eliminate this.
In vitro–in vivo gap: Reliance on cell-based assays can miss vectors with weak in vitro but strong in vivo activity; broader inclusion of in vivo and human-derived data is needed.
Non-generative model: Current AGILE ranks existing structures; it does not generate new lipids, limiting exploration to library composition.

Related Publications

Explore these studies to deepen your understanding of the subject.

Biology

A deep learning approach for morphological feature extraction based on variational auto-encoder: an application to mandible shape

M. Tsutsumi, N. Saito, et al.

Medicine and Health

A multimodal deep learning approach for the prediction of cognitive decline and its effectiveness in clinical trials for Alzheimer’s disease

C. Wang, H. Tachimori, et al.

Medicine and Health

Pre-deployment risk factors for PTSD in active-duty personnel deployed to Afghanistan: a machine-learning approach for analyzing multivariate predictors

K. Schultebraucks, M. Qian, et al.

Education

A hybrid deep learning model with feature engineering technique to enhance teacher emotional support on students' engagement for sustainable education

R. G. Al-anazi, N. M. Alhammad, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny