Medicine and Health
Accelerating the prediction and discovery of peptide hydrogels with human-in-the-loop
T. Xu, J. Wang, et al.
Unlock the secrets of peptide hydrogel formation with the groundbreaking research conducted by Tengyan Xu, Jiaqi Wang, Shuang Zhao, Dinghao Chen, Hongyue Zhang, Yu Fang, Nan Kong, Ziao Zhou, Wenbin Li, and Huaimin Wang. This study reveals an innovative machine learning-experiment hybrid approach that predicts tetrapeptide hydrogels with an impressive 87.1% success rate, showcasing a de novo-designed peptide hydrogel that boosts immune responses. Dive into this exciting development in biomaterials!
~3 min • Beginner • English
Introduction
Hydrogels are water-immobilizing soft materials formed by self-assembled matrices and have broad relevance in nature and applications in materials science, biomedicine, and semiconductors. Peptidic hydrogels are particularly attractive due to high biocompatibility, low immunogenicity, and similarity to extracellular matrices. However, current discovery and design of peptide hydrogels often rely on sequences derived from natural proteins, expert intuition, or serendipity, limiting efficiency and scope. While coarse-grained molecular dynamics (CGMD) can model peptide self-assembly and inform design rules for short peptides, brute-force simulations across the vast peptide sequence space are intractable, especially for longer peptides. There is a critical need for accurate prediction of hydrogel formation and de novo design strategies to broaden the library of hydrogel-forming peptides. This study addresses that need by integrating CGMD, machine learning (ML), and iterative experimental feedback to create an improved score function for predicting tetrapeptide hydrogelation and to validate functional applications.
Literature Review
Prior work has used CGMD to model peptide self-assembly, providing insights into aggregation and morphology. Ulijn and Tuttle's groups developed approaches to derive design rules for di- and tripeptide aggregation/self-assembly, partially overcoming reliance on serendipity. Other computational studies have attempted gelation prediction using physicochemical descriptors and molecular dynamics-based descriptors. However, MD applied to selected peptides primarily informs derivatives of original sequences and does not scale to the enormous peptide sequence space. Systematic prediction and de novo design of peptidic hydrogels remain less explored and challenging. This work builds upon these foundations by coupling CGMD-derived aggregation propensity with ML predictions and experimental classification to efficiently traverse the full space of 160,000 natural tetrapeptides.
Methodology
Overview: The study integrates CGMD simulations to compute aggregation propensity (AP), ML regression to predict AP across the full tetrapeptide space, and iterative human-in-the-loop experimental validation with ML classification to correct gelation predictions via a gelation corrector (Cg). This yields an updated score APHC to rank peptides for hydrogel formation.
Computational AP generation: Latin hypercube sampling was used to select 10,000–15,000 tetrapeptide sequences from the total space (20^4 = 160,000). CGMD simulations were run in GROMACS with the Martini 2 force field. Systems comprised 300 zwitterionic tetrapeptides randomly solvated in a 13 nm cubic box (~18,700 water beads), neutralized with Na+/Cl−, at 300 K and 1 bar using Berendsen coupling. Energy minimization was followed by 5×10^6 steps at 25 fs (125 ns). For morphology, longer 1,250 ns simulations were performed and averaged over 8 runs. Aggregation propensity AP was defined as AP = SASA_initial / SASA_final.
ML regression for AP prediction: Sequence features used an 80-bit one-hot encoding (20 amino acids × 4 positions). A Support Vector Machine (SVM, RBF kernel; default hyperparameters C=1, gamma=auto=1/80) trained on 10,000 CGMD-labeled sequences achieved MAE_tr=0.095, R^2_tr=0.928; MAE_te=0.092, R^2_te=0.933. Error between AP_pred and AP_sim was <2.5% for AP_sim>1.5.
Score functions: The initial hydrogelation score AP_H combined predicted AP and hydrophobicity: AP_H = AP_prd^2 × logP'^0.5, where logP' is normalized hydrophobicity derived from Wimley-White whole-residue hydrophobicities (ΔG_wat-oct/β, with β=0.5). The corrected score introduced an experimental gelation corrector: AP_HC = AP_prd^2 × logP'^0.5 × Cg (equivalently AP'^α × logP'^β × Cg with α=2, β=0.5), where Cg is produced by an ML classification model trained on experimental gelation labels.
Human-in-the-loop iterative loops: Three experimental-ML loops were conducted. Loop 1: 55 peptides were synthesized (26 from top-8000 by AP_H) and tested for gelation (vial inversion at pH 6.5–8.5 and varying concentrations). Using the 55 binary labels, a classifier was trained to yield Cg1 (average accuracy 0.735) and score AP_HC,1. Loop 2: Another 55 peptides were selected (30 within top-8000 by AP_HC,1), synthesized, and tested; retraining on 110 labels produced Cg2 (accuracy 0.746) and AP_HC,2. Loop 3: 55 more peptides were tested; retraining on all 165 labels yielded Cg3 (accuracy 0.767) and final AP_HC,3 (=AP_HC). Selection performance was assessed within the top 8,000 ranked sequences each round.
Peptide synthesis and characterization: Tetrapeptides were synthesized via Fmoc solid-phase peptide synthesis on 2-chlorotrityl chloride resin. Standard deprotection (20% piperidine/DMF), HBTU/DIPEA couplings, cleavage (TFA:TIS:H2O 95:2.5:2.5), precipitation (cold diethyl ether), purification by RP-HPLC (C18), and verification by MS (positive ion mode) and 1H NMR (DMSO-d6) were performed.
Hydrogel formation assay: Purified peptides were dissolved in water (initially 30 mM). pH was adjusted (6.5–8.5) with 1N NaOH, with brief ultrasonication after adjustments. If no gel formed, concentration was increased stepwise (up to at least 120 mM). Gelation was defined as a self-supporting, non-flowing gel by vial inversion; samples rested 48 h for complete gelation.
Materials characterization: Morphologies were examined by TEM using negative staining (uranyl acetate). Mechanical properties were assessed by rheology (frequency sweeps, G′ and G″). FTIR spectra (amide I region, 1620–1648 cm−1) assessed secondary structure.
Biological application: A de novo-designed tetrapeptide (YAWF; AP_HC rank 1661) forming nanofibrous hydrogels was chosen as an adjuvant. C57BL/6 mice (6–8 weeks) received three immunizations (days 0, 7, 14): RBD alone, Alum+RBD, or YAWF hydrogel+RBD. On day 21, sera and splenocytes were collected. RBD-specific IgG and subclasses were measured by ELISA. Splenocytes were restimulated with RBD to quantify IL-5 and IFN-γ. BMDC activation (CD83, CD80, CD86) and cytokines (IL-6, TNF-α) were assessed by flow cytometry and ELISA.
Key Findings
Machine learning AP prediction: An SVM trained on 10,000 CGMD-labeled tetrapeptides predicted AP with MAE≈0.095 (train), 0.092 (test) and R^2≈0.93; for AP_sim>1.5, prediction error was <2.5%.
Corrected score performance: Across three human-in-the-loop rounds (total 165 synthesized peptides), gelation hit rates within the top-8000 ranked sequences improved from 61.5% (AP_H; loop 1) to 76.7% (AP_HC,1; loop 2) to 81.6% (AP_HC,2; loop 3). The final corrected score AP_HC (with Cg3; classifier accuracy 0.767) achieved an 87.1% gelation hit rate among the top 8,000 sequences, versus ~66% for AP_prd or AP_H.
Experimental outcomes: Of 165 synthesized tetrapeptides, 100 formed hydrogels. Morphology statistics showed hydrogel-formers predominantly yielded fibers, sheets, or hybrid structures (~70%), while non-hydrogel peptides formed aggregates, spheres, or particulates (~86%). FTIR indicated β-sheet signatures in representative hydrogels. Rheology showed G′>G″ with weak frequency dependence (0.01–100 Hz), consistent with gels.
Hydrophobicity window and features: Gelating peptides had normalized hydrophilicity logP′ between ~0.05 and 0.4; too low led to precipitation, too high remained in solution. AP_HC could prioritize true gelators with lower AP_prd that AP_H missed, improving ranks substantially (e.g., WVII and IMVV up-ranked; WPYY and WWCP down-ranked and validated experimentally).
Sequence-position rules: Aromatic residues (F, Y), especially at positions 3–4 (C-terminus proximal), most strongly favored gelation; W contributed less due to excessive hydrophobicity causing precipitation. Hydrophobic residues (I, L, V, M) aided gelation, with position-dependent preferences. Polar N, Q were rarely present in gelators (especially at positions 1–2), while S at positions 1, 2, 4 and T at positions 1, 2 were beneficial. C at position 1 could promote disulfide-stabilized assemblies. Proline at position 1 favored gelation via kink formation; Gly had minimal positive contribution. Charged residues (D, E, R, K) generally disfavored gelation, though N-terminal K could assist via electrostatic interactions. Doublet/triplet analyses confirmed synergistic aromatic–aromatic and aromatic–hydrophobic motifs as strong contributors.
Vaccine application: The YAWF hydrogel (60 mM) as adjuvant boosted RBD-specific IgG by 41.6-fold versus RBD alone, outperforming aluminum adjuvant (20.7-fold). IgG1 increased significantly; IgG2b increased ~9.7-fold versus alum; IgG2c titers remained high relative to alum and control. Splenocytes showed elevated IL-5 and IFN-γ versus alum. BMDCs treated with YAWF+RBD exhibited elevated activation markers (CD83 72.0%, CD80 71.1%, CD86 50.5%) and increased IL-6 and TNF-α.
Discussion
The study addressed the challenge of accurately predicting peptide hydrogel formation across a large sequence space by integrating CGMD-derived aggregation metrics, ML prediction, and iterative experimental feedback. Incorporating experimental gelation outcomes via the gelation corrector Cg into the score function (AP_HC) significantly improved selection precision, boosting the gelation hit rate among top-ranked candidates from ~66% (AP_prd or AP_H) to 87.1%. This demonstrates that hydrophobicity and aggregation propensity alone are insufficient; experimental-informed corrections capture additional determinants (e.g., isoelectric effects, specific interactions) relevant to gelation. The derived sequence-position rules, especially the importance of aromatic residues near the C-terminus and aromatic–hydrophobic motifs, align with observed nanostructures (fibers/sheets) and β-sheet signatures, offering practical design guidance. Functionally, the de novo-designed hydrogel (YAWF) served as an effective vaccine adjuvant, enhancing humoral and cellular responses to SARS-CoV-2 RBD, illustrating translational potential of the prioritized peptide library. Overall, the human-in-the-loop framework efficiently narrows a vast search space, balances computation and experiment, and yields design rules generalizable to short peptide materials.
Conclusion
An efficient human-in-the-loop framework combining CGMD, ML regression and classification, and iterative experiments produced an improved hydrogelation score (AP_HC) for 160,000 natural tetrapeptides. With 165 experimental validations (100 hydrogels), the approach achieved an 87.1% hit rate among the top 8,000 candidates and revealed robust sequence-position rules favoring aromatic-enriched C-terminal motifs. A de novo-designed tetrapeptide hydrogel (YAWF) demonstrated strong vaccine adjuvant performance against SARS-CoV-2 RBD in mice. The framework and resulting top-8,000 library provide a valuable resource for designing peptide-based soft materials and bioapplications. Future directions include automating synthesis and testing via robotics to accelerate iterative loops and extending the strategy to other peptide-based functional materials (e.g., terminal-modified hydrogels, peptide electronics, probes, and energy devices).
Limitations
The study focused on natural tetrapeptides and specific experimental conditions (aqueous buffer, pH 6.5–8.5, concentration ranges up to at least 120 mM), which may limit generalizability to other peptide lengths, chemistries, or environments. The ML models rely on CGMD-derived labels and 165 experimental gelation outcomes; while effective, the final classifier accuracy (~0.77) indicates residual misclassification risk. Hydrophobicity was represented via a specific scale (Wimley-White) and normalization choices (α=2, β=0.5), which may influence rankings under different physicochemical models. Biological validation was demonstrated with one peptide (YAWF) and one antigen (SARS-CoV-2 RBD) in a specific mouse model, so broader immunological generalization requires further studies.
Related Publications
Explore these studies to deepen your understanding of the subject.

