
Medicine and Health
Real-time detection of 20 amino acids and discrimination of pathologically relevant peptides with functionalized nanopore
M. Zhang, C. Tang, et al.
This groundbreaking research by Ming Zhang and colleagues presents a copper(II)-functionalized *Mycobacterium smegmatis* porin A (MspA) nanopore that accurately identifies all 20 proteinogenic amino acids. With a robust machine-learning algorithm achieving 99.1% accuracy, this innovative system not only quantifies amino acids at nanomolar levels but also analyzes various peptides, including those relevant to Alzheimer's and cancer, paving the way for advanced peptide sequence inference.
~3 min • Beginner • English
Introduction
The study addresses a central challenge in proteomics: direct identification and quantification of all 20 proteinogenic amino acids at single-molecule resolution to enable protein/peptide analysis and potential sequencing. Proteoform diversity driven by alternative splicing and post-translational modifications (PTMs) cannot be inferred solely from transcriptomics, and proteins lack amplification strategies analogous to PCR, limiting mass-spectrometry-based detection of low-abundance species. Existing fluorescence labeling approaches target a subset of residues and face scalability issues for labeling all amino acids, while label-free electrical methods have thus far detected only a subset (≤12) of amino acids. Nanopore technologies, proven in DNA sequencing, have detected and distinguished peptides by mass, length, PTMs, and single-residue changes; yet deconvolving signals from 20 residue types within complex peptide contexts remains difficult, and real-time detection of cleaved amino acids during peptide hydrolysis had not been demonstrated. The purpose of this work is to engineer a functionalized biological nanopore that, together with machine learning, can (1) directly identify all 20 amino acids, (2) quantify them sensitively, (3) detect select PTMs and an unnatural amino acid, and (4) enable real-time analysis of peptide hydrolysates to discriminate disease-relevant peptides and infer sequence order trends.
Literature Review
Prior strategies include: (1) Fluorophore-based approaches (e.g., selective labeling of Cys/Lys and Edman degradation or single-molecule FRET readouts) providing positional information but limited by the difficulty of labeling all 20 amino acids; engineered fluorescent amino-terminal recognizers yield repetitive binding signals but still face coverage constraints. (2) Label-free electrical methods such as tunneling currents and molecular junctions, reaching up to 12 amino acids, insufficient for sequencing needs. (3) Nanopores have distinguished peptides by size, length, PTMs, and single-residue substitutions; controlled translocation via ClpX, engineered electro-osmotic flow, and DNA–peptide ratcheting has produced segment- or sequence-dependent signals, but decoding 20-amino-acid alphabets remains challenging. Aerolysin pores with poly-Arg carriers discriminated 13/20 amino acids; Cu2+-modified α-hemolysin and MoS2 pores detected underivatized amino acids; and an MspA-NTA Ni2+-modified pore unambiguously discriminated all 20 amino acids and PTMs. Exopeptidase-assisted nanopore methods using peptide probes also emerged. However, a gap persisted in real-time detection of cleaved amino acids during peptide hydrolysis, impeding peptide sequencing development. This study builds on these advances by introducing a Cu2+-coordinating MspA mutant enabling comprehensive amino acid detection and real-time hydrolysate analysis.
Methodology
Pore engineering and sensing principle: An octameric MspA mutant (MspA-N91H) was constructed by substituting Asn91 with His in each subunit, creating Cu2+ coordination sites at the constriction in conjunction with Asn90. The hypothesized coordination complex involves two adjacent His91 residues and one Asn90 coordinating a Cu2+ ion, which then chelates the α-amine and α-carboxyl groups of an amino acid, producing characteristic current blockades.
Experimental setup: Single-channel recordings used a vertical planar lipid bilayer (DPhPC; 150 µm aperture) separating cis (grounded) and trans chambers (1 M KCl, 10 mM MOPS, pH 7.5; 23 ± 2 °C). MspA was added to cis (60–90 ng ml−1) and inserted under +300 mV; measurements were made at +50 mV. CuCl2 was added to trans to 200 µM (20 µM for peptide hydrolysis) to saturate binding sites (~87.8 ± 3.1% time at stable state). L-amino acids were typically added to cis at 100 µM (exceptions: H 5 µM; P 200 µM; C 2 µM). Controls verified necessity of the N91H mutation and Cu2+ for sensing and showed acetylated/amidated leucine were not detected, indicating coordination via the α-amine/carboxyl.
Signal characterization: The open-pore current (Io) defines baseline; state 0 denotes Cu2+-bound baseline, state 1 indicates one amino acid bound. Signals were characterized by blockade and dwell time. Event extraction used change-point detection and thresholding (blockade >0.1 relative to baseline). Blockade, dwell time, and s.d. were computed, and 1,000 normalized current density features (X0001–X1000) per event were extracted.
Machine learning: A feature matrix (blockade, dwell time, s.d., and 1,000 density features) trained multiple classifiers (RF, NB, NNet, KNN, bagged CART, AdaBoost) using caret in R with 10-fold cross-validation. For each amino acid, 1,000 events were used for training where available (upsampling for classes with <1,000). Data were split into training, testing, and independent validation sets. Only state 1 events were used for classification due to higher feature importance and to avoid multilevel/noise events. Prediction probability thresholds were applied to trade accuracy for coverage; events with predicted probability >0.95 were retained for robust identification.
Quantification and LOD: For representative amino acids (Gly, Arg, Asp), signal frequency was measured across concentrations to establish linear relationships (Pearson R > 0.99; adjusted R2 > 0.97). LOD was defined as the minimum concentration yielding ≥5 detected events within 10 min.
PTM and unnatural amino acids: Serine phosphorylation (O-phosphoryl-L-serine, P-S), lysine acetylation (Nε-acetyl-L-lysine, Ac-K), and an unnatural cysteine derivative (S-carboxymethyl-L-cysteine, CMC) were tested alongside their native counterparts. Cysteine caused unstable Cu2+ binding; CMC mitigated sulfhydryl interference.
Peptide hydrolysis and detection: Real-time hydrolysis used carboxypeptidase A1 added directly to cis with peptides (e.g., EAFNL, LNFAE). For Aβ(17–27) peptides, leucyl aminopeptidase hydrolyzed peptides from the N-terminus ex situ (37 °C, 15.5 h), followed by heat inactivation and 10 kDa ultrafiltration; filtrates were measured. Other peptides (neoantigens, angiotensin I, α-bag cell peptide 1–9, ACTH 18–39) were incubated with carboxypeptidase A1 at 37 °C for 15 min and measured without ultrafiltration. Signals were classified by the RF model with probability thresholding (>0.95). To estimate hydrolysate composition, counts were normalized by mean capture rates of individual amino acids. Peptide similarity was assessed via Euclidean distances between density-feature distributions and visualized by classical multidimensional scaling.
Key Findings
- Engineered Cu2+-functionalized MspA-N91H directly detects all 20 proteinogenic amino acids. Signals are well-defined state 1 blockades with dwell times typically 1–10 ms; histidine shows two populations (His1 with prolonged dwell time 42.7 ± 17.1 ms).
- Blockade correlates with amino acid volume (Pearson r up to 0.97 when excluding charged residues, Pro, and Cys), consistent with volume exclusion for most residues.
- Capture frequencies vary substantially among amino acids; polar side chains show significantly higher frequencies than non-polar (Wilcoxon P=0.04 and 1.73×10−7).
- Machine learning classification: RF outperformed other models (AUC 0.990 initially). With 1,000 signals per amino acid, RF AUCs were 0.996 (training), 0.993 (testing), and 0.989 (independent validation). Applying probability thresholds yielded 95.2% accuracy using 43.1% of signals, and 99.1% accuracy using 30.9% of signals. Confusion matrices show most amino acids are well distinguished.
- Quantification: Strong linear relationships between signal frequency and concentration for Gly, Arg, and Asp (Pearson R > 0.99; adjusted R2 > 0.97), enabling quantification in the micromolar-to-nanomolar regime. LODs: Gly <100 nM, Asp 250 nM, Arg 1 µM within a 10-min measurement.
- PTMs and unnatural amino acid: Distinct blockade clusters for S vs P-S (0.132 ± 0.0033 vs 0.295 ± 0.0093) and K vs Ac-K (0.171 ± 0.0026 vs 0.233 ± 0.0071). CMC produced blockades clearly distinct from cysteine and avoided Cu2+ binding instability.
- Real-time peptide hydrolysis detection: For peptides EAFNL and LNFAE, hydrolysis from the C-terminus produced opposite abundance trends of identified amino acids consistent with sequence reversal; normalized abundance increased toward the C-terminus (Spearman ρ = 0.87 for EAFNL; ρ = −0.80 for LNFAE, except N anomaly), suggesting sequence-order inference from abundance gradients.
- Pathology-relevant peptides: Neoantigen vs normal peptide hydrolysates showed expected amino acid compositions and were distinguishable via blockade distributions. Aβ(17–27) wild-type and single-residue mutants were discriminated; mutant hydrolysates exhibited clear G or K signals relative to wild-type. Commercial peptides (angiotensin I, α-bag cell peptide 1–9, ACTH 18–39) produced correct C-terminal compositions and expected carboxypeptidase A1 stop points.
- Unsupervised peptide profiling: MDS on Euclidean distances of blockade-density features grouped related peptides (e.g., three AD-associated peptides clustered together; peptides sharing sequence elements clustered closely), indicating peptide composition/sequence-reflective profiles.
Discussion
The work demonstrates that a Cu2+-coordinated MspA-N91H nanopore can exploit the universal α-amine/α-carboxyl chelation of amino acids to produce informative current signatures, enabling comprehensive identification of all 20 residues. By combining these signals with a tailored RF classifier that emphasizes state 1 features, the platform achieves high-accuracy classification with controllable precision–yield trade-offs and supports sensitive quantification through linear calibration of event frequency versus concentration. Compared with prior Ni2+-modified nanopores, the Cu2+-MspA approach attains substantially improved LODs (e.g., Gly <100 nM vs ~50 µM), moving toward cellularly relevant concentrations. The method’s ability to resolve PTMs (phosphorylation, acetylation) and an unnatural amino acid with the same pore further broadens applicability to proteoform analysis. Crucially, real-time detection of amino acids liberated during exopeptidase digestion reveals positional biases (higher abundance near the cleavage front), providing a route to infer sequence order trends and to distinguish peptides differing by a single residue, including clinically relevant neoantigens and Aβ variants. Collectively, these results suggest a viable pathway toward nanopore-enabled single-molecule protein sequencing and diagnostic peptide profiling, while highlighting the importance of pore engineering, ion coordination chemistry, and data-driven signal interpretation.
Conclusion
This study introduces a copper(II)-functionalized MspA-N91H nanopore that, with machine learning, identifies all 20 proteinogenic amino acids, quantifies them with high sensitivity (down to sub-100 nM for glycine), and discriminates PTMs and an unnatural amino acid. It further enables real-time analysis of peptide hydrolysates, distinguishing disease-relevant peptides differing by a single residue and indicating sequence-order trends via abundance gradients. These advances improve sensitivity over prior approaches and couple identification with quantification, supporting applications in proteoform analysis and diagnostics. Future work should aim to: (1) enhance stability and specificity for challenging residues (e.g., sulfur-containing), (2) increase usable signal yield while maintaining accuracy, (3) better control translocation kinetics and event multiplicity, (4) generalize PTM coverage and unnatural amino acid panels, and (5) integrate exopeptidase-driven readouts into workflows for de novo peptide/protein sequencing.
Limitations
- Cysteine perturbs Cu2+ binding due to strong thiol–copper interactions, causing baseline instability; analysis required using a derivative (CMC) rather than native Cys.
- Signal overlap exists among several amino acids (e.g., Lys/Arg; Met/Leu; Pro/Phe; Thr/Asn; Cys/Tyr), necessitating machine learning and probabilistic filtering that reduces usable event yield (e.g., 99.1% accuracy at 30.9% signal recovery).
- Capture rates vary widely by residue, complicating direct abundance interpretation; normalization is required for hydrolysate composition estimates.
- Real-time sequence inference relies on positional abundance trends rather than deterministic residue order and is influenced by exopeptidase specificity and stop residues (e.g., R, K, P).
- Quantification relationships and LODs were established for a subset of amino acids and under specific buffer/voltage conditions; broader calibration and matrix effects in complex samples remain to be validated.
- Multilevel signals and potential co-binding/noise require conservative event selection (state 1 only), potentially discarding informative events.
- PTM coverage was demonstrated for two modifications and one unnatural amino acid; generalization to diverse PTMs and chemistries requires further study.
Related Publications
Explore these studies to deepen your understanding of the subject.