Introduction
Precise identification and quantification of amino acids is crucial for numerous biological applications, particularly single-molecule protein sequencing. Current methods, such as fluorophore-based techniques and label-free techniques like tunneling current measurement, face limitations in either labeling complexity or the number of amino acids detectable. Nanopore technology, successful in DNA sequencing, presents an attractive alternative for amino acid detection and protein sequencing. While previous nanopore studies have shown success in distinguishing peptides based on various properties, identifying all 20 proteinogenic amino acids remains a challenge due to the complexity of signal deconvolution. This research aims to address this limitation by developing a novel nanopore-based system for real-time, sensitive detection and quantification of amino acids, along with the ability to discriminate between peptides with subtle differences.
Literature Review
Existing methods for amino acid detection and protein sequencing have their drawbacks. Fluorophore-based techniques require complex chemical labeling of each amino acid, while label-free methods struggle to identify all 20 proteinogenic amino acids with sufficient accuracy. Nanopore technology has emerged as a promising alternative due to its ability to detect individual molecules, but the deconvolution of signals from the 20 amino acids presents a challenge. Previous work has shown that nanopores can detect peptides with differences in molecular weight, length, PTMs, and single-amino acid substitutions. The use of unfoldases, electro-osmotic flow, or ratcheting motion has aided in controlling peptide translocation, but achieving real-time detection of individual amino acids during peptide hydrolysis and differentiating between many amino acids has been limited. This paper builds upon these previous efforts to develop a more comprehensive and sensitive method.
Methodology
The researchers engineered a copper(II)-functionalized MspA nanopore by introducing histidine substitutions (N91H) in the constriction region. This modification creates copper-binding sites that facilitate reversible coordination with the α-amine and α-carboxyl groups of amino acids. A single-channel recording setup was used, with amino acids added to the cis chamber and copper ions to the trans chamber. The binding events generate characteristic current blockades. Three control experiments validated the sensing mechanism. A machine-learning algorithm, specifically a Random Forest classifier, was trained using features extracted from the current traces (blockade, dwell time, standard deviation, and normalized signal density). The algorithm distinguished the 20 proteinogenic amino acids, along with two PTM amino acids (O-phosphoryl-L-serine and Nε-acetyl-L-lysine) and one unnatural amino acid (S-carboxymethyl-L-cysteine). For peptide analysis, exopeptidases were used to hydrolyze peptides, and the released amino acids were detected in real time. The methodology included detailed steps for nanopore preparation, amino acid detection, peptide hydrolysis, electrophysiology recording, and signal analysis. The researchers also employed various statistical methods to analyze the data, including linear regression and Spearman's rank correlation.
Key Findings
The copper(II)-functionalized MspA nanopore successfully identified all 20 proteinogenic amino acids with 99.1% accuracy using a machine-learning algorithm. The limit of detection reached the nanomolar range (e.g., <100 nM for glycine), significantly improving upon previous nanopore-based methods. The system also effectively discriminated between amino acids with post-translational modifications (PTMs) and an unnatural amino acid. Real-time detection of amino acids during peptide hydrolysis was demonstrated, revealing a trend of higher abundance of amino acids closer to the C-terminus. This trend was exploited to differentiate between peptides with single amino acid substitutions, including clinically relevant peptides associated with Alzheimer's disease and cancer neoantigens. The analysis of ten different peptides further validated the method's robustness and generalizability. The results suggest the potential to infer peptide sequences based on the abundance pattern of amino acids during hydrolysis. The developed Random Forest model showed high performance with AUC above 0.99.
Discussion
This study successfully addresses the challenge of real-time detection and quantification of all 20 proteinogenic amino acids using a functionalized nanopore. The high accuracy and sensitivity of the method, along with its ability to detect PTMs and unnatural amino acids, offer significant advancements in single-molecule protein sequencing. The observed trend in amino acid abundance during peptide hydrolysis provides new avenues for peptide sequence inference, opening possibilities for improved diagnostics and therapeutics. The application to clinically relevant peptides from Alzheimer's disease and cancer highlights the translational potential of this technology. Future work could focus on improving the speed of peptide hydrolysis and further optimizing the machine-learning algorithms for enhanced accuracy and throughput.
Conclusion
This research demonstrates a novel copper(II)-functionalized MspA nanopore capable of real-time identification and quantification of all 20 proteinogenic amino acids, along with PTMs and an unnatural amino acid, at nanomolar sensitivity. The ability to distinguish peptides differing by a single amino acid during hydrolysis opens exciting possibilities for peptide sequencing and downstream applications in disease diagnostics and therapeutic development. Future research directions include improving the temporal resolution of the method and exploring its applicability to a broader range of proteins and peptides.
Limitations
The study primarily focused on relatively short peptides. The applicability of the method to longer peptides and complex protein mixtures needs further investigation. While the machine-learning algorithm showed high accuracy, potential biases from the training data set cannot be fully excluded. Further optimization and validation on a larger and more diverse dataset would strengthen the generalizability of the findings. The observed C-terminus bias in amino acid detection during hydrolysis requires further investigation to refine the sequence inference process.
Related Publications
Explore these studies to deepen your understanding of the subject.