Medicine and Health

DeepRTAlign: toward accurate retention time alignment for large cohort mass spectrometry data analysis

Y. Liu, Y. Yang, et al.

Discover the groundbreaking DeepRTAlign, a deep learning-based retention time alignment tool that enhances identification sensitivity in LC-MS analyses without sacrificing quantitative accuracy. This innovative research reveals its effectiveness in predicting hepatocellular carcinoma recurrence, showcasing its potential in biological analyses. Conducted by Yi Liu, Yun Yang, Wendong Chen, Feng Shen, Linhai Xie, Yingying Zhang, Yuanjun Zhai, Fuchu He, Yunping Zhu, and Cheng Chang.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses the challenge of retention time (RT) alignment in LC-MS-based proteomics and metabolomics, particularly in large cohorts where analyte RTs shift due to matrix effects and instrument performance. Traditional approaches include warping-based methods (e.g., XCMS, MZmine, OpenMS) that assume monotonic shifts, and direct matching methods (e.g., RTAlign, MassUntangler, Peakmatch) that compare signals between runs without a warping function. These methods struggle to handle both monotonic and non-monotonic RT shifts. Identification-based match-between-runs (MBR) approaches increase identifications but depend on identified peptides and are constrained for biomarker discovery from unidentified precursors. There has been no deep learning-based RT alignment algorithm for LC-MS data. The authors propose DeepRTAlign, combining coarse (pseudo-warping) alignment with a deep neural network-based direct matching classifier to robustly align features across runs, including unidentified ones, addressing a bottleneck in large-cohort analyses.

Literature Review

The paper contrasts warping function methods (e.g., XCMS, MZmine, OpenMS) that cannot correct non-monotonic shifts due to their monotonic nature, with direct matching tools (RTAlign, MassUntangler, Peakmatch) that underperform due to MS signal uncertainty. Identification-dependent MBR functions in tools like MaxQuant, MSFragger, and DIA-NN can transfer IDs across runs but remain limited for ID-free feature analysis and discovery. Prior deep learning work applied Siamese networks to GC-MS alignment, but no DL-based alignment had been applied to LC-MS before this study. This context motivates a deep learning approach capable of handling both monotonic and non-monotonic RT shifts and enabling ID-free alignment.

Methodology

DeepRTAlign consists of a training phase and an application phase. Training steps: (1) Precursor detection and feature extraction using XICFinder (similar to Dinosaur), which detects isotope patterns across spectra and merges them into features (10 ppm mass tolerance). (2) Coarse alignment (pseudo-warping): RTs in all samples are linearly scaled to a fixed range (e.g., 80 min). For each m/z, the highest-intensity feature per sample is selected; all non-anchor samples (first sample as anchor) are divided into RT windows (1 min). For each window, features are compared with the anchor (mass tolerance 0.01 Da); unmatched features are ignored. Average RT shift per window is computed and added to all features in that window. (3) Binning and optional filtering: Features are grouped into m/z bins defined by bin_width (0.03 Da) and bin_precision (2 decimals). Only features within the same m/z bin are aligned. Optionally, per-sample per-bin, retain the highest-intensity feature within a user-defined RT range (off by default). (4) Input vector construction: For each candidate feature pair (feature n in sample 1, feature m in sample 2 within an m/z bin), construct a 5×8 input representing RT and m/z of the target and its two RT-adjacent neighbors on each side in both samples. Parts 1 and 4 store original RT and m/z; parts 2 and 3 store differences between samples. Normalize original values by base2 [80, 1500], and difference values by base1 [5, 0.03]. (5) Deep neural network classifier: A feedforward DNN with three hidden layers (5000 neurons each) predicts whether two features should be aligned (binary classification). Training data comprised 400,000 feature pairs from HCC-T: 200,000 positive pairs (same peptide; ±10 ppm, peptide RT constrained within precursor feature RT range) and 200,000 negative pairs (different peptides within m/z tolerance 0.03 Da). Identification results from Mascot/MaxQuant/MSFragger are used solely as ground truth for benchmarking/training, not required in application. (6) Training details: Loss function BCELoss; sigmoid activations; PyTorch default kaiming_uniform initialization; optimizer Adam (betas=(0.9,0.999), eps=1e-08, weight_decay=0); initial LR 0.001 with step decay ×0.1 every 100 epochs; batch size 500; trained for 400 epochs after observing loss stabilization by 100–300 epochs. (7) Parameter evaluation: 10-fold cross-validation on HCC-T to select hyperparameters; evaluated on independent test sets to verify no overfitting. Quality control: In each m/z window, randomly select a sample as a target and build a decoy sample; since decoy features should not align, compute FDR of alignments from these decoys. Application phase: Accepts feature lists from Dinosaur, MaxQuant, OpenMS, and XICFinder (or others via txt/csv). Performs coarse alignment and vector construction, then uses the trained DNN to output aligned feature lists. Comparative ML baselines: RF, KNN, SVM, and LR were trained on the same inputs with parameters optimized by 10-fold CV. Dataset simulation for generalizability: From 14 real-world datasets, generate 24 simulated variants per dataset by adding normally distributed RT shifts with means µ=0,5,10 min and standard deviations σ=0,0.1,0.3,0.5,0.7,1,3,5 to OpenMS-extracted featureXMLs; align to originals with DeepRTAlign and OpenMS; compute precision/recall.

Key Findings

- Model performance: The DNN achieved the highest AUC among DNN, RF, KNN, SVM, and LR across independent test sets. Analysis of 2000 negative pairs correctly predicted by DNN but not RF showed similar RT differences (~4.89 vs 5.03 min), but much closer m/z differences (0.002 Da vs 0.011 Da), indicating DNN excels at discriminating rare negative pairs with very close m/z values (<0.002 Da, ~10% of data). Feature importance analysis showed m/z-difference features were most informative; DNN assigned relatively higher importance to RT-difference features than RF/LR. - Ablation: Including coarse alignment improved DNN AUC over a model without coarse alignment. Anchor choice had negligible effect. - Comparison to MS-only tools: Across multiple proteomic datasets, DeepRTAlign improved both precision and recall relative to MZmine 2 and OpenMS. On a metabolomic standard dataset (SM1100), all combinations performed well; DeepRTAlign, combined with various feature extractors (MZmine 2/OpenMS/Dinosaur), achieved high precision/recall comparable to others. - Comparison to MS/MS-informed Quandenser: On Benchmark-FC, DeepRTAlign was comparable in aligned peptide counts and quantification accuracy, while aligning many more features overall due to its ID-free nature. - Comparison to ID-based MBR workflows: On Benchmark-FC, combining MaxQuant feature extraction, DeepRTAlign alignment, and MSFragger identification yielded 150% more peptides than the original MaxQuant workflow (feature extraction + alignment + identification) without loss of quantification accuracy. DeepRTAlign aligned 7% and 14.5% more peptides than OpenMS and MSFragger’s MBR, respectively, with comparable quant accuracy. On Benchmark-MV, DeepRTAlign aligned more peptides than MSFragger’s MBR with comparable quant accuracy. On single-cell DIA data, considering peptides present in at least two cells, DeepRTAlign aligned on average 39 (6.33%) more peptides per cell than DIA-NN with MBR; the average number of aligned MS features was ~42.3× the number of aligned peptides, indicating rich ID-free signal for downstream analyses. - Generalizability boundary (simulations): DeepRTAlign generally achieved higher precision and recall than OpenMS on proteomic datasets. Performance of both methods declined as RT shift standard deviation increased; for proteomics, precision/recall dropped notably when σ>1 min; for metabolomics, similar declines occurred when σ>0.3 min, especially in recall. Changing the mean RT shift had little effect. On metabolomic datasets with many features of very close m/z (NCC19, SO, GUS), DeepRTAlign underperformed OpenMS at FDR<1% due to potential incompatibility of decoy design; setting FDR to 100% made performance comparable; OpenMS also performed poorly on these datasets, indicating MSI-only limitations. - Clinical application: Using HCC tumor data (HCC-T), a classifier trained on top-200 aligned MS features achieved 5-fold CV AUC 0.998, outperforming classifiers based on top-200 peptides (AUC 0.931) and proteins (AUC 0.757). Mapping to an independent test set HCC-R (C2, N=11) yielded 15 features, 55 peptides, and 56 proteins; an SVM trained on the 15 features outperformed peptide- and protein-based models on C2. In a new independent cohort (HCC-R2, C3, N=23) validated by scheduled PRM with Skyline quantification, the 15-feature classifier achieved AUC 0.833. Of the 15 features, eight mapped to identifications in HCC-T; seven remained unknown yet were validated in PRM.

Discussion

DeepRTAlign addresses limitations of existing RT alignment approaches by combining a coarse alignment step to handle global monotonic shifts with a deep neural network classifier capable of resolving non-monotonic, local shifts and ambiguous cases with close m/z. It aligns MS features in an ID-free manner, thereby exploiting information from unidentified precursors and benefiting downstream analyses, including biomarker discovery and clinical prediction tasks. Benchmarks demonstrate improved alignment coverage without compromising quantification accuracy relative to both MS-only tools and ID-dependent MBR strategies in DDA and DIA contexts. The method shows robustness across different feature extraction tools and datasets. The observed decline in performance under large RT shift variability underscores practical limits and suggests maintaining controlled RT stability in large-scale studies. The strong performance of the feature-based HCC early recurrence classifier highlights the hidden discriminative information within aligned MS features beyond peptides/proteins and supports ID-free alignment as a powerful complement to traditional pipelines.

Conclusion

DeepRTAlign is a deep learning-based RT alignment tool that integrates coarse alignment and a DNN classifier to accurately align LC-MS features across large cohorts, handling both monotonic and non-monotonic RT shifts. It outperforms established MS-only alignment tools and ID-based MBR workflows in alignment coverage while maintaining quantification accuracy, is compatible with multiple feature extractors, and generalizes across diverse datasets. Its ID-free alignment enables discovery and exploitation of informative features for downstream applications, exemplified by robust prediction of early HCC recurrence. Future work will focus on jointly optimizing feature extraction and alignment to further improve quantification accuracy and extend applicability to challenging datasets with dense, closely spaced m/z features.

Limitations

- Performance sensitivity to RT shift variability: Precision and recall decline as the standard deviation of RT shifts increases; notable drops occur when σ>1 min in proteomics and σ>0.3 min in metabolomics. - Decoy design and MSI-only limitation: Datasets with many features having very close m/z (e.g., NCC19, SO, GUS) challenge the current decoy-based FDR estimation; at stringent FDR thresholds (<1%), performance can lag, though becomes comparable at relaxed thresholds. OpenMS also struggles on these datasets, suggesting broader limitations of MS1-only alignment. - Dependence on feature extraction quality: Although alignment impacts quantification less than feature extraction in theory, overall quantitative accuracy remains constrained by upstream feature extraction; the authors note a need to co-optimize extraction and alignment. - Training data specificity: The DNN was trained on a specific proteomic dataset (HCC-T), though tested broadly; while no overfitting was observed, performance in novel acquisition conditions may vary and benefit from retraining or transfer learning.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Validity of Scottish predictors of child obesity (age 12) for risk screening in mid-childhood: a secondary analysis of prospective cohort study data—with sensitivity analyses for settings without various routinely collected predictor variables

G. Carrillo-balam, L. Doi, et al.

Engineering and Technology

Data-driven pitting evolution prediction for corrosion-resistant alloys by time-series analysis

X. Jiang, Y. Yan, et al.

$A deep convolutional neural network for real-time full profile analysis of big powder diffraction data$

Chemistry

A deep convolutional neural network for real-time full profile analysis of big powder diffraction data

H. Dong, K. T. Butler, et al.

Medicine and Health

Cumulative learning enables convolutional neural network representations for small mass spectrometry data classification

K. Seddiki, P. Saudemont, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny