logo
ResearchBunny Logo
DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics

Biology

DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics

O. Alka, P. Shanthamoorthy, et al.

Discover DIAMetAlyzer, an open-source revolution in targeted metabolomics from Oliver Alka, Premy Shanthamoorthy, Michael Witting, Karin Kleigrewe, Oliver Kohlbacher, and Hannes L. Röst. This innovative workflow minimizes false discoveries while enhancing biomarker quantification from DIA data, ensuring accurate results even at low concentrations.

00:00
00:00
Playback language: English
Introduction
Mass spectrometry (MS) offers both untargeted and targeted data acquisition methods. Untargeted approaches aim for broad metabolite detection, while targeted methods prioritize accurate quantification of a smaller subset. Targeted techniques like Multiple Reaction Monitoring (MRM) or Parallel Reaction Monitoring (PRM) offer precise quantification but limited analyte coverage. Untargeted approaches often use data-dependent acquisition (DDA), selecting metabolites for fragmentation based on precursor selection. Data-independent acquisition (DIA), conversely, cycles through predetermined mass ranges (DIA or SWATH windows) to acquire high-resolution MS2 spectra, improving reproducibility but creating highly multiplexed, lower-quality spectra. DIA excels in quantitative precision and MS2 spectrum coverage, while DDA provides superior MS2 spectrum quality. DIA metabolomics data analysis employs either untargeted strategies (deconvolution, spectral library search, or targeted extraction) or targeted strategies (predefined compounds requiring assay libraries and XIC processing). The latter, while accurate, requires manual curation and specialized expertise, which is a significant bottleneck. This research addresses these limitations by presenting a workflow that automates assay library generation and implements FDR estimation for DIA metabolomics, integrating into the OpenMS software suite.
Literature Review
Existing algorithms for DIA metabolomics data analysis largely rely on untargeted strategies. These methods often involve deconvolution to produce pseudo-MS2 spectra which are then used either for identification via spectral library search or for quantification via targeted extraction. While tools like MS-DIAL, DecoMetDIA, and SWATHtoMRM have been developed to address aspects of DIA analysis, they either lack automated assay library generation or rigorous FDR control. The creation of assay libraries, which require knowledge of retention times, precursor masses, and fragment masses, and the subsequent processing of XICs, are largely manual processes in metabolomics. In proteomics, target-decoy approaches are established for FDR estimation, but their adaptation to metabolomics, where fragment annotation is more complex, has been limited. This study builds upon existing methodologies by leveraging target-decoy strategies and integrating them into a fully automated workflow.
Methodology
The DIAMetAlyzer workflow uses an experiment-specific assay library derived from DDA data. The process begins with candidate identification using DDA data, employing feature detection, adduct grouping, and accurate mass search. Library construction utilizes SIRIUS for fragment annotation, employing a compositional fragmentation tree approach. Decoy generation employs Passatutto, re-rooting fragmentation trees to reduce bias. The resulting target-decoy assay library enables targeted extraction and scoring of transitions from DIA data with FDR control. The workflow comprises several steps: (1) DDA data is used for candidate identification (feature detection, adduct grouping, accurate mass search); (2) library construction uses fragment annotation (SIRIUS) and decoy generation (Passatutto); (3) targeted extraction from DIA data is performed using a modified version of OpenSWATH; (4) statistical validation employs a semi-supervised machine learning approach and PyProphet for FDR estimation. The workflow is implemented using OpenMS, pyOpenMS, SIRIUS, and Passatutto, and is accessible via a KNIME workflow. For library generation, the AssayGeneratorMetabo tool (implemented in OpenMS) performs precursor correction, filtering, feature mapping, and fragment annotation using SIRIUS. Decoy generation uses Passatutto's fragmentation tree re-rooting method, adding -CH2 mass to overlapping decoy transitions as a fallback. Manual validation for assessing FDR calibration involved visual inspection of peak groups, assessing co-elution and chromatographic profiles to determine true positives. The comparison with MS-DIAL used the MTBL1108 dataset, converting the assay library to a spectral library. The comparison with MetaboDIA utilized the MTBLS417 dataset, generating libraries with both tools and performing targeted extraction with statistical validation. Statistical analysis included using LIMMA with Benjamini-Hochberg correction for multiple testing.
Key Findings
DIAMetAlyzer significantly reduced false positive peak groups: 91% reduction at 5% FDR and 98% at 1% FDR, with only a 12% and 28% reduction in true positives, respectively. Assay library generation from reference mixes showed 77% coverage with three transitions. Simulations demonstrated that using three transitions significantly improved unique compound identification compared to MS1-only or MRM-based analyses. Analysis of 30 DIA samples (APM spiked-in human blood plasma) showed that the FDR estimated by DIAMetAlyzer was slightly conservative, especially at lower collision energies. The approach achieved an AUC of 0.96 in the precision-recall curve. The quantification performance of DIAMetAlyzer matched or exceeded manual analysis across dilution steps, with a median coefficient of variation below 0.2 in technical replicates. Compared to MS-DIAL, DIAMetAlyzer detected almost twice as many compounds in a targeted setting. Compared to MetaboDIA, DIAMetAlyzer nearly doubled the number of quantified features and identified additional differentially expressed features in an AMD dataset. The AMD analysis revealed potential biomarkers for AMD within various compound classes (glycerophospholipids, organic heterocyclic compounds, etc.), including upregulated carnitines and potential biomarkers such as oleoylcarnitine and L-palmitoylcarnitine. Further, previously reported biomarkers like hypoxanthine and compounds related to oxidative stress (dityrosine) were also identified. Significant upregulation of EPA and DHA, potentially linked to Omega-3 rich diets, was also observed.
Discussion
DIAMetAlyzer addresses the need for a reliable and automated workflow for DIA metabolomics analysis. Its strengths lie in the accurate FDR control, achieved through the innovative combination of target-decoy library generation and a robust statistical validation approach. The automated nature of the workflow significantly reduces manual effort and increases reproducibility. The comparison with existing tools highlights DIAMetAlyzer's superior performance in both targeted and untargeted settings, demonstrating its ability to identify a greater number of compounds and biomarkers. The identification of potential biomarkers for AMD, including some not previously reported, underscores the workflow's potential for advancing disease understanding. Future research could focus on expanding the compound databases used for annotation, exploring more sophisticated machine learning models for FDR estimation, and validating the identified biomarker candidates in larger independent cohorts.
Conclusion
DIAMetAlyzer offers a significant advance in DIA metabolomics analysis by providing an automated, open-source workflow with accurate FDR control. Its superior performance compared to existing tools and its ability to identify novel biomarkers highlight its potential for widespread adoption. Future research should focus on validating the newly identified potential biomarkers and refining the workflow further.
Limitations
The workflow requires both DDA and DIA data acquisition. While DDA data for reference standards is acquired once for library construction, it represents an additional experimental step. The performance of the fragment annotation depends on the capabilities of SIRIUS, which might have limitations for very large molecules. The runtime of the workflow, particularly the SIRIUS annotation step, can be substantial for complex samples. The identification of the biomarkers are putative at level 3 identification and require further validation.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny