Medicine and Health
A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes
I. Piazza, N. Beaton, et al.
Discover LiP-Quant, an innovative machine learning-based pipeline that revolutionizes drug target deconvolution using limited proteolysis and mass spectrometry. This groundbreaking research by Ilaria Piazza and colleagues showcases the identification of small-molecule targets, binding sites, and even a novel fungicide target, expanding the horizons of drug development!
~3 min • Beginner • English
Introduction
The study addresses the challenge of deconvoluting the direct protein targets and binding sites of small molecules in complex proteomes without prior modification of the compound or target. Existing chemoproteomic approaches (e.g., chemical probe-based enrichment, TPP, SPROX, DARTS) either require labeling that can perturb interactions or lack peptide-level binding site resolution and can miss low-abundance targets. Prior limited proteolysis (LiP) approaches mapped metabolite–protein interactions in microbial lysates but were not validated in complex eukaryotic proteomes. The authors propose LiP-Quant, integrating drug dose titrations with machine learning analysis of LiP-MS data to improve specificity for true targets, provide peptide-level information to approximate binding sites, and estimate apparent EC50s directly in lysates. The goal is to robustly identify targets across diverse drug classes and species, including human cells, and to discover targets of compounds with unknown mechanisms.
Literature Review
- Chemical probe-based methods can enrich and identify targets and sometimes interaction sites, but labeling may perturb native interactions and introduce bias.
- Label-free approaches: Thermal Proteome Profiling (TPP) monitors drug-induced thermal stability changes; SPROX assesses oxidation rate changes; DARTS leverages proteolytic stability shifts upon ligand binding. These can map interactions without modifications but may miss low-abundance targets due to lack of enrichment and generally lack peptide-level binding site information.
- Prior LiP-SMap demonstrated proteome-wide detection of ligand-induced structural changes in microbes by LC–MS but had not been adapted to complex eukaryotic proteomes.
- Comparative benchmarks exist for kinase inhibitor profiling using TPP and kinobeads, highlighting complementary strengths and the need for orthogonal, label-free, proteome-wide target deconvolution methods.
Methodology
Overview: LiP-Quant combines limited proteolysis (proteinase K) of proteomes with DIA-MS across a drug concentration series, followed by machine learning to rank peptides/proteins likely to be direct drug targets. It provides peptide-level signals to infer binding site proximity and estimates apparent EC50 values from dose–response behavior.
Experimental design:
- Systems: Primarily HeLa cell lysates; also live HeLa cells, Saccharomyces cerevisiae lysates, Botrytis cinerea for case studies. For membrane target tests, crude lysates and plasma membrane-enriched fractions were used.
- Compounds: Positive controls with known targets (rapamycin, FK506, selumetinib, staurosporine, fostriecin, calyculin A), membrane-targeting proscillaridin A; unknown fungicide BAYE-004.
- Workflow: For each compound, lysate aliquots are incubated across a titration (vehicle to high micromolar), subjected to limited proteolysis with proteinase K, then fully digested and analyzed by DIA-MS. Differential peptide abundance is assessed relative to vehicle at each dose.
Mass spectrometry:
- DIA-MS on Q-Exactive HF/HF-X with 2 h LC gradients; Spectronaut X used for DIA analysis (1% FDR at precursor and protein). DDA runs generated spectral libraries using SpectroMine; human UniProt (2018-07-01) and iRT standards used. Yeast analyzed on Orbitrap Q Exactive Plus.
Statistical pre-filtering:
- For each dataset, Spectronaut statistical testing (one-sample two-sided t-test with Storey correction) at modified peptide level using fragment ions. Peptides filtered at q-value < 0.01 and |log2FC| > 0.58 (or > 0.46 for automated ranking step), generating a candidate list for LiP-Quant analysis.
Machine learning classifier (LiP-Quant score):
- Training: Six ground-truth drug datasets split into training sets A (calyculin A, rapamycin, staurosporine) and B (FK506, selumetinib, fostriecin). Linear Discriminant Analysis (LDA) trained five times per set with resampling of negatives (400 background peptides) and positives from known target peptides (95 in A; 33 in B). Features scaled to [0,1]. Weights averaged and normalized to a maximum composite score of 6; weights from training set A used for analyses.
- Features (four components):
I. Dose–response sigmoidal fit correlation (R2) across concentrations (dominant; ~69% weight).
II. Protein Frequency Library (PFL): down-weights proteins frequently appearing as non-specific in other LiP datasets (contaminant frequency).
III. Multiple high-quality peptides per protein: count of peptides in top 10% by q-value within the protein.
IV. Statistical significance (q-value) of peptide regulation at concentrations above known EC50 (or comparable range).
- Thresholding: LiP-Quant scores are bimodal; putative targets defined as peptides with score > 1.5 (median non-target + 3 SD), yielding an average PPV ~30% across positive controls; stricter selection (e.g., top 10 peptides) increases PPV to ~70%.
Automated ranking and EC50 estimation:
- In-house R scripts compute sub-scores and final LiP-Quant scores per peptide; best peptide score represents the protein. EC50 values for each LiP-Quant peptide derived using the drc package from the dose–response curve of relative peptide intensity changes.
Binding site approximation:
- For candidate targets with multiple high-scoring peptides, the center of mass (CoM) of atoms in the top 3 LiP-Quant peptides (typically among top-15 proteome-wide) is computed on available protein structures to estimate proximity to the ligand-binding site. Distances compared to van der Waals shells to assess overlap/proximity.
Benchmarking and complementary assays:
- Compared LiP-Quant with TPP (melting point fits and non-parametric analysis) and kinobeads for staurosporine. Deep LiP-Quant used extended LC gradients (4 h) to increase sequence coverage.
- For fungicide BAYE-004, target engagement validated by CETSA in B. cinerea and enzymatic inhibition measured by LANCE Ultra kinase assay. Homology modeling used to map putative binding sites.
Membrane protein enrichment:
- To mitigate loss of insoluble proteins, crude lysates without centrifugation and a plasma membrane enrichment protocol were applied, increasing identification of membrane-associated proteins and enabling detection of ATP1A1 as proscillaridin A target.
Key Findings
- Single-dose LiP in HeLa identified many candidates (52 in lysates; 37 in live cells) including true target FKBP1A for rapamycin, but lacked specificity, motivating dose–response LiP-Quant.
- LiP-Quant score integrates four features; scores show bimodal distribution enabling target prioritization with threshold >1.5 (average PPV ~30%; top-10 peptides PPV ~70%).
- Rapamycin and FK506 in HeLa lysates: multiple top-scoring LiP-Quant peptides mapped to FKBP1A; results were consistent across compounds with shared peptides showing similar regulation and scores.
- Cross-species applicability: In S. cerevisiae, top-scoring peptides mapped to FRP1 (rapamycin’s yeast target). Additional candidates (ARI1, SYEC) were evaluated in TOR-impaired strains, suggesting ARI1 as a likely secondary target.
- Benchmarking with staurosporine:
- LiP-Quant effectively ranked true kinase targets among top candidates; detected 21 kinases vs. TPP detecting 49; kinobeads captured many more kinases (~190), highlighting complementarity.
- Receiver operating characteristic AUCs: standard LiP-Quant 0.76; Deep LiP-Quant (4 h gradient) 0.81; TPP non-parametric 0.85 (TPP replicate fits: 0.74 and 0.80), indicating competitive performance with deeper MS.
- Successful LiP-Quant detection correlated with higher protein sequence coverage; deeper acquisition increased kinase target identification by 42% (21 to 36).
- Membrane proteins: Standard LiP-Quant quantified ~200 plasma membrane proteins (~4% of total); crude lysates >300; membrane enrichment ~400. Using enrichment, LiP-Quant identified ATP1A1 as the known proscillaridin A target.
- Selectivity and EC50 estimates:
- Selumetinib: LiP-Quant identified MAP2K1/MAP2K2 (known targets) and NQO2 (known off-target for some kinase inhibitors). LiP-Quant EC50s for MAP2K1: 48.5–101 nM (literature ~41 nM).
- Calyculin A: Identified PP2A/B (12 peptides) and PP1 (16 peptides) with median EC50s of 18 nM and 63 nM respectively (~10x higher than in vitro), but preserving the relative ~3.5-fold affinity difference (PP2A vs PP1).
- Phosphatase selectivity captured: calyculin A mapped to PP1 and PP2A/B; fostriecin to PP2A/B, PP4, PP6 but not PP1.
- Binding site approximation: For seven protein–drug complexes, the center-of-mass of top LiP-Quant peptides generally localized within van der Waals distance of the ligand-binding site (e.g., FKBP1A with FK506/rapamycin; kinases with staurosporine; MAP2K1 with selumetinib). Calyculin A and fostriecin were proximal but not overlapping, still near the binding cleft.
- Target discovery for an uncharacterized fungicide (BAYE-004):
- Identified B. cinerea kinase Bcin06g02870 (casein kinase I homolog) as primary target with LiP-Quant EC50 ~6 nM; CETSA confirmed thermal stabilization; enzymatic assay showed IC50 12.5 nM, and LiP-Quant peptide CoM mapped to the ATP-binding site.
- Secondary candidate Bcin16g04330 (GSK3β-like) showed much weaker apparent affinity; LiP-Quant peptides localized near a known allosteric region, suggesting allosteric/secondary effects.
Discussion
LiP-Quant effectively deconvolutes direct drug–protein interactions in complex proteomes without compound modification, complementing TPP and affinity enrichment methods. By integrating dose–response behavior, contaminant frequency, multi-peptide support, and statistical significance, the LiP-Quant score enriches true targets among top-ranked candidates. The peptide-level resolution provides a practical proxy for binding site localization and enables estimation of apparent EC50s directly in lysates, which often exceed recombinant protein measurements but may better reflect physiological context (competition, PTMs, complexes, membranes). Comparative analyses show LiP-Quant’s performance is comparable to TPP for early-ranked targets and improves with deeper MS acquisition and targeted enrichment (e.g., membrane proteins). The approach profiles selectivity within closely related protein families (kinases, phosphatases) and can discover targets and likely binding modes of compounds with unknown mechanisms, as demonstrated for BAYE-004 targeting casein kinase I in B. cinerea. Overall, LiP-Quant adds an orthogonal, label-free, proteome-wide tool to the chemoproteomics toolkit, particularly valuable for mapping target engagement and binding sites.
Conclusion
The study introduces LiP-Quant, a machine learning-based limited proteolysis–mass spectrometry pipeline for proteome-wide, label-free drug target identification with peptide-level binding site approximation and EC50 estimation. It accurately identifies known targets across species, distinguishes selectivity among homologous proteins, complements TPP and kinobeads, and reveals targets and binding sites of an uncharacterized fungicide. Improvements in proteome coverage (longer gradients, fractionation, enrichment) enhance sensitivity and target detection. Future directions include broader application to intact cells and tissues, integration with orthogonal assays (e.g., TPP, CETSA), methodological enhancements for membrane proteins, and refined modeling to translate lysate-derived EC50s to in vivo affinity metrics.
Limitations
- Sensitivity depends on proteome sequence coverage; lower coverage reduces target detection, especially for kinases; requires deeper MS or fractionation for improvement.
- Bias against membrane proteins in standard lysate preparations due to removal of insoluble proteins; mitigated by crude lysates or membrane enrichment.
- Apparent EC50s measured in lysates are typically higher (~10-fold) than in vitro values with purified proteins, reflecting competition and cellular context; absolute values may not directly translate but relative affinities are robust.
- Single-dose LiP generates many false positives in complex proteomes; dose–response LiP-Quant mitigates but still yields a modest PPV at the 1.5 score threshold (~30%), requiring stringent ranking and validation.
- Some targets may not exhibit detectable LiP peptide changes (e.g., limited protease accessibility or sampling), and detection in live cells is more challenging than in lysates.
- Binding site localization is approximate, relying on the center-of-mass of top peptides and availability of structural/homology models.
Related Publications
Explore these studies to deepen your understanding of the subject.

