Medicine and Health
Drug ranking using machine learning systematically predicts the efficacy of anti-cancer drugs
H. Gerdes, P. Casado, et al.
Discover the groundbreaking findings of DRUML, a machine learning approach that predicts the efficacy of anti-cancer drugs based on omics data. This research from a team led by Henry Gerdes and others at Barts Cancer Institute reveals remarkable accuracy in drug ranking, paving the way for improved cancer treatment outcomes.
~3 min • Beginner • English
Introduction
The study addresses the challenge of predicting effective cancer therapies in the context of high inter- and intra-tumor heterogeneity that leads to variable treatment responses. Traditional companion diagnostics, often based on single genetic alterations (e.g., HER2, PIK3CA, FLT3), have limited predictive power because cancer phenotypes arise from complex, compensatory signaling networks. The authors aim to determine whether large-scale proteomics and phosphoproteomics can be leveraged via machine learning to accurately predict and rank the efficacy of anti-cancer drugs for individual samples without requiring reference controls. They introduce DRUML, an ML framework that integrates empirical markers of drug response into an internally normalized distance metric to predict within-sample drug rankings, potentially improving precision oncology beyond genetics-based approaches.
Literature Review
Prior work has focused largely on genomic and transcriptomic predictors of drug response (e.g., GDSC, CTD2, DepMap), with mixed clinical success due to pathway redundancy and compensatory signaling. Protein biomarkers have guided targeted therapies (e.g., ER, HER2) for decades, and emerging evidence suggests proteomic and phosphoproteomic signals can outperform genomic features in predicting drug responses. Historically, large-scale proteomics was limited by throughput and reliance on relative quantification via labeling, hindering clinical translation. Recent advances in label-free LC-MS/MS, high-throughput phosphoproteomics, and public drug response datasets (PharmacoDB/GDSC) enable systematic ML modeling with proteomic inputs. The study builds on findings that kinase activity (phosphorylation) and protein-level changes reflect mechanisms underlying drug sensitivity and resistance, and that kinase inhibitors are promiscuous, complicating single-marker approaches.
Methodology
Design: DRUML is an ensemble ML framework that ranks drugs for a given sample using omics-derived composite features. The pipeline includes: (1) generating empirical markers of drug response (EMDRs), (2) computing per-drug distance metrics (D) within each sample, (3) selecting top D features correlated with responses, and (4) training ML regressors to predict area above the dose–response curve (AAC) for each drug.
Input datasets: In-house label-free LC-MS/MS proteomics and phosphoproteomics were acquired for 48 cancer cell lines (26 AML, 10 esophageal, 12 hepatocellular), each in triplicate, quantifying 22,804 phosphopeptides and 6,455 proteins. Drug responses (AAC) were obtained from PharmacoDB and scaled within cell lines. Drugs with sufficient variability (IQR > 0.15 AAC units) were retained (466). RNA-seq from DepMap was used for benchmarking.
Dimensionality reduction (EMDRs and D metric): For each drug, tenfold cross-validation on 80% training samples split cell lines into sensitive vs resistant groups using median AAC. Using Limma with repeated resampling (100 randomized resistant–sensitive comparisons), features significantly and consistently altered (fold ±0.8, p < 0.05 in ≥80% repeats, BH adjusted) in sensitive or resistant samples were retained as EMDRs. For each sample and drug, a distance D was computed from EMDR distributions: Dd,b = (SQ2 − RQ2) + (SQ3 − RQ3), where S and R denote sensitivity and resistance marker expression quantiles (median and third quartile). D is internally normalized, robust to missing values, and can be calculated from any omics type providing EMDRs.
Feature selection for models: For each drug, Spearman correlations between its AAC and all available D values (for that and other drugs) were computed across training cell lines. The top positively and negatively correlated D features (7–30 per direction, total 14–60, p < 0.05) were selected.
Model training and validation: Separate models were built for AML and solid tumor groups. ML algorithms included random forest (rf), cubist, Bayesian GLM (bglm), partial least squares (pls), principal component regression (pcr), support vector machine (svm), neural network (nnet), and deep learning (dl via h2o). Data were split at the cell-line level (80/20) using caret, features normalized 0–1, with hyperparameter tuning by repeated 10-fold CV (3 repeats) using RMSE loss; pls/pcr used LOOCV. Performance was assessed on validation sets via absolute error, RMSE, and Spearman rank between predicted and measured AAC within samples.
Verification with independent datasets: Models were applied without retraining to external label-free datasets: (i) phosphoproteomics for 8 colorectal cancer cell lines (Piersma et al.; PRIDE PXD001550; 12,197 phosphopeptides quantified) and (ii) proteomics for 47 cell lines across 8 solid tumor pathologies from 12 labs (Jarnuczak et al.; PRIDE PXD013455; iBAQ values). EMDR-derived D values were computed from these datasets and fed to saved DRUML models corresponding to the appropriate tumor group. Predictions were compared to public AAC data using absolute error, MSE, and Spearman rank within lines. Ranking accuracy for all drugs and top-20 subsets was evaluated.
Clinical relevance assessment: Using phosphoproteomics from 36 primary AML samples (Casado et al.; PRIDE PXD005978), cytarabine D values and DRUML-predicted cytarabine AAC were correlated with overall survival (OS). Kaplan–Meier analyses compared OS between high vs low predicted responders using mean predicted AAC as cutoff, both in patients achieving complete remission (CR) and in the full cohort.
Experimental details: Comprehensive LC-MS/MS sample preparation, TiO2 phosphopeptide enrichment, instrument settings (Q Exactive Plus), identification (Mascot with FDR ≤1%), and label-free quantification via XICs are provided. Data normalization included centering/scaling for proteomic data and quantile normalization for RNA-seq. Ontology/pathway and kinase substrate enrichment of EMDRs used hypergeometric tests with FDR correction. Drug similarity analyses computed Pearson correlations of pathway enrichment delta scores between drugs. Code and EMDR resources are available in the DRUMLR R package and GitHub repositories.
Key Findings
- DRUML resource: EMDRs identified for 466 drugs; robust predictive models trained for 411–412 drugs using proteomics, phosphoproteomics, and RNA-seq derived D features; total 16,760 ML models constructed.
- Validation performance (internal): Deep learning achieved lowest validation RMSE across datasets (< 0.1). Within-sample ranking showed strong concordance: mean Spearman rho ≈ 0.88 (q < 0.002). For AML validation with DL models, RMSE between predicted and measured AAC was 0.078 (phosphoproteomics), 0.040 (proteomics), and 0.13 (RNA-seq).
- Verification performance (independent datasets):
• CRC phosphoproteomics (6 lines with AAC overlap): Significant correlations between predicted and measured AAC across drugs per cell line; Spearman rho ranging ~0.68–0.89 with p from 2.1e-06 to 1.4e-45. Random forest performed best overall: mean rho 0.70 ± 0.077 (n = 6), >85% of responses with absolute error < 0.15 AAC units. For within-line ranking, 88% of predictions within 50 ranks and 86% within 20; for top-20 drugs, 51% within 50 and 45% within 20 positions.
• Multi-lab proteomics (47 lines across 8 pathologies): High associations between predictions and measurements by RF across lines (mean rho ~0.64 with p ≤ 1e-05), MSE < 0.1 for all lines. >85% of responses had absolute error < 0.15 and 95% < 0.25. For within-line rankings, 51% within 20 positions; for top-20 drugs per sample, 76% within 20 positions.
- BYL-719 (alpelisib) case study: D values derived from proteomics, phosphoproteomics, and RNA-seq correlated with BYL-719 responses; D values for drugs targeting PI3K/AKT/mTOR, upstream RTKs, and downstream nodes positively correlated with BYL-719 sensitivity; anti-correlation with HDAC inhibitors aligned with known synergistic biology.
- Biological relevance: EMDR enrichment analyses grouped drugs by mode of action; pathways such as Class I PI3K signaling enriched in sensitivity markers for PI3K/mTOR/AKT inhibitors; kinase substrate signatures consistent with expected mechanisms.
- Clinical relevance: In 25 CR AML patients, cytarabine D correlated with OS (Spearman p = 0.014) and DRUML-predicted cytarabine AAC correlated with OS (p = 0.04). Kaplan–Meier: CR subgroup median OS 3.4 vs 1.1 years for high vs low predicted response (p = 0.0049; n = 15 vs 10). Entire cohort median OS 1.64 vs 1.0 years (p = 0.044; n = 22 vs 14).
Discussion
The findings demonstrate that large-scale proteomics and phosphoproteomics can effectively inform ML models to rank drugs by predicted efficacy within individual cancer samples. DRUML’s use of EMDR-derived distance metrics provides biologically meaningful, internally normalized features robust to noise and missing values, enabling predictions without requiring reference cohorts. The consistent performance across independent datasets from multiple laboratories and across diverse tumor types indicates generalizability of the approach. EMDR pathway and kinase substrate enrichments recapitulate drug mechanisms, and cross-drug D correlations capture pathway relationships and potential synergistic/antagonistic interactions (e.g., PI3K and HDAC inhibitors). Clinically, DRUML’s cytarabine predictions were prognostic in AML, supporting translational potential for therapy prioritization. While deep learning excelled in internal validation, random forest and PCR generalized better to external data, suggesting model selection should consider dataset size and heterogeneity. Overall, DRUML addresses the need for robust, sample-centric drug ranking tools in precision oncology by integrating proteomic signaling states that capture functional pathway activity beyond genomic alterations.
Conclusion
This work introduces DRUML, an ML framework that integrates proteomic and phosphoproteomic information via EMDR-based distance metrics to accurately rank anti-cancer drugs within a sample. Trained on 48 cell lines and verified across 53 independent models from multiple laboratories, DRUML achieved low errors (MSE < 0.1) and strong rank correlations, and its predictions of cytarabine response were prognostic in AML patients. The approach is biologically grounded, robust to missing data, and applicable across tumor types. Future work should expand training to include additional cancer types and clinically relevant drugs as new pharmacogenomic data become available, incorporate larger proteomic cohorts to better leverage deep learning, and evaluate prospective clinical utility for treatment selection and combination therapy design.
Limitations
- Drug coverage is restricted to compounds with available public response data; many modeled agents are research probes rather than approved therapies.
- Models were trained primarily on immortalized cell lines, which may not fully recapitulate in vivo tumor microenvironments and heterogeneity.
- Deep learning models showed overfitting on external datasets; generalization may require larger, more diverse training cohorts and harmonized acquisition protocols.
- Cross-platform and cross-lab variability, while mitigated by D metrics, may still influence performance; prospective clinical validation is needed.
- Predictions for tumor types not represented in training could be less accurate, necessitating retraining or transfer learning as new datasets emerge.
Related Publications
Explore these studies to deepen your understanding of the subject.

