logo
ResearchBunny Logo
Learning chemical sensitivity reveals mechanisms of cellular response

Medicine and Health

Learning chemical sensitivity reveals mechanisms of cellular response

W. Connell, K. Garcia, et al.

Discover ChemProbe, an innovative deep learning model crafted by William Connell, Kristle Garcia, Hani Goodarzi, and Michael J. Keiser. This model adeptly predicts cellular sensitivity to molecular probes and drugs using transcriptomic and chemical structural data, paving the way for precise cancer treatments and deep insights into molecular mechanisms.... show more
Introduction

Chemical probes are potent small molecules that selectively target known mechanism-of-action proteins and have been crucial for dissecting biological processes and diseases, often serving as starting points for drug development. Beyond therapeutics, drugs can act as probes in complex diseases like cancer, where heterogeneity necessitates precision strategies informed by mechanisms of resistance and sensitivity. Large-scale experimental screening of many chemicals across disease models, engineered cell lines, and patient samples is resource-prohibitive. Traditional machine-learning approaches (e.g., SVMs, random forests, MLPs) have predicted drug response using single omics modalities, with improvements from multimodal integration (e.g., chemical structure and pharmacological features). Deep learning offers flexible representation learning and integration of diverse inputs (e.g., graph-based chemical encoders, VAEs, attention). Interpreting predictive models can illuminate biology via feature attributions and by incorporating biological priors, though priors may limit discovery of novel combinations. The authors hypothesized that a conditional deep-learning model can learn to combine gene expression with chemical structure to predict cellular sensitivity and that interpretation of such a model would reflect known pharmacology and reveal mechanisms of response. They introduce ChemProbe to predict sensitivity from transcriptomes and chemical structures and to generate interpretable gene features relevant to compound mechanisms, enabling in silico screening and mechanistic insight.

Literature Review

Prior work has applied traditional ML (SVMs, RFs, MLPs) to predict drug response from single modalities (mutations, expression), with notable gains from multimodal integration including chemical structure and pharmacology. Deep learning facilitates representation learning across modalities, leveraging encoders, transfer learning via VAEs, graph neural networks for chemical structures, and cross-attention for feature integration. Model interpretability spans ensemble confidence, direct inspection of parameters (e.g., attention matrices), and gradient-based attribution. Incorporating biological priors (gene ontologies, pathways) can constrain models to interpretable features but may limit discovery of novel gene combinations and systems mechanisms. Few studies examine how diverse feature sets are integrated or whether highly predictive features reflect expected biological relationships in drug sensitivity tasks.

Methodology

Data: Drug sensitivity data were from CTRP v1/v2 (864 cell lines; 481 compounds and 64 pairs), providing viability across concentrations. Compound structures were encoded as 512-bit Morgan fingerprints (radius 2) derived via RDKit from SMILES, concatenated with micromolar concentration to form 513-length compound feature vectors. Basal transcriptomes for matched CTRP cell lines were taken from CCLE, using standardized protein-coding gene expression (19,144 features). The dataset comprised 545 compounds/pairs and 860 cell lines, totaling 366,710 unique pairs and 5,849,340 examples across concentrations.

Model: ChemProbe is a conditional neural network predicting cellular viability y = f(x | n), where x is standardized RNA abundance and n encodes chemical structure and concentration. Separate encoders embed gene expression (layers [2048, 512, 256] to g=128) and compound features (layers [256, 128] to c=128). A FiLM generator maps compound embeddings to scale (γ) and shift (β) parameters of length g; FiLM layers apply affine transformations to the gene expression embedding. Two FiLM layers are followed by linear blocks (linear, ReLU, batch norm, dropout), outputting a scalar viability prediction optimized by mean squared error. Alternative integration strategies evaluated included simple concatenation, and ablations learning only scale (β=0) or only shift (γ=1). A structural ablation replaced fingerprints with randomized identifiers. Models were implemented in PyTorch with hyperparameter optimization via Optuna.

Training and evaluation: Five-fold cross-validation was stratified by cell line to prevent leakage. Five independently trained models underwent 20 rounds of hyperparameter optimization each. Performance was reported as R² (mean ± SE) across folds. An ensemble of five FiLM models (mean predictions) was used for downstream tasks.

Dose–response modeling: For each cell line–compound, predicted viabilities were generated at 32 concentrations (1e-3 µM–300 µM). Quality control removed anomalous high-viability tails; a minimum of 16 points was required. Four-, then three-, then two-parameter log-logistic models were fit using SciPy; curves with undetermined parameters or EC50 outside [1e-3, 300] µM were filtered. From fitted curves, IC50/ED50 and pharmacodynamic features were derived. For in vitro relative potency, the R package drc fit four-parameter log-logistic models, computing ED50 contrasts (EDcomp) with t- and p-values.

Retrospective clinical analysis (I-SPY2): Microarray expression (GEO GSE194040) and clinical response (pathologic complete response, PCR) from I-SPY2 (988 patients) were processed by matching 90% of genes to training features, mean-imputing remaining 10%, and z-scoring. Distributional shifts versus CCLE RNA-seq were assessed via PCA. ChemProbe predicted dose-response AUCs at 32 concentrations for five overlapping drugs (paclitaxel, neratinib, MK2206, veliparib, carboplatin), scaling AUCs per drug to [0,1]. ROC curves compared ChemProbe predictions with trial outcomes; ChemProbe-based responder classification used a concentration-derived decision threshold.

Prospective wet-lab validation: Two breast cancer lines (HCC1806-Par, MDA-MB-231-Par) were tested. ChemProbe predicted IC50s and differential potency; six compounds were selected among the top predicted differences with complete in silico curves: neratinib, ceranib-2, CAY10618, AZD7762, 1S,3R-RSL-3, ML162. Dose–response assays (12-point serial dilutions; 72 h treatment; CellTiter-Glo 2.0) determined ED50s; experiments were performed in quadruplicate wells and repeated, with quality control excluding a small number of wells exhibiting spurious death.

Model interpretation: Integrated gradients (Captum) with zero baselines computed attributions for inputs along 50 interpolation steps, evaluated at each pair’s predicted IC50 to yield gene attribution vectors. Per–cell line z-scoring produced adjusted attribution vectors to mitigate correlation with expression magnitudes. Sanity checks assessed sensitivity of attributions to learned parameters and labels by comparing trained model attributions with those from randomly initialized and label-permuted models.

Attribution analyses: For compounds with shared MOA (control compound set; 28 classes with ≥2 compounds across 7 profiled cell lines), K-means clustering of attribution vectors was compared to nominal MOA labels via adjusted mutual information (AMI). Attribution similarity was contrasted with clustering from compound fingerprints and with random/permuted-model attributions. Nominal target attribution enrichment used two-sided Wilcoxon tests with BH FDR correction. Network analyses leveraged STRING (score > 0.7) to assess connectivity of cluster-defined nominal target sets versus random targets and random genes; enrichment for PPI density and functional terms (GO, KEGG, Reactome) was tested via hypergeometric tests. Differential attribution analysis (DAA) applied Wilcoxon tests within attribution clusters to identify top genes; clusters were hierarchically organized by DAA profiles. Ferroptosis-related attribution clusters were examined for known genes (GPX4, SCD, SLC7A11, FSP1, LRP8). Differential expression between MDA-MB-231-Par and HCC1806-Par (RNA-seq in triplicate) was analyzed with DESeq2.

Key Findings
  • Conditional integration improves prediction: Conditioning gene expression embeddings on chemical features via FiLM, shift-only, or scale-only transformations outperformed simple concatenation for viability prediction. Cross-validated R² (mean ± SE): Concatenation 0.6066 ± 0.0165; Shift 0.7060 ± 0.0304; Scale 0.7113 ± 0.0081; FiLM 0.7089 ± 0.0040. Structural ablation with randomized fingerprints dropped performance to 0.3016 ± 0.0304, indicating dependence on chemical structure. The 5-model FiLM ensemble achieved R² 0.7173 ± 0.0052.
  • Learned parameters reflect structure and dose: t-SNE of conditioning parameters showed distinct encoding of chemical identity (scaling γ) and concentration (shifting β); concentration correlated with the first principal component of β (p = 1.72e−55).
  • Retrospective clinical generalization (I-SPY2): Despite assay modality shift (microarray vs RNA-seq), ChemProbe predicted lower scaled AUCs for responders in 4/5 drugs. Per-drug auROC ranged 0.60 (paclitaxel, neratinib) to 0.73 (veliparib); macro-average auROC 0.65. A binary ChemProbe-based responder classifier significantly improved accuracy over I-SPY2 biomarker-based allocation (p < 5e−2), with markedly reduced false positive rate (0.37 vs 0.70) and modest true positive rate (0.21 vs 0.30), increasing true negative rate for clinical decision-making.
  • Prospective validation in cell lines: In silico, HCC1806-Par was predicted more sensitive than MDA-MB-231-Par for 88.16% (201/228) compounds with fitted curves. Six compounds were tested in vitro. ED50 ratio (HCC1806/MDA-MB-231), t, p: neratinib 0.4946 ± 0.2426, t = −2.0830, p = 4.02e−2; ceranib-2 0.5165 ± 0.1943, t = −2.4887, p = 1.47e−2; CAY10618 0.2089, t = −42.3233, p ≈ 1.16e−59; AZD7762 0.5639 ± 0.1004, t = −4.3430, p = 3.75e−5; 1S,3R-RSL-3 2.1123 ± 0.4446, t = 2.5244, p = 1.34e−2; ML162 3.008 ± 0.7041, t = 2.8521, p = 5.41e−3. Observed dose–response differences aligned with predictions; concentration range-finding and follow-up assays were consistent, with one experimental outlier noted (p = 0.096).
  • Attribution soundness: Raw attribution vectors correlated with expression and with control models, failing sanity checks; adjusted (cell line z-scored) attributions did not correlate with random/permuted baselines, indicating dependence on learned parameters and labels rather than artifacts.
  • Mechanism alignment: Clustering adjusted attribution vectors yielded higher AMI with MOA labels than clustering by compound structure or by random/permuted-model attributions, indicating attribution captures pharmacology. Within MOA classes, nominal targets often had significantly higher attributions than other targets.
  • Network biology: Target sets derived from attribution clusters formed PPI subgraphs with significantly higher connectivity than random target or random gene sets. Ten of 26 attribution-defined modules of action showed significant PPI enrichment and functional enrichment (GO/KEGG/Reactome), reflecting systems-level mechanisms.
  • Ferroptosis insights: Attribution clusters segregated ferroptosis-inducing compounds and cell line sensitivities. DAA highlighted ferroptosis-associated genes (GPX4, SCD, SLC7A11, FSP1, LRP8) among the most attributed. DEA between MDA-MB-231-Par and HCC1806-Par showed few ferroptosis genes beyond GPX4, suggesting attributions capture non-trivial dependencies. ChemProbe predicted LRP8 knockout increases sensitivity to ferroptosis inducers (ML210, 1S,3R-RSL-3, ML162, CIL56), consistent with literature. Top differentially attributed genes were enriched for lipid transport and fatty acid metabolic processes adjacent to lipid peroxidation and ferroptosis.
Discussion

The study demonstrates that conditioning transcriptomic embeddings on chemical features substantially improves prediction of cellular response, supporting the hypothesis that integrated biological and chemical representations capture determinants of drug sensitivity. The learned conditioning parameters decomposed interpretable aspects of compound identity and concentration, yielding a mechanistically meaningful inductive bias. ChemProbe generalized beyond its training domain: it stratified patient responses in a heterogeneous clinical trial (I-SPY2) despite assay shifts, particularly improving specificity (true negative rate), and it correctly predicted differential potency across independently sourced breast cancer cell lines validated in vitro. Model interpretation via adjusted integrated gradients linked highly attributed genes to known pharmacology and protein network modules, indicating that predictions rely on biologically coherent patterns rather than spurious correlations. Attribution-driven analyses revealed modules of action and pinpointed ferroptosis-related dependencies, including alignment with LRP8-mediated ferroptosis resistance. Together, these findings indicate ChemProbe can both prioritize compounds for particular cellular contexts and generate hypotheses about mechanisms of sensitivity and resistance, aiding precision oncology and target discovery.

Conclusion

ChemProbe is an interpretable conditional deep-learning framework that integrates transcriptomes with chemical structure and dose to predict cellular viability across hundreds of compounds. It achieves state-of-the-art performance relative to concatenation baselines, generalizes to clinical and independent cellular contexts, and supports mechanistic inference through attribution analyses that align with known targets, interaction networks, and pathways. Prospective validations confirm predicted differential sensitivities, and attribution-led analyses uncover ferroptosis-related gene dependencies, illustrating how the model can guide genetic and pharmacologic hypothesis generation. Future work includes expanding chemical and biological coverage, integrating self-supervised foundation models for molecules and transcriptomes to improve out-of-distribution generalization, and prospectively testing attribution-derived mechanisms across broader systems. Open-source code and pretrained models facilitate adoption in research on precision medicine, engineered cell line screening, and mechanistic discovery.

Limitations
  • Training coverage: The model was trained on a limited set of cell lines and approximately half a thousand compounds, potentially constraining generalization across broader biological and chemical spaces.
  • Attribution causality: High attributions may reflect correlates rather than causal drivers of response; empirical attribution methods require prospective biological validation.
  • Chemical structure generalization: With limited structural diversity in training, learned chemical features may not generalize to distant chemotypes.
  • Assay/domain shift: Differences between training (RNA-seq) and application (microarray, independent labs) introduce distributional shifts that can degrade performance; while ChemProbe showed robustness, such shifts remain a risk.
  • Data/model artifacts: Unadjusted attribution vectors correlated with input magnitudes and with random/permuted models, necessitating normalization and sanity checks to avoid misleading interpretations.
  • Resource constraints in validation: Prospective wet-lab validation covered a small set of compounds and cell lines due to practical limitations, leaving broader validation for future work.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny