
Medicine and Health
Learning chemical sensitivity reveals mechanisms of cellular response
W. Connell, K. Garcia, et al.
Discover ChemProbe, an innovative deep learning model crafted by William Connell, Kristle Garcia, Hani Goodarzi, and Michael J. Keiser. This model adeptly predicts cellular sensitivity to molecular probes and drugs using transcriptomic and chemical structural data, paving the way for precise cancer treatments and deep insights into molecular mechanisms.
Playback language: English
Introduction
Chemical probes, potent small molecules targeting specific proteins, are crucial for understanding biological processes and diseases. They are particularly valuable in cancer research, where heterogeneity necessitates precision medicine approaches. Ideally, comprehensive chemical screens across diverse biological models would be conducted; however, this is resource-intensive. Machine learning offers a potential solution. Previous approaches have used single feature sets (e.g., mutation status or gene expression) or combined multimodal information (chemical structure and pharmacological features). Deep learning methods, capable of integrating diverse feature sets effectively, have emerged as promising tools. This study focuses on developing a deep learning model to predict cellular sensitivity to a panel of chemicals, along with a framework for interpreting the implicated gene features. The researchers aim to improve drug sensitivity prediction by integrating biological and chemical features, and assess the utility of model interpretation for biological discovery. They hypothesize that a deep neural network can learn to combine cellular transcriptomes and chemical structures to accurately predict cellular sensitivity, offering insights into underlying biological mechanisms.
Literature Review
Existing research on predicting drug response has explored various machine-learning methods, including support vector machines (SVMs), random forests (RFs), and multi-layer perceptrons (MLPs). Early methods often relied on single cellular feature sets. However, significant improvements resulted from incorporating multimodal information, such as chemical structure and pharmacological features. Deep learning has emerged as a powerful tool for representing and integrating diverse feature sets, often employing separate feature encoders before integration. Variational autoencoders (VAEs) have been used for pretraining and transfer learning. Neural networks' adaptability to various input types (e.g., graph representations for chemical structures) and composability (feature integration techniques like cross-attention) make them well-suited for this task. Model interpretation is also crucial, with ensemble models providing confidence scores and methods like attention matrices or gradient-based attribution helping identify features driving predictions. While incorporating biological priors can enhance interpretability, it might limit the discovery of novel gene combinations and mechanisms. This research aims to address the limitations of previous approaches by developing a model that integrates diverse data without relying heavily on biological priors.
Methodology
The researchers developed ChemProbe, a conditional deep-learning model predicting cellular sensitivity. They used publicly available data from the Cancer Therapeutics Response Portal (CTRP) and the Cancer Cell Line Encyclopedia (CCLE), combining compound structures, concentrations, and protein-coding gene transcriptomes to create a dataset of approximately 5.8 million labeled examples. The prediction task was formulated as a conditional model: y = f(x|n), where y is cellular viability, x is the transcriptome, n is the chemical features, and f is the neural network. ChemProbe modulates gene expression representations using chemical features through linear transformations. Several methods for combining cellular and chemical information were tested, including feature concatenation, scaling, shifting, and linear modulation of gene expression using FiLM (feature-wise linear modulation) layers. Models were trained, validated, and hyperparameter-optimized using five-fold cross-validation. A feature ablation experiment using randomized chemical fingerprints demonstrated the importance of compound structural information. The best-performing models used FiLM layers. To evaluate generalizability to *in vivo* contexts, the researchers tested ChemProbe's ability to predict drug response in clinical tumor samples from the I-SPY2 trial. The I-SPY2 data, obtained using microarrays instead of RNA sequencing, presented a challenge due to differences in data modality. ChemProbe's predictions were compared to the original I-SPY2 treatment allocations and evaluated using scaled-AUC and ROC curves. A prospective evaluation of ChemProbe's predictive power was performed using two primary breast cancer cell lines, HCC1806-Par and MDA-MB-231-Par. Dose-response curves were generated *in silico* and compared to *in vitro* experiments. Integrated gradient saliency mapping was used to interpret the model's predictions, identifying highly attributed gene features. Soundness checks were performed to ensure that the attributions were not artifacts of the data or model architecture. Further analysis explored the relationship between highly attributed gene features, known compound pharmacology, and network biology using techniques like K-means clustering, adjusted mutual information (AMI), and STRING database analysis of protein-protein interactions. Differential attribution analysis (DAA) was employed to identify genes potentially driving responses to ferroptosis-inducing compounds. The impact of LRP8 knockout on ferroptosis sensitivity was also investigated.
Key Findings
ChemProbe, a conditional deep-learning model, successfully predicts cellular drug sensitivity by integrating transcriptomic and chemical structural data. Various model architectures were compared, with FiLM layers significantly outperforming simple feature concatenation. The model accurately predicted drug response in diverse contexts: cell lines, tumor samples from the I-SPY2 clinical trial, and prospectively in new cell lines. Retrospective analysis of the I-SPY2 trial showed ChemProbe's ability to stratify responders and non-responders, significantly outperforming I-SPY2's classification accuracy in reducing the false positive rate while maintaining a low false negative rate. Prospective validation in two primary breast cancer cell lines (HCC1806-Par and MDA-MB-231-Par) confirmed ChemProbe's accuracy in predicting differential drug sensitivity. Model interpretation using integrated gradients revealed that highly attributed gene features often reflected known compound mechanisms of action (MOA) and network biology. Analysis of a control compound set showed that clusters of attribution vectors were significantly more similar for compounds with the same known protein targets than for compounds with different targets or random models. Furthermore, analysis using the STRING database demonstrated that highly attributed genes within clusters formed significantly interconnected protein-protein interaction networks. Differential attribution analysis (DAA) successfully identified gene sets related to ferroptosis, a form of cell death. The model accurately predicted increased ferroptosis sensitivity in an LRP8 knockout cell line, consistent with prior research. Functional enrichment analysis of highly attributed genes for ferroptosis-inducing compounds revealed enrichment of terms related to lipid transport, fatty acid metabolism, lipid peroxidation, and ferroptosis itself.
Discussion
ChemProbe addresses the challenge of predicting cellular drug sensitivity by effectively integrating transcriptomic and chemical structural data. Its superior performance compared to simpler methods and its accurate prediction in various contexts (cell lines, tumor samples, and prospective tests) highlight its potential as a powerful tool in drug discovery and precision medicine. The ability to predict differential drug sensitivity, especially in the context of genetic modifications or alterations (like LRP8 knockout), is particularly valuable for understanding mechanisms of drug resistance and exploring potential therapeutic targets. The interpretability of ChemProbe, demonstrated by the alignment of highly attributed genes with known compound MOAs and network biology (including ferroptosis), opens avenues for mechanistic insights and the discovery of novel disease-gene relationships. The success in integrating data from different modalities (RNA sequencing and microarrays) suggests the model's robustness and adaptability.
Conclusion
ChemProbe represents a significant advancement in predicting and understanding cellular drug sensitivity. Its ability to integrate diverse data types, predict drug response accurately across different contexts, and provide interpretable insights into underlying mechanisms makes it a powerful tool for drug discovery, precision medicine, and systems biology research. Future work could focus on expanding the scope of ChemProbe by incorporating a larger dataset representing a broader range of chemical structures and cellular contexts. Integrating ChemProbe with pre-trained foundation models could further enhance its capabilities and expand its applicability.
Limitations
The study focused on a limited set of cell lines and compounds, and model interpretation reflected a limited set of biological factors. Gene features with high model attribution may reflect correlations rather than direct causal relationships. The model may not be leveraging generalizable structural features due to a less diverse training set of chemical structures. Deep learning model attribution methods are primarily empirical, requiring prospective biological testing to validate the inferred compound mechanisms of action. Expanding ChemProbe's predictions to a larger chemical space requires significantly more biological screening data.
Related Publications
Explore these studies to deepen your understanding of the subject.