Introduction
Accurate prediction of anti-cancer drug responses in cancer patients is crucial for improving treatment outcomes. While machine learning (ML) offers promise, current approaches often fail to identify robust translational biomarkers from preclinical models. This is due to challenges such as differences in complexity between preclinical models and human tumors, limited training data, and input feature heterogeneity. This research addresses these challenges by integrating pharmacogenomic data from three-dimensional (3D) organoid culture models with network-based analyses, specifically leveraging protein-protein interaction (PPI) networks. Organoid models, closely resembling human tumors at molecular and phenotypic levels, provide a valuable preclinical platform. Network-based methods offer a powerful framework for feature selection, reducing biological heterogeneity and improving ML model performance. Previous studies have shown the potential of using gene modules and network analysis for drug response prediction. This study aims to develop a systematic framework to leverage organoid pharmacogenomic data and network-based computational approaches to obtain robust drug biomarkers.
Literature Review
Existing methods for drug response prediction frequently rely on molecular signatures like transcriptomic data, but often struggle to accurately predict drug sensitivity in human tumors. The use of preclinical models like organoids offers an advantage due to their improved similarity to human tumors, but translating findings from these models to clinical settings remains a challenge. Studies have explored the application of machine learning to predict patient drug response from preclinical data, showing some success, but frequently failing in robust and generalizable biomarker identification. Network-based approaches, which incorporate protein-protein interaction networks, have emerged as a means of reducing biological complexity and improving predictive modeling in various biological contexts. Genes associated with similar phenotypes often cluster in PPI networks, suggesting that drug-response biomarkers may also exhibit such clustering. Studies have already demonstrated the use of network analysis in drug repurposing and drug response prediction, showing promise for improving predictive models.
Methodology
This study integrated pharmacogenomic data from 3D colorectal and bladder cancer organoid models with a network-based machine learning (ML) framework. The process involved several key steps:
1. **Network Analysis:** A PPI network (STRING database) was used to identify biological pathways proximal to known drug targets (5-fluorouracil and cisplatin). Distances between pathways and drug targets were computed using the average shortest path length, and pathways closer than random expectations (z-score ≤ -1.2816) were selected as potential biomarkers.
2. **Machine Learning:** The expression profiles of the selected pathways from organoid models were used as input features to train ML models (ridge regression, linear regression, and support vector regression). Threefold cross-validation was used to optimize the hyperparameters of the ridge regression model.
3. **Biomarker Identification:** Pathways with high predictive performance (measured by the magnitude of their regression coefficients) in the ML model were identified as drug-response biomarkers.
4. **Clinical Validation:** The identified biomarkers were validated by comparing overall survival in colorectal (114 patients, 5-fluorouracil) and bladder cancer patients (77 patients, cisplatin) classified as responders or non-responders based on the expression levels of the biomarkers.
5. **Independent Validation:** The biomarkers were further validated using independent transcriptomic datasets of drug-sensitive and -resistant isogenic cancer cell lines.
6. **Concordance Analysis:** The predicted biomarkers were compared with independently identified somatic mutation-based biomarkers to assess concordance between different molecular layers of information.
The study also included comparative analyses using other feature selection methods (network centrality, direct neighbors of drug targets, Bolis et al.'s method) and a deep-learning approach (Sharifi-Noghabi et al.'s method) to demonstrate the superiority of the proposed network-based approach. Bootstrapping was performed to assess the statistical significance of the results. Statistical analyses included Kaplan-Meier survival analysis, log-rank tests, and Student's t-tests. Patient drug response was inferred using expression levels of pathways and their regression coefficients from the organoid models.
Key Findings
The network-based ML approach accurately predicted patient drug responses, while conventional ML methods using whole-genome or whole-pathway transcriptomics were less effective.
In colorectal cancer, the "activation of BH3-only proteins" pathway was identified as a biomarker for 5-fluorouracil response. Patients with high expression levels of this pathway showed significantly improved overall survival. This finding is consistent with previous studies linking BH3-only protein expression to 5-fluorouracil sensitivity.
In bladder cancer, the "amino acid synthesis and interconversion" pathway was identified as a biomarker for cisplatin response. Patients with high expression levels of this pathway also showed improved overall survival, a finding consistent with prior reports on the role of amino acid metabolism in cisplatin sensitivity.
Validation in independent datasets of isogenic cancer cell lines confirmed the differential expression patterns of the identified biomarkers between drug-sensitive and -resistant cells.
Concordance analysis showed a strong correlation between the predicted biomarkers from the transcriptomic data and known somatic mutation-based biomarkers (BRAFV600E and ERCC2). Specifically, BRAFV600E mutation was associated with cetuximab resistance, and ERCC2 mutations were associated with cisplatin sensitivity, consistent with the predictions of the network-based model. This supports the existence of concordance between molecular layers of information.
Comparative analyses demonstrated that the network-based approach outperformed other ML methods without feature selection or those employing alternative feature selection strategies, including a deep learning model. Bootstrapping analysis confirmed that the identified biomarkers were not likely to be due to chance.
Discussion
This study demonstrates the potential of integrating network analysis into machine learning for improved prediction of cancer drug response. The network-based approach enhanced the interpretability of the ML models by providing insights into the underlying biological mechanisms of drug response. The findings highlight the importance of pathway-level analysis compared to individual gene analysis in capturing robust predictive signals. The successful identification and validation of biomarkers in both colorectal and bladder cancer underscore the translational potential of this approach. The study's results support the use of organoid models as effective preclinical platforms for identifying relevant biomarkers and predicting clinical responses to anti-cancer drugs. The approach's ability to identify known clinical biomarkers, such as BRAFV600E and ERCC2, further validates its robustness and potential.
Conclusion
This research provides a novel framework that integrates network-based analysis with machine learning to effectively identify robust biomarkers for predicting anti-cancer drug responses. The superior predictive performance of this approach compared to conventional ML methods highlights its potential to improve personalized cancer treatment. Future research could explore the integration of additional molecular data types into the model to further enhance its predictive accuracy and explore other cancer types and drug therapies.
Limitations
While this study provides strong evidence supporting the efficacy of the developed method, several limitations should be considered. The organoid models used, while highly valuable, might not perfectly recapitulate the full complexity of the human tumor microenvironment. The study focused on two specific cancer types and drugs; therefore, the generalizability to other cancer types and drugs needs further investigation. Larger and more diverse patient cohorts would strengthen the clinical validation and enhance generalizability. The study primarily used transcriptomic data; future research could benefit from integrating other molecular data, like proteomic and genomic data, to develop more comprehensive and precise predictive models.
Related Publications
Explore these studies to deepen your understanding of the subject.