Medicine and Health

Network-based machine learning in colorectal and bladder organoid models predicts anti-cancer drug efficacy in patients

J. Kong, H. Lee, et al.

Discover a groundbreaking machine-learning framework developed by JungHo Kong, Heetak Lee, Donghyo Kim, Seong Kyu Han, Doyeon Ha, Kunyoo Shin, and Sanguk Kim that identifies robust drug biomarkers through innovative network-based analyses of pharmacogenomic data. This research promises to enhance drug response predictions in colorectal and bladder cancer treatments, verified by extensive validation against external datasets.

00:00

~3 min • Beginner • English

Index

Introduction

Identifying molecular biomarkers that classify cancer patients by drug sensitivity is critical to improve outcomes, yet clinical trials to discover such markers are costly and slow. Preclinical pharmacogenomic screens have helped, and machine-learning models trained on these data can predict clinical responses. However, traditional preclinical models and ML approaches often fail to translate due to biological complexity and limited training data relative to high-dimensional features (input heterogeneity). Network-based methods can reduce complexity and enable biologically informed feature selection because genes associated with similar phenotypes cluster in protein-protein interaction (PPI) networks. Prior studies showed drug-disease proximity in PPI networks relates to therapeutic effects and that gene modules can predict drug response. In parallel, 3D organoid culture models better recapitulate tumor transcriptomes and drug sensitivities compared with conventional models and are being developed for high-throughput screening. There remains a need to systematically identify biomarkers from organoids that robustly predict patient responses. In this study, the authors integrate organoid-derived pharmacogenomics with PPI network-based feature selection and ML to identify pathway-level biomarkers proximal to drug targets and use these to predict clinical outcomes in colorectal cancer (5-fluorouracil) and bladder cancer (cisplatin), validating predictions in independent datasets and against known mutational biomarkers.

Literature Review

The paper situates its work within several lines of research: (1) ML models trained on preclinical data (cell lines/organoids) can predict clinical drug responses, though translation is inconsistent; (2) network medicine demonstrates that phenotype-associated genes cluster in PPIs and that drug-disease proximity can infer therapeutic potential; (3) organoid models closely mirror patient tumors molecularly and phenotypically and can recapitulate treatment responses; (4) prior feature selection and deep learning methods (e.g., centrality-based selection, direct target neighborhoods, correlation-based selection per Bolis et al., and multi-omics deep learning per Sharifi-Noghabi et al.) have limitations when applied solely to transcriptomics for drug-response prediction. These motivate a network-proximity, pathway-level feature selection grounded in organoid pharmacogenomics.

Methodology

Data: Gene expression and drug-response (IC50) data for colorectal (19 organoids) and bladder (9 organoids) cancer organoids were collected (van de Wetering et al.; Lee et al.). TCGA patient data (COAD, BLCA) provided expression (FPKM-UQ; log2(FPKM-UQ+1)), mutation, treatment, and survival. Pathways were from Reactome (MSigDB C2: REACTOME), drug targets from DrugBank. The human PPI network was from STRING v11 (confidence >700), using the largest connected component (13,824 proteins; 323,774 interactions). Genes and targets were mapped to UniProt IDs. Pathway activity per sample was computed using ssGSEA (NES), and features were z-score standardized across samples. Feature selection by network proximity: For each drug, compute proximity between its target genes (set T) and pathway genes (set S) on the PPI using the average of shortest distances from each target to the nearest pathway gene: d = (1/|T|) Σ_{t∈T} min_{s∈S} d(s,t). Significance of proximity was assessed by bootstrapping degree-matched random gene sets (1000 iterations) to form a reference distribution, yielding a z-score. Pathways with z ≤ -1.2816 (α=0.10; lower 10% tail) were deemed proximal features. Model training in organoids: Use expression profiles (ssGSEA NES) of proximal pathways to train regression models to predict IC50 values. Primary model: ridge regression (sklearn RidgeCV) with threefold CV to select α (0.1 to 1.0 in 0.1 steps). Linear regression and linear-kernel SVR were also evaluated. Pathways were ranked by the absolute value of their regression coefficients (predictive performance). For robustness, organoid data were split into train (60%), validation (10%), and test (30%). α was tuned on validation by RMSE; performance was R^2 between observed and predicted IC50 in test. Sampling all validation set combinations showed high correlations (COAD R^2 = 0.98; BLCA R^2 = 0.89). Inferring patient-specific drug resistance: For each patient, compute a drug resistance score as the weighted sum of pathway expression and preclinical regression coefficients: Score_patient = Σ_p Exp_patient,p × β_preclinical,p, using the top-ranked predictive pathway(s). Patients were median-split into predicted responders versus non-responders. Clinical validation used Kaplan–Meier overall survival and log-rank tests in treated cohorts; untreated or unknown-treatment cohorts served as controls. Comparators: (1) No feature selection using whole-transcriptome or whole-pathway features; (2) network-centrality-based feature selection (degree, betweenness, closeness), matching the number of proximal pathways; (3) direct neighbors of drug targets (first to third degree); (4) correlation-based feature selection (Bolis et al.) via 10 iterations of leave-half-out cross-validation selecting genes with significant average Spearman correlation (P<0.05) with IC50; (5) deep learning (Sharifi-Noghabi et al., MOLI-like) trained on organoid transcriptomics with cross-validated hyperparameters and combined loss (triplet + binary cross-entropy), then applied to patients. External validations: (a) Isogenic sensitive vs resistant cell lines: COAD (GSE81008) for 5FU; BLCA dataset from Yeon et al. for cisplatin. Pathway activity via ssGSEA; unpaired two-tailed t-tests compared sensitive vs resistant. (b) Bootstrapping feature selection: 10,000 iterations selecting random pathways matching the count of proximal pathways, training ridge on organoids, and recording the top survival-predictive and resistance-predictive pathway rank; empirical P-values computed as frequency of random ranks equal to or better than observed. (c) Concordance with known biomarkers: Predicted resistance scores were compared against mutation status of BRAFV600E (cetuximab in COAD) and ERCC2 (cisplatin in BLCA) via one-sided Mann–Whitney U tests, testing whether known resistance/sensitivity mutations associate with higher/lower predicted resistance, respectively. Batch effects were monitored via PCA; z-score standardization applied across datasets. Software and code: Python (pandas, numpy, scipy, scikit-learn, lifelines, gseapy, matplotlib). Network proximity code from https://github.com/emreg00/toolbox. Source code available at https://github.com/billy-kong/organoid_biomarker_detection.

Key Findings

- Network-proximity feature selection yielded focused pathway sets proximal to drug targets: for 5-fluorouracil (5FU) in COAD organoids, 37 proximal Reactome pathways; for cisplatin in BLCA organoids, 30 proximal pathways. - Top predictive biomarkers: • COAD/5FU: “Activation of BH3-only proteins” pathway showed highest predictive performance against IC50 in organoids. • BLCA/cisplatin: “Amino acid synthesis and interconversion” pathway was top predictive. - Clinical validation (Kaplan–Meier OS) using TCGA treated cohorts: • COAD, 5FU-treated: Predicted responders vs non-responders showed significant survival difference (log-rank P = 0.014; n=57 per group). No significant separation in patients without known 5FU treatment (P = 0.16; n=149 per group). • BLCA, cisplatin-treated: Significant survival separation (P = 0.01; responders n=39, non-responders n=38). No significant separation in no-treatment cohort (P = 0.066; n=147 per group). - Alternative ML and feature selection strategies underperformed: Whole-transcriptome and whole-pathway ridge regression did not predict survival (COAD: P=0.69 and P=0.81; BLCA: P=0.79 and P=0.82). Network-centrality and drug-target neighbor features were not significant (e.g., COAD P=0.96; BLCA P=0.37). Correlation-based selection (Bolis et al.) showed weak/non-significant results (COAD P=0.15; BLCA P=0.96). Deep learning (Sharifi-Noghabi et al.) was non-predictive (COAD P=0.91; BLCA P=0.82). - External isogenic validations: • COAD 5FU-sensitive vs resistant cell lines: BH3-only activation pathway activity higher in sensitive vs resistant (t-test P=0.0085; n=3 vs 9). • BLCA cisplatin-sensitive vs resistant cell lines: Amino acid synthesis and interconversion pathway activity lower in resistant vs sensitive (P=0.00022; n=3 vs 3). - Bootstrapping feature selection significance: Empirical P-values indicated non-randomness of identified biomarkers (COAD/5FU P=0.0012; BLCA/cisplatin P=0.014), and pathway size was not a confounder. - Concordance with known mutation biomarkers: • COAD cetuximab: Predicted resistance scores (from a proximal pathway, “Gastrin-CREB signaling via PKC and MAPK”) were higher in BRAFV600E mutants vs wild-type (one-sided Mann–Whitney P=0.037). • BLCA cisplatin: Predicted resistance scores were lower in ERCC2-mutated tumors vs wild-type (P=0.002), consistent with increased sensitivity. - Organoid model predictive performance against IC50 in internal validation: High correlations in test sets (COAD R^2=0.98; BLCA R^2=0.89).

Discussion

Embedding PPI network structure into ML feature selection at the pathway level reduced biological heterogeneity and yielded interpretable, robust biomarkers that translate from organoids to patients. The approach outperformed standard ML baselines (whole-genome/pathway, network centrality, target neighbors) and a deep-learning method when applied to transcriptomics alone. The BH3-only protein activation pathway likely links 5FU-induced DNA damage to apoptosis, aligning with prior observations that reduced BH3-only activity contributes to 5FU resistance. For cisplatin in bladder cancer, amino acid synthesis/interconversion pathways associate with response, consistent with reports that amino acid and polyamine metabolism is epigenetically downregulated in resistant cells. The biomarkers’ differential expression in isogenic sensitive/resistant lines and their concordance with known mutation biomarkers (BRAFV600E–cetuximab resistance; ERCC2–cisplatin sensitivity) further support biological validity and cross-omic consistency. Given organoids’ molecular similarity to patient tumors and ability to recapitulate treatment responses, organoid pharmacogenomics combined with network-informed ML can improve clinical response prediction while offering mechanistic insight. Interpretability is emphasized for high-stakes clinical decisions, and the pathway-level biomarkers provide testable hypotheses for therapeutic strategies.

Conclusion

This work introduces a systematic, interpretable framework that integrates organoid pharmacogenomic data with PPI network-based, pathway-level feature selection and ML to identify robust, translational drug-response biomarkers. The method accurately stratified colorectal (5FU) and bladder (cisplatin) cancer patients by survival, validated in isogenic models, and aligned with known mutation-based biomarkers. Future directions include integrating multi-omic layers (e.g., mutations, methylation, proteomics), leveraging pre- and post-treatment perturbation datasets (e.g., LINCS L1000) to refine causal links, expanding and standardizing organoid datasets with microenvironmental and immune components, and exploring therapeutic combinations such as amino acid metabolism targeting to enhance cisplatin efficacy.

Limitations

- Limited organoid sample sizes (19 COAD, 9 BLCA) may constrain model generalizability; although internal validation showed high R^2, larger cohorts are desirable. - Deep learning optimized for multi-omics may underperform on transcriptomics-only inputs; broader data types were not integrated here. - Cancer cell line validations may not fully represent primary tumors; careful selection is required, and organoids are preferred. - Lack of paired pre-/post-treatment molecular data in organoids limits causal inference; predictions were based on baseline expression. - Mechanistic links (e.g., BH3-only activity in COAD 5FU response; amino acid metabolism in BLCA cisplatin resistance) require further experimental elucidation. - Potential batch effects between datasets were mitigated via standardization and assessed by PCA, but residual confounding cannot be fully excluded. - Feature selection depends on PPI network coverage and confidence thresholds; incomplete interactome and pathway annotations could bias proximity estimates.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Drug ranking using machine learning systematically predicts the efficacy of anti-cancer drugs

H. Gerdes, P. Casado, et al.

Medicine and Health

Therapeutic efficacy of a MMAE-based anti-DR5 drug conjugate Oba01 in preclinical models of pancreatic cancer

C. Zheng, D. Zhou, et al.

Medicine and Health

Efficacy of early PET-CT directed switch to carboplatin and paclitaxel based definitive chemoradiotherapy in patients with oesophageal cancer who have a poor early response to induction cisplatin and capecitabine in the UK: a multi-centre randomised controlled phase II trial

S. Mukherjee, C. N. Hurt, et al.

Medicine and Health

A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes

I. Piazza, N. Beaton, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny