logo
ResearchBunny Logo
Radiogenomics and machine learning predict oncogenic signaling pathways in glioblastoma

Medicine and Health

Radiogenomics and machine learning predict oncogenic signaling pathways in glioblastoma

A. B. Ahanger, S. W. Aalam, et al.

Explore groundbreaking research by Abdul Basit Ahanger and team, as they harness radiogenomics and machine learning to non-invasively predict critical oncogenic signaling pathways in glioblastoma. This innovative approach, utilizing post-operative MRI scans and advanced analytics, reveals promising associations that could transform personalized cancer therapy.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge of non-invasively identifying deregulation of key oncogenic signaling pathways in glioblastoma (GBM)—notably RTK-RAS, PI3K, TP53, NOTCH, and WNT—from routine MRI. GBM is a WHO grade 4 tumor with poor outcomes (median survival ~15–20 months; ~6.9% 5-year survival). While MRI aids diagnosis and assessment, targeted therapies require genetic profiling often obtained via invasive procedures. Radiogenomics offers a non-invasive strategy to infer genetic alterations from imaging phenotypes. The research question is whether radiomic features from multi-parametric MRI, combined with ML, can predict alterations in oncogenic pathways implicated in gliomagenesis, enabling personalized treatment selection and improved outcomes.
Literature Review
Prior work demonstrates links between imaging phenotypes and tumor genotypes (radiogenomics) and supports using AI/ML for pattern discovery in large imaging-genomic datasets. Key GBM pathways—including PI3K (often activated via PTEN loss), RTK-RAS (RTK overactivation), TP53 (frequent disruption), NOTCH (interacts with TP53, influences apoptosis), and WNT (dysregulated in GSCs)—drive tumor growth, survival, and therapy resistance. Studies have shown pathway crosstalk (e.g., Ras-ERK and PI3K-mTOR), co-alterations in GBM, and feasibility of radiomic associations with molecular markers. Conventional identification of pathway alterations via expression/genomic profiling is resource-intensive, motivating radiomics-based predictive approaches.
Methodology
Data: Post-operative multi-parametric MRI (T1w, contrast-enhanced T1w/T1c, T2w, FLAIR) with expert-verified segmentation labels (ET, NET, ED) from BRATS-19 (TCGA n=167; CPTAC n=19). Scans underwent skull-stripping, co-registration, and resampling to 1 mm³. Patient identifiers were mapped to TCGA/CPTAC via CBICA-provided mappings. Pathway alterations: Oncogenic pathway alteration data were manually scraped from cBioPortal (TCGA PanCancer Atlas GBM and LGG; CPTAC GBM; Firehose Legacy for missing entries). Nine pathways were initially collected; NRF2 and TGF-β had no alterations and were excluded. Class imbalance motivated focusing on five pathways with less severe imbalance: PI3K, TP53, RTK-RAS, NOTCH, and WNT. Radiomics: Using PyRadiomics (IBSI-compliant), 1284 features derived from 107 base features were extracted across modalities and regions, including first-order, shape, and texture families (GLCM, GLDM, GLRLM, GLSZM, NGTDM). Features were standardized and normalized to [0,1]. Feature engineering: To reduce dimensionality and multicollinearity, features with correlation to target pathway >0.1 and inter-feature correlation <0.9 were retained. Feature importance was computed using Random Forest (sklearn, default params) on training folds to select top 10 features per pathway (reported per dataset variant). Data balancing and splits: SMOTE addressed class imbalance in training. Three dataset strategies were created: (i) over_split—validation combined CPTAC with selected TCGA samples to balance classes based on defined rules using majority/minority ratios; (ii) under_split—validation set used only CPTAC balanced via under-sampling (majority limited to 2x minority), with remaining majority cases returned to training; (iii) under_split_pure—validation set only CPTAC balanced via under-sampling with excess discarded. Models and training: Five supervised classifiers—Logistic Regression (LRC), Support Vector Classifier (SVC, RBF kernel), Random Forest (RFC), AdaBoost (ABC), and K-Nearest Neighbors (KNN)—were trained per pathway. Hyperparameters were tuned via Grid Search. Performance estimation used 5-fold cross-validation on training sets. Best-tuned models were then evaluated on held-out validation/test sets for each split. Metrics included accuracy and ROC_AUC (training CV), and on validation/test: accuracy, precision, recall, and F1-score. Additional analyses included pathway alteration distributions (cohort-level) and intersection patterns (upset plot).
Key Findings
- Cohort pathway alteration rates (n≈186 across TCGA and CPTAC): TP53 76.05%, RTK-RAS 61.68%, PI3K 39.52%, NOTCH 25.75%, WNT 19.76% (HIPPO 14.37%, MYC 8.38%; NRF2 and TGF-β ~1–2% and excluded from modeling). Only TP53 and RTK-RAS had more positives than negatives. - Intersections: RTK-RAS, PI3K, and TP53 frequently co-occurred (e.g., RTK-RAS+TP53 in 31 cases; TP53+PI3K in 14; all three in 30), whereas NOTCH and WNT more often appeared independently (9 and 16 cases respectively). All four (RTK-RAS, PI3K, TP53, NOTCH) were present together in 7/167 cases. - Radiomic associations: Most targeted pathways showed positive associations with MRI-derived radiomic features across modalities/regions. - Best AUCs on test data: RTK-RAS 0.70, PI3K 0.80, TP53 0.75, NOTCH 0.40 (indicating strong predictability for PI3K/TP53, moderate for RTK-RAS, poor for NOTCH; WNT AUC not reported). - Cross-validation accuracies (means across folds) were generally highest for RFC across pathways and dataset variants (e.g., WNT up to ~0.88, TP53 up to ~0.85; see Table 4). SVC and KNN also performed competitively depending on pathway and split. - Validation behaviors varied with class imbalance and dataset composition: certain splits (e.g., under_split) led to degenerate predictions for some pathways (e.g., all-positive predictions affecting precision/recall), reflecting sensitivity to imbalance and small validation sizes. - Feature subsets: Top discriminative features per pathway often involved textural GLCM/GLDM/GLSZM metrics and first-order statistics from NET and ED regions, with notable contributions from T2 and FLAIR sequences for RTK-RAS and WNT and from T1c/T1 for PI3K/TP53.
Discussion
The models demonstrate that radiomic phenotypes from standard MRI can non-invasively capture alterations in key oncogenic signaling pathways relevant to GBM biology and therapy. The strongest predictability was observed for PI3K and TP53, supporting the hypothesis that pathway deregulation manifests in imaging textures and intensities across tumor subregions (ET, NET, ED). Moderate predictability for RTK-RAS and poor for NOTCH suggest differing radiographic expressivity and potential confounding by pathway crosstalk. The upset plot confirmed frequent co-alterations among RTK-RAS, PI3K, and TP53, aligning with known gliomagenesis mechanisms and potentially complicating single-pathway classification due to shared phenotypes. Ensemble methods (RFC) offered robust performance across splits, indicating value as baseline models in radiogenomic tasks. However, validation performance varied with class imbalance and small CPTAC-derived test sets, highlighting the importance of balanced, adequately sized, and externally validated cohorts. Overall, the findings support radiogenomics as a viable adjunct to invasive profiling, with potential to inform targeted therapy selection and precision oncology workflows.
Conclusion
Integrating radiomic features from multi-parametric MRI with machine learning enables non-invasive prediction of deregulation in oncogenic signaling pathways in GBM. The approach achieved promising discrimination for PI3K and TP53 pathways and moderate performance for RTK-RAS, suggesting clinical utility for pathway-aware treatment planning and patient stratification. Future work should: (1) expand cohorts to better capture GBM heterogeneity and reduce imbalance; (2) perform external validation on independent datasets; (3) move toward end-to-end multi-label deep learning models to predict multiple pathways simultaneously; and (4) evaluate generalization to additional cancer types and prospective clinical settings.
Limitations
- Reliance on public datasets (BRATS-19, TCGA, CPTAC) may not capture full GBM heterogeneity. - Limited sample sizes for several pathways led to class imbalance; even with SMOTE, synthetic samples may not reflect true biological complexity, affecting generalization. - Some validation/test splits contained very small CPTAC-derived sets, yielding unstable estimates and degenerate classifier behavior (e.g., all-positive predictions). - Absence of external, independent validation reduces certainty of clinical applicability. - Manual scraping of pathway data may introduce selection or mapping errors; pathway definitions across sources may vary.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny