Medicine and Health

Prognosis Individualized: Survival predictions for WHO grade II and III gliomas with a machine learning-based web application

M. Karabacak, P. Jagtiani, et al.

This innovative research by Mert Karabacak, Pemla Jagtiani, Alejandro Carrasquilla, Isabelle M. Germano, and Konstantinos Margetis harnesses machine learning to predict survival outcomes for glioma patients, transforming clinical decision-making with personalized analytics integrated into a user-friendly web application. The predictive models, utilizing LightGBM and Random Forest, demonstrate impressive accuracy, ensuring that neuro-oncology practices are more data-driven and tailored to individual patient needs.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses the challenge of prognostication in WHO grade II and III gliomas, which exhibit heterogeneous biology and variable survival outcomes. Traditional statistical tools can struggle with large, high-dimensional, and heterogeneous data, and often require strict assumptions. The authors hypothesize that machine learning (ML) methods—capable of modeling non-linear relationships and incorporating a broad array of clinical, molecular, and imaging variables—can provide more accurate and individualized survival predictions. The purpose is to develop and deploy ML models that predict mortality at 12, 24, 36, and 60 months post-diagnosis for grade II and III gliomas, and to make these predictions accessible via a user-friendly web application to support personalized clinical decision-making.

Literature Review

Prior work on glioma survival prediction spans statistical models and ML approaches. Zhao et al. used Cox models, SVM, and Random Forest on 3,462 patients with common clinical variables, reporting c-indices of 0.757–0.771 but without a clear clinical deployment pathway. Gittleman et al. created a clinicopathologic nomogram with an online calculator for grade II/III gliomas, but generalizability is limited by a small sample (n=238 from TCGA and OBTS). Beyond clinicopathologic factors, radiomics approaches (e.g., Li et al.) and integrated radiomics-clinical models (e.g., Xu et al.) show promise but face translational barriers, such as undefined pathways for computing required signatures in routine practice. Numerous genomics-based models propose gene signatures for prognosis, yet limited adoption is due to the infrequent use of comprehensive genomic profiling in standard care. Tree-based ensemble methods (Random Forest, gradient boosting such as LightGBM) have repeatedly shown strong performance in clinical prediction tasks. This study builds on that literature by leveraging a large, national registry (NCDB) and providing deployable models with interpretability via SHAP and an accessible web application.

Methodology

Ethics: Use of de-identified NCDB data; study deemed exempt by the Icahn School of Medicine at Mount Sinai IRB. Data source: 2020 NCDB, a national registry from >1500 CoC-accredited institutions covering ~70% of US cancer diagnoses. Cohort: Adults (≥18 years) with histologically confirmed cranial WHO grade II or III gliomas diagnosed 2010–2017, identified via ICD-O-3 histologic and topography codes: diffuse astrocytoma (9400, grade II), anaplastic astrocytoma (9401, grade III), pleomorphic xanthoastrocytoma (9424, grade II), pilomyxoid astrocytoma (9425, grade II), oligodendroglioma (9450, grade II), anaplastic oligodendroglioma (9451, grade III), oligoastrocytoma (9382, grade II), anaplastic oligoastrocytoma (9382, grade III); topographic codes C71.0–C71.9. Predictors: Sociodemographics (age, sex, ethnicity, Spanish/Hispanic origin, payor, facility type/location); clinical presentation (Charlson-Deyo score, Karnofsky Performance Scale); diagnostics (preoperative diagnostic biopsy, laterality, location, focality, tumor size ordinal, histology); molecular markers (1p19q co-deletion, MGMT methylation, Ki-67); treatments (resective surgery, extent of resection, radiation, chemotherapy, immunotherapy). Missing categories for categorical variables labeled as 'Unknown/Other'; age had no missing. Outcomes: Binary mortality at 12, 24, 36, and 60 months post-diagnosis, defined using Vital Status and Last Contact/Death (months from diagnosis). Patients alive but with last follow-up before the timepoint were excluded from that timepoint’s analysis; those with missing vital status or follow-up were excluded. Modeling: Five supervised algorithms evaluated—TabPFN, TabNet, XGBoost, LightGBM, and Random Forest. Hyperparameter optimization with Optuna for all except TabPFN; hyperparameter spaces in Supplementary Table 3. Data split per outcome into train/validation/test sets (60/20/20). Class imbalance handled via SMOTE on training sets. Performance assessment included ROC and PR curves; metrics: sensitivity, specificity, accuracy, AUPRC, AUROC, and calibration via Brier score; confusion matrices generated. Top models for deployment selected by AUROC. Model interpretability used SHAP for feature importance (global and local) and partial dependence plots (PDPs). Web application: Deployed top-performing models per outcome with fixed hyperparameters (Supplementary Table 4) on a Hugging Face Space; code available and app demonstrated via video; URL: https://huggingface.co/spaces/MSHS-Neurosurgery-Research/G2G3-Glioma. Statistics: Descriptive statistics as appropriate for distributions; group comparisons between grade II and III via t-tests/Welch/Mann-Whitney and chi-squared; normality by Shapiro-Wilk; variances by Levene; significance at p<0.001.

Key Findings

Study population: 10,001 grade II and 11,456 grade III cranial gliomas from NCDB. Mean age: grade II 42±23 years; grade III 51±27 years. Female proportion: grade II 44.1%; grade III 44.5%. Per-timepoint included counts: grade II—12 mo: 9,748; 24 mo: 9,462; 36 mo: 8,938; 60 mo: 6,763. Grade III—12 mo: 11,161; 24 mo: 10,943; 36 mo: 10,572; 60 mo: 9,095. Top algorithms: LightGBM and Random Forest achieved best discrimination across outcomes (AUROC > 0.8). Grade II performance highlights: - 12-month mortality (Random Forest): AUROC 0.888 (95% CI 0.856–0.912); Sens 0.838; Spec 0.814; Acc 0.816; AUPRC 0.383; Brier 0.054. - 24-month mortality (LightGBM): AUROC 0.859 (0.804–0.867); Sens 0.712; Spec 0.839; Acc 0.816; AUPRC 0.523; Brier 0.083. - 36-month mortality (LightGBM): AUROC 0.813 (0.777–0.835); Sens 0.653; Spec 0.836; Acc 0.803; AUPRC 0.564; Brier 0.111. - 60-month mortality (Random Forest): AUROC 0.846 (0.815–0.863); Sens 0.684; Spec 0.835; Acc 0.787; AUPRC 0.748; Brier 0.142. Grade III performance highlights: - 12-month mortality (LightGBM): AUROC 0.876 (0.857–0.889); Sens 0.768; Spec 0.811; Acc 0.800; AUPRC 0.725; Brier 0.119. - 24-month mortality (Random Forest): AUROC 0.855 (0.839–0.870); Sens 0.722; Spec 0.810; Acc 0.796; AUPRC 0.775; Brier 0.153. - 36-month mortality (Random Forest): AUROC 0.878 (0.857–0.885); Sens 0.763; Spec 0.827; Acc 0.874; AUPRC 0.794; Brier 0.146. - 60-month mortality (LightGBM): AUROC 0.860 (0.834–0.870); Sens 0.816; Spec 0.748; Acc 0.930; AUPRC 0.795; Brier 0.142. Feature importance (SHAP): Age was the most important predictor for nearly all outcomes; histology and extent of resection were also influential. Overall, models showed good to excellent discrimination and were incorporated into a web tool for individualized predictions.

Discussion

The models demonstrate that tree-based ensemble methods (Random Forest and LightGBM) are well-suited to capture complex, non-linear relationships in heterogeneous clinical datasets for glioma prognosis. Differences in optimal algorithms across time horizons and tumor grades suggest varying prognostic patterns between grade II and III tumors, with Random Forest excelling for certain short- and long-term predictions in grade II, and LightGBM performing best for short- and long-term predictions in grade III while Random Forest excelled mid-term. SHAP analyses bolster interpretability: age consistently emerged as the strongest predictor, aligning with prior evidence that older age correlates with worse outcomes. Histology-specific survival differences and extent of resection also contributed substantially, concordant with literature emphasizing their prognostic relevance. The web application offers an avenue for clinicians to generate patient-specific survival estimates at 12, 24, 36, and 60 months, potentially aiding shared decision-making, risk stratification, and resource prioritization. Integration of global and local SHAP explanations fosters transparency and supports clinical trust by enabling clinicians to contextualize predictions with domain knowledge.

Conclusion

This work presents high-performing ML models for individualized survival prediction at multiple time points for WHO grade II and III gliomas, implemented in an accessible web application. By leveraging large-scale NCDB data and interpretable modeling (SHAP, PDPs), the approach advances beyond generalized population-based estimates toward personalized prognostication. The models achieved AUROC values consistently above 0.8 across all endpoints, indicating strong discrimination. Future research should focus on external validation, incorporation of additional clinical, imaging, and molecular predictors (e.g., IDH status), and impact analyses to determine clinical utility and effects on decision-making and outcomes in real-world settings.

Limitations

Key limitations stem from retrospective registry data: absence of several important variables (notably molecular markers such as IDH status for much of the study period, imaging details, method of extent-of-resection determination, comprehensive performance status, symptomatology, eloquent area involvement). Outcomes were limited to overall survival (no progression-free survival or malignant transformation analyses). Treatment data reflect initial regimens only, without accounting for subsequent therapies. Potential selection bias exists as NCDB includes only CoC-accredited facilities (~30% of US hospitals, though ~70% of cancers), and mortality is all-cause rather than disease-specific. The analysis window (2010–2017) avoided follow-up deficiencies but excluded more recent diagnoses to prevent systematic missingness; adding later years could bias results. Models have not undergone external validation, necessitating future validation studies to assess generalizability.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Efficacy of early PET-CT directed switch to carboplatin and paclitaxel based definitive chemoradiotherapy in patients with oesophageal cancer who have a poor early response to induction cisplatin and capecitabine in the UK: a multi-centre randomised controlled phase II trial

S. Mukherjee, C. N. Hurt, et al.

Medicine and Health

Radiogenomics for predicting p53 status, PD-L1 expression, and prognosis with machine learning in pancreatic cancer

Y. Iwatate, I. Hoshino, et al.

Education

Exploring students’ beliefs about web-based collaborative learning and their practices: a qualitative case study of university English-as-a-foreign-language readers

X. Zhang

Medicine and Health

A wearable sensor and machine learning estimate step length in older adults and patients with neurological disorders

A. Zadka, N. Rabin, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny