
Medicine and Health
Hyperspectral imaging benchmark based on machine learning for intraoperative brain tumour detection
R. Leon, H. Fabelo, et al.
This research delves into the innovative combination of hyperspectral imaging and machine learning to enhance intraoperative brain tumor detection. Conducted by a team of experts, the study presents a benchmark that could lead to the development of real-time decision support tools in neurosurgery, showcasing a promising median macro F1-Score of 70.2%.
~3 min • Beginner • English
Introduction
The study addresses the intraoperative challenge of accurately identifying and delineating brain tumour margins to maximize safe resection and preserve neurological function. Existing guidance modalities (IGS, iMRI, ultrasound, and fluorescence agents like 5-ALA and fluorescein) have limitations such as brain shift, infrastructure demands, artifacts, learning curve, and inconsistent sensitivity (especially for low-grade gliomas). Hyperspectral imaging offers label-free, non-contact, near real-time guidance by acquiring dense spectral information per pixel that reflects tissue biochemical composition. The objective is to evaluate and benchmark a machine learning-based framework using intraoperative in-vivo hyperspectral imaging for identification and delineation of brain tumours (primary high- and low-grade and secondary) with robust inter-patient validation, aiming toward real-time decision support in neurosurgery.
Literature Review
The paper situates HSI within a growing body of work across oncology, pathology, ophthalmology, dermatology, and gastroenterology, where AI and increased computational power have enabled promising diagnostic performance. Prior intraoperative brain HSI studies often used small datasets, focused on high-grade tumours, and employed intra-patient or leave-one-patient-out methodologies that can overestimate performance. Reported OAs ranged widely (60–99%), frequently on limited images/patients. Fluorescence-guided surgery (5-ALA, ICG, FS) provides real-time visualization but exhibits variable sensitivities (e.g., ~71% for 5-ALA in malignant gliomas; ~96–97% for ICG in select settings; ~63–81% for FS), may not detect most low-grade gliomas, and requires contrast administration with potential side effects. The present work contributes a larger, diverse inter-patient database (61 images, 34 patients; grades 1–4; primary and secondary) and a three-way partitioned, 5-fold cross-validated benchmark, comparing multiple ML/DL and unmixing methods and integrating spatial-spectral processing.
Methodology
Study population and data: Adults (>18 years) undergoing brain surgery at University Hospital of Gran Canaria Doctor Negrín (Spain) across three campaigns (2015–2019) with informed consent and ethics approval. Final dataset: 61 intraoperative HS images from 34 patients (28 primary; 6 secondary), with demographics and tumour characteristics recorded.
Acquisition: Custom VNIR pushbroom HSI system (Hyperspec VNIR A-Series, 400–1000 nm, 826 channels; 2–3 nm spectral resolution; max 741×1004 px). Illumination via 150 W QTH lamp through fiber cold light; working distance 40 cm; pixel size 128.7 μm; max acquisition time 60 s.
Labelling and classes: Regions of interest manually cropped. Ground-truth maps created with semi-automatic SAM-based tool, guided by surgeons and neuropathologists. Classes: tumour tissue (TT), normal tissue (NT), blood vessel (BV), background (BG). Only high-confidence pixels labelled; some images lack TT labels.
Pre-processing: Radiometric calibration using white and dark references: CI = (RI−DI)/(WI−DI). Noise reduction via 5-point moving average. Removal of first 56 and last 126 bands due to sensor noise, yielding 645 bands (440.5–909.1 nm). Spectra converted to absorbance for hemoglobin comparisons as A(λ) = −log(R(λ)). Dimensionality reduction by spectral decimation to 128 bands (optimal 3.61 nm interval). Per-spectrum min–max normalization to [0,1].
Training data reduction: To reduce computational cost while preserving representativeness, per-class K-means (K=100) over labelled training pixels; select n most similar pixels to each centroid via SAM, producing reduced training sets of 1000, 2000, or 4000 pixels per class (n∈{10,20,40}). Validation indicated no significant performance loss; 1000/class used for efficiency.
Algorithms: Supervised ML/DL—SVM with linear (SVM-L) and RBF kernels (SVM-RBF; LIBSVM), Random Forest (RF), k-NN with Euclidean (KNN-E) and Cosine (KNN-C) distances, and a 1D two-hidden-layer DNN (ReLU activations, batch norm, learning rate 0.1, 300 epochs; hidden layer size optimized). Unmixing—linear EBEAE and nonlinear NEBEAE with class-specific endmember counts (NT 2, TT 2, BV 1, BG 3); similarity (p) and entropy weight (γ) tuned; BV endmember as class mean.
Spatial-spectral framework: PCA to first component for structure; supervised per-pixel probability maps; spatial KNN filtering (λ=1, K=40, window 8 rows, Euclidean) to smooth using PCA + probabilities; unsupervised hierarchical k-means (HKM) segmentation (K=24); majority voting (MV) to assign cluster labels. Three maximum density (TMD) maps combine cluster-wise class proportions (R=TT, G=NT, B=BV) for visualization.
Data partition and validation: Inter-patient three-way split at patient level per fold: training (60%), validation (20%), test (20%). Five folds with distinct patient assignments. Hyperparameters optimized on validation via coarse search maximizing macro F1-Score (excluding BG). Performance metrics: macro F1-Score (mean over NT, TT, BV), overall accuracy (OA), sensitivity, specificity. Statistical analysis: paired two-sided Wilcoxon Rank Sum tests at 5% for spectral differences between classes/grades.
Interpretability: Post-hoc LIME applied to RF, KNN-E, KNN-C, DNN (fold 1 models) to identify top-10 influential spectral bands per class.
Key Findings
Spectral characterization:
- Significant spectral differences (Wilcoxon, p<0.05) across all bands for TT vs NT and TT vs BV after pre-processing and normalization. Interpatient variability yields broad standard deviations; acquisition constraints (non-flat surfaces, focus/illumination) noted.
- Absorbance analysis aligns with hemoglobin features: increased absorbance 500–600 nm (HbO2 peaks ~540 and ~575 nm; deoxyHb ~555 nm). A distinct ~760 nm deoxyHb-related peak/valley present across tumour types; strongest deoxyHb contribution in BV, lower in TT, absent in NT—consistent with tumour hypoxia.
- Tumour subtypes: Secondary tumours show lower SD (likely fewer cases), yet significant median differences at 440–599, 602–756, and 769–909 nm. High- vs low-grade primary tumours differ significantly at 466–510, 522–549, 559–572, and 580–909 nm. Grade-wise: G1 vs G2 differ across all bands; G3 vs G4 differ at 440–460, 578–644, 745–764, 779–909 nm.
Validation (spectral only):
- Hyperparameters stabilized across folds; no significant differences among training set sizes (1000/2000/4000 per class). Chosen: 1000/class for efficiency.
- Unmixing methods (EBEAE/NEBEAE) underperformed vs ML/DL. Best median macro F1-Score by SVM-RBF: 78.4 ± 5.1%. Highest OA: 91.5 ± 4.7% (SVM-RBF). Highest TT sensitivity: 65.9 ± 13.1% (DNN). Specificities generally >90% for ML/DL.
Validation (spatial-spectral):
- Including spatial info (KNN filtering) increased median macro F1-Score by 0.4–7.7% and reduced SDs by 0.2–3.7% across most algorithms; Majority Voting alone reduced performance, likely due to cluster-majority assignment effects.
- At Spatial/Spectral stage, SVM-RBF achieved highest OA (92.3 ± 4.6%); DNN achieved best TT sensitivity (68.9 ± 14.3%), closely followed by SVM-L (67.7 ± 19.3%).
Test set (spatial-spectral):
- Slight performance drop vs validation (≈0.5–1%). Best median macro F1-Score on test: 70.2 ± 7.9% (DNN; spectral+spatial). OA similar for SVM-L (86.6 ± 5.5%) and DNN (86.8 ± 3.4%). TT sensitivity: SVM-L 57.8 ± 23.7%, DNN 54.7 ± 21.9%. Specificities generally >90% across classes.
Qualitative outcomes:
- TMD maps delineated glioblastoma margins (red) and highlighted hypervascular regions (blue) consistent with surgical expectation, including infiltrative patterns. Low-grade tumours (e.g., G2 oligodendroglioma, G1 ganglioglioma) and secondary metastases (e.g., breast carcinoma) were also identified.
Interpretability:
- LIME identified influential bands corresponding to physiological features (HbO2 ~540, ~575 nm; deoxyHb ~555, ~760 nm). Extreme bands (<464 nm, >835 nm) rarely among top-10 important features.
Comparative context:
- Prior HSI studies often small and intra-patient; reported OAs up to ~99% not generalizable. Present inter-patient, k-fold framework provides more realistic benchmark performance.
- Fluorescence imaging shows sensitivities ~63–97% depending on agent and cohort but requires exogenous contrast; HSI is label-free and non-contact.
Discussion
The study demonstrates that intraoperative VNIR hyperspectral imaging, combined with a spatial-spectral ML/DL pipeline, can identify and delineate brain tumour tissue in vivo across primary (both high- and low-grade) and secondary tumours. Spectral distinctions between TT, NT, and BV reflect underlying hemodynamics and oxygenation (notably deoxyhemoglobin), supporting the biological plausibility of HSI-based classification. Incorporating spatial context mitigates pixel-level noise and reduces false positives, improving robustness. The inter-patient three-way partition with 5-fold cross-validation yields realistic performance estimates, addressing overoptimism in prior intra- or LOPO-designs. Achieved test macro F1-Score (~70%) and high OA with strong specificity indicate potential as a decision-support tool for surgical guidance, with qualitative TMD maps aligning with surgeon expectations and revealing infiltrative patterns. Interpretability analysis further links model decisions to physiologic spectral features, enhancing trust and clinical translatability.
Conclusion
This work provides a benchmark and validated processing framework for intraoperative in-vivo brain tumour detection and delineation using hyperspectral imaging and machine learning. On a diverse dataset (61 images, 34 patients; grades 1–4; primary and secondary), the spatial-spectral approach achieved a best median test macro F1-Score of 70.2 ± 7.9%, high overall accuracy, and strong specificity, while qualitatively delineating tumour margins and vascular features. Contributions include comprehensive spectral characterization across tissue and tumour types, rigorous inter-patient validation across eight algorithms, and an interpretable pipeline with TMD visualization.
Future work: optimize acquisition (e.g., snapshot sensors, integration into surgical microscopes) to improve focus and reduce acquisition time; expand to multi-centre clinical trials with larger cohorts; perform thorough pathological correlation at tumour margins and MRI correlation to assess infiltration detection; evaluate impact on surgical outcomes, safety, and workflow; and further develop real-time deployment on GPU platforms.
Limitations
- Acquisition constraints: pushbroom system sensitivity and focusing challenges, particularly on non-flat or deep-layer brain surfaces, led to suboptimal illumination/focus in some images (notably >700 nm), causing decreased reflectance and misclassifications.
- Sensor spectral sensitivity: reduced performance in extreme bands necessitated band trimming; IR-range sensitivity issues affected certain cases (e.g., Op55, Op56).
- Labelling limitations: only high-confidence pixels were labelled; some images lacked tumour labels; ground truth is limited to accessible surface regions and biopsy guidance.
- Dataset composition: fewer secondary tumour cases and variable image quality contribute to heterogeneity and wider performance variability (notably TT sensitivity SDs).
- Majority voting approach can degrade performance by oversimplifying cluster labels; nevertheless, TMD visualization mitigates by conveying class mixtures.
- Clinical validation: results are preclinical/observational; prospective randomized or controlled studies comparing against standard modalities are needed for clinical efficacy and utility assessment.
Related Publications
Explore these studies to deepen your understanding of the subject.