Medicine and Health

Prediction of recurrence risk in endometrial cancer with multimodal deep learning

S. Volinsky-fremond, N. Horeweg, et al.

This groundbreaking research conducted by Sarah Volinsky-Fremond, Nanda Horeweg, and colleagues introduces HECTOR, a cutting-edge deep learning prognostic model that predicts distant recurrence of endometrial cancer more effectively than the current gold standard. By leveraging histopathology images and tumor stages from over 2,000 patients, HECTOR enhances personalized treatment for patients with endometrial cancer.

00:00

~3 min • Beginner • English

Index

Introduction

Endometrial cancer is the most common gynecologic malignancy in high-income countries, with 10–20% of patients developing typically incurable distant recurrences after surgery. Adjuvant chemotherapy can lower recurrence risk but causes toxicity, and current recommendations rely on complex combinations of clinicopathological risk factors and molecular classification (POLEmut, MMRd, NSMP, p53abn). These approaches face challenges including interobserver variability, cost, turnaround time, and incomplete capture of prognostic information inherent in histology slides. Prior deep learning (DL) models have predicted molecular alterations, tumor composition, and prognosis from H&E whole-slide images (WSIs), with newer self-supervised and attention-based architectures showing promise. However, many multimodal approaches require non-routine assays (e.g., multiplex IF, genomic/transcriptomic data). The study’s aim was to develop and validate a clinically deployable multimodal DL model (HECTOR) that predicts distant recurrence risk using only routine inputs (H&E WSIs and FIGO stage), thereby enabling personalized adjuvant treatment decisions.

Literature Review

Recent DL advances in pathology include prediction of molecular alterations, cell composition, and survival from H&E WSIs using attention-based MIL, graph models, and transformers. Multimodal DL can outperform unimodal image-only models, but prior EC studies often required complex profiling (e.g., multiplex IF, genomics/transcriptomics) and achieved modest survival discrimination (C-index ~0.63–0.69). The authors’ prior im4MEC model predicted EC molecular class from H&E WSIs and was prognostic. Existing clinicopathological-molecular risk stratification is effective but limited by complexity and resource needs. There remained a need for a model using routine diagnostic data to predict distant recurrence and potential chemotherapy benefit.

Methodology

Study design and cohorts: Tumor-containing H&E-stained WSIs and clinicopathological, molecular, and distant recurrence data were collected for 2,072 stage I–III EC patients across eight cohorts (including PORTEC-1/2/3 trials). Two external population-based cohorts were held out for testing: UMCG (n=160) and LUMC (n=151; up to three WSIs per case). Remaining cases were split into an internal training (n=1,408; fivefold cross-validation) and an internal test set (n=353). Patients receiving adjuvant chemotherapy (n=225, largely PORTEC-3) were excluded from training/testing but used for downstream chemotherapy benefit analysis. For self-supervised representation learning, the training set was augmented with TCGA-UCEC and cases without outcome or stage IV disease, totaling 1,862 WSIs. WSI processing and self-supervised learning: Tissue segmentation via Otsu thresholding; non-overlapping patching at 180 µm; patches resized to 256×256 px. Average ~10,185 patches/WSI, compressed by spatial/semantic averaging to ~1,723. A modified EsVIT (Swin transformer-based) self-supervised model was trained on 3,702,447 patches from 1,862 WSIs, extracting patch features from the last eight transformer blocks (feature dimension 3,456). Training utilized DINO heads, AdamW, cosine schedulers, and data augmentation as per EsVIT. HECTOR architecture: A multimodal, three-arm model for time-to-event prediction. Inputs: (1) H&E WSI patch features processed with attention-based multiple instance learning (AttentionMIL) to produce a slide-level embedding; (2) image-based molecular class (im4MEC-predicted categorical class: imPOLEmut, imMMRd, imNSMP, imp53abn) embedded via learnable embeddings; (3) categorical FIGO 2009 stage I–III embedded similarly. Gating-based attention with bilinear product weights modality contributions; embeddings fused via Kronecker product, followed by fully connected layers to a discretized survival head using negative log-likelihood loss over four time intervals. Dropout (0.25) and ReLU activations were used. Ablations compared unimodal H&E, two-arm (H&E+im4MEC), and three-arm (HECTOR) models; alternative architectures (graph networks, transformers) and multitask learning were tested. Training and evaluation: Fivefold cross-validation on the training set guided architecture selection by mean C-index (tau=10 years; scikit-survival). Final HECTOR retrained on full training data and evaluated on the internal test set and two external test sets. Additional metrics: cumulative time-dependent AUC and integrated Brier score. For LUMC with multiple WSIs per patient, experiments aggregated patient-level risk by random single-WSI selection (repeated 100×), mean/median across 2–3 WSIs, or combining WSIs into a single bag. Risk group thresholds were set by training set quantiles (low: <median; intermediate: median–Q3; high: >Q3) and applied across datasets. Explainability and correlates: Integrated Gradients (IG) quantified WSI contributions to risk; contributions of im4MEC class and stage were assessed via counterfactual risk score differences. Regions with highest positive/negative contributions (top 5%) were reviewed by an expert pathologist. Quantitative image analyses measured inflammatory cell density (Hover-Net-based), mitotic figures (DL detector fine-tuned on EC), and tumor nuclei size. Spatial analysis examined overlap with tumor vs invasive border areas. Genomic/transcriptomic correlates were analyzed in TCGA-UCEC (n=381 stage I–III) for driver mutation frequencies (OncoKB-annotated MC3), immune cell subsets (CIBERSORT), and differential expression (DESeq2), including adjustments for molecular class and tumor mutational burden. Statistical analyses: C-indices for DL and Cox proportional hazards (CPH) models (clinicopathological/molecular inputs). Kaplan–Meier and log-rank tests for distant recurrence-free probabilities; univariable/multivariable CPH for HRs including a combined CLINICAL risk score. Interaction analyses tested HECTOR (continuous and categorical) by treatment (EBRT vs EBRT+CT) in PORTEC-3, with comparisons to established high-risk factors (serous histology, stage III, p53abn). Two-sided P<0.05 considered significant.

Key Findings

- Performance: Fivefold cross-validation mean C-index: unimodal H&E 0.775 (95% CI 0.748–0.802); two-arm (H&E+im4MEC) 0.782 (0.759–0.805); HECTOR (H&E+im4MEC+stage) 0.795 (0.768–0.822). Internal test set C-index 0.789; UMCG external test set C-index 0.828; LUMC external test set C-index up to 0.815 (median aggregation across up to 3 WSIs). - Risk stratification: Internal test 10-year distant recurrence-free probabilities: low 97.0% (95% CI 0.930–0.988; n=175), intermediate 77.7% (0.670–0.854; n=82), high 58.1% (0.469–0.677; n=96); log-rank P=1.78×10⁻¹⁰. Continuous HECTOR HRs: training HR=5.06 (95% CI 4.35–5.89; P=9.00×10⁻¹¹), internal HR=2.69 (2.07–3.49; P=1.31×10⁻¹⁵), UMCG HR=5.84 (3.06–11.14; P=8.37×10⁻⁷). UMCG 5-year distant recurrence-free probabilities: low 93.9% (0.859–0.974; n=102), intermediate 91.4% (0.756–0.972; n=44), high 19.0% (0.0097–0.553; n=14); log-rank P=5.56×10⁻¹⁰; HR high vs low 20.42 (95% CI 5.92–70.50; P=2.00×10⁻⁶). - Comparison to standards: HECTOR outperformed CPH models using comparable inputs (base CPH with H&E-defined features: C-index 0.681; +stage: 0.716; +molecular class: 0.762). In multivariable analyses (n=1,254), HECTOR remained independently prognostic (HR=5.26; 95% CI 4.21–6.56; P=2.30×10⁻⁴⁸); only FIGO stage III retained significance (HR=1.50; 95% CI 1.05–2.14; P=0.026). Other factors (e.g., LVSI, POLEmut, p53abn) lost prognostic value when HECTOR was included. - External robustness with multiple WSIs (LUMC): Random single-WSI selection mean C-index 0.802; aggregation across 2 WSIs: 0.810; 3 WSIs: 0.813–0.815; combining WSIs into one bag: 0.805. 5-year distant recurrence-free probabilities (median aggregation): low 98.4% (95% CI 0.891–0.998; n=70), intermediate 74.8% (0.534–0.874; n=44), high 52.6% (0.323–0.694; n=37); log-rank P=1.00×10⁻⁹. Continuous HECTOR HR=3.73 (95% CI 2.34–5.96; P=3.17×10⁻¹⁰). - Chemotherapy benefit (PORTEC-3): Significant interaction between treatment and HECTOR risk (continuous P_INTERACTION=0.014; categorical P_INTERACTION=0.064). In HECTOR high-risk (n=173), EBRT+CT improved 5-year distant recurrence-free probability vs EBRT alone: 62.2% (95% CI 0.511–0.715) vs 42.0% (0.311–0.526); log-rank P=0.007; HR=0.561 (95% CI 0.366–0.862; P=0.008). No benefit in low- or intermediate-risk groups. Predictive utility exceeded that of serous histology, FIGO stage III, and p53abn. - Explainability and correlates: H&E WSIs contributed positively to risk with higher contributions in grade 3 EEC, non-EEC, and LVSI. Low-risk features: smooth luminal borders, inflamed stroma, intraepithelial lymphocytes/neutrophils, abundant compact myometrium. High-risk features: ragged luminal surface (hobnailing), LVSI, solid growth with marked nuclear atypia, desmoplastic reaction, mitotic figures. Quantitatively, low-risk regions had higher inflammatory cell density (P=0.011); high-risk regions had higher mitotic density and larger nuclei (both P<0.001). Genomics (TCGA-UCEC): ARID1A, CTCF, CTNNB1, FGFR2, KRAS, PTEN enriched in low risk (all P<0.005); PPP2R1A and TP53 enriched in high risk (P=2.19×10⁻³; 2.81×10⁻⁷). Transcriptomics: higher HECTOR scores correlated with memory B cells and activated dendritic cells; inversely with CD8+ T cells, Tfh, Tregs, and NK activation (independent of molecular class and TMB). High-risk upregulated L1CAM and CLDN6; low-risk upregulated hormone signaling genes (e.g., C1orf64, OVGP1).

Discussion

HECTOR addresses the need for clinically deployable, accurate prediction of distant recurrence in EC using routine diagnostics. By integrating WSI-derived morphology, image-based molecular class, and FIGO stage, HECTOR outperformed clinicopathological/molecular standards and provided powerful risk stratification validated internally and in two external cohorts. The model also predicted which high-risk patients benefit from adjuvant chemotherapy in a randomized trial setting, outperforming current selection criteria, thus enabling more precise treatment escalation or de-escalation. Explainability analyses aligned with known EC biology: immune infiltration associated with favorable prognosis, adverse morphology (LVSI, high-grade atypia, desmoplasia, mitoses) associated with higher risk, and genomic/transcriptomic profiles consistent with low vs high-risk phenotypes. These findings suggest HECTOR captures complex, nonlinear morphological and clinico-molecular interactions beyond categorical pathology assessment and may reveal therapeutic targets (e.g., CLDN6).

Conclusion

The study presents HECTOR, a multimodal DL model predicting distant recurrence risk in stage I–III EC from routine H&E WSIs and FIGO stage, with superior discrimination to current gold standards and validated across multiple cohorts. HECTOR stratifies patients into clinically meaningful risk groups and predicts benefit from adjuvant chemotherapy in high-risk cases, supporting personalized adjuvant decisions and potential reduction of unnecessary treatment. Future work includes prospective validation (e.g., PORTEC-4a), testing in more diverse populations, integrating additional routine modalities (e.g., multi-section WSIs, radiology, IHC), refining multimodal fusion, and exploring targeted therapies informed by HECTOR-derived biomarkers.

Limitations

- The MIL-based architecture is not explicitly spatially aware and was not designed to leverage cross-slide context; while tested alternatives did not outperform it here, future context-aware or multi-WSI fusion may improve performance. - Some patients lacked full surgical lymphadenectomy staging, potentially introducing noise into the stage input and contributing to residual prognostic value of stage III. - Rare POLEmut cases may have risk overestimated by HECTOR given their low propensity for metastasis. - Not all morphological correlates (e.g., certain structural changes) were quantifiable due to limited labeled datasets for EC-specific image analysis tools. - External cohorts were largely of European ancestry; broader validation and prospective trials are needed to confirm generalizability. - The therapeutic landscape is evolving; the optimal systemic therapy for HECTOR high-risk patients requires ongoing validation.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Prediction of tumor origin in cancers of unknown primary origin with cytology-based deep learning

F. Tian, D. Liu, et al.

Medicine and Health

A multimodal deep learning approach for the prediction of cognitive decline and its effectiveness in clinical trials for Alzheimer’s disease

C. Wang, H. Tachimori, et al.

Medicine and Health

Prediction of mortality risk and duration of hospitalization of COVID-19 patients with chronic comorbidities based on machine learning algorithms

P. Amiri, M. Montazeri, et al.

Psychology

Naturalistic multimodal emotion data with deep learning can advance the theoretical understanding of emotion

T. Angkasirisan

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny