Medicine and Health
Multi-omic machine learning predictor of breast cancer therapy response
S. Sammut, M. Crispin-ortuzar, et al.
Neoadjuvant therapy (chemotherapy with or without targeted therapy) is increasingly used in breast cancer to improve breast-conserving surgery rates and survival. However, many patients have suboptimal responses. Prior predictors have relied on clinical, molecular, or digital pathology features, often combining heterogeneous patient treatments and using single-platform profiling, which fails to capture the complexity of tumour ecosystems. The authors hypothesize that improved prediction requires modeling tumours as complex ecosystems of malignant clones within stromal, vascular and immune microenvironments present before therapy. The study’s purpose is to comprehensively profile pre-treatment biopsies across multiple modalities and integrate these features via machine learning to predict therapy response, with external validation.
Existing work has identified associations between clinical factors and neoadjuvant response, and developed predictors, but these often lack multi-omic integration and may not reflect the tumour microenvironment’s role. Tumour ecosystems and microenvironmental organization are known to influence therapy response, yet most predictors in untreated tumours have not incorporated these elements. Prior reports connected genomic features (e.g., TP53 status, proliferation-related signals) and immune infiltration with response, and suggested that neoadjuvant efficacy, particularly in HER2+ disease, may involve mechanisms beyond proliferation alone. This study builds on that literature by combining clinical, genomic, transcriptomic, and digital pathology features to model the baseline tumour ecosystem and its relationship to response.
Study design and cohort: A prospective, multi-centre neoadjuvant study (TransNEO) enrolled 180 women with locally advanced breast cancer. Fresh-frozen pre-treatment core tumour biopsies were collected; 168 cases had DNA profiling and 162 had RNA data. Diagnostic H&E slides were reviewed for 166 cases. Paired post-surgery core biopsies were obtained from 31 tumours for residual disease scoring concordance.
Therapies and response assessment: Patients received neoadjuvant chemotherapy, with some receiving HER2-targeted therapy (e.g., trastuzumab) depending on receptor status. Treatment regimens included combinations such as anthracycline/cyclophosphamide and carboplatin/cyclophosphamide. Response was assessed at surgery using the Residual Cancer Burden (RCB) classification, including pathologic complete response (pCR) versus residual disease (RCB I–III).
Multi-omic profiling: DNA and RNA were extracted from pre-treatment biopsies. Genomic profiling included shallow whole-genome sequencing (n=168) and whole-exome sequencing (n=168) to identify somatic mutations and copy-number alterations, assess tumour mutation burden (TMB), mutational signatures (e.g., APOBEC, HRD), and chromosomal instability. RNA sequencing (n=162) profiled gene expression, with differential expression analysis (FDR<0.05), gene set enrichment analysis using MSigDB Hallmarks and Reactome, gene set variation analysis with Genomic Grade Index and other metagenes, and derivation of immune-related scores (e.g., cytolytic activity, cell-type signatures). Digital pathology quantified lymphocyte density on scanned H&E sections.
Feature discovery: Associations between individual features and pCR/RCB were tested using logistic and ordinal logistic regression, including univariable screens and stratified analyses (e.g., by HER2 status). Clinical factors (grade, ER status, nodal status), genomic alterations (e.g., TP53, PIK3CA), TMB, copy-number profiles, HLA class I locus loss of heterozygosity (LOH), and transcriptomic/immune features were assessed.
Machine learning modeling: An ensemble framework integrated multi-omic features to predict binary response (pCR vs residual disease). A total of 314 candidate features spanning clinical, DNA, RNA, digital pathology, and treatment sequence were considered. Within each modeling pipeline: (1) univariable selection and dimensionality reduction were applied; (2) an unweighted ensemble of three classifiers—elastic net logistic regression, support vector machine, and random forest—was trained; (3) the three model scores were averaged for the final predictor. Five-fold cross-validation optimized hyperparameters. Feature importance was summarized via average z-scores across modeling procedures.
External validation and simulations: Trained models were evaluated on an independent external cohort of 75 patients who received neoadjuvant therapy (including ARTemis trial control-arm cases within the PRECISION programme). Receiver operating characteristic AUCs were computed for models with increasing feature integration. A simulation illustrated potential clinical workflow impact under varying false-negative constraints in a hypothetical cohort of 100 candidates for neoadjuvant therapy.
- Clinical predictors (univariable): tumour grade (OR 4.2, CI 1.8–11.5, FDR=0.009), ER+ receptor status (OR 4.2, CI 2.1–9.1, FDR=0.002), and absence of nodal involvement (OR 3.1–4.6, FDR=0.01) associated with pCR. In multivariable analysis, only ER+ status remained associated (OR 3.8, CI 1.6–9.74, FDR=0.009), with substantial response heterogeneity.
- Genomics (n=168): 16,134 somatic mutations identified; frequent drivers included TP53, PIK3CA, GATA3, MAP3K1. TP53 mutations associated with pCR (OR 2.9, CI 1.3–6.6, p=0.01). PIK3CA mutations associated with residual disease (OR 2.1, CI 1.3–3.4, p=0.002). TMB higher in pCR tumours (median per Mb 2.3 vs 1.4; p=0.0005) and associated with RCB class (p=0.004). Predicted mutation counts were higher in pCR overall (median 17; p=0.009), predominantly in HER2+ tumours (p=0.004). APOBEC and HRD mutational signatures were enriched in pCR, with HRD showing a monotonic association with response (p=0.0005). Greater copy-number alteration burden and chromosomal instability associated with lower RCB (better response) (p=0.002). Integrative genomic clusters: IC10 (largely triple-negative) showed strongest association with pCR; some clusters (e.g., IC3/7) showed low likelihood of pCR.
- Immune evasion: HLA class I LOH (29 cases) associated with residual disease independently of global LOH and copy-number instability (p<0.05). Events were predicted to impair presentation of ~30% of neoantigens.
- Transcriptomics: 2,439 genes overexpressed and 2,071 underexpressed in pCR tumours (FDR<0.05). Drivers associated with response included overexpression of EGFR, CCNE1, MYC and underexpression of ESR1 and ZNF703; proliferation and immune activation pathways were strongly enriched in pCR. Genomic Grade Index and embryonic stem-cell metagenes correlated with histological grade and showed monotonic associations with RCB, with notable effects in HER2+ tumours. A tumour response metagene (proliferation−immune difference) was higher in HER2 tumours achieving pCR.
- Tumour microenvironment: Digital pathology lymphocyte density predicted pCR (p=0.006). Cytolytic activity (CYT) score monotonically associated with response (p=0.001) and correlated with lymphocyte density (R2=0.04, p=1×10^-4). HER2 tumours with residual disease showed higher T-cell dysfunction at diagnosis (p=0.006) and enrichment of inhibitory NK CD56dim cells (p=0.014) and regulatory T cells (p=0.02). Across the cohort, T-cell exclusion was higher in residual disease (p=0.02), with enrichment of cancer-associated fibroblasts (p=0.009) and tumour-associated macrophages (p=0.0009).
- Machine learning performance (external validation, n=75): AUCs—0.70 (clinical), 0.80 (clinical+DNA), 0.86 (clinical+RNA), 0.86 (clinical+DNA+RNA), 0.85 (multifactorial), 0.87 (fully integrated: clinical+DNA+RNA+digital pathology+treatment). Feature importance emphasized age, lymphocyte density, and expression of PGR, ESR1, ERBB2, along with proliferation, immune activation, and immune evasion markers. Predictor scores correlated with RCB class (training p=3×10^-10; validation p=1×10^-5).
- Clinical simulation (per 100 candidates): With no false negatives allowed, the clinical model would identify 15 non-responders; the fully integrated model, 31. Allowing two false negatives increased identified non-responders to 24 (clinical) and 52 (fully integrated).
Findings support that response to neoadjuvant therapy is largely determined by baseline tumour ecosystem characteristics. The tumour microenvironment—particularly lymphocyte infiltration, cytotoxic activity, and absence of immune evasion features such as HLA class I LOH—was a key determinant of response. Genomic correlates of response included TP53 mutations, higher TMB, APOBEC/HRD signatures, and chromosomal instability, with nuances by subtype (e.g., in HER2+ disease, response appeared independent of proliferation). Clonal diversity likely contributes to selection of resistant subclones. The similarity between features predicting response to cytotoxic therapy and those linked to immune checkpoint inhibitor benefit suggests shared mechanisms of tumour cell killing, potentially involving chemotherapy-induced immunogenic cell death in immune-engaged tumours. Integrating clinical, molecular, and digital pathology features via an ensemble machine learning framework significantly improved predictive performance over clinical variables alone and generalized to an external cohort. This approach could inform trial enrichment and therapy selection, prioritizing alternative strategies for predicted non-responders.
This study demonstrates that multi-omic characterization of pre-treatment breast tumour biopsies, combined with digital pathology, can be integrated via an ensemble machine learning framework to accurately predict neoadjuvant therapy response. The models performed well on external validation and aligned with continuous measures of residual disease. The work highlights the central role of the tumour ecosystem—genomic context, proliferation, immune activation, and immune evasion—in shaping response. The framework is adaptable and could be extended to other cancers and clinical settings, including adjuvant therapy. Future research should prospectively evaluate clinical utility, refine feature sets (e.g., through newer assays), and investigate mechanistic links between immune contexture, genomic features, and therapy response across subtypes.
Related Publications
Explore these studies to deepen your understanding of the subject.

