logo
ResearchBunny Logo
Non-invasive early detection of cancer four years before conventional diagnosis using a blood test

Medicine and Health

Non-invasive early detection of cancer four years before conventional diagnosis using a blood test

X. Chen, J. Gole, et al.

Early cancer detection could drastically improve survival rates. This research, led by a team of renowned authors, reveals that a novel blood test identified multiple cancers in asymptomatic individuals up to four years before traditional diagnoses. The findings harbor significant implications for non-invasive cancer screening.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the pressing need for effective early cancer detection, as late-stage cancers often lack effective treatment and have substantially lower five-year survival than early-stage cancers. While some screening modalities exist (e.g., colonoscopy, PSA testing, mammography, cervical cytology), their efficacy is debated and many cancers lack non-invasive screening options. Demonstrating true early detection requires longitudinal cohorts with samples collected years before diagnosis. Circulating tumor DNA (ctDNA) is a promising biomarker for non-invasive detection and monitoring, but sensitivity is limited by the low abundance of ctDNA and the vast mutational heterogeneity. DNA methylation (5-methylcytosine) offers a more stable biomarker, yet bisulfite-based methods can reduce DNA yield, impacting sensitivity. The authors introduce PanSeer, a blood-based assay targeting cancer-specific methylation signatures, and test whether multiple cancer types can be detected up to four years prior to conventional diagnosis in a large retrospective longitudinal cohort (Taizhou Longitudinal Study).
Literature Review
The paper situates PanSeer within the context of prior screening tests and ctDNA research. Existing screening tools (colonoscopy, PSA, mammography, cervical cytology) cover only a subset of cancers, with controversies about benefits and adherence. ctDNA has shown utility for non-invasive cancer detection, therapy guidance, and minimal residual disease monitoring, with some reports of pre-diagnostic detection in limited cancers. However, mutation-based approaches face challenges from low ctDNA fractions and extensive heterogeneity. Methylation markers have been proposed as more stable and potentially ubiquitous cancer signals, with prior plasma-based assays targeting select loci (e.g., SEPT9, SHOX2) and newer methods expanding targeted regions at high depth. PanSeer builds on TCGA and other public datasets, literature-curated cancer regions, and internal RRBS/WGBS data to define a broad methylation panel, aiming for multi-cancer detection regardless of tissue-of-origin.
Methodology
Study population and design: The Taizhou Longitudinal Study (TZL) enrolled 123,115 initially healthy adults (ages 25–90) from 2007–2014 in Taizhou, China, collecting baseline blood and following participants via cancer registry and health insurance linkages (mean follow-up 8.1 years through 2017). Among these, 575 individuals were later diagnosed within four years with one of five cancers (stomach, esophagus, colorectal, lung, liver). A subset of 191 pre-diagnosis (asymptomatic at blood draw) cases meeting inclusion/exclusion and QC criteria were selected (stomach 45; esophagus 41; colorectal 35; lung 47; liver 29). Additional post-diagnosis cancer patients were obtained from local hospital biobanks (reported as 223 in the main text), along with healthy controls from the TZL cohort (healthy participants not diagnosed with cancer for at least five years post blood draw). Matching considered time of collection, sex, age group, and other factors. In total, plasma and tissue materials included: 191 pre-diagnosis plasma samples, approximately 223 post-diagnosis plasma samples, 414 healthy plasma controls, and 200 tissue DNA samples (160 cancers, 40 healthy). Ethical approval and informed consent were obtained. Assay development and targeting: Using TCGA microarray/WGBS data, literature-reported cancer-related genomic regions, and internal RRBS from multiple cancer tissues, the team compiled 595 regions (11,787 CpGs initially; later focusing on 10,613 CpGs). To address DNA loss inherent to bisulfite workflows and low ctDNA abundance, the PanSeer assay employs a semi-targeted PCR library construction that uses a single ligation step and a single primer amplification, enabling higher molecular recovery and single-molecule counting from a median input of ~12 ng cfDNA per sample, targeting ~2 million reads per sample and ≥200,000 unique mapped molecules. Marker selection and annotation: From 160 cancer and 40 normal tissue samples (BioChain), differentially methylated regions (DMRs) were identified via t-tests with Benjamini–Hochberg correction, retaining 472–477 regions (associated with 657 genes and 10,613 CpGs) showing significant differences. Genes enriched included families with known cancer relevance (FOX, HK, NK1, PAX, TBK) and previously used plasma markers (SEPT9, SHOX2). GO analysis highlighted DNA binding/transcription factor activity, consistent with dysregulated epigenetic control across cancers. The panel was designed to capture a core epigenetic signature shared across tumor types. Plasma/tissue processing and sequencing: Plasma samples (typically 1 mL aliquots available from storage) were processed to extract cfDNA, which underwent bisulfite conversion (EpiTect) and targeted library preparation. Sequencing was performed on Illumina NextSeq with paired-end reads, yielding ~2 million reads/sample and filtering for ≥200,000 unique molecules; samples below this threshold were excluded. Tissue DNA was fragmented to ~150 bp to mimic cfDNA. Analytical performance (spike-in): Fragmented HT-29 cancer DNA was spiked into pooled healthy plasma to assess the limit of detection, demonstrating detectability down to 0.1% tumor fraction across targets using region-wise cutoffs derived from baseline controls. Feature engineering: For each targeted region, the average methylation fraction (AMF) was computed. Regions showing concordant differential methylation between tissue and plasma were also analyzed in secondary analyses. Classifier development: An ensemble logistic regression (LR) classifier was trained using AMF features to distinguish cancer vs healthy. Training used a subset of samples (e.g., 207 healthy, 110 post-diagnosis, 93 pre-diagnosis) with repeated random 50/50 splits into model-building and validation subsets, repeated 100 times; model scores for each sample were averaged across 1000 LR fits to reduce overfitting. LASSO regularization and internal cross-validation (scikit-learn LogisticRegression/LogisticRegressionCV) tuned penalty parameters. The LR outputs an estimated probability P=1/(1+e^{-(β^T X + B)}). Performance metrics (sensitivity, specificity, CIs) were computed; ROC/AUC analyses and subgroup analyses by years-before-diagnosis, stage, and tissue-of-origin were performed on held-out test sets. Statistics: Confidence intervals for sensitivity/specificity were computed; resampling methods assessed consistency; nonparametric tests evaluated score distributions across groups. Limit of detection thresholds were defined as mean+3 SD over baseline for each region. Data availability: Methylation matrices and supplementary data are provided in the paper/supplements and a GitHub repository (NCMONS-20-1056-1); raw sequencing data were not consented for public release.
Key Findings
- PanSeer detected five common cancers (stomach, esophagus, colorectal, lung, liver) in post-diagnosis patients with 88% sensitivity (95% CI: 80–93%) at 96% specificity (95% CI: 93–98%). - Among asymptomatic individuals later diagnosed within four years of blood draw, PanSeer detected cancer in 95% (95% CI: 89–96%) of cases in the retrospective longitudinal cohort. - Performance remained high across time intervals up to four years pre-diagnosis (with strong performance in 0–1, 1–2, 2–3, and 3–4 year strata on test sets; ROC curves and AUCs presented in figures). - Analytical sensitivity: spike-in experiments demonstrated detection down to 0.1% tumor DNA fraction. - The classifier leveraged a common methylation signature across cancers, showing detection independent of tissue-of-origin, and maintained performance across stages, including early-stage disease (test-set analyses shown in Fig. 2 and Fig. 3). - The assay successfully targeted 10,613 CpGs across 477 genomic regions at high molecular recovery from ~1 mL plasma, enabling robust performance despite limited input DNA.
Discussion
The findings demonstrate that a targeted cfDNA methylation assay can non-invasively detect multiple cancer types up to four years prior to conventional clinical diagnosis, addressing a major unmet need in cancer screening. By focusing on a core set of aberrantly methylated regions conserved across tumor types and employing a high-recovery semi-targeted bisulfite workflow, PanSeer achieved high sensitivity and specificity in both post-diagnosis and pre-diagnosis cohorts. The robustness across tumor types and stages suggests that epigenetic dysregulation is an early, shared hallmark exploitable for screening independent of tissue-of-origin. In a potential clinical pathway, PanSeer could serve as a first-line screen; positives could then receive reflex tests for localization and imaging, followed by confirmatory pathology. While encouraging, the retrospective nature of pre-diagnostic detection warrants validation in large prospective studies to confirm real-world performance, impact on outcomes, and cost-effectiveness.
Conclusion
PanSeer provides a preliminary proof-of-concept that multi-cancer early detection is feasible years before standard diagnosis using cfDNA methylation profiling and machine learning. The assay achieved high sensitivity and specificity across five common cancers and detected the majority of cases up to four years pre-diagnosis, using a compact panel and modest plasma input. Future work should include large-scale prospective trials to verify clinical utility, optimize workflows (e.g., larger blood volumes, improved plasma preservation), assess generalizability across populations and cancer spectra, and integrate reflex testing for tissue localization, with the ultimate goal of reducing cancer mortality and healthcare costs through earlier detection.
Limitations
- Retrospective analysis within a longitudinal cohort; the study enriched/matched cancer and healthy samples for model development, so the impact on patient outcomes and population-level performance remains unproven and requires prospective validation. - Sample handling constraints from the historical TZL biobank: only ~1 mL plasma available per sample and variability in preservation contributed to genomic DNA contamination and higher sample failure rates, potentially limiting sensitivity. - Cancer spectrum in the TZL cohort may not match the broader Chinese population due to local environmental, lifestyle, or genetic factors, affecting generalizability. - Limited number of pre-diagnosis cases and potential selection biases in included patient characteristics. - Input DNA quantity and quality limited; performance may improve with larger blood draws and optimized pre-analytical workflows. - The article reports differing counts in some cohorts across sections (e.g., abstract vs main text), reflecting sample availability and QC filtering; precise real-world performance estimates require standardized prospective sampling.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny