Medicine and Health
Terminal modifications independent cell-free RNA sequencing enables sensitive early cancer detection and classification
J. Wang, J. Huang, et al.
Discover SLiPiR-seq, a groundbreaking method for cell-free RNA sequencing developed by Jun Wang and colleagues, which reveals early-stage tumor signals with remarkable sensitivity and robustness. This innovative approach shows great potential for cancer detection and classification using cfRNA biomarkers.
~3 min • Beginner • English
Introduction
The study addresses the need for sensitive, comprehensive profiling of plasma cell-free RNAs (cfRNAs) to enable early cancer detection and classification. While liquid biopsy assays based on cell-free DNA achieve high specificity, adding transcriptomic information could improve sensitivity, especially in early-stage disease when circulating tumor DNA may be scarce. Prior cfRNA investigations have focused largely on microRNAs, which have limited tissue or disease specificity, whereas messenger RNAs can be tissue-, subtype-, and cell-of-origin specific. However, cfRNA profiling faces two key barriers: low recovery requiring large plasma volumes and severe fragmentation producing diverse terminal modifications that confound conventional adapter ligation methods sensitive only to 5′-phosphate/3′-hydroxyl ends. The purpose of this study is to develop and validate SLiPiR-seq, a phosphate-independent cfRNA sequencing method operable with small plasma volumes, to generate a broader, accurate cfRNA landscape and assess its clinical utility for early cancer detection and multi-cancer classification.
Literature Review
Previous work in cfRNAs has concentrated on miRNAs, which, despite clinical relevance, show limited tissue/disease specificity. More recently, mRNAs in plasma exhibited tissue and subtype specificity and utility in distinguishing pre-malignant conditions and neurologic pathology, and cfRNA signatures have shown value in obstetrics (e.g., preeclampsia prediction). Nonetheless, miRNAs and mRNAs represent a small fraction of the cfRNA repertoire. Existing small RNA library methods (e.g., adapter ligation including NEBNext) generally require 5′-phosphate and 3′-OH termini and often large plasma inputs, limiting comprehensive profiling. Several protocol optimizations (e.g., splint ligation approaches, dephosphorylation/remodeling methods like PANDORA-seq and Phospho-RNA-seq) improved aspects of small RNA detection but lacked systematic case-control clinical validation for early cancer detection. This study builds on splint ligation strategies by moving reverse transcription prior to ligation, enabling 5′-phosphate independence, and couples it to comprehensive clinical assessment across multiple cfRNA classes.
Methodology
Technology and protocol: The authors developed Splint Ligation and Phosphate-independent RNA Sequencing (SLiPiR-seq), based on an optimized S-Poly(T) Plus approach. Steps include: (1) one-step 3′ polyadenylation and reverse transcription with a custom RT primer containing oligo(dT), an 8-nt sample barcode, and sequencing adapter; (2) exonuclease I treatment to deplete excess RT primer; (3) splint ligation of a double-stranded DNA adapter with a 3′ overhanging degenerate sequence to the 3′ end of cDNA (corresponding to RNA 5′ end); (4) removal of the adapter’s blocking strand using USER enzyme; (5) library PCR amplification and single-tube workflow to minimize losses; (6) Illumina NovaSeq 6000 sequencing with at least 10 million paired-end reads per sample.
Optimization and robustness: Benchmarked input plasma volumes from 12.5–400 µl to define a 100 µl lower limit for reliable results; evaluated pre-analytical variables (blood standing 3/6/9 h; freeze-thaw cycles). Found stable recovery across 3–9 h processing and stability after one freeze-thaw, with informative reads decreasing after multiple cycles.
Technology assessment: Compared SLiPiR-seq to quantitative PCR for selected cfRNAs (high concordance) and to NEBNext small RNA library prep using synthetic RNAs with/without 5′-phosphate to demonstrate 5′-end phosphorylation independence.
Sample cohorts: Clinical feasibility tests included 165 lung cancer (LC), 30 breast cancer (BRC), 37 colorectal cancer (CRC), 55 gastric cancer (GC), 15 hepatocellular carcinoma (HCC), and 133 cancer-free donors across multiple hospitals. For LC detection, discovery cohort (N=245: NOR_SZBA and LC_SZDE) was partitioned (80/20) into training/test repeatedly (100 times); independent validation cohort (N=53: NOR_SZDW and LC_SZBU) was used for unbiased validation.
Library prep details: PolyA/RT at 37°C for 30 min with RT primer (barcode+oligo(dT)); ExoI cleanup; denaturation; splint ligation with T4 DNA ligase, PEG, ATP, degenerate adapter; heat steps; USER digestion; PCR with KAPA HiFi; bead cleanup; pooling.
Bioinformatics pipeline: R1 reads trimmed (adapters/barcodes/polyA) with Cutadapt/Trimmomatic; R2 barcodes extracted then discarded due to low quality. Samples demultiplexed by custom scripts; processing parallelized with GNU Parallel. Read calling: miRNA (miRBase) and piRNA (piRNABank) via Bowtie2; lncRNA/mRNA/snRNA/snoRNA via Bowtie2 to GRCh38 and featureCounts (GENCODE v41); tsRNA via MINTbase reference; rsRNA and ysRNA via custom references built from human rRNA (28S, 18S, 5.8S, 5S) and Y RNAs (RNY1/3/4/5), using perfect-match counting. Length filters applied to reduce misclassification (e.g., <19 nt for miRNA; <23 nt for mRNA/lncRNA). RPM normalization and QC metrics computed; exclusion criteria applied (clean read ratio <20%, clean reads <2M, rsRNA ratio >30%, lncRNA+mRNA ratio >30%).
Differential expression: DESeq2 negative binomial modeling; BH-FDR control (0.1). For LC vs controls, and for one-vs-rest analyses per cancer type. Candidate selection criteria included upregulation, mean counts >10, and |log2FC|>0.8.
Feature selection and machine learning: Three strategies per RNA type—filter Top N by significance; Boruta (RF-based importance); LASSO logistic regression (glmnet). Verification with Ridge-regularized LR, RF, and linear-kernel SVM across 100 random train-test partitions. Performance reported as AUC and risk scores. Combined panels tested across RNA types (mRNA, miRNA, snRNA, snoRNA, tsRNA); top-performing combinations prioritized for specificity in screening contexts.
Additional analyses: Fragment length distributions and gene body coverage compared between SLiPiR-seq and NEBNext; cross-analysis of tissue-elevated genes (Human Protein Atlas) intersected with detected mRNAs; construction of tsRNA/rsRNA/ysRNA references and coverage profiling; t-SNE visualization and one-vs-rest classification panels for cancer typing.
Key Findings
- SLiPiR-seq sensitivity and robustness: Reliable with as little as 100 µl plasma input. High concordance with qPCR quantification (Pearson R=0.86). Pre-analytical assessment showed stable cfRNA recovery up to 9 h pre-processing and after one freeze-thaw; informative reads declined with additional freeze-thaw cycles (p<0.001).
- 5′-phosphate independence: Using synthetic RNAs with and without 5′-P, SLiPiR-seq produced libraries regardless of 5′-end phosphorylation, whereas NEBNext failed without 5′-P. Sequencing read counts by SLiPiR-seq showed no difference between 5′-P vs non-5′-P groups given equal input.
- Expanded transcriptome coverage vs NEBNext: With 20M clean reads, SLiPiR-seq detected 65,204 total RNA species vs 17,696 for NEBNext (3.68×). Detected mRNAs: 17,932 (4.37× NEBNext); lncRNAs: 12,236 (14.34× NEBNext). Higher proportions of mRNA (7.9% vs 1.9%) and lncRNA (8.4% vs 0.7%). Many detected mRNAs overlapped tissue-elevated genes from Human Protein Atlas. Fragment size distributions for mRNA/lncRNA were broader and smoother with SLiPiR-seq (e.g., mRNA Gamma fit α=10.22, β=0.24 vs NEBNext α=3.32, β=0.10). miRNA expression correlated strongly between methods (R=0.938).
- Discovery of underexplored small RNAs: SLiPiR-seq revealed abundant tsRNAs (14.4%), rsRNAs (6.0%), and ysRNAs (9.6%). Custom rsRNA and ysRNA references identified 45,397 rsRNA and 2,664 ysRNA unique sequences. SLiPiR-seq captured higher depth and 3′-end coverage (e.g., 3′-tRHs and 3′-tRFs), revealing signals undetectable by NEBNext (e.g., 3′ ends of 5.8S rRNA, RNY4, RNY5).
- Lung cancer (LC) differential expression: Transcriptome-wide LC vs control correlation was high (R=0.987), indicating reproducibility. Identified 17,622 DE cfRNAs (11,550 up, 6,072 down; BH-FDR<0.1, |log2FC|>0.8), dominated by rsRNAs (73.4%), tsRNAs (8.3%), and piRNAs (8.2%). Cumulative DE cfRNA expression was elevated in early-stage LC vs controls (p=7.10×10^-3) with no difference early vs late stage (p=0.321), suggesting early detectability.
- LC detection models (validation cohort N=53, 26 cases/27 controls): Among single RNA types, median AUCs included miRNA (LR AUC=0.905 [IQR 0.895–0.912]), snRNA (SVM AUC=0.903 [0.893–0.911]), mRNA (LR AUC=0.846 [0.823–0.860]), snoRNA (LR AUC=0.788 [0.772–0.798]), tsRNA (SVM AUC=0.741 [0.721–0.765]). rsRNA (SVM AUC=0.819 [0.786–0.843]) and ysRNA (SVM AUC=0.793 [0.747–0.829]) also performed well (RF did not consistently agree).
- Combined panels improved accuracy: Top combinations (median AUC in test and validation) were "mi+sn+sno" (AUC=0.979), "m+mi+sn+sno" (AUC=0.970), and "mi+sn+sno+ts" (AUC=0.970). Emphasizing specificity for screening, the "m+sn+sno+ts" panel achieved 100% specificity and 99.28% sensitivity in the discovery cohort, and 95.24% specificity with 76.92% sensitivity in the validation cohort.
- Pan-cancer classification: Identified type-specific panels by LASSO: BRC (21 cfRNAs), CRC (33), GC (36), HCC (33), LC (30). t-SNE showed clear separation by cancer type. A set of 65 cfRNAs was commonly upregulated across all five cancers. One-vs-rest models achieved high AUCs in held-out tests. Validation cohort LC patients showed high cancer and LC-specific risk scores while remaining low in non-corresponding panels, and cancer-free individuals had low scores in all panels (with some false positives mitigated by a proposed two-step confirmation strategy).
Discussion
The research question was whether a phosphate-independent, low-input cfRNA sequencing method could robustly profile the plasma transcriptome and enable sensitive early cancer detection and accurate cancer classification. SLiPiR-seq addresses technical barriers by decoupling profiling from 5′-end phosphorylation status and reducing input volume requirements. It outperformed an adapter ligation-based method (NEBNext) by detecting substantially more RNA species, including broader mRNA/lncRNA coverage and revealing underexplored small RNAs (tsRNA, rsRNA, ysRNA). The strong concordance with qPCR and methodologically consistent miRNA quantification support accuracy.
In case-control studies, SLiPiR-seq detected widespread cfRNA alterations in lung cancer, including early-stage patients, indicating that transcriptomic dysregulation is observable in plasma early in tumorigenesis. Machine-learning models trained on DE cfRNA signatures demonstrated robust performance for early-stage LC detection and multi-cancer classification, with combinations of RNA types outperforming single-type models. Panels prioritizing specificity (notably those including tsRNAs) are promising for screening to reduce false positives. The results suggest cfRNA signatures can complement cfDNA-based approaches, potentially increasing sensitivity in early disease.
These findings demonstrate that comprehensive cfRNA profiling is feasible and informative for oncology applications and motivate larger, prospective studies to validate clinical utility, refine feature panels, and integrate cfRNA with other liquid biopsy analytes.
Conclusion
The study introduces SLiPiR-seq, a sensitive, 5′-phosphate-independent cfRNA sequencing method operable with 100 µl plasma that substantially expands detectable cfRNA species compared to adapter ligation, accurately reflects transcript abundance, and uncovers underexplored small RNAs. Using SLiPiR-seq, the authors identified extensive DE cfRNA signatures in lung cancer detectable at early stages and developed machine-learning models that achieved high accuracy, particularly when combining multiple RNA classes. They further established cancer type-specific panels that discriminated among five cancers and a common cancer panel.
Future directions include: (1) prospective, large-scale validation across diverse populations to determine real-world screening performance; (2) improved handling of RNA modifications to reduce RT truncation and enhance tsRNA annotation; (3) expansion to additional cancer types and benign conditions; (4) integration of cfRNA with cfDNA/protein markers; and (5) standardized pre-analytic protocols for clinical adoption.
Limitations
- Biochemical constraints: Requires a 3′-OH for polyadenylation; RNA modifications can hinder reverse transcription and cause truncated reads, complicating accurate annotation (notably for tsRNAs). Circular RNAs are not captured due to design tailored to linear RNAs.
- Study design: Retrospective case-control design; accuracy may overestimate performance in general population screening.
- Cohort and confounding: Cases and controls collected at different sites with imperfect age/sex matching; limited clinical covariates (e.g., smoking) prevented confounder analyses.
- Sample size: Pan-cancer cohorts were relatively small, limiting comprehensive classification assessment.
- Pre-analytics: Informative reads decrease with multiple freeze-thaw cycles; standardized handling is crucial.
Related Publications
Explore these studies to deepen your understanding of the subject.

