logo
ResearchBunny Logo
InPACT: a computational method for accurate characterization of intronic polyadenylation from RNA sequencing data

Biology

InPACT: a computational method for accurate characterization of intronic polyadenylation from RNA sequencing data

X. Liu, H. Chen, et al.

Discover InPACT, the groundbreaking computational method that accurately characterizes intronic polyadenylation (IPA) from RNA-seq data, revealed by our talented team of researchers including Xiaochuan Liu, Hao Chen, Zekun Li, and others. This innovative approach uncovers numerous unannotated IPA transcripts, highlights temporally coordinated IPA events during monocyte activation, and identifies context-specific IPA isoforms in fetal bone marrow single-cell RNA-seq data.

00:00
00:00
~3 min • Beginner • English
Introduction
Intronic polyadenylation (IPA), a form of alternative polyadenylation (APA) occurring within introns, generates alternative last exon isoforms that can produce noncoding RNAs or truncated proteins lacking C-terminal domains. IPA is implicated in immune regulation, development, and cancer, with examples including tumor suppressor truncations (e.g., DICER, FOXN3, MGA) and altered subcellular RNA localization. Despite its importance, comprehensive IPA annotation and analysis remain limited due to the scarcity of 3'-end sequencing datasets. Conventional RNA-seq is abundant but existing tools either quantify APA without discovering novel IPA isoforms or focus on 3'UTR APA. The research objective is to develop an accurate, sample-wise computational approach to identify and quantify IPA events from standard RNA-seq, reconstruct IPA isoforms (distinguishing skipped and composite events), and assess their biological relevance across bulk and single-cell contexts.
Literature Review
Specialized 3'-end sequencing methods (A-seq, 3P-seq, 3′READS, PAS-seq, PolyA-seq) capture polyA sites but are not widely available. Tools such as MISO, QAPA, and LABRAT quantify APA but cannot discover novel IPA isoforms. DaPars, Aptardi, APAtrap, and TAPAS infer APA sites from RNA-seq but are limited to 3'UTR APA. IPAFinder identifies IPA from read coverage changepoints but requires high depth and may yield imprecise sites. APAIQ combines sequence and coverage to predict APA sites, including some IPA, but cannot assemble IPA isoforms or distinguish skipped versus composite IPA events. Deep-learning polyA predictors (DeepPASS, APARENT, DeepPASTA) model sequence features but are not tailored for IPA discovery in RNA-seq. This methodological landscape motivates a tool that integrates sequence priors with read-alignment features to sensitively and precisely identify and quantify IPA isoforms.
Methodology
InPACT consists of two modules that integrate sequence-based prediction with RNA-seq alignment features to identify IPA sites and reconstruct isoforms. - Sequence module: A convolutional neural network (CNN) scans intronic regions using 201 nt windows centered on candidate sites, one-hot encoded (4×201). The model is trained on RefSeq polyA sites versus matched negatives and achieves AUROC 0.954 on GENCODE, 0.920 on PolyA_DB 3, and 0.794 on PolyASite 2.0, with performance comparable or superior to DeepPASS, APARENT, and DeepPASTA. The trained model scans non-overlapping introns to yield candidate IPA sites. - Read module: Using candidate sites and sample-specific RNA-seq BAM files, InPACT constructs putative intronic terminal exons and trains classifiers to validate them. Two event types are handled: composite terminal exons (from upstream donor to intronic polyA site) and skipped terminal exons (novel exon ending at intronic polyA with a new acceptor). Putative skipped exons require at least five uniquely mapped spliced reads with 3' ends in the candidate region; composite exons require at least ten uniquely mapped unspliced reads crossing the closest upstream splice site. Features capture spliced/unspliced read patterns at 5' and 3' boundaries, region length, normalized expression, variability, and entropy metrics. Random forest classifiers are trained per sample on annotated terminal exons, internal exons, and background regions (80/20 split; ensemble of 10 subsamples). On HEK293 replicates, AUROC_skipped≈0.998–0.999 and AUROC_composite≈0.997–0.998. - Isoform assembly and quantification: InPACT assembles novel IPA isoforms by augmenting reference annotations, annotates putative coding sequences by locating the first in-frame stop codon, and outputs a GTF. Transcript abundances are quantified with Salmon, and IPA usage is computed as the isoform’s TPM divided by the sum of TPMs for all gene isoforms. - Validation and benchmarking: Authenticity is assessed by enrichment of canonical polyA signals (e.g., AAUAAA), conservation (PhyloP), 3'-RACE validation, and ribosome profiling (footprint density around stop codons and translational efficiency). Comparative benchmarks use matched A-seq, 3P-seq, PolyA-seq, long-read Iso-seq, and simulated RNA-seq (10×–50× coverage). Applications include analysis of LPS-activated monocytes (differential transcript usage via DRIMSeq) and full-length SMART-seq2 scRNA-seq from human fetal bone marrow with a pooling strategy to discover events and estimate single-cell IPA usage.
Key Findings
- Discovery and validation in HEK293: In two RNA-seq replicates, InPACT identified 471 (319 skipped, 152 composite) and 393 (287 skipped, 106 composite) novel IPA sites, respectively, with 218 overlapping (182 skipped, 36 composite). 3'-RACE validated all 15 tested sites, 10 within 10 nt and 5 within ~40 nt of predictions. Identified IPA sites showed strong enrichment of canonical polyA signals (AAUAAA most frequent, followed by AUUAAA), expected nucleotide profiles, shorter terminal exon lengths than annotated ones, and moderate evolutionary conservation. Expression of novel IPA isoforms was reproducible between replicates (Pearson r=0.94); annotated isoforms showed r=0.98. - Translation evidence: Ribo-seq footprints peaked around stop codons for both annotated and InPACT IPA isoforms. Intronic terminal exons had higher translational efficiency than intronic background and slightly lower than annotated terminal exons. Examples include translated IPA isoforms for PIGL (skipped) and CHRNA5 (composite). - U1 telescripting perturbation: In HeLa cells, U1 AMO treatment yielded 767 novel IPA events vs 151 in control, with significantly increased IPA usage (Wilcoxon P<2.2e-16), consistent with telescripting biology. - Benchmarking against APAIQ and IPAFinder: • A-seq and 3P-seq ground truths (HEK293): >60% of InPACT sites within 50 nt of ground truth vs ~30% for APAIQ and ~20% for IPAFinder; among true positives, >80% of InPACT sites were within <10 nt of ground truth. • Long-read Iso-seq (human small airway epithelial cells): ~20% of InPACT novel IPA isoforms validated vs ~2% APAIQ and ~5% IPAFinder; InPACT had more precise genomic positions relative to Iso-seq sites. • Simulated RNA-seq (10×–50×): InPACT achieved higher sensitivity and precision than APAIQ and IPAFinder across depths, identifying ~90% of IPA sites at 50× coverage and yielding lower error in IPA usage quantification. - Monocyte activation: Across untreated and LPS-activated monocytes, 2,977 novel IPA sites were found (1,105 skipped; 1,872 composite). PCA of IPA usage separated conditions; 204 IPA events were significantly differential (DRIMSeq). Enriched GO terms included neutrophil degranulation/activation, defense response, and innate immune response. Changes in IPA had weak correlation with gene expression changes (Spearman rho≈-0.079, P=0.0049), suggesting largely independent regulation. ARHGAP24 IPA usage increased with LPS; the truncated isoform lacks the RhoGAP domain. 3'-RACE validated IPA in ARHGAP24, RALA, and PDCD6IP; SDHD did not validate, likely due to low expression. - Single-cell FBM (SMART-seq2): Identified 2,635 novel IPA events in 2,157 genes (599 skipped; 2,076 composite). Estimated per-cell IPA usage revealed 533 cell type-specific IPA events with GO enrichments matching cell functions (e.g., vesicle trafficking and IL-6 signaling in B cells). SCARB2 showed higher IPA usage in plasmacytoid dendritic cells, consistent with roles in IFN production.
Discussion
The study addresses the need for accurate IPA characterization from standard RNA-seq by introducing InPACT, which integrates sequence-informed candidate discovery with sample-specific read-alignment features to reconstruct and quantify IPA isoforms. Multiple orthogonal validations (motif enrichment, conservation, 3'-RACE, ribosome profiling) support the authenticity and translational competence of identified isoforms. Systematic benchmarking across diverse datasets demonstrates superior positional accuracy, sensitivity, precision, and quantification accuracy relative to APAIQ and IPAFinder. Biological applications reveal dynamic IPA regulation during monocyte activation, affecting immune-related pathways largely independently of overall gene expression, and highlight cell type-specific IPA programs in human fetal bone marrow at single-cell resolution. These results underscore IPA as a pervasive regulatory layer shaping protein domains and RNA localization, and establish InPACT as a robust approach for IPA discovery and analysis in bulk and single-cell transcriptomics.
Conclusion
InPACT enables precise, sample-wise identification, reconstruction, and quantification of intronic polyadenylation from conventional RNA-seq by combining a CNN-based sequence module with a read-informed classification module. It outperforms existing tools in site accuracy and quantification, reveals translated IPA isoforms, and uncovers dynamic and cell type-specific IPA regulation in immune contexts. The method outputs augmented annotations suitable for downstream analyses and is applicable to full-length scRNA-seq. Future work includes scaling to large transcriptomic resources (e.g., TCGA, GTEx, HCA) to build a comprehensive atlas of IPA across tissues, conditions, and cell types, and integrating sample-specific genomes to further improve precision.
Limitations
InPACT currently scans candidate sites on a common reference genome and may overlook sample-specific genomic variants unless user-supplied genomes are provided. Performance depends on RNA-seq data quality; low RNA integrity and insufficient read coverage can reduce accuracy for identifying and quantifying novel IPA events. For single-cell applications, InPACT requires full-length protocols (e.g., SMART-seq2); 3' tag-based data are better handled by specialized polyA site methods and cannot be used to assemble IPA isoforms.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny