Biology

InPACT: a computational method for accurate characterization of intronic polyadenylation from RNA sequencing data

X. Liu, H. Chen, et al.

Discover InPACT, the groundbreaking computational method that accurately characterizes intronic polyadenylation (IPA) from RNA-seq data, revealed by our talented team of researchers including Xiaochuan Liu, Hao Chen, Zekun Li, and others. This innovative approach uncovers numerous unannotated IPA transcripts, highlights temporally coordinated IPA events during monocyte activation, and identifies context-specific IPA isoforms in fetal bone marrow single-cell RNA-seq data.

00:00

Playback language: English

Index

Introduction

Alternative polyadenylation (APA), a crucial post-transcriptional regulatory mechanism, generates multiple RNA transcripts from a single gene by selecting various polyadenylation (polyA) sites. APA is implicated in diverse biological processes, including immune responses, stem cell differentiation, and cancer progression. One type of APA is intronic polyadenylation (IPA), occurring within introns, producing non-coding transcripts or transcripts with truncated coding regions. IPA events are categorized as composite (an internal exon becomes a 3′ terminal exon) or skipped (a 3′ terminal exon is utilized that would otherwise be skipped). IPA's cell-type-specific expression is evident in the immunoglobulin M heavy chain (IGHM) locus; mature B cells produce full-length IGHM, while plasma cells produce IPA isoforms lacking the transmembrane domain, resulting in secreted IgM antibodies. Aberrant IPA events generate truncated proteins deactivating tumor suppressor genes in B-cell leukemia and are augmented in solid tumors. IPA also alters 3'UTR content, influencing RNA subcellular localization. Despite IPA's significance, its precise genome annotation and biological relevance remain incompletely understood. Although several high-throughput sequencing techniques and computational tools exist for APA analysis, they are limited in identifying novel IPA isoforms or are insufficiently precise. This study addresses this gap by introducing InPACT.

Literature Review

Existing methods for analyzing alternative polyadenylation (APA) from RNA-seq data, such as MISO, QAPA, and LABRAT, cannot identify novel intronic polyadenylation (IPA) isoforms. Methods like DaPars, Aptardi, APAtrap, and TAPAS focus solely on 3'UTR APA. While IPAFinder and APAIQ have been proposed for IPA analysis, IPAFinder relies on read coverage fluctuations (requiring high sequencing depth), and APAIQ, while leveraging DNA sequence and RNA-seq read coverage, cannot assemble IPA isoforms or differentiate between skipped and composite events. The lack of a comprehensive and accurate method for identifying and quantifying IPA from conventional RNA-seq data hinders a complete understanding of its biological role.

Methodology

InPACT (Intronic PolyAdenylation Characterization Tool) employs a two-module approach: a sequence module and a read module. The sequence module utilizes a convolutional neural network (CNN) to identify potential polyA sites within intronic regions. The CNN is trained on annotated polyA sites from RefSeq, GENCODE, PolyA DB 3, and PolyASite 2.0, demonstrating high accuracy compared to existing deep learning models (DeepPASS, APARENT, DeepPASTA). The read module then uses a sample-specific classifier trained on features derived from RNA-seq read alignments (spliced and unspliced reads surrounding 5' and 3' boundaries) to identify novel terminal exons in introns. The classifier distinguishes between terminal exons (composite and skipped), internal exons, and background regions. Two separate classifiers are trained for skipped and composite terminal exons. InPACT identifies IPA sites, reconstructs IPA isoforms, and quantifies isoform expression levels using Salmon. 3'-RACE experiments validated predicted IPA sites. Ribosome profiling data assessed translation efficiency of IPA isoforms. InPACT's performance was benchmarked against IPAFinder and APAIQ using 3'-end sequencing data (A-seq, 3P-seq, polyA-seq), long-read Iso-seq data, and simulated RNA-seq data. The analysis involved computing polyA signal enrichment, nucleotide profile analysis, conservation scores (PhyloP), and evaluation metrics (sensitivity, precision, error rate).

Key Findings

InPACT accurately and reproducibly identifies IPA sites and reconstructs IPA isoforms. 3'-RACE experiments confirmed the majority of InPACT-predicted IPA sites within 10 nt of their predicted positions. Ribosome profiling revealed that many InPACT-identified IPA isoforms are translated. InPACT outperforms IPAFinder and APAIQ in identifying and quantifying IPA across various benchmarks (3'-end sequencing data, Iso-seq data, simulated RNA-seq data). InPACT analysis of monocyte activation revealed temporally coordinated IPA events associated with immune response pathways. Differential transcript usage analysis identified 204 significantly different IPA events, many leading to truncated proteins lacking the entire coding region. IPA changes during monocyte activation are largely independent of gene expression changes. InPACT analysis of single-cell RNA-seq data from human fetal bone marrow revealed cell type-specific IPA events enriched for corresponding biological functions. Examples include cell-type specific IPA usage of *SRP68*, *HMGCL*, and *SCARB2*. The analysis of ARHGAP24 showed that increased IPA usage in LPS-activated monocytes resulted in a truncated protein lacking the RhoGAP domain, potentially impacting Rac1 activity and contributing to monocyte activation. Experimental validation using 3'-RACE confirmed several predicted IPA events.

Discussion

InPACT's superior performance in identifying and quantifying IPA from conventional RNA-seq data addresses a significant gap in existing methodologies. The application of InPACT to monocyte activation and single-cell RNA-seq data reveals dynamic and context-specific IPA regulation, highlighting its broader implications in biological processes and disease. The identification of translated IPA isoforms emphasizes the functional consequences of IPA beyond simply altering gene expression. Future studies should explore the specific functional roles of these IPA isoforms in different contexts. The large-scale application of InPACT to existing transcriptomic datasets (TCGA, GTEx, HCA) promises to greatly expand our understanding of IPA's role in human health and disease.

Conclusion

InPACT is a novel computational tool that accurately and efficiently identifies and quantifies intronic polyadenylation events from RNA-seq data. Its superior performance over existing methods, combined with its successful application to various datasets, highlights its potential to advance our understanding of IPA's biological significance and contribute to future studies in diverse biological processes and diseases. Future research directions include applying InPACT to large-scale datasets and integrating it with other omics data to gain a holistic perspective on gene regulation.

Limitations

InPACT currently doesn't account for sample-specific genomic variants, relying on the reference genome. The accuracy of InPACT is affected by the quality and coverage of RNA-seq data; low RNA integrity and insufficient coverage may limit its accuracy. The reliance on full-length sequencing for single-cell analysis restricts its application to full-length scRNA-seq data.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

A method for intelligent allocation of diagnostic testing by leveraging data from commercial wearable devices: a case study on COVID-19

M. M. H. Shandhi, P. J. Cho, et al.

Psychology

A critical evaluation of QIDS-SR-16 using data from a trial of psilocybin therapy versus escitalopram treatment for depression

B. Weiss, D. Erritzoe, et al.

Medicine and Health

HIDDEN: a machine learning method for detection of disease-relevant populations in case-control single-cell transcriptomics data

A. Goeva, M. Dolan, et al.

Earth Sciences

Factors Affecting the Robustness of Data Inversion for Stable Isotope Measurement Using the Double Spike Method: Insights from Chromium Isotope Analysis

X. Wang and T. M. Johnson

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny