Medicine and Health
Isoform-level transcriptome-wide association uncovers genetic risk mechanisms for neuropsychiatric disorders in the human brain
A. Bhattacharya, D. D. Vo, et al.
Discover the groundbreaking isoTWAS framework, which integrates genetics and isoform-level expression to revolutionize neuropsychiatric trait association studies. This research conducted by Arjun Bhattacharya, Daniel D. Vo, Connor Jops, Minsoo Kim, Cindy Wen, Jonatan L. Hervoso, Bogdan Pasaniuc, and Michael J. Gandal highlights the significance of isoform-level resolution in understanding complex traits.
~3 min • Beginner • English
Introduction
Recently, the number of genetic associations with complex traits identified by genome-wide association studies (GWAS) has increased considerably. However, translating these associations into concrete molecular mechanisms remains challenging, as most GWAS hits lie in non-coding regions within large linkage disequilibrium blocks, complicating the prioritization of causal variant(s) and their target genes. Integrative methods such as transcriptome-wide association studies (TWAS) impute the cis-component of gene expression into association cohorts to prioritize candidate genes. Prior work has largely focused on total gene expression rather than the distinct transcript isoforms generated by alternative splicing—a regulatory mechanism present in approximately 90% of human genes and particularly complex in brain, where genes are longer, contain more exons, and show extensive splicing. Brain isoform-level changes show greater enrichment for schizophrenia heritability than gene-level or local splicing changes, suggesting isoforms capture critical biology. Yet, integrating isoform quantifications with GWAS requires methods that jointly model highly correlated isoforms of the same gene. Here, the authors introduce isoTWAS, a flexible isoform-level TWAS that jointly models genetic effects on isoforms and aggregates associations. Simulations and analyses using GTEx and PsychENCODE demonstrate that isoTWAS improves prediction for >80% of isoforms (median ~1.8–2.4-fold increase in adjusted R²), enhances total gene expression prediction by 25–70%, and increases the number of testable genes. Across 15 neuropsychiatric traits, isoTWAS increases discovery of gene-level trait associations and uncovers associations at ~60% more GWAS loci than gene-level TWAS, including isoform-specific associations for AKT3, CUL3, HSPD1, and PCLO. These findings motivate focusing on transcript isoforms to better uncover transcriptomic mechanisms underlying genetic associations with complex traits, especially in brain.
Literature Review
Methodology
isoTWAS is a three-step framework for isoform-centric complex trait mapping. Step 1 (Model training): For each gene, isoform-level expression (log-scale TPM) is jointly modeled from all SNPs within a 1 Mb cis-window using multivariate penalized regression frameworks that leverage inter-isoform correlation. Four multivariate methods are implemented: multivariate elastic net (group-lasso penalty across isoforms), multivariate LASSO with covariance estimation (MRCE), multivariate elastic net with stacked generalization (joinet), and sparse partial least squares (SPLS). As a baseline, isoforms can be modeled independently using univariate methods (elastic net, BLUP, or SuSiE), selecting the best by 5-fold cross-validated adjusted R². The method with maximal adjusted R² per gene is retained. Total gene expression prediction from predicted isoforms is learned via elastic net trained within the same CV folds to prevent leakage. Models are trained in large reference QTL datasets: GTEx v8 across 48 tissues (13 brain) and brain cortex panels (adult cortex N=2,115 from PsychENCODE/AMP-AD; developmental prefrontal cortex N=205). Only models with CV R² ≥ 0.01 proceed to mapping. Step 2 (Trait mapping): Using either individual-level genotypes or GWAS summary statistics, imputed isoform expression is associated with traits. With individual-level data, imputed isoform expression is a linear combination of SNP dosages and model weights; association is tested via regression. With summary statistics, isoform-trait association uses a weighted burden test with ancestry-matched LD (in-sample LD from the reference QTL panels). Step 3 (Stepwise hypothesis testing and locus control): To address multiple testing across correlated isoforms and local LD, isoform-level statistics within each gene are aggregated to a gene-level P value using the Aggregated Cauchy Association Test (ACAT). Gene-level false discovery is controlled across all genes via Benjamini–Hochberg (or Bonferroni). For genes passing the omnibus test (adjusted P<0.05), within-gene isoform-level family-wise error rate is controlled using Shaffer’s modified sequentially rejective Bonferroni. To control type I error within GWAS loci and LD-driven coincidences, a locus-level permutation test permuting SNP-to-isoform weights 10,000 times generates a null specific to the locus architecture; permutation-based P values quantify signal added by predictive models. For significant loci, isoform-level Bayesian fine-mapping (FOCUS) yields minimal credible sets and posterior inclusion probabilities for isoforms. Simulation framework: Using 1000 Genomes European LD, the authors simulate isoform expression under varied architectures (2–10 isoforms, causal isoQTL proportion 0.001–0.05, shared isoQTL proportion 0–1, non-cis shared variance 0.1–0.25) and traits under three scenarios: (1) only total gene expression affects the trait, (2) a single effect isoform, and (3) two effect isoforms with varied effect ratios (including opposite directions). FPR, power, and fine-mapping performance are benchmarked. Data processing: RNA-seq quantified with Salmon (decoy-aware GRCh38, GENCODE v38), 50 bootstraps; imported with tximeta; TMM normalization and variance-stabilizing transformation; residualization for covariates (clinical/demographic, genotype PCs, PEER/HCP), batch correction with ComBat in adult cortex; sample QC via WGCNA connectivity. Genotypes from WGS/arrays imputed to TOPMed (Eagle/Minimac4), filtered (MAF, HWE, imputation R²>0.8), harmonized (HRC-1000G checks), and restricted to HapMap3 for training. Association mapping performed for 15 brain-related GWAS using summary statistics and in-sample LD. Significant features require FDR-adjusted omnibus P<0.05 and within-locus permutation P<0.05; fine-mapping performed in regions with multiple associations.
Key Findings
- Prediction improvements: In simulations, multivariate elastic net often yielded the best isoform prediction. Aggregating predicted isoforms improved total gene expression prediction over TWAS, especially at sparse isoQTL architectures, with median absolute increases in adjusted R² of 0.6–3.5%. Empirically across 48 GTEx tissues, multivariate modeling trained 2.3–2.5× more isoform models at CV R²>0.01 than univariate approaches and improved prediction for 79–82% of isoforms with a median ~1.8–2.4-fold increase in adjusted R². IsoTWAS outperformed TWAS in predicting total gene expression with a median 25–70% CV improvement and increased the number of genes predicted at CV R²>0.01 by 50–80%. Out-of-sample, training in GTEx and testing in PsychENCODE/AMP-AD yielded a 15.2% median increase in adjusted R²; training in PsychENCODE/AMP-AD and testing in GTEx yielded 23.9%.
- Power and calibration: Simulations showed controlled false positive rate (~0.05) for the ACAT-based gene-level test. When trait effects differ across isoforms (one effect isoform or two divergent effects), isoTWAS had higher power than TWAS; when only total gene expression affects the trait or effect isoforms have similar effects, power was similar or favored TWAS. Fine-mapping sensitivity of 90% credible sets decreased and set size increased with more shared isoQTLs, reflecting horizontal pleiotropy challenges.
- Trait mapping across 15 neuropsychiatric traits: Using adult (N=2,115) and developmental (N=205) cortex models, isoTWAS identified more gene-trait associations than TWAS: adult 2,595 vs 1,589; developmental 4,062 vs 890. Across both panels, isoTWAS detected 3,436 unique genes and 5,377 unique isoform-trait associations; among 1,335 genes with multiple isoform associations, 661 showed opposing directions across isoforms. Across 1,149 independent GWAS loci, isoTWAS prioritized associations in 323 loci vs 201 with TWAS (~60% increase). For schizophrenia (SCZ) across 287 loci, isoTWAS prioritized genes in 70 (adult) and 86 (developmental) loci vs 56 and 29 with TWAS. Standardized effect sizes for significant associations were highly correlated between methods (r=0.84).
- Biological plausibility and enrichment: IsoTWAS recovered 96% (193/201) of TWAS associations and prioritized many constrained genes (pLI≥0.9): adult 724 vs 106 with TWAS (Fisher’s exact P=0.048); developmental 385 vs 200 (P=1.23×10⁻¹⁰). Test statistic inflation estimates showed no significant differences between methods; isoTWAS increased approximate effective sample size by ~10–20% (mean % increase in χ²). Compared to splicing-event-based mapping in the same developmental panel, isoTWAS prioritized ~40% more GWAS loci (167 vs 119), with 108 overlapping.
- Locus-level examples: Isoform-specific associations undetectable at gene-level were highlighted: AKT3 isoform ENST00000492957 with SCZ (strong isoQTL P<10⁻⁵⁰ but weak eQTL), CUL3 isoform ENST00000409096 with SCZ, HSPD1 isoform ENST00000678969 with SCZ, and PCLO isoform ENST00000423517 with cross-disorder (CDG). Distinct AKT3 isoforms were implicated in SCZ and brain volume, suggesting isoform-specific mechanisms.
Discussion
The findings demonstrate that modeling transcript isoforms jointly captures genetic regulation not evident at the gene level, substantially improving imputation accuracy and increasing the number of testable features. Power gains arise from: (1) more imputable genes via isoform-level modeling (>2× increase), (2) improved prediction of total gene expression (up to ~35% or more in CV terms), and (3) joint capture of expression and splicing mechanisms within a unified framework. The stepwise testing with ACAT aggregation, FDR and FWER control, followed by locus-specific permutation, yields well-calibrated discoveries despite an increased number of tests. IsoTWAS recovers almost all TWAS findings while adding many isoform-specific associations, including at constrained genes enriched for disease relevance. Empirical increases in approximate effective sample size and concordant null calibrations support that discovery gains reflect true signal rather than inflation. Fine-mapping at the isoform level remains challenging due to horizontal pleiotropy and correlated isoform regulation, but credible set sizes were comparable to or smaller than TWAS at the gene level. Collectively, isoform-centric modeling refines mechanistic hypotheses at GWAS loci—particularly in brain, where splicing complexity is high—and prioritizes specific transcript isoforms that may mediate genetic risk.
Conclusion
This work introduces isoTWAS, an isoform-centric TWAS that integrates multivariate isoform prediction, stepwise association testing, locus-level permutation, and isoform fine-mapping to uncover genetic risk mechanisms for neuropsychiatric traits. IsoTWAS substantially increases discovery over gene-level TWAS, identifies isoform-specific associations invisible at the gene level, and prioritizes biologically plausible, constrained genes across many GWAS loci. Publicly available software and models across GTEx and brain datasets facilitate broad adoption. Future directions include leveraging long-read sequencing to improve transcript annotations and isoform quantification, incorporating RNA-seq inferential replicates to model prediction uncertainty, extending to genetically regulated transcript usage with appropriate compositional modeling, and developing fine-mapping methods that explicitly account for horizontal pleiotropy across isoforms and shared SNP effects.
Limitations
- Isoform quantifications from short-read RNA-seq are maximum-likelihood estimates dependent on reference annotations and affected by sequencing depth, read length, and library preparation; incomplete annotations can bias estimates. Long-read sequencing will improve annotations and quantification, benefiting isoTWAS.
- Inferential replicates capturing technical uncertainty are not incorporated into prediction models; exploiting them to estimate SNP effect standard errors or prediction intervals could improve mapping.
- Extensions to transcript usage (compositional data) may require specialized modeling to handle compositional constraints.
- Horizontal pleiotropy of SNPs across isoforms can reduce power, inflate false positives, and decrease fine-mapping sensitivity; methods explicitly modeling pleiotropy and shared effects across isoforms are needed.
Related Publications
Explore these studies to deepen your understanding of the subject.

