Psychology
Integrating human endogenous retroviruses into transcriptome-wide association studies highlights novel risk factors for major psychiatric conditions
R. R. R. Duarte, O. Pain, et al.
Explore the intriguing link between human endogenous retroviruses (HERVs) and psychiatric disorders! This research conducted by Rodrigo R. R. Duarte and colleagues reveals 26 significant HERV expression signals in a study of 792 post-mortem brain samples, shedding light on their potential implications for conditions like schizophrenia and major depressive disorder.
~3 min • Beginner • English
Introduction
Psychiatric disorders including schizophrenia, bipolar disorder, major depressive disorder, attention deficit hyperactivity disorder, and autism spectrum conditions have substantial genetic components. Genome-wide association studies (GWAS) reveal highly polygenic architectures, with many predominantly non-coding variants presumed to impact local gene regulation. Transcriptome-wide association studies (TWAS) integrate GWAS with expression data to identify gene expression signatures associated with susceptibility, offering insights into aetiology and potential therapeutic targets. However, TWAS to date have largely overlooked repetitive elements such as human endogenous retroviruses (HERVs). HERVs comprise ~8% of the human genome, originated from ancient germline retroviral infections, and are thought to modulate local gene regulation via long terminal repeats, with some remnants of viral genes potentially exerting additional functions. Prior studies implicating HERVs in psychiatric disorders often predated comprehensive genomic annotation, grouped expression at the family level, and were underpowered, risking confounding by environmental factors. Here, the authors integrate precise, locus-specific HERV expression into a TWAS framework—termed a retrotranscriptome-wide association study (rTWAS)—to identify HERV expression signatures associated with psychiatric disorder risk while mitigating limitations of traditional case–control designs.
Literature Review
Previous GWAS/TWAS work has successfully identified canonical genes and processes associated with major psychiatric disorders, but repetitive elements like HERVs have been underexplored. Earlier HERV studies used techniques such as Western blotting, RT-qPCR, or microarrays, typically aggregating signals across HERV families and analysing small cohorts, limiting power and specificity. These designs risk capturing secondary expression changes due to environmental exposures (e.g., medication, smoking) rather than primary genetic risk mechanisms. Comprehensive annotation now identifies 14,968 HERV transcriptional units across 60 families, enabling locus-specific expression quantification. This study builds upon advances in RNA-seq quantification (e.g., Telescope) and TWAS to investigate cis-regulated HERV expression linked to psychiatric risk across well-powered GWAS datasets.
Methodology
Design and cohorts: RNA sequencing and genotype data from the CommonMind Consortium (CMC) dorsolateral prefrontal cortex (DLPFC) were analysed. After ancestry inference using 1000 Genomes references, European (N=563) and African (N=229) subsets were selected to construct TWAS weights. Samples included unaffected individuals and cases (schizophrenia, bipolar disorder, affective disorders) to increase power for detecting cis-regulatory effects; sensitivity analyses assessed potential biases.
Genotyping: Genotype data (CMC1 hg19; CMC3 hg38) were harmonised (PLINK, bcftools), lifted to hg19 for GWAS compatibility, phased (Eagle v2.4), and imputed on the Michigan Imputation Server using 1000 Genomes Phase 3 v5. Variants retained were non-ambiguous autosomal SNPs with MAF>0.05, Hardy–Weinberg P<5×10^-6, and missingness <0.05. Samples with excess heterozygosity, relatedness (pihat>0.2), high missingness, or sex mismatches were removed.
RNA-seq processing: For CMC1, BAMs were converted to FASTQ (samtools); for CMC3, Picard SamToFastq was used. Trimming was done with Trimmomatic (quality and length thresholds). HERV expression was quantified by mapping reads to hg38 with Bowtie2 (parameters: --very-sensitive-local --k 100 -score-min L,0,1.6) and Telescope v1.0.2 using HERV annotation v2 (hg38), which assigns ambiguously mapped reads probabilistically to source loci. Canonical gene expression was quantified via kallisto pseudoalignment (hg38), imported with tximport (lengthScaledTPM), and canonical genes selected via biomaRt. Features were considered expressed with counts ≥6 and TPM ≥0.1 in ≥20% of samples. Coordinates were lifted to hg19 for GWAS compatibility. HOMER was used to classify genomic categories of HERV loci.
Normalization and covariate adjustment: Expression data (genes + HERVs) were TMM-normalised within ancestry. Using limma, expression was adjusted for institution, case–control status, RNA integrity number, sex, post-mortem interval, age bins, top 10 genetic PCs, and surrogate variables (30 for N=150–250; 60 for N>350) via sva, following GTEx-informed practices.
TWAS weight construction (rTWAS): FUSION-based SNP weights were built separately in European and African subsets within 1 Mb cis-windows using ancestry-matched 1000 Genomes LD references. Predictive models included BLUP, BSLMM, lasso, elastic net, and top-SNP. Likelihood ratio tests identified features with significant cis-heritable expression (nominal P<0.01). rTWAS integrated these SNP weights with ancestry-matched GWAS summary statistics for schizophrenia, bipolar disorder, major depressive disorder, ADHD, and autism spectrum conditions. Bonferroni correction accounted for the number of tested features per trait.
Secondary analyses: Conditional/joint analyses (FUSION) evaluated independence of expression signals within loci and their ability to explain GWAS associations relative to neighbouring features. Fine-mapping (FOCUS) computed posterior inclusion probabilities (PIP) for putatively causal expression signals (PIP>0.5 prioritized). Cross-ancestry analyses used European-derived weights with non-European GWAS (African American, Latino, Asian for schizophrenia; East Asian for MDD), and African American weights with African American schizophrenia GWAS.
Co-expression and functional inference: Weighted Gene Co-expression Network Analysis (WGCNA) on TMM-normalised expression defined modules of co-expressed genes and HERVs (signed network; β=12; modules assigned colours; grey for unassigned). Module function was inferred via Gene Ontology (anRichment), Bonferroni-corrected. Genomic context of high-confidence HERVs was examined using Integrated Genomics Viewer and UCSC Browser (including predicted enhancers). Pfam assessed potential protein motifs.
Key Findings
Expression and cis-regulation:
- Europeans: 4594 HERVs expressed; 4289 autosomal (93%); 1238 (27%) with cis-heritable expression. Canonical genes: 15,017 expressed; 14,459 autosomal (96%); 6956 (46%) cis-heritable.
- Africans: 4645 HERVs expressed; 4343 autosomal (93%); 852 (18%) cis-heritable. Canonical genes: 15,015 expressed; 14,546 autosomal (97%); 5464 (36%) cis-heritable.
- Of 4645 HERVs expressed in Africans, 4463 (96%) also expressed in Europeans; of 852 cis-heritable HERVs in Africans, 534 (63%) were cis-heritable in Europeans, suggesting ancestry-specific regulation.
rTWAS associations (Europeans):
- Schizophrenia: 163 Bonferroni-significant expression signatures; 15 (9%) HERVs (9 positive, 6 negative). Top HERVs: ERV316A3_6p22.1b (MHC locus; Z=-8.75, P=2.05×10^-15) and ERV316A3_2q33.1g (Z=-7.13, P=1.03×10^-12). Canonical gene signals replicated known TWAS hits (e.g., NAGA, SNAP91, TAOK2, SF3B1, MAPK3, FURIN).
- Bipolar disorder: 47 significant signatures; 2 (4%) HERVs: MER4_20q13.13 (Z=5.04, P=4.73×10^-7); PRIMA41_9q34.3 (Z=4.61, P=4.07×10^-6). MER4_20q13.13 was also significant in schizophrenia (same direction; Z=9.95, P=8.15×10^-21).
- Major depressive disorder: 29 significant signatures; 9 (31%) HERVs (five on 1p31, two on 9p23, one on 3p21, one on 14q24).
- ADHD: 7 significant signatures; none HERVs. Autism spectrum conditions: 1 significant signature; none HERVs.
Conditional analyses (independent signals):
- Schizophrenia: 91 conditionally independent signals; 6 HERVs, including MER4_20q13.13 (TWAS P=9.90×10^-21; joint P=1.00×10^-20), ERV316A3_2q33.1g (TWAS P=1.00×10^-12; joint P=2.90×10^-11), ERV316A3_5q14.3j (TWAS and joint P=5.50×10^-10).
- Bipolar disorder: 30 independent signals; 2 HERVs, notably MER4_20q13.13 (TWAS and joint P=4.70×10^-7).
- Major depressive disorder: 12 independent signals; 2 HERVs, including ERVLE_1p31.1c (TWAS and joint P=2.90×10^-18).
- ADHD and autism: independent signals pertained to canonical genes only.
Fine-mapping (FOCUS):
- Schizophrenia: 11 HERV signals had PIP>0.5; three overlapped conditional independence (high-confidence): ERV316A3_2q33.1g (PIP=1.00), ERV316A3_5q14.3j (PIP=0.98), MER4_20q13.13 (PIP=1.00).
- Bipolar disorder: 2 HERVs with PIP>0.5; one high-confidence (MER4_20q13.13, PIP=0.99).
- Major depressive disorder: 4 HERVs with PIP>0.5; one high-confidence (ERVLE_1p31.1c, PIP=0.68).
- ADHD: two HERVs on 3p24 with PIP>0.5 (HARLEQUIN_3p24.3 PIP=0.79; HML3_3p24.3 PIP=0.97), but not conditionally significant. Autism: none with PIP>0.5.
Sensitivity analyses:
- Constructing weights from unaffected-only Europeans (N=242) vs full cohort (N=563) showed high concordance (schizophrenia rTWAS Z-scores r=0.95, P<2.2×10^-16) but fewer significant associations (137 vs higher in full cohort; ~16% reduction). Adding cases increased detection of cis-heritable HERVs by 85%. Ten HERVs were significant in the control-only analysis, nine overlapping with full-cohort Bonferroni-significant HERVs; one was nominal (HERVL18_6p22.1c, P=0.002).
Cross-ancestry:
- Using European weights with Asian schizophrenia GWAS, MER4_20q13.13 showed nominal association (Z=2.14, P=0.03) but did not survive Bonferroni correction; no other high-confidence replications. African American weights with African American schizophrenia GWAS (N=6152 cases, 3918 controls) yielded no Bonferroni-significant signals; top rTWAS features: QSOXI (Z=-3.89, P=9.92×10^-4) and HERVS71_7p14.3 (Z=-3.56, P=3.67×10^-3).
Genomic context and co-expression:
- Expressed HERVs were predominantly intergenic or intronic (~98%). ERV316A3_2q33.1g overlaps the 3′ UTR of FTCDNL1 transcripts; ERV316A3_5q14.3j lies in an ADGRV1 promoter, suggesting isoforms influenced by HERV retrotransposition. MER4_20q13.13 is antisense to PTGIS; ERVLE_1p31.1c is intergenic (nearest gene NEGR1), consistent with non-coding RNAs. Predicted distal enhancers overlap several high-confidence HERV loci (e.g., ERVLE_1p31.1c, ERV316A3_2q33.1g, MER4_20q13.13). Pfam detected no known protein motifs.
- WGCNA identified 16 modules; all contained HERVs. The ‘turquoise’ module was most HERV-enriched (3815 HERVs, 73%; 1398 genes, 27%) and included all four high-confidence HERVs, enriched for signal transduction GO terms (e.g., GPCR activity; detection of chemical stimulus). Parallel analyses (covariate-adjusted; African ancestry subset) yielded similar module compositions and GO enrichments.
Discussion
Integrating locus-specific HERV expression into TWAS revealed multiple HERV expression signatures associated with psychiatric risk, including high-confidence signals for schizophrenia (ERV316A3_2q33.1g, ERV316A3_5q14.3j), a shared signature across schizophrenia and bipolar disorder (MER4_20q13.13), and a major depressive disorder signature (ERVLE_1p31.1c). These differ from previously implicated HERV families identified by family-aggregated assays, likely reflecting the advantage of genomic precision and focus on genetic regulation rather than secondary case–control differences. The findings indicate that a subset of brain-expressed HERVs are cis-regulated and tied to disorder susceptibility, consistent with direct contributions to aetiology rather than solely environmentally induced changes.
Co-expression networks place many HERVs within modules enriched for neurobiological functions such as signal transduction, suggesting potential roles in neuronal signalling pathways. Mechanistically, HERV expression could influence biology via dsRNA formation, activation of antiviral/inflammatory cascades (e.g., TLR pathways; induction of cytokines like TNFα, IL-6), antisense RNA interactions, or via regulatory RNAs/proteins acting in trans. Genomic context analyses suggest that some high-confidence HERVs may form parts of specific canonical gene isoforms (e.g., FTCDNL1, ADGRV1), highlighting the impact of retrotransposition on transcript diversification, while others likely represent independent non-coding RNAs (e.g., ERVLE_1p31.1c, MER4_20q13.13). The lack of robust cross-ancestry replication likely reflects limited GWAS power and LD mismatches, underscoring the need for larger, diverse datasets.
Conclusion
This study demonstrates extensive HERV expression and cis-regulation in the adult human cortex and identifies high-confidence HERV expression signatures associated with risk for schizophrenia, bipolar disorder, and major depressive disorder. By extending TWAS to the retrotranscriptome (rTWAS), the work uncovers novel, locus-specific HERV contributions to psychiatric disorder aetiology and provides functional context through co-expression networks. Future research should: (1) expand rTWAS to additional brain regions, developmental stages, and tissues; (2) investigate trans-regulatory mechanisms affecting and mediated by HERVs; (3) employ long-read RNA sequencing to resolve chimeric and repetitive element-derived transcripts; (4) include whole-genome sequencing to capture non-reference HERV insertions; and (5) perform functional studies to delineate how specific HERVs influence neuronal biology and pathophysiology.
Limitations
- Tissue and context: Analyses were limited to dorsolateral prefrontal cortex; other brain regions, developmental time points, and tissues may reveal additional associations.
- Regulatory scope: rTWAS focused on cis-genetic regulation; trans-regulatory effects on HERVs and trans effects of HERV expression were not assessed.
- Functional inference: WGCNA-based module function provides associative, not causal, insight; experimental validation is needed to define HERV roles in neurobiology and electrophysiology.
- Annotation and transcript resolution: Only HERVs annotated in the reference genome were analysed; non-reference insertions were not captured. Short-read RNA-seq limits resolution of chimeric and repetitive transcripts; long-read approaches are needed.
- Possible tagging of canonical transcripts: Some HERV signals may represent uncharacterised isoforms of nearby genes; though less likely for antisense/intergenic cases (e.g., MER4_20q13.13, ERVLE_1p31.1c), experimental validation is required.
- Cross-ancestry power: Non-European GWAS datasets analysed were underpowered, limiting detection and replication of HERV associations; LD differences further complicate cross-ancestry inference.
Related Publications
Explore these studies to deepen your understanding of the subject.

