Introduction
Major psychiatric disorders, including schizophrenia, bipolar disorder, major depressive disorder (MDD), attention deficit hyperactivity disorder (ADHD), and autism spectrum conditions (ASC), have a significant genetic component. Genome-wide association studies (GWAS) have revealed a polygenic architecture, with many loci contributing to risk. Since most associated variants are non-coding and likely affect gene regulation, transcriptome-wide association studies (TWAS) were developed to identify gene expression signatures associated with susceptibility. TWAS leverage large genetic association studies to test if risk variants are associated with the expression of nearby genes in relevant tissues, while controlling for linkage disequilibrium. While TWAS has identified genes and biological processes involved in these disorders, it has largely ignored the expression of repetitive elements like human endogenous retroviruses (HERVs). HERVs comprise approximately 8% of the human genome, originating from ancient retroviral infections of germ cells. They are not currently retrotransposing, with most insertions occurring over 1.2 million years ago. HERVs are hypothesized to regulate nearby genes via long terminal repeats (LTRs) acting as promoters, although some contain viral genes with other functions (e.g., syncytin-1 and syncytin-2 in placental formation). While HERVs have been linked to psychiatric conditions, most prior studies used methods aggregating family-level expression data with small sample sizes, making them underpowered to study complex polygenic traits and potentially confounding results with environmental factors like smoking or treatment. This study uses a TWAS approach, termed 'retrotranscriptome-wide association study' (rTWAS), to examine the association between neurological HERV expression (estimated at precise genomic locations) and psychiatric disorders, mitigating limitations of previous case-control studies.
Literature Review
Previous research has implicated HERVs in major psychiatric conditions, but these studies suffered from methodological limitations. Many relied on aggregated family-level expression data from techniques like Western blotting, RT-qPCR, or microarrays, often with small sample sizes, limiting their statistical power to detect associations with complex polygenic traits like psychiatric disorders. The use of case-control study designs further complicated interpretation, as expression changes might be due to environmental factors associated with diagnosis rather than intrinsic disease mechanisms. Furthermore, these studies preceded comprehensive genomic annotation of HERV sequences. This study addresses these limitations by applying a novel rTWAS approach.
Methodology
This study analyzed the CommonMind Consortium (CMC) dataset, focusing on dorsolateral prefrontal cortex (DLPFC) samples. The dataset included RNA sequencing and genotype data from 792 individuals of European and African ancestry (563 Europeans, 229 Africans). RNA sequencing data was processed using Trimmomatic to trim low-quality bases and reads mapped to the human genome (hg38) with Bowtie2. HERV expression was precisely quantified using Telescope 1.0.2, assigning ambiguously mapped reads to the most probable source transcript using a Bayesian statistical model. Canonical gene expression was quantified using Kallisto 0.44.0. Expression data was normalized using the trimmed mean of M values (TMM) method. SNP weights were generated using FUSION, considering expression data from both males and females to increase power, and adjusted for several confounding variables (institution, case-control status, RNA integrity number, sex, post-mortem interval, age, population covariates, and surrogate variables). rTWAS was then conducted using FUSION, integrating these SNP weights with GWAS summary statistics from European cohorts for schizophrenia, bipolar disorder, MDD, ADHD, and ASC. Bonferroni correction was applied for multiple testing. Conditional analyses (FUSION) and fine-mapping analyses (FOCUS) were used as sensitivity analyses to identify independent HERV expression associations and pinpoint the most likely causal expression signals within linkage disequilibrium blocks. Cross-ancestry validation was attempted using GWAS data from non-European cohorts but was limited by available data. Finally, weighted gene co-expression network analysis (WGCNA) was performed to identify modules of co-expressed genes and HERVs and infer potential biological functions using gene ontology (GO) enrichment analysis.
Key Findings
The study identified 1238 HERVs with cis-regulated expression in European samples. The rTWAS analysis revealed 26 HERV expression signatures associated with psychiatric disorder susceptibility. For schizophrenia, 15 HERV expression signatures (9% of Bonferroni significant features) were found; for bipolar disorder, two (4%); for MDD, nine (31%). No HERV associations were found for ADHD or ASC. Conditional analysis identified 10 conditionally independent HERV associations, and fine-mapping analysis revealed five high-confidence risk HERVs. Two were schizophrenia-specific (ERV316A3_2q33.1g, ERV316A3_5q14.3j), one was shared between schizophrenia and bipolar disorder (MER4_20q13.13), and one was associated with MDD (ERVLE_1p31.1c). Genomic context analysis revealed that some high-confidence risk HERVs were located within or near canonical genes (e.g., ERV316A3_2q33.1g overlapping with the 3'UTR of FTCDNL1 and ERV316A3_5q14.3j in the promoter of ADGRV1), suggesting potential isoform effects. Others appeared intergenic (e.g., MER4_20q13.13 and ERVLE_1p31.1c), potentially producing novel non-coding RNAs. WGCNA analysis revealed 16 co-expression modules, with all containing HERVs, suggesting diverse biological roles. The most HERV-enriched module ('turquoise') included all four high-confidence risk HERVs and was enriched for GO terms related to signal transduction. Cross-ancestry analyses were limited by available data and statistical power, but showed nominal associations in some cases.
Discussion
This study provides novel evidence for the involvement of HERVs in the etiology of major psychiatric disorders. The use of a more precise HERV expression quantification method, the larger sample size and the focus on genetic risk associations helped to overcome limitations of previous research. The identification of high-confidence risk HERVs associated with specific disorders and their genomic contexts suggests that these elements may have diverse functional roles, including affecting the expression of nearby genes or producing novel non-coding RNAs. The co-expression network analysis further supports the involvement of HERVs in various biological processes. The results challenge the hypothesis that HERV expression in psychiatric cases is solely a byproduct of immune responses, suggesting instead a direct contribution to disease mechanisms. However, more research is needed to elucidate how specific HERVs influence cell biology, gene expression, and neuronal function in relation to psychiatric disorder risk.
Conclusion
This study demonstrates a significant association between specific HERV expression signatures and genetic risk for major psychiatric disorders, particularly schizophrenia and MDD. The findings highlight the need for future research focusing on the functional characterization of these HERVs, investigation of their roles in neuronal processes, and exploration of HERV expression in other brain regions and developmental stages. Long-read RNA sequencing could help to more fully elucidate the transcripts originating from these repetitive elements. Further work exploring the interactions between HERVs and canonical genes, as well as the role of trans-regulatory mechanisms in HERV expression, is warranted.
Limitations
The study's findings are primarily based on analyses of DLPFC samples from individuals of European ancestry, limiting generalizability to other brain regions, developmental stages, and populations. The reliance on cis-regulatory effects might have missed important trans-regulatory influences. The WGCNA analysis inferred potential biological functions based on co-expression patterns but requires experimental validation. Finally, the study did not capture non-reference HERVs.
Related Publications
Explore these studies to deepen your understanding of the subject.