Agriculture
Genomic variants affecting homoeologous gene expression dosage contribute to agronomic trait variation in allopolyploid wheat
F. He, W. Wang, et al.
The study addresses how whole-genome duplication (WGD) and the resulting regulatory complexity in allopolyploid wheat shape population-scale variation in homoeologous gene expression and agronomic traits. Polyploidy broadens regulatory interactions among redundant homoeologs and contributes to crop fitness and adaptation. Regulatory variants, both cis and trans, influence gene expression and complex traits, yet the genomic distribution and impact of these variants on homoeolog regulation in polyploid crops are not well characterized. The authors hypothesize that both demographic history (polyploidization bottlenecks, gene flow) and selection have shaped the balance of cis- and trans-acting regulation, and that variation in relative homoeolog expression dosage contributes to yield-related trait diversity. They aim to map eQTL using diverse wheat accessions, partition expression variance by genome, integrate chromatin features, and link homoeolog dosage variation to agronomic traits via GWAS-eQTL integration.
Prior work shows WGD confers evolutionary advantages and underlies many crops, including wheat. Polyploid genomes permit novel regulatory interactions and robustness, enabling accumulation of variants. Studies in polyploid cotton highlighted the importance of trans-regulatory evolution for domestication. In wheat, several domestication/adaptive loci (e.g., Ppd, VRN-1, miR172/AP2 pathways) are associated with regulatory changes that alter homoeolog expression. However, a comprehensive, population-scale understanding of cis- versus trans-eQTL contributions to homoeolog expression and phenotypes in polyploid crops has been lacking. Gene expression is often under purifying selection in diploids, but how redundancy alters selection on homoeolog expression remains unclear.
- Plant materials and RNA-seq: 204 wheat lines (198 retained after QC) representing global diversity were grown; total RNA from 2-week-old seedlings was sequenced (2×100 bp, mean 65.7M reads). An additional dataset from 90 lines' spikes at double-ridge stage was analyzed. Reads were mapped to IWGSC RefSeq v1.0 (HISAT2); TPMs quantified with Kallisto; expression normalized and PEER residuals computed to reduce confounding.
- Genotyping: Multiple sources combined: regulatory region capture resequencing, 90K SNP array, complexity-reduced sequencing (GBS), and RNA-seq-derived SNPs. Variant calling followed GATK best practices; datasets merged and imputed with Beagle using 1000 wheat exomes as reference. After filtering (MAF>0.05, missingness, heterozygosity), 2,021,936 SNPs (seedlings panel) and 227,922 SNPs (spikes panel) were retained.
- Genetic variance partitioning: Using GCTA, SNPs were grouped by A, B, D genomes to construct GRMs and partition gene expression variance into cis-genomic (same genome as gene) and trans-genomic (other genomes) components for highly variable genes.
- eQTL mapping: Matrix eQTL with linear model; first three SNP PCs as covariates. Significant associations at FDR ≤ 1e-5; LD-based merging (r² ≥ 0.2, <100 kb) to define eQTL. cis-eQTL defined within ±1 Mb of target gene; trans-eQTL on different chromosomes/genomes. False discovery assessed by phenotype permutations.
- Functional/epigenomic annotation: eQTL enrichment tested across coding variant classes (deleterious, missense, synonymous), regulatory/open chromatin annotations (MNase, DNaseI, epigenetic states), and histone marks (H3K27ac, H3K4me3, H3K36me3, H3K4me1, H3K27me3, H3K9me2).
- 3D chromatin analysis: Hi-C data (Chinese Spring) processed with Juicer Tools; contact frequencies evaluated for eQTL–gene pairs vs randomized pairs; relationships between -log10 p-values and Hi-C contact frequencies assessed.
- Comparative analyses across genomes: Counted trans-eQTL directional flows among A, B, D; evaluated syntenic co-localization and shared eQTL across homoeologs; related shared eQTL proportion to expression correlation.
- Selection inference: Assessed correlations between cis-eQTL minor allele frequency and effect size for homoeologs versus singletons; Fisher’s z-test for differences.
- Homoeolog expression configurations: Grouped homoeolog pairs by eQTL configurations (none; shared trans only; cis-only; cis of one also trans for the other) and compared expression correlations.
- Joint eQTL–GWAS integration: Performed GWAS (GCTA mlma) on 14 productivity traits in two populations (400-line greenhouse panel; ~800-line field panel from 1000 exomes). Applied SMR with HEIDI to link gene expression to traits using seedling/spike eQTL; constructed eQTL-based gene co-expression/regulatory network (GCN) around SMR candidates.
- Case studies: Investigated TaSPL14 module and TaElf3 homoeologs (Elf3-B1, Elf3-D1) including LD patterns and PAV/deletion effects.
- Biased homoeolog dosage and traits: Identified 59 negatively correlated homoeologs (SCC < -0.4). Counted per-line low-expressing alleles (TPM thresholds) and correlated with trait BLUPs. Ridge regression used to predict traits from expression of these 59 genes with 10-fold nested CV; compared to random gene sets. Validated associations in an independent 1000 exomes panel via allele counts of cis-eQTL linked to low expression.
- Homoeolog expression patterns: Homoeolog pairs generally show positively correlated expression across 198 lines, significantly higher than random gene pairs. Downregulation of one homoeolog reduces total triplet expression; compensation by other homoeologs is incomplete (mean ratio between groups below 1:1 but above 2:3).
- Variance partitioning: For top expressed genes, all SNPs explained on average 40.4% of expression variance. D genome contributed less (7.7%) than A (19.1%) and B (13.6%). Cis-genomic SNPs in A/B explained more variance (21.7%, 28.7%) than trans-genomic SNPs (5–17%), whereas in D genome cis (12%) was only slightly higher than trans from A (11%) and B (9%). 6,173 of 34,691 genes with ≥20% explained variance had <1% from cis, indicating predominant trans regulation; 47.8% of these were in the D genome, consistent with diversity loss from the polyploidization bottleneck.
- eQTL discovery: Seedlings data yielded 8,568 cis-eQTL (8,315 genes/8,837 transcripts) and 14,645 trans-eQTL (8,255 genes/8,500 transcripts); spikes data yielded 3,172 cis-eQTL (3,476 transcripts) and 9,891 trans-eQTL (7,250 transcripts). cis-eQTL density peaks were further from TSS in D genome, consistent with lower diversity and longer LD.
- Functional enrichment: Both cis and trans eQTL are enriched among deleterious coding variants, followed by missense and synonymous; enriched in regulatory/open chromatin regions and active histone marks (H3K27ac, H3K4me3, H3K36me3, H3K4me1), and depleted in repressive marks (H3K27me3) and H3K9me2 and MNase-resistant regions.
- 3D genome: Stronger eQTL associations correspond to higher Hi-C contact frequencies; trans eQTL–gene pairs show elevated contacts (log10 mean 1.24) versus random (0.92), with 15% of high-contact trans pairs in homoeologous chromosome regions.
- Genome asymmetry in regulation: A and B genomes harbor 4.0× and 3.6× more trans-eQTL targeting D-genome genes than the reverse, mirroring diversity differences. eQTL and targets tend to co-localize in syntenic homoeologous regions; 23% of homoeolog triplets share at least one eQTL, and expression correlation increases with shared eQTL proportion.
- Selection on expression: cis-eQTL minor allele frequency negatively correlates with effect size for both homoeologs (SCC = -0.23, p < 2.2e-16) and singletons (SCC = -0.20, p = 0.002), with no significant difference between them, indicating purifying selection on expression levels of both gene classes.
- Homoeolog dosage impact of eQTL configurations: Mean homoeolog expression correlation (SCC) is high with no eQTL (0.57) or shared trans only (0.59, bimodal), drops with cis-only regulation (0.17), and becomes negative when one cis-eQTL also acts as trans on the other homoeolog (-0.26), indicating cis variants disproportionately drive dosage imbalance.
- GWAS–eQTL integration: cis-eQTL are enriched near trait-associated SNPs; 70 MTAs overlapped cis-eQTL within ±1 kb versus ≤33 in randomizations. SMR identified 971 (seedlings) and 424 (spikes) candidates at p < 1e-4; 329 and 95, respectively, passed HEIDI. TaSPL14 (TraesCS5B01G512800) associates with spike compactness, grain length, and harvest weight; connected to FAR1 homologs in the eQTL network. TaElf3-B1 associates with heading date and spikelet number per spike; a 1D trans-eQTL (pos. 493,768,787 bp) strongly LD-linked (r²>0.8) to Elf3-D1 cis-eQTL implicates D-genome PAV/deletion. Elf3 homoeologs are negatively correlated (SCC = -0.18), with higher Elf3-B1 expression in lines lacking Elf3-D1 (p = 2×10^-4). Regions around negatively correlated homoeologs show ~2× enrichment of high interchromosomal LD (r² > 0.4) versus random, suggesting non-random allele combinations likely due to selection.
- Biased homoeolog expression and traits: Identified 59 negatively correlated homoeologs (SCC < -0.4). Per line, an average of 8 low-expressing alleles (range 1–31) were observed; cis-eQTL MAFs were common (mean 0.30). Only 2/59 low-expression cases linked to PAV, indicating regulatory variation predominates. The count of low-expressing alleles correlates positively with grain length (0.26), width (0.41), and weight (0.39) and negatively with heading date (-0.29), spikelet number per spike (-0.35), spike length (-0.19), and plant height (-0.18). Ridge regression using expression of these 59 genes predicts several productivity traits with correlations 0.25–0.37, outperforming random gene sets (often in the 99th percentile). Independent panel validation links higher counts of low-expressing alleles with increased grain yield and earlier heading.
The study demonstrates that genomic variants shaping homoeolog expression dosage are widespread in wheat and are enriched in functional regulatory contexts. Both demography (notably the D-genome bottleneck and gene flow into A/B genomes) and selection have influenced the balance of cis and trans regulation: the D genome shows reduced cis diversity and increased reliance on trans-acting regulation from A/B. Despite redundancy, expression levels of both homoeologs and singletons are under purifying selection, as shown by the negative MAF–effect size relationship. Cis-regulatory variants are primary drivers of dosage imbalance between homoeologs; shared trans regulation tends to maintain coordinated expression across homoeologs, offering partial robustness. Importantly, biased homoeolog dosage—particularly accumulation of low-expression alleles at specific homoeologs—predicts key agronomic traits and reflects trade-offs (grain size vs number/spikelet traits). Elevated interlocus LD and non-random allele combinations among negatively correlated homoeologs suggest selection has acted on dosage combinations, exemplified by the Elf3 homoeologs impacting heading date and spike development. Integrating eQTL with GWAS via SMR identifies plausible causal genes and networks (e.g., TaSPL14 module), offering mechanistic links from regulatory variation to complex trait outcomes.
This work provides a genome-wide map of cis- and trans-eQTL affecting homoeolog expression in hexaploid wheat and shows that cis variants disproportionately drive homoeolog dosage imbalance, which in turn contributes to variation in productivity traits. Demographic history and selection shaped the relative contributions of A, B, and D genomes to expression variance, with the D genome showing reduced cis diversity and greater trans regulation. eQTL are enriched in active regulatory chromatin and align with 3D chromatin contacts. Joint eQTL–GWAS analyses implicate candidate genes and modules (e.g., TaSPL14, TaElf3) underlying agronomic traits. The findings highlight the breeding relevance of manipulating homoeolog expression dosage—through selection or genome editing—to optimize trade-offs between grain size and number and to tailor varieties to environments. Future research should improve resolution of regulatory interactions (higher-resolution Hi-C/ATAC-seq), expand tissues and developmental stages, validate causal regulatory variants functionally, and explore environment-specific selection on homoeolog dosage.
- Hi-C data resolution for wheat was low, limiting precise mapping of regulatory contacts underlying eQTL.
- Population structure affected test statistics for a subset of genes; although permutation-based FDR estimates were low, residual structure could influence some associations.
- eQTL were mapped in two tissues (seedlings and developing spikes); regulatory effects may be tissue- and stage-specific and not fully captured.
- SMR/HEIDI distinguishes pleiotropy from linkage statistically but cannot definitively prove causality without functional validation.
- Presence/absence variation was evaluated using available pangenome resources, but undetected structural variants could still influence expression.
- The conservative cis definition (±1 Mb) and trans definition (different chromosome) may miss some proximal/distant regulatory effects within chromosomes.
Related Publications
Explore these studies to deepen your understanding of the subject.

