Agriculture
Origin and adaptation to high altitude of Tibetan semi-wild wheat
W. Guo, M. Xin, et al.
Bread wheat (Triticum aestivum, AABBDD) originated ~10,000 years ago via hybridization between domesticated tetraploid wheat and Aegilops tauschii. Despite being a temperate crop, it is cultivated on the Tibetan Plateau (~4268 m a.s.l.) where high UV-B, low temperature, and hypoxia prevail. Such conditions are expected to drive adaptive genomic changes, yet the molecular basis in wheat has been unclear. A unique Tibetan semi-wild hexaploid wheat (T. aestivum ssp. tibetanum) exists, phenotypically resembling local landraces but showing brittle rachis, suggesting possible de-domestication. Known contributors to rachis brittleness include loci on group 3 chromosomes (especially 3D) and the Q locus on 5A, but the population-scale genomic changes underlying Tibetan semi-wild traits were not defined. To address these gaps, the authors assembled a de novo genome of the Tibetan semi-wild accession Zang1817 and re-sequenced 245 diverse wheat accessions (plus 73 published), to investigate high-altitude adaptation and the origin of Tibetan semi-wild wheat.
Prior work established: (1) wheat domestication history and origin in the Fertile Crescent; (2) environmental extremes at high altitude can drive adaptive evolution, with enriched stress tolerance and DNA repair pathways reported in other high-altitude plants (e.g., Eutrema spp., Crucihimalaya himalaica, and maca). (3) In Tibetan germplasm, phenotypic surveys showed semi-wild accessions resemble local landraces except for brittle rachis, hinting at de-domestication, but lacked genetic evidence. (4) Rachis brittleness is linked to group 3 chromosomes (notably 3D) and to allelic variation at Q on 5A in wheat and to Btr1/2 in barley. (5) Extensive introgressions shape wheat A and B subgenomes and large-scale resequencing resources are available. Collectively, these studies frame expectations for candidate pathways (light, cold, hypoxia, DNA repair, photosynthesis) and for loci involved in spike shattering.
- De novo genome assembly: Generated >240× Illumina PCR-free paired-end and mate-pair libraries for Tibetan semi-wild wheat Zang1817; assembled with DeNovoMAGIC3, supplemented with 10X Genomics Chromium data for scaffold validation/extension. Produced 384,307 scaffolds (assembly size 14.71 Gb; scaffold N50 37.62 Mb). Anchored scaffolds to 21 pseudomolecules using IWGSC RefSeq v1.0 (Chinese Spring). Assessed completeness via BUSCO (plants set).
- Genome annotation: Identified 118,078 high-confidence protein-coding genes using an integrated pipeline (homology-based, ab initio, and RNA-seq–supported). Annotated repeats with combined TE libraries and RepeatMasker; quantified repeat content. Evaluated collinearity and structural variation vs CS using MCScanX, Quota Align, MUMmer, and dot plots. Identified PAVs via read-depth comparison between Zang1817 and CS across 5-kb windows; defined gene-level PAVs similarly.
- Resequencing and variant calling: Re-sequenced 245 accessions (109 Tibetan: 74 semi-wild, 35 landraces; 136 worldwide landraces/cultivars) at ~6.07× average coverage and combined with 73 published accessions, totaling 308 samples for joint calling. Mapped reads to IWGSC v1.0 using BWA-MEM; filtered, removed duplicates; called SNPs/INDELs with GATK HaplotypeCaller/GenotypeGVCFs; applied stringent QC; annotated variants with SnpEff. Final set: 46,431,479 filtered SNPs.
- Population genetics: Built NJ trees (PLINK distances; ape), PCA (EIGENSOFT), and ADMIXTURE (K=2–6) using pruned SNPs (MAF≥5%, missing≤10%, LD r2<0.4). Focused on D subgenome SNPs (364,856 high-confidence) with A. tauschii outgroup due to A/B introgressions.
- High-altitude adaptation scans: Classified accessions into high-altitude (HA; Tibetan Plateau) and low-altitude (LA; global low-altitude) sets. Computed FST in 100-kb windows (A/B and D treated separately). Validated thresholds via label-shuffling empirical distributions. Performed GO enrichment of genes within top 5% divergent windows. Integrated haplotype analysis with 308 resequenced lines plus 1,026 whole-exome capture accessions to define HA- vs LA-enriched haplotypes for candidate genes; mapped geographic haplotype distributions.
- Flowering time genes: Targeted TaPPD1 homeologs (2A, 2D) for divergence and haplotype analyses between HA and LA.
- Origin of Tibetan semi-wild wheat: Phylogeny, PCA, ADMIXTURE focusing on D subgenome; computed nucleotide diversity (π). Performed demographic inference using dadi with 4-fold degenerate SNPs for Tibetan semi-wild (TS) and Tibetan landraces (TL) (subcluster I), evaluating eight models; selected best-fit by log-likelihood.
- Rachis brittleness phenotyping and association: Field phenotyping (Beijing) by manual spike fracture at maturity. GWAS using MLM (TASSEL) with structure (PCs) and kinship; Bonferroni threshold P≤2.85×10−10. Defined de-domestication (DE, brittle) vs domestication (DO, non-brittle) groups for divergence metrics (FST, π, Pi-ratio) and developed CNV-index (normalized coverage difference) to detect deletions. Identified a chr3D 0.8-Mb deletion (55.5–56.3 Mb) and evaluated presence across accessions. Searched for wheat homologs of barley Btr1/2. Assessed a 161-bp TE insertion in TaQ-5A (chr5A:650,129,563) per accession using soft-clipped reads and perfect matches to the TE sequence.
- High-quality genome assembly of Tibetan semi-wild Zang1817: 14.71 Gb total; scaffold N50 37.62 Mb; 21 pseudomolecules covering 95.51% (14.05 Gb); 118,078 genes; repeats 82.74% (retrotransposons 63.91%); BUSCO completeness 99.51% overall (94.38–96.81% per subgenome). Identified 22,782,409 SNPs in syntenic blocks vs CS; 184,913 SNPs in coding regions with 95,734 nonsynonymous and 1,634 stop-gain changes.
- Structural and presence/absence variation vs CS: 345.73 Mb Zang1817-specific segments (1,875 genes; enriched in polysaccharide binding and serine-type endopeptidase inhibitor activity); 389.29 Mb segments absent in Zang1817 (2,540 genes; enriched in photosynthesis-related terms). PAVs concentrated at chromosome ends. Detected a 6D inversion (366–375 Mb). Marked expansion of α-gliadin gene clusters, e.g., 34 α-gliadins in a 2.4 Mb region of 6A vs 9 in CS.
- HA adaptation signals: 1,905 highly divergent genomic windows (top 5% FST) encompassing 3,847 genes between HA and LA. Enrichments in serine-type peptidase activity, photosynthesis (chloroplast thylakoid, chlorophyllide a oxygenase), and DNA repair.
- TaHY5-like (TraesCS2A02G142800), an HY5 homolog on 2A, shows two missense variants (c.1049 G>A, p.Arg350Gln; c.1034 G>T, p.Gly345Val) distinguishing HA (AT haplotype) and LA (GC) groups; >86% of Tibetan wheat carry AT; LA predominantly GC. Nepal high-altitude lines share HA haplotypes at TaHY5-like and downstream targets.
- Additional candidates diverged between HA and LA include TaTDP1 (TraesCS2B02G244300; tyrosyl-DNA phosphodiesterase), ATG10 ortholog (TraesCS2B02G371600), TaERF4 (TraesCS3B02G357500; cold acclimation), TaCHLH (TraesCS2A02G134000; chlorophyll synthesis), TraesCS2A01G145300 and TraesCS7B02G178700 (photosynthesis). Promoters of TaERF4, TaPCO1, TaCHLH, TaTDP1 contain HY5-binding motifs (CACGTG), supporting a TaHY5-like-centered regulatory module under selection.
- Flowering-time adaptation: TaPPD1-2A (TraesCS2A02G081900; c.1202 A>G, p.Asp401Gly) and TaPPD1-2D (TraesCS2D02G079600; c.1018 C>T, p.Arg340*) show strong divergence; HA-favored alleles largely fixed in HA (together 87.61%) but at lower frequency in LA (33.33%). The HA haplotype combination occurs only on the Tibetan Plateau (51% of HA group), consistent with delayed heading aiding life-cycle completion at HA.
- Origin of Tibetan semi-wild wheat: D-subgenome phylogeny, PCA, and ADMIXTURE cluster Tibetan semi-wild wheat with Tibetan landraces (distinct from other Chinese/non-Chinese groups). Within Clade III, many semi-wild accessions group closely with Tibetan landraces, indicating a feral/de-domesticated origin. Genetic diversity is slightly lower in semi-wild (π=5.38×10−4) vs landraces (π=5.67×10−4). Demographic modeling supports de-domestication from Tibetan landraces followed by a bottleneck without migration.
- Rachis brittleness genetics (de-domestication footprint):
- A 0.8-Mb deletion on chr3D (55.5–56.3 Mb) explains ~20.51% of brittleness variation; region harbors two BRITTLE RACHIS-LIKE genes (TraesCS3D02G103200, TraesCS3D02G103400), homologous to barley Btr1/2; D-subgenome Btr1/2 homologs lost in Zang1817, while A/B homeologs are intact.
- A 161-bp TE insertion in exon 5 of TaQ-5A (TraesCS5A02G473800) on chr5A causes a frameshift/stop and accounts for ~28.96% of brittleness variation; validated previously and detected in Zang1817.
- Some accessions with either variant are non-brittle, implying additional interacting loci influence rachis phenotype. Overall, high-altitude conditions are associated with extensive genomic reshaping, selection on a TaHY5-like-centered pathway integrating light, DNA repair, cold, hypoxia, and photosynthesis, and a de-domestication trajectory in Tibetan semi-wild wheat marked by specific structural and allelic changes.
The study clarifies how wheat adapted to the Tibetan Plateau’s combined stresses (high light/UV-B, cold, hypoxia) by revealing selection on a TaHY5-like-centered regulatory network and on photoperiod genes (TaPPD1). The convergence of haplotype patterns across HA regions (Tibet and Nepal) supports altitude-driven selection rather than local drift. Functional targets downstream of HY5 (e.g., ERF4, CHLH, TDP1, PCO1) show coordinated haplotype divergence and carry HY5-binding motifs, suggesting a co-evolved module optimizing DNA repair, chlorophyll synthesis, cold acclimation, and hypoxia responses. Population analyses indicate Tibetan semi-wild wheat arose through de-domestication from Tibetan landraces, undergoing a bottleneck. The genetic architecture of rachis brittleness in the semi-wild form involves a chr3D 0.8-Mb deletion encompassing Btr-like homologs and a TE insertion in TaQ-5A, together explaining a substantial portion of phenotypic variance, while additional loci likely modulate the trait. These findings address the core questions of high-altitude adaptation mechanisms and the origin of Tibetan semi-wild wheat, and highlight genomic targets for breeding HA-resilient wheat.
This work delivers a high-quality genome assembly for Tibetan semi-wild wheat and the largest resequencing analysis to date focused on Tibetan and worldwide wheat for altitude adaptation. It demonstrates that high-altitude environments have reshaped the wheat genome, with strong selection on a TaHY5-like-mediated pathway integrating responses to light, UV/DNA repair, cold, and hypoxia, and on photoperiod genes fine-tuning flowering time. It provides genomic evidence that Tibetan semi-wild wheat is a feral, de-domesticated form of local landraces, with rachis brittleness associated with a chr3D 0.8-Mb deletion harboring Btr-like genes and a TE insertion in TaQ-5A. These insights supply candidate haplotypes and loci for breeding wheat adapted to extreme environments. Future work should functionally validate the identified HA haplotypes and regulatory interactions, dissect additional contributors to rachis brittleness, and explore the translational use of HA-adapted alleles in elite germplasm.
- Causality of HA-adapted haplotypes (e.g., TaHY5-like and downstream targets) is inferred from population genetics and haplotype enrichment; functional validation in wheat is still needed.
- Despite significant variance explained, rachis brittleness is not fully determined by the chr3D deletion and TaQ-5A TE insertion, indicating additional loci and gene interactions remain to be identified.
- Potential confounding due to complex population structure and historical introgressions, especially in A/B subgenomes, was mitigated but cannot be fully excluded.
- Phenotyping for brittleness was conducted in a single environment/year, which may limit detection of genotype-by-environment effects.
Related Publications
Explore these studies to deepen your understanding of the subject.

