Agriculture
Multiple wheat genomes reveal global variation in modern breeding
S. Walkowiak, L. Gao, et al.
Wheat is a staple crop grown worldwide and must increase production by over 50% by 2050 to meet demand. Progress in wheat genomics has been impeded by its very large, complex, and repetitive genome, and by the limited availability of multiple high-quality genome assemblies representing within-species diversity. Although chromosome-level assemblies for tetraploid and hexaploid wheat have recently become available, they do not capture the breadth of genomic variation needed for crop improvement. Prior comparative studies chiefly relied on exome capture, low-coverage sequencing, or scaffolded assemblies, which limited resolution. To address this gap, the authors generated multiple reference-quality assemblies from globally important bread wheat lines to characterize genomic diversity shaped by breeding and to provide resources to accelerate functional discovery and breeding.
The study builds on prior wheat reference assemblies and pangenome efforts, highlighting challenges in assembling large, repeat-rich polyploid genomes. Previous comparative analyses in bread wheat have largely used exome capture, low-coverage resequencing, or scaffolded assemblies, which limited insights into structural variation, introgressions, and gene content diversity. Recent advances produced chromosome-level assemblies for both tetraploid and hexaploid wheat and related species (including barley), but a lack of multiple high-quality assemblies within bread wheat hindered comprehensive within-species variation analyses. The authors position their work as extending these resources by delivering multiple reference-quality assemblies to capture global breeding diversity.
- Generated ten reference-quality pseudomolecule assemblies (RQAs) and five scaffold-level assemblies of hexaploid bread wheat lines from global breeding programs.
- De novo assembly produced contigs (contig N50 > 48 kb) combined into scaffolds (scaffold N50 > 10 Mb) spanning >14.2 Gb per RQA; >94% of scaffolds were ordered and oriented into 21 chromosome pseudomolecules using 10X Genomics linked reads and Hi-C data; assemblies curated and validated, including independent validation of scaffold placement/orientation (e.g., CDC Landmark) with Oxford Nanopore long-read sequencing.
- Assessed completeness using BUSCO, identifying >97% of expected gene content in each genome.
- Projected ~107,000 high-confidence gene models from the Chinese Spring reference onto each RQA to assess gene content and orthology; analyzed synonymous diversity and Tajima’s D across homeologues.
- Identified presence/absence variation (PAV), tandem duplications, and gene copy number variation (CNV); investigated restorer of fertility (Rf) gene families, discovering a previously undescribed mTERF clade.
- Performed de novo annotation of NLR (NB-ARC-LRR) loci across genomes; redundancy analyses estimated shared and unique NLR signatures and saturation with increasing numbers of genomes.
- Annotated transposable elements; identified 81.6% TE content (69% LTR retrotransposons, 12.5% DNA transposons). Called 1.22×10^6 full-length LTR retrotransposons (fl-LTRs); used fl-LTR polymorphisms and density, especially RLC-Angela elements, to detect introgressed chromosomal segments.
- Traced introgression sources using pedigree information and whole-genome resequencing of putative donor species/accessions (Thinopyrum ponticum, Triticum timopheevii, Aegilops ventricosa) and alignment to RQAs.
- Mapped centromeres using CENH3 ChIP-seq, determining positions and sizes, and identifying shifts/inversions relative to consensus.
- Detected large-scale structural variants via pairwise genome alignments, Hi-C directionality bias, long-read sequencing, and cytological karyotyping; characterized a Robertsonian translocation between chromosomes 5B and 7B.
- Assessed translocation frequency in a panel of 538 UK wheat lines; evaluated recombination and synteny around Ph1 locus.
- Developed haplotype visualizations and quantified haplotype blocks along chromosomes; applied haplotype-based mapping to clone Sm1 (orange wheat blossom midge resistance) using high-resolution genetic mapping anchored to CDC Landmark; refined to a 587-kb interval.
- Validated Sm1 candidate gene structure with Oxford Nanopore reads; assessed expression via cDNA; generated two EMS-induced loss-of-function mutants in a Sm1 carrier background; developed a KASP marker to discriminate resistant vs susceptible alleles.
- Used genotyping-by-sequencing to track the Ae. ventricosa 2NS introgression across panels and assess association with grain yield.
- Produced 10 reference-quality pseudomolecule assemblies and 5 scaffold-level assemblies of hexaploid wheat capturing global breeding diversity; assemblies exhibited high collinearity with Chinese Spring and >97% BUSCO completeness; >94% scaffolds placed into 21 pseudomolecules.
- Gene content: projected gene counts ranged from 118,734 to 120,967 per line; ~73.5% of genes per cultivar were in complete orthologous groups across lines; ~12% of genes showed presence/absence variation; ~26% of projected genes were in tandem duplications, indicating substantial CNV.
- Diversity across homeologues: low correlations in synonymous nucleotide diversity (π, r=0.11–0.29) and Tajima’s D (r=0.02–0.06) across subgenome homeologues, consistent with polyploidy expanding targets of selection.
- Rf gene analysis: identified a previously undescribed mTERF clade with evolutionary patterns similar to Rf-like PPR proteins, advancing resources for hybrid wheat breeding.
- NLR repertoire: ~2,500 NLR-signature loci per genome; only 31–34% of NLR signatures shared across all genomes; unique NLR signatures per cultivar ranged from 22 to 192. Saturation analysis indicated 90% of the NLR complement is captured with 8–11 genomes (depending on 95–100% identity). Total unique NLR signatures across all lines ranged from 5,905 (98% identity) to 7,780 (100% identity), underscoring extensive immune receptor diversity.
- Transposable elements: overall TE content 81.6% (69% LTR, 12.5% DNA transposons). Annotated 1.22×10^6 full-length LTRs; unique fl-LTRs (n≈147,450) were younger (median ~0.9 Myr) and enriched distally, whereas shared fl-LTRs were older (median ~1.3 Myr) and more pericentric.
- Introgressions identified and validated via TE signatures and donor alignments:
- Thinopyrum ponticum introgression spanning ~60 Mb on chromosome 3D in LongReach Lancer (carrying Lr24/Sr24).
- Triticum timopheevii material aligning across ~427 Mb of chromosome 2B in LongReach Lancer (carrying Sr36).
- Aegilops ventricosa 2NS segment (~33 Mb) on 2A present in Jagger, Mace, SY Mattis, CDC Stanley; region contained 535 high-confidence genes, >10% defense-related including NLRs; ~60 cytochrome P450 genes identified. The 2NS frequency has increased in breeding germplasm and is associated with higher grain yield.
- In total, 341 segments >20 Mb with unique/rare fl-LTR patterns were detected; 273 unique to a single genome; most numerous in spelt accession PI190962.
- Centromere dynamics: CENH3 ChIP-seq mapped single active centromeres per chromosome (sizes ~7.5–9.6 Mb). Observed pericentric inversions (e.g., on 4B, 5B) causing centromere position shifts and a ~25 Mb shift of Cen4D in Chinese Spring without a structural event, suggesting a shift to a non-homologous site.
- Large structural variation: discovered a Robertsonian translocation between 5B and 7B in ArinaLrFor, SY Mattis, and Claire, yielding recombined chromosomes of ~488 Mb (5BS/7BS) and ~993 Mb (7BL/5BL; largest wheat chromosome). Breakpoints mapped within a ~5-kb GAA microsatellite (7BL/5BL). The translocation occurred in 66% of a panel of 538 UK lines and was selectively neutral; recombined chromosomes pair and recombine freely with 5B/7B, and Ph1 region remained syntenic.
- Haplotype-based cloning of Sm1 (OWBM resistance): identified a shared 7.3-Mb resistant haplotype on 2B in CDC Landmark, Robigus, and Paragon; fine-mapped Sm1 to a 587-kb interval; candidate is an NB-ARC-LRR gene with integrated serine/threonine kinase and major sperm protein (MSP) domains (NB-ARC present in resistant lines; susceptible lines lacked NB-ARC). Two independent EMS mutations (G182R in NB-ARC; W98* truncation) conferred susceptibility, supporting causality. Developed a low-cost KASP marker that perfectly discriminates resistant vs susceptible lines.
By assembling multiple reference-quality wheat genomes representing diverse global breeding programs, the study captures within-species genomic variation previously inaccessible with single references or reduced-representation approaches. The findings reveal extensive structural variation, introgressions from wild relatives, and considerable diversity in gene content, including immune receptor (NLR) repertoires and fertility restoration candidates, demonstrating how polyploidy and CNV shape adaptation and agronomic traits. Transposable element signatures proved powerful for detecting and validating introgressions, enabling precise delineation of beneficial segments such as the Ae. ventricosa 2NS region associated with disease resistance and yield. Centromere mapping clarified dynamic centromere behavior and corrected prior misconceptions about multiple active centromeres. Discovery of a widespread 5B/7B translocation and its neutrality provides insights into chromosome evolution in breeding populations. Haplotype-aware mapping allowed cloning of Sm1, directly translating the multi-genome resource into a breeder-relevant trait, and marker development (KASP) enables immediate deployment. Collectively, these resources and insights directly address the need for comprehensive genomic tools to accelerate wheat improvement.
The authors deliver ten reference-quality chromosome-scale assemblies and five scaffolded assemblies of bread wheat that collectively reveal global genomic variation shaped by modern breeding. They characterize SNPs, PAV, CNV, introgressions, centromere shifts, and large structural variants, and demonstrate practical applications via detailed NLR catalogs, discovery of an Rf-related mTERF clade, TE-based introgression mapping, and haplotype-guided cloning of the insect resistance gene Sm1 with an associated diagnostic marker. These resources enable high-resolution manipulation of genomic segments, facilitate functional gene discovery, and support marker development using haplotype blocks. Future work includes functional validation of complex resistance gene architectures (e.g., NB-ARC-LRR-kinase-MSP in Sm1), cloning and characterization of additional breeding targets within introgressed segments (e.g., rust and blast resistance loci), and expanding multi-genome analyses to further capture global diversity for accelerated wheat breeding.
While Sm1 was mapped and a strong candidate with NB-ARC-LRR-kinase-MSP architecture was identified and supported by EMS loss-of-function mutations, additional research is needed to functionally validate the integrated domains and their roles in OWBM resistance. The study also represents a first step toward characterizing causal genes within several introgressed segments; comprehensive functional dissection of these regions remains to be completed.
Related Publications
Explore these studies to deepen your understanding of the subject.

