
Agriculture
Multiple wheat genomes reveal global variation in modern breeding
S. Walkowiak, L. Gao, et al.
Discover how advances in genomics are paving the way for improved wheat cultivars. This innovative research by a team of experts reveals insights into the genomic diversity of hexaploid wheat and highlights key genes associated with disease and insect resistance. Join us in exploring the future of agriculture and crop resilience!
Playback language: English
Introduction
Wheat (*Triticum* spp.) is a staple food globally, and increasing its production is crucial to meet future food demands. The human population's continued growth necessitates a more than 50% increase in wheat production by 2050. Achieving this requires leveraging comprehensive genomic resources from global breeding programs to identify within-species allelic diversity and optimize allele combinations for superior cultivars. Two wheat species dominate global production: durum wheat (*Triticum turgidum* ssp. *durum*), used for pasta and couscous, and bread wheat (*Triticum aestivum*), utilized for bread and noodles. These are allotetraploid (AABB) and allohexaploid (AABBDD), respectively, with A, B, and D subgenomes derived from three ancestral diploid species that diverged 2.5 to 6 million years ago. The large genome size (16 Gb for bread wheat), high sequence similarity between subgenomes, and abundance of repetitive elements (approximately 85%) have historically hindered wheat genome assembly efforts. While chromosome-level assemblies are now available for both tetraploid and hexaploid wheat, they don't fully capture within-species genomic variation crucial for crop improvement. Comparative genomic data from multiple individuals are needed to accelerate bread wheat research and breeding. Previous comparative genomics studies were limited by methods such as exome-capture sequencing, low-coverage sequencing, and whole-genome scaffolded assemblies. This research aims to address this gap by generating multiple reference-quality genome assemblies and analyzing genome variation among bread wheat lines, driven by past breeder selection, showcasing the significant differences among these lines and paving the way for advancements in wheat breeding.
Literature Review
Previous research on wheat genomics has been hampered by the complexity of its large and highly repetitive genome. Early efforts produced draft sequences, but chromosome-level assemblies have only recently become available for both tetraploid and hexaploid wheat (references 1, 11, 12, 18, 20, 49). These resources provide valuable insights, but lack the breadth of genomic variation across diverse wheat lines necessary for effective breeding. Studies have explored wheat exomes (reference 14) and used low-coverage sequencing (reference 13) and scaffolded assemblies (references 15-17) to analyze limited variation. However, the comprehensive characterization of genomic variation across multiple wheat lines remained a significant challenge. Studies like those presented in references 4 and 5 investigated the genetic diversity and ancestry of modern wheat, highlighting the role of wild-relative introgression. Reference 19 details efforts to profile the genetic diversity of Australian wheat and reference 20 characterizes haplotype variation. Despite these advancements, a robust comparative genomic resource based on high-quality reference genomes representing the diversity of global wheat breeding programs was lacking, motivating the current study.
Methodology
This study generated ten reference-quality pseudomolecule assemblies (RQAs) and five scaffold-level assemblies of hexaploid wheat. For each RQA, de novo assembly of contigs was performed, followed by scaffold creation using 10X Genomics linked reads and Hi-C sequencing. The completeness of the assemblies was assessed using BUSCO analysis. Scaffolds were ordered and oriented using Hi-C data. Oxford Nanopore long-read sequencing independently validated the scaffold placement and orientation for CDC Landmark. Five additional bread wheat lines were assembled at the scaffold level. The generated assemblies, along with existing datasets, were combined for global context analysis. Genetic relationships were determined using various analyses, including PCA, phylogenetic tree construction based on SNP data, and Jaccard similarity based on Presence/Absence Variation (PAV). Gene content was evaluated by projecting high-confidence gene models onto the RQAs and analyzing SNPs, Indels, PAVs, and copy number variations (CNVs). Detailed analyses were carried out on specific gene families, including restorer of fertility (Rf) genes and nucleotide-binding leucine-rich repeat (NLR) proteins. Transposable element content and composition were characterized, and analysis of unique and shared full-length LTR-retrotransposons was conducted to identify potential introgressions from wild relatives. Centromere positions were identified using CENH3 ChIP-seq. Large structural variants were identified through pairwise genome alignments, Hi-C data, and Oxford Nanopore long-read sequencing, further validated by cytological karyotyping. Haplotype variation was analyzed, and a high-resolution genetic mapping approach was used to identify the *Sm1* gene responsible for resistance to the orange wheat blossom midge (OWBM). Genotyping by sequencing and competitive allele-specific PCR (KASP) assays were performed to validate and expand the findings.
Key Findings
The study generated fifteen high-quality wheat genome assemblies, including ten reference-quality pseudomolecule assemblies (RQAs) and five scaffold-level assemblies. Comparative analysis of these assemblies revealed extensive genomic variation, including:
1. **Extensive Structural Rearrangements:** The researchers found extensive structural rearrangements among the different wheat lines, including inversions and translocations. A notable finding was a translocation between chromosomes 5B and 7B, observed in several lines.
2. **Introgressions from Wild Relatives:** Analysis of transposable elements, specifically full-length LTR retrotransposons, identified numerous introgressions from wild relatives such as *Triticum timopheevii* and *Thinopyrum ponticum*. These introgressions often carry important genes conferring resistance to diseases or pests.
3. **Gene Content Variation:** The study identified considerable variation in gene content among the wheat lines, including presence/absence variation (PAV), copy number variations (CNVs), and single nucleotide polymorphisms (SNPs). This variation was especially significant in genes associated with disease resistance and fertility restoration.
4. **NLR Gene Family Expansion:** The researchers observed significant expansion of the nucleotide-binding leucine-rich repeat (NLR) protein gene family, which plays a critical role in disease resistance. The analysis identified thousands of unique NLR signatures across the different wheat lines, highlighting the vast diversity in disease resistance mechanisms.
5. **Centromere Dynamics:** Analysis of CENH3 ChIP-seq data revealed variation in centromere positions, with some lines displaying shifts in centromere locations compared to the reference genome. These shifts were often associated with structural rearrangements, but also occurred without apparent structural events.
6. **Identification of *Sm1* Gene:** The study successfully identified and characterized the *Sm1* gene, responsible for resistance to the orange wheat blossom midge (OWBM). This gene encodes a novel NLR protein with integrated kinase and major sperm protein (MSP) domains, representing a novel type of insect resistance gene.
7. **Haplotype Block Analysis:** Analysis of haplotype blocks revealed the presence of large regions of linked genes inherited together. This information can be utilized to efficiently integrate desirable genes into improved cultivars.
Discussion
This study significantly advances our understanding of wheat genomic diversity and its implications for breeding. The generation of multiple high-quality reference genomes, representing the broad diversity of global wheat breeding programs, provides an invaluable resource for identifying and characterizing genes underlying important agronomic traits. The findings highlight the significant impact of structural variations, introgressions from wild relatives, and CNVs on wheat genome evolution and adaptation. The detailed analysis of specific gene families, such as NLR and Rf genes, demonstrates the power of comparative genomics to understand the molecular mechanisms underlying disease resistance and fertility restoration. The successful cloning of the *Sm1* gene, responsible for insect resistance, demonstrates the potential of using these genomic resources for identifying genes of interest for crop improvement. The identification of haplotype blocks offers new opportunities to efficiently incorporate multiple beneficial genes into breeding programs. The study’s findings have immediate implications for improving wheat varieties through marker-assisted selection and gene editing, contributing significantly to efforts to meet the future global food demand.
Conclusion
This research delivered a comprehensive resource of fifteen high-quality wheat genome assemblies, highlighting extensive structural variation, introgressions, and gene family expansion. The study's detailed analysis of specific gene families and the successful cloning and characterization of the *Sm1* gene underscore the valuable implications for accelerating wheat breeding. The study's findings provide robust tools for marker-assisted selection and gene editing technologies, opening avenues for enhanced crop improvement strategies and increased food security. Future research could focus on functional characterization of identified genes and exploring the role of other genomic variations in different wheat lines and under diverse environmental conditions.
Limitations
While the study utilized a large number of wheat lines, it might not fully capture the entire breadth of global wheat genetic diversity. The focus on specific gene families and traits might overlook other important genetic variations that contribute to overall wheat improvement. Further research is needed to fully validate the functions of genes identified through this analysis, particularly the *Sm1* gene. The study relied primarily on genomic data, and additional phenotypic data from field trials would further strengthen the interpretation of the results.
Related Publications
Explore these studies to deepen your understanding of the subject.