Agriculture
A haplotype-led approach to increase the precision of wheat breeding
J. Brinton, R. H. Ramirez-gonzalez, et al.
The study addresses how to more precisely define and exploit haplotypes—the co-inherited blocks of genetic variation that underpin complex agronomic traits—in hexaploid bread wheat to accelerate genetic gains in breeding. Traditional marker-assisted approaches often rely on a few non-causal SNPs, which can misclassify individuals due to limited information about surrounding sequences. Given wheat’s constrained genetic diversity from domestication and pure-line breeding, the authors sought a genome-scale method to rigorously delineate identical-by-state haplotype blocks, distinguish them from near-identical sequences, and demonstrate how such haplotype definitions can guide targeted introgression and trait improvement.
Existing haplotype definition methods typically use genotyping (SNP arrays or resequencing) based on linkage disequilibrium or fixed window sizes, allowing up to 1–3% diversity to account for genotyping errors. Prior work shows this may be insufficiently stringent in crops like wheat. The Green Revolution RHT genes exemplify that causal variants can reside within broader haplotypes with very high sequence identity (~99.96%), highlighting the need to discriminate near-identical versus truly identical-by-state regions. Gene-centric SNP arrays and exome-based datasets are biased toward distal, genic regions and may not capture full haplotype structure across genomes with reduced recombination (e.g., centromeric regions). Studies have also documented narrow genetic bases in modern cultivars and the importance of recombination landscapes and epistasis in shaping trait architecture.
- Materials: Genome assemblies of 15 bread wheat lines representing modern diversity: 9 chromosome-scale assemblies (ArinaLrFor, Jagger, Julius, Lancer, Landmark, Mace, Norin61, Stanley, SY-Mattis), the Chinese Spring RefSeq v1.0 assembly, and 5 scaffold-level assemblies (Cadenza, Claire, Paragon, Robigus, Weebill). For analyses excluding early materials, pre-1950 lines (Chinese Spring, Norin61) were identified.
- NUCmer whole-chromosome alignments: Pairwise chromosome alignments across all 15 cultivars (excluding scaffold-to-scaffold) using MUMmer 3.23 NUCmer with -mum anchors. Alignments filtered with delta-filter (-l 20,000; -r -q) to retain one-to-one mappings and exclude short retrotransposon matches. Percentage sequence identity was computed per alignment and summarized in 5-Mbp bins across chromosomes. Identical-by-state haplotypes were defined as bins with median sequence identity ≥99.99%, stitching adjacent bins and allowing up to two consecutive bins below threshold before splitting.
- Gene-based BLAST approach for scaffold-level and complementary calls: Projected RefSeq v1.1 gene models (consistent with expected chromosomes) were aligned pairwise with BLASTn including ±2,000 bp flanking sequences; alignments containing Ns were removed. Sliding windows of 25 consecutive genes were used; the lowest 10% of alignments by identity within each window were discarded and the mean identity of the remainder calculated. Windows with 100% mean identity were considered identical-by-state. Parameters (±2 kb flanks, 25-gene windows) were chosen via precision-recall analysis using NUCmer blocks as ground truth, balancing precision and recall.
- Integration of methods: BLAST-derived blocks overlapping NUCmer blocks were merged, removing redundant smaller BLAST blocks. Haplotype blocks were also derived at finer bin sizes (2.5 and 1 Mbp) for increased resolution.
- Coordinate conversion and visualization: RefSeq v1.1 projected genes served as anchors for converting block coordinates among assemblies. A Ruby on Rails/MySQL-backed visualization platform with D3.js was developed (http://www.crop-haplotypes.com/). Algorithms grouped adjacent projected genes (allowing gaps up to 20 genes) to convert interval boundaries between assemblies.
- Genome-wide characterisation: Block length and gene content were computed after converting to RefSeq coordinates. Chromosomal compartments (R1, R2a, C, R2b, R3) were analyzed with sampling every 500 Kbp and positions scaled by chromosome length. Haplotype sharing coverage across cultivars was quantified to identify ‘highly conserved’ regions (shared by ≥5 other cultivars).
- Case study on chromosome 6A: Identified a large ‘highly conserved’ region containing many productivity-related QTL/GWAS hits. Defined a 258 Mbp minimum haplotype block (MHB) without recombination among sequenced cultivars, encompassing TaGW2-A and 2,167 additional genes. Sequenced-cultivar haplotypes (H1–H7) were determined.
- Recombinant mapping: In Spark (H2) × Rialto (H3), 189 independent recombinants (between 75 and 496 Mbp) were identified; 38 were evaluated in multi-year field trials to assess grain-size effects relative to parental haplotypes.
- Genotyping datasets and marker development: Public and new datasets from 15K iSelect, 35K Axiom breeders’ array, and exome-capture were integrated (N=592 cultivars). Haplotype-informed KASP markers (17 assays) were designed to discriminate H1–H7 and additional haplotypes, selected to provide redundancy and even distribution across the 6A MHB.
- Landrace discovery: The Watkins landrace panel (n=806) was genotyped with the 35K array and the haplotype-informed markers to refine haplotype assignments and identify Watkins-specific haplotypes. Paragon (H2) × Watkins biparental populations were used to detect QTL for thousand grain weight (TGW).
- Statistics: Precision/recall/F1 used for parameter selection; Wilcoxon tests with Benjamini-Hochberg correction for block length and gene number comparisons; Chi-square tests for haplotype frequency changes over time; ANOVAs with Dunnett’s tests for RIL evaluations; CIM for QTL mapping.
- Precision haplotype definition: Using stringent criteria (median identity ≥99.99% in 5-Mbp bins and 25-gene windows with ±2 kb flanks), the study distinguished identical-by-state haplotypes from near-identical sequences (~99.90–99.98%), demonstrating that common genotyping thresholds (1–3% divergence) are insufficient to accurately classify wheat haplotypes.
- Genome-wide haplotype landscape: Identified 4,485 pairwise haplotype blocks genome-wide with 5-Mbp bins (7,578 with 2.5-Mbp; 17,693 with 1-Mbp). Median block length was 9.34 Mbp with a median of 196 genes per block. Blocks were significantly larger in centromeric regions (C: 221.04 Mbp) than distal regions (R1: 15.43 Mbp, R3: 24.38 Mbp; p < 2e-16), and contained more genes on average (C: 2,601; R1: 467; R3: 623; p < 2e-16).
- Shared haplotypes among cultivars: On average, 59.3 ± 4.6% of each cultivar’s genome was shared with at least one other cultivar; excluding pre-1950 lines increased this to 65.6 ± 1.8%. Many haplotypes were common across cultivars from different continents, reflecting a narrow modern genetic base.
- Highly conserved regions: Defined regions shared with ≥5 other cultivars (6.1 ± 0.9% of the genome) that likely reflect breeder selection (e.g., RHT-B1b on 4B) and/or low diversity regions. These serve as targets for targeted introgression.
- RHT case study: The Green Revolution RHT mutant and wild-type haplotypes in ~300 Kbp intervals exhibited ~99.96% identity with >99% alignment breadth, underscoring the need for high-stringency thresholds to differentiate haplotypes.
- Chromosome 6A haplotypes: Identified seven haplotypes (H1–H7) across a large conserved region including TaGW2-A. Common marker arrays (15K, 35K) could not resolve all haplotypes; exome capture provided partial resolution; haplotype-informed markers were necessary for full discrimination.
- Breeding patterns and environment: H3 is dominant in modern European germplasm and has increased over time (χ2 = 13.6; df=1; p < 0.001), with the H3 block extending to ~68% of 6A in UK Recommended List cultivars (421.8 Mbp; 4,731 genes). Alternative haplotypes dominate in Australia, USA, and CIMMYT lines, suggesting genotype-by-environment effects.
- Recombination within extended blocks: Despite reduced recombination in pericentromeric regions, 189 independent recombinants were identified across 6A (75–496 Mbp). Field-tested recombinants (n=11) showed intermediate grain size when the H3 haplotype was disrupted, consistent with additive/epistatic effects maintained in intact haplotypes.
- Landrace-driven discovery: Haplotype-informed KASP markers (17 assays) increased resolution in Watkins landraces from 21 to 40 groups (31 Watkins-specific) and reassigned 243 lines previously grouped with modern haplotypes to Watkins-specific types. Three Watkins-specific haplotypes were associated with significantly increased TGW (mean increase 8.2 ± 0.8%; p < 0.05), providing novel variation absent in modern germplasm.
- Practical outputs: An interactive haplotype visualization platform was developed (http://www.crop-haplotypes.com/) to facilitate exploration and application in breeding.
The study demonstrates that precise, genome-anchored haplotype definitions markedly improve the ability to identify, track, and selectively introgress functional genetic variation in wheat. By showing that near-identical sequences can mask critical differences (e.g., at RHT loci) and that gene-centric arrays underrepresent true haplotype structure, the work validates the need for high-stringency criteria (≥99.99% identity) and inclusion of flanking non-genic sequences for haplotype calling. The genome-wide characterization reveals extensive haplotype sharing among modern cultivars, reflecting a narrow genetic base, and identifies highly conserved regions likely shaped by historical selection. The chromosome 6A case study illustrates that extended blocks can be maintained due to additive or epistatic interactions, and that recombination within these blocks can modulate phenotypes. Environment-specific selection patterns (e.g., dominance of H3 in UK vs alternative haplotypes in other regions) highlight the importance of genotype-by-environment interactions in haplotype choice. Applying haplotype-informed markers to landraces uncovers novel, beneficial haplotypes (e.g., TGW-increasing Watkins haplotypes), providing actionable targets for breeding and demonstrating the utility of a haplotype-led approach for diversifying and improving elite germplasm.
This work establishes a rigorous, haplotype-led framework for wheat breeding that leverages chromosome-scale assemblies and high-stringency identity thresholds to define identical-by-state haplotype blocks. It shows that (i) large, gene-rich haplotype blocks are common and often conserved across modern cultivars, (ii) standard genotyping arrays and genic sequences alone are insufficient to resolve haplotypes, and (iii) haplotype-informed markers enable targeted discovery and introgression of novel beneficial variation from landraces. The authors provide practical tools, including an online visualization platform, and demonstrate application through the 6A case study, identifying landrace haplotypes associated with increased grain weight. Future directions include integrating haplotype frameworks with gene editing and targeted recombination to fine-tune allelic combinations, expanding sequencing of cultivars carrying rare haplotypes adapted to specific environments, and retrospectively mining historical trial data to optimize haplotype assemblies for predictable performance.
- Representation of diversity: Analyses are based on 15 assemblies capturing much, but not all, global diversity; rare or region-specific haplotypes may be underrepresented.
- Assembly heterogeneity: Inclusion of scaffold-level assemblies required gene-projection-based BLAST, which may miss or misplace haplotypes in low-quality or N-rich regions; Ns and assembly gaps reduce recall at larger flanking windows.
- Threshold generalization: The ≥99.99% identity threshold was optimized for wheat; applicability to other species with different mutation rates and assembly qualities may require recalibration.
- Genotyping platform bias: Public array and exome datasets are gene-centric and biased toward distal regions, limiting resolution in pericentromeric regions; haplotype resolution in external panels may be incomplete without haplotype-informed markers.
- Phenotypic validation scope: While recombinants and selected biparental populations were field-tested, broader multi-environment validation of specific haplotypes (e.g., beyond 6A) is needed to generalize environment-specific effects and epistasis.
- Complex trait architecture: Extended haplotype effects likely involve multiple linked loci and epistasis; pinpointing causal variants within large blocks remains challenging and may require further functional genomics.
Related Publications
Explore these studies to deepen your understanding of the subject.

