Agriculture
Discovery of beneficial haplotypes for complex traits in maize landraces
M. Mayer, A. C. Hölker, et al.
Harnessing the allelic diversity of genetic resources is considered essential for overcoming the challenges of climate change and for meeting future demands on crop production1,2. For most traits of agronomic importance, modern breeding material captures only a fraction of the available diversity within crop species1. In the case of maize (Zea mays L.), today’s elite germplasm went through several bottlenecks, first by geographical dispersion from its center of origin3,4, second through the selection of only a few key ancestors sampled from a small number of landraces to establish heterotic groups5,6, and third through decades of advanced cycle breeding with high selection intensities7,8. For traits that were not targets of selection in the past, but are important today, like abiotic stress tolerance and resource-use efficiency, this might have resulted in the loss of favorable alleles during the breeding process. In addition, unfavorable alleles might have become fixed during the selection process due to drift and/or hitchhiking effects10–12.
Impressive examples exist where introgression of alleles from genetic resources has improved mono- or oligogenic traits13–15, but for broadening the genetic diversity of complex traits, such as yield or abiotic stress tolerance successful examples are scarce2. Up to date, the genomic characterization of genetic resources has been based predominantly on sampling individuals across a wide range of accessions, maximizing the level of diversity in the genetic material under study2,16–20. Such diverse samples are characterized by high variation in adaptive traits and strong population structure, leading to spurious associations and limited power for detecting associations with nonadaptive traits of agronomic importance21,22. Furthermore, alleles which are locally common, but globally rare likely remain undetected in broad, species-wide samples, whereas in a more targeted approach they might show sufficiently high frequencies for detection22.
Here, we propose a genome-based strategy (Supplementary Fig. 1) for making native diversity of maize landraces accessible for improving quantitative traits, showing limited genetic variation in elite germplasm, such as cold tolerance and early plant development23–25. Capitalizing on low levels of linkage disequilibrium (LD), we map haplotype-trait associations at high resolution in ~1000 doubled-haploid (DH) lines derived from three European flint maize landraces. The genetic material has been preselected for adaptation to target environments to avoid confounding effects of strong adaptive alleles as suggested by Mayer et al.26. We assess promising haplotypes genotypically by quantifying their frequency in a diverse panel of 65 European flint breeding lines. Phenotypically, we evaluate the direction and magnitude of haplotype effects relative to a subset of breeding lines. Many of the discovered haplotypes show stable trait associations across populations and environments. In addition, most of them do not exhibit undesired trait associations, making them ideal for introgression into elite germplasm. We show that our strategy to sample comprehensively individuals from a limited set of preselected landraces is successful in linking molecular variation to meaningful phenotypes, and in identifying alleles for quantitative traits that will enrich the genetic diversity of our crops.
Impressive examples exist where introgression of alleles from genetic resources has improved mono- or oligogenic traits13–15, but for broadening the genetic diversity of complex traits, such as yield or abiotic stress tolerance, successful examples are scarce2. Genomic characterization of genetic resources has often relied on sampling individuals across many accessions to maximize diversity2,16–20. Such broad samples exhibit high variation in adaptive traits and strong population structure, which can cause spurious associations and reduce power to detect associations with nonadaptive agronomic traits21,22. Alleles that are locally common but globally rare may remain undetected in species-wide samples, while targeted approaches can raise their frequency enough for detection22.
Plant materials: >1000 doubled-haploid (DH) lines were generated from three European maize landraces: Kemater Landmais Gelb (KE), Lalin (LL), and Petkuser Ferdinand Rot (PE), preselected for phenotypic variation in cold-related traits and population genetic analyses. A panel of 66 European flint breeding lines (released 1950–2010) was compiled; after quality control, 65 remained and included prominent founder lines.
Genotypic data: 1015 DH lines were genotyped with the 600k Affymetrix Axiom Maize Array. After stringent filtering, 941 DH lines (KE=501, LL=31, PE=409) and 501,124 SNPs mapped to B73 AGPv4 remained. Heterozygous calls were set to missing (DH expectation) and imputed per landrace with Beagle 5.0. Of the 66 breeding lines, 64 were genotyped on the same array; two (EZ5, F64) used overlapping SNPs from whole-genome sequences. After analogous filtering (one line removed), 65 breeding lines remained; heterozygosity set to missing and imputed with Beagle 5.0. Population structure was summarized by principal coordinate analysis (PCOA) based on modified Rogers' distances. LD (pairwise r² within 1 Mb) and decay distances were estimated.
Phenotypic data: 958 DH lines were phenotyped for 25 traits in replicated field trials across up to 11 environments (Germany and northern Spain; 2017–2018). Designs were 10×10 lattices with two replicates per line per environment. Checks included 15 breeding lines and the original landraces in 2017; four breeding lines in 2018. After genotype-based filtering, 899 DH lines (KE=471, LL=26, PE=402) remained for analysis. Best linear unbiased estimates (BLUEs) were computed across and within environments using mixed models (genotypes fixed; environment and design factors random). A subset of 14 breeding lines had phenotypes in six 2017 environments for comparative analyses.
Haplotype construction: For both DH and breeding lines, nonoverlapping windows of ten SNPs defined haplotypes (presence/absence coded as 0/2). The SNP density mirrored recombination rate; the median physical window size was 13.5 kb (mean 37.8 kb), corresponding to 0.006 cM (mean 0.026 cM). Haplotype inventories and frequencies were compared between DH and breeding panels. Diversity metrics (PIC, gene diversity He, πR, minimum number of recombination events nR) were calculated per SNP/window and averaged.
GWAS and region definition: Haplotypes present fewer than three times in the 899 DH lines were excluded; for r²=1 duplicates, one retained, yielding 154,104 haplotypes (mean 5.73 per window) for GWAS. GWAS used a univariate linear mixed model in GEMMA (y = Wa + xβ + Zu + e) for single- and across-environment BLUEs, including a SNP-based genomic relationship matrix (Astle-Balding). Significance was assessed by likelihood ratio tests with 15% FDR. Significant haplotypes within <1 Mb and high LD (r² ≥ 0.8) formed a trait-associated genomic region; the most significant haplotype per region was the focus haplotype.
Multi-locus multi-environment modeling: Candidate focus haplotypes entered a multi-locus, multi-environment mixed model (ASReml-R) with stepwise backward elimination (Wald test; remove if P ≥ 0.01) to estimate effects and classify effect stability. Effect stability across landraces was evaluated by including haplotype×environment×landrace interactions. The proportion of genetic variance explained was estimated by reduction in genetic variance when including haplotypes.
Multi-trait analyses: For early development haplotypes, bivariate models tested pleiotropic effects on other traits; significance required 95% CIs of effects excluding zero for both traits. The explained genetic variance per trait was estimated via reduction in the genetic variance-covariance matrix.
Comparison with breeding lines: Frequencies of favorable and unfavorable focus haplotypes in 65 breeding lines were compared to 500 random haplotypes (Mann–Whitney tests). Recombination risk for haplotype breakup was evaluated using physical/genetic haplotype length, similarity (1−Hhap), and nR within windows. Phenotypic contrasts compared DH lines carrying a focus haplotype versus breeding lines lacking it (and among alternative haplotypes within a window), with permutation tests for significance.
Resolution metrics: LD decay distances within DH libraries: LL=203 kb, PE=484 kb, KE=973 kb; combined set ~201 kb. Breeding panel average r² decay distance: 107 kb. Median associated region size was 92 kb with median three annotated genes; in silico fine-mapping at tb1 identified a ten-SNP window overlapping tb1 and its upstream regulatory region.
-
Diversity and haplotypes:
- PCOA separated landrace-derived DH lines and breeding lines along geographic origin; KE and PE separated from breeding lines on PC2.
- The DH landrace set and breeding panel comprised 356,724 and 363,290 ten-SNP haplotypes (mean 7.12 and 7.25 per window), respectively.
- Haplotype frequencies correlated between panels (Pearson r=0.74, P<2.2e-16). 26.2% of landrace haplotypes were absent in breeding lines, indicating untapped variation. Only 2.7% of those absent haplotypes occurred in all three landraces; 82.8% occurred in only one.
- The landrace panel captured 72.4% of haplotypes present in breeding lines.
-
GWAS and variance explained:
- Significant haplotype-trait associations were identified for all nine traits studied, with many associations for early vigor (EV_V4/V6) and early plant height (PH_V4/V6).
- Haplotypes explained from 2% (female flowering time, FF) to 57% (lodging, LO) of total genetic variance per trait. Median associated region size was 92 kb (median three genes).
- Example locus: teosinte branched1 (tb1) for tillering (TILL); fine-mapped to a ten-SNP window overlapping tb1 and its regulatory region.
-
Effect stability and classification:
- Many favorable haplotypes identified for early development traits; undesirable traits LO and TILL had many unfavorable haplotypes.
- Table 1 counts (selected traits): EV_V4 favorable 16 (29%), unfavorable 29 (53%), interacting 10 (18); EV_V6 favorable 14 (26%), unfavorable 26 (49%), interacting 13 (25%); PH_V4 favorable 15 (41%), unfavorable 15 (41%); PH_V6 favorable 20 (42%), unfavorable 22 (46%); LO favorable 11 (22%), unfavorable 35 (70%); TILL favorable 11 (31%), unfavorable 23 (66%).
- Across landraces KE and PE, 30% of shared PH_V6 haplotype-environment associations were significant in both with identical effect signs; among those significant in only one landrace, 90% had matching effect signs.
-
Frequencies in breeding lines and potential for improvement:
- For early development, 53 nonredundant favorable haplotypes had increased mean frequency in breeding lines versus random (0.20 vs 0.16; P<0.01), but 6 favorable haplotypes (11.3%) were absent in breeding lines, representing novel improvement opportunities.
- Of 80 unfavorable haplotypes for early development, 27.5% were common in breeding lines (frequency > upper quartile of random >0.231), suggesting targeted replacement potential.
-
Phenotypic validation examples:
- Chromosome 3 focus haplotype (PH_V6): window explained 4.8% of variance; focus haplotype frequency 4.1% in DHs; absent in breeding lines; significantly superior to six of eight alternatives within window. DH carriers averaged +6.06 cm over breeding lines (P>0.056 overall; significant at cooler sites OLI, EIN, ROG with P<0.044), indicating temperature-dependent advantage.
- Chromosome 9 focus haplotype (PH_V6): window explained 1.7% variance (low DH frequency 0.4%); absent in breeding panel; DH carriers showed +15.1 cm over breeding lines (P<0.009), with strongest effects in cooler environments.
- Unfavorable examples: tb1-window focus haplotype increased TILL by +1.51 scores vs breeding lines lacking it (P<0.0001); a chromosome 5 TILL region (6.6% variance) focus haplotype increased TILL by +1.69 scores (P<0.0004). For an EV_V4 region on chromosome 1 (5.1% variance), breeding lines carrying the unfavorable focus haplotype were 0.875 scores worse than those without (P<0.039).
-
Pleiotropy/undesired effects:
- Of 53 favorable early-development haplotypes, 20 affected at least one of PH_final, FF, MF, LO, or TILL; only three increased LO or TILL, four slightly decreased them; several advanced flowering and/or increased PH_final. Many had limited effects on other traits, supporting their suitability for introgression.
- Of 80 unfavorable early-development haplotypes, 48 affected other traits; 14 decreased TILL; 40 decreased PH_final and/or delayed flowering, mostly with moderate effects.
-
Mapping resolution and LD:
- LD decay distances within DH libraries: LL=203 kb, PE=484 kb, KE=973 kb; combined=201 kb; breeding panel=107 kb. Median associated region size 92 kb; in <5% of regions, resolution was poor (>100 genes).
The study addresses how to harness native allelic diversity from maize landraces to improve quantitative traits with limited variation in elite germplasm. By comprehensively sampling lines from three preselected landraces adapted to target environments, conducting high-density genotyping (600k SNPs), and multi-environment phenotyping, the authors mapped haplotype-trait associations with high resolution and evaluated their stability across environments and genetic backgrounds. The strategy identified numerous favorable haplotypes for early vigor and early plant height, many of which are rare or absent in elite breeding lines, indicating untapped beneficial variation. Effect signs were largely consistent across landraces and environments, suggesting limited background dependency. Haplotype-based tracking enabled direct comparison between landrace-derived and elite materials, and phenotypic contrasts demonstrated that specific favorable haplotypes can outperform elite lines, particularly under cooler conditions relevant to early development. The limited pleiotropic penalties observed for most favorable haplotypes (few increases in lodging or tillering) further support their practical utility. The approach mitigates common GWAS challenges in diverse panels (population structure, LD phase inconsistency) and provides compact associated regions (median 92 kb, few genes) that are amenable to candidate-gene identification and functional validation. Overall, the findings show that targeted, haplotype-centric discovery in landraces is a promising route to enrich elite germplasm for complex traits, with potential extension to other germplasm groups and allogamous crops.
This work establishes a genome-based, haplotype-centric strategy to discover and evaluate beneficial alleles for quantitative traits from maize landraces. By linking dense haplotype inventories to multi-environment phenotypes in large DH libraries, the study uncovers numerous favorable haplotypes for early development, many absent in elite breeding lines, and demonstrates stable effects across environments and genetic backgrounds with limited adverse pleiotropy. The approach delivers high mapping resolution, enabling candidate-gene prioritization. Future directions include: (i) proof-of-concept crosses introgressing favorable landrace haplotypes into elite germplasm; (ii) fine-mapping and functional validation of candidate haplotypes/genes; (iii) targeted allele mining and replacement of unfavorable haplotypes; (iv) leveraging gene editing to improve incompatible or disadvantageous haplotypes; and (v) integrating validated associations as fixed effects to enhance genome-based prediction in pre-breeding. The strategy is generalizable to other maize germplasm groups and allogamous crop species to translate genebank genomic information into improved plant performance.
- The proportion of genetic variance explained by haplotypes may be overestimated despite large sample size, necessitating cautious interpretation.
- As with any GWAS, some detected associations may be spurious; although sequential significance determination and multi-locus modeling were used to minimize false positives.
- Beneficial haplotypes absent from breeding germplasm tended to be of low frequency in landraces, affecting power and variance explained.
- Haplotype construction using fixed ten-SNP windows involves trade-offs: number of haplotypes per window and risk of recombination-breaking; optimal windowing and block definitions warrant further research. In a few regions, mapping resolution was suboptimal (>100 genes).
- The study focused on three preselected landraces and a European flint breeding panel; generalizability to other germplasm requires further validation.
- Phenotypic comparisons with elite lines used 14 breeding checks in six environments (2017); broader testing and direct crosses with elite material are needed for definitive proof-of-concept.
- Final validation of utility requires introgression and evaluation in elite backgrounds.
Related Publications
Explore these studies to deepen your understanding of the subject.

