logo
ResearchBunny Logo
Four chromosome scale genomes and a pan-genome annotation to accelerate pecan tree breeding

Agriculture

Four chromosome scale genomes and a pan-genome annotation to accelerate pecan tree breeding

J. T. Lovell, N. B. Bentley, et al.

Discover how genome-enabled biotechnologies are paving the way for breakthroughs in pecan breeding! This research led by a diverse team of experts reveals significant insights into the genomic structure of pecan and its adaptive capabilities. Uncover candidate genes for pest and pathogen resistance while tackling the complexities of highly heterozygous genomes.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses how genome-enabled tools can accelerate breeding in long-lived, outcrossing tree crops like pecan, which face challenges including long generation times, high heterozygosity, presence–absence gene variation, and historical interspecific hybridization. Traditional breeding has yielded only modest gains in pecan compared to annual crops, and the genetic diversity across pecan and related species suggests that important trait loci may be absent from single reference genomes. The authors propose constructing multiple outbred, diploid genome assemblies and a synteny-constrained pan-genome to capture gene content diversity, resolve interspecific introgressions, and identify candidate genes for key traits such as pest and pathogen resistance, thereby enabling marker-assisted and genomic selection in pecan.
Literature Review
The paper situates pecan breeding within broader observations that perennial specialty crops retain substantial genetic diversity and often have complex histories including whole-genome duplications and interspecific hybridization. Prior assemblies typically relied on single inbred references or collapsed haplotypes, which are insufficient for highly heterozygous tree genomes. Juglandaceae experienced a whole-genome duplication ∼60 Mya, with prior walnut genome work documenting extensive conserved synteny. Historical records indicate use of interspecific pedigrees (e.g., with Carya cordiformis) in pecan breeding, and morphological and archaeological evidence suggests ancient admixture with other Carya species. These contexts motivate multi-genome, haplotype-resolved assemblies and pan-genome approaches for functional and comparative genomics.
Methodology
- Genotypes: Four outbred pecan genotypes selected to capture diversity: cultivated 'Pawnee', 'Lakota', 'Elliott', and a wild Mexico collection '87MX3-2.11' ('Oaxaca'). - Sequencing and assembly: 'Oaxaca', 'Elliott', and 'Lakota' assembled using PacBio RS II/Sequel long reads (55–108 Gb; 78.9x–135.3x) plus Illumina short reads (50–60x), with Hi-C scaffolding and synteny to construct 16 chromosome pseudomolecules. Alternative haplotypes assembled as contigs capturing 64.5–76.1% of main sequence. 'Pawnee' assembled with PacBio HiFi (CCS) reads (52.1x), enabling haplotype-aware assembly and a highly contiguous alternative haplotype (∼89.5% of primary; alt contig N50 2.9 Mb). Assembly validation via conserved paralogous (homeologous) synteny across Juglandaceae. - Annotation: Each genome annotated using RNA-seq-supported and homology-based pipelines (PASA, RepeatModeler/RepeatMasker, FGENESH, GenomeScan), with BUSCO completeness 94.4–97%. - Pan-genome: Constructed a synteny-constrained orthologous pan-genome across four pecan genomes with walnut as outgroup using GENESPACE and OrthoFinder, masking paralogs and condensing tandem arrays; computed Ka/Ks on single-copy syntenic orthologs. - Presence–absence variation (PAV): Compared annotations and syntenic unannotated regions to classify gene absences as unannotated but similar, diverged, or absent by sequence. - Introgression mapping: Resequenced 30 samples (55x median), including relatives and outgroups (C. cordiformis, C. aquatica, C. myristiciformis). Mapped to 'Oaxaca'; called SNPs; inferred local ancestry using Ancestry_HMM with parental allele frequencies; defined ancestry blocks ≥500 variants. - Differential expression: In 'Desirable' leaves inoculated with V. effusa isolate De-Tif-11 vs mock, RNA-seq at 24 h post inoculation (3 biological replicates). Reads aligned to 'Pawnee'; DE by DESeq2; GO enrichment by topGO. - QTL mapping: Pseudo-testcross F1 mapping population ('Lakota' × 'Oaxaca', n=143). Resequencing, SNP genotyping, phased to 'Mahan' vs 'Major' haplotypes, linkage map in R/qtl2 with LOCO kinship; permutation-based significance; 95% Bayesian credible intervals. Candidate gene prioritization within QTL by comparing primary vs alternative 'Lakota' haplotypes for PAV and protein identity, and annotating immune-related domains (LRR).
Key Findings
- Assemblies: Four diploid, outbred pecan genomes generated. 'Pawnee' HiFi assembly achieved contig N50 26.5 Mb with 100% primary sequence in chromosomes; other genomes had contig N50 3.7–4.4 Mb with 95.5–98% in chromosomes. Alternative haplotype assemblies captured 64.5–89.5% of primary sizes (alt contig N50 0.10–2.90 Mb). Annotations highly complete (BUSCO 94.4–97%). - Conserved genome structure: Exceptional homeologous synteny reflecting Juglandaceae WGD ∼60 Mya; estimated one rearrangement every 6.7 Myr in pecan vs ∼0.8 Myr in poplar and ∼0.49 Myr in maize. - Pan-genome: 42,416 orthogroups; 21,196 single-copy across all four genomes. Coding divergence among single-copy orthologs was low (mean Ks ≈ 0.0017; Ka ≈ 0.0042). However, 38.7% of orthogroups (13,010) exhibited PAV. Identified 3,889 blocks of ≥5 consecutive genes absent in one or more genomes; 8,655 absent genes had no similar sequence in syntenic regions, indicating true absences in many cases. - Introgressions: High-confidence interspecific introgressions detected. C. aquatica ancestry accounted for 6.6–20.6 Mb (1.04–3.23%) per genome, broadly distributed, suggesting ancient admixture. A >7.5 Mb C. cordiformis block on chromosome 8 traced to cultivar 'Major' was retained in several descendants ('Lakota', 'Kanza', 'Osage'), consistent with recent introgression under positive selection. The narrower 1.41 Mb interval in 'Lakota' contained 46 orthogroups, including 8 private to 'Lakota' (17.4%; Fisher’s exact test OR=4.232, P=0.0012), and multiple candidate defense-related genes (e.g., SNF1-related kinases, LRR receptors). Additional regions included C. myristiciformis introgression on chromosome 5 ('Elliott') enriched for receptor kinases and cell wall defense genes, and a C. aquatica introgression on 'Pawnee' chromosome 16 containing nine LRR receptor serine/threonine kinase genes across five orthogroups. - Pathogen response: In scab-susceptible 'Desirable', 194 genes were differentially expressed 24 h after V. effusa inoculation (|log2FC| ≥ 1.5, FDR<0.05). Enriched GO terms highlighted 'response to chitin' (upregulated) and 'response to wounding' (downregulated), implicating canonical fungal defense pathways and redox-related processes. - Phylloxera resistance QTL: A major QTL on chromosome 16 (peak at 2.021 Mb in 'Oaxaca' coordinates) with LOD 14.8 explained strong segregation of leaf gall incidence in 'Lakota' × 'Oaxaca'. Individuals inheriting the 'Mahan' haplotype at the peak were largely gall-free; highly susceptible individuals inherited the 'Major' haplotype. Within the 95% credible interval (1.48–2.62 Mb), prioritization based on 'Lakota' haplotype differences identified 22 genes present only in the alternative assembly and 12 genes with <98% peptide identity between haplotypes, including 13 LRR-motif-related genes varying in copy number, nominating LRR-mediated mechanisms as candidates for phylloxera resistance.
Discussion
The work demonstrates that multiple, haplotype-resolved genomes are essential to capture the extensive gene content variation, tandem arrays, and structural diversity present in outbred tree crops like pecan. Conserved homeologous synteny validates assembly quality and provides a framework for comparative genomics and pan-genome construction. The prevalence of interspecific introgressions, often enriched for defense-related genes, suggests adaptive gene flow has been leveraged historically and recently in breeding, contributing to traits like disease and insect resistance. The major phylloxera resistance QTL highlights how PAV and haplotype variation within a single parent can underpin large-effect loci, with LRR gene clusters as strong candidates. The scab inoculation transcriptomics identifies responsive pathways and candidate genes for susceptibility. Collectively, these resources enable marker-assisted and genomic selection, particularly for biotic stress resistance, and illustrate that relying on a single reference would miss key breeding-relevant variation.
Conclusion
This study provides four chromosome-scale, diploid pecan genome assemblies, including a highly contiguous PacBio HiFi assembly for both 'Pawnee' haplotypes, and a synteny-constrained pan-genome capturing extensive presence–absence variation. It maps and characterizes adaptive interspecific introgressions contributing to defense-related gene content and identifies a major QTL for phylloxera resistance with LRR-rich candidate genes. The integrated comparative, population, and quantitative genomics framework advances functional and breeding genomics in pecan and similar outbred perennials. Future work should expand the pan-genome with additional genotypes, experimentally validate candidate resistance genes and haplotypes (e.g., LRR clusters), refine introgression mapping across broader germplasm, and develop robust markers for deployment in marker-assisted and genomic selection programs.
Limitations
- Pan-genome construction constrained to syntenic blocks may miss small translocations (<0.4% of the genome), slightly reducing precision for some ortholog assignments. - Introgression inference relies on sampled relatives and three focal outgroup species; unsampled species may contribute ancestry, and local ancestry estimates depend on allele frequency estimates and model assumptions. - The scab differential expression experiment used a single susceptible cultivar ('Desirable'), one isolate, and a single 24 h time point; it lacks a corresponding genome assembly for 'Desirable' and may not capture temporal dynamics or cultivar-pathotype specificity. - The phylloxera QTL mapping used a single environment and time point with a moderate F1 population size (n=143); while the QTL signal is strong, causal genes are not definitively identified and require functional validation. - Some genomic regions in non-HiFi assemblies have reduced alt-haplotype contiguity due to homozygosity/repeats, which may limit resolution of PAV within those intervals.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny