Agriculture
The complex polyploid genome architecture of sugarcane
A. L. Healey, O. Garsmeur, et al.
The study addresses the long-standing lack of a reference-quality genome for modern sugarcane hybrids, whose breeding progress has plateaued due to limited diversity, long cycles, and extreme genomic complexity. Modern cultivars originated from interspecific crosses of ‘sweet’ Saccharum officinarum with ‘wild’ S. spontaneum, followed by backcrossing that introduced variable subgenome dosage, polysomic/disomic pairing, recombination, and frequent aneuploidy. The research aims to generate a highly contiguous, polyploid reference genome for the representative cultivar R570 that captures unique haplotypes across its approximately 12x ploidy, resolve the architecture and ancestry of chromosomes, and identify candidate genes for key agronomic traits, particularly the single-dose brown rust resistance locus Bru1. This resource is intended to empower molecular breeding, marker development, and functional genomics in sugarcane.
Prior knowledge of sugarcane genome architecture has largely come from molecular cytogenetics, genetic mapping, BAC libraries, and a mosaic monoploid genome assembly, which provide only a collapsed, single-copy view of chromosomes. Extensive work documented complex meiotic behavior, variable pairing affinities, interspecific recombination, aneuploidy, and the double-genome structure of modern hybrids. Several draft assemblies for other cultivars and the R570 monoploid path existed but lacked polyploid resolution. Previous studies mapped the durable brown rust resistance locus Bru1 to a small genomic region using map-based cloning and BAC walking but were constrained by an unsized gap and haplotype-specific insertion. Work in related grasses (Sorghum bicolor) established strong synteny with S. officinarum, while S. spontaneum exhibits chromosome fusions reducing its basic number. Prior QTL studies implicated discrete loci affecting sugar content, but interpretation was limited by collapsed assemblies and differing genetic backgrounds. Collectively, these studies highlighted the need for a chromosome-scale, polyploid assembly capturing haplotypic diversity to enable trait dissection in sugarcane.
Plant material: Sugarcane hybrid cultivar R570 maintained at CIRAD. Sequencing and data generation: Multiple platforms and library types were used: Illumina short reads (X10, HiSeq2500, NovaSeq 6000; paired-end 150–250 bp; PCR-free and tight insert libraries); PacBio continuous long read (CLR) and HiFi circular consensus sequencing (Sequel/Sequel II; SMRTbell v2.0; 1–18 kb insert sizes); extensive RNA-seq (poly(A)-selected, stranded, NovaSeq 2×150 bp); Hi-C (~558 Gb, ~56× coverage); Bionano Saphyr Direct Label and Stain (DLE-1) optical maps from ultra-HMW DNA; flow-sorted chromosome sequencing libraries; cytogenetics with FISH using S. spontaneum-specific probes. Assembly and scaffolding: Contigs were assembled with HiFiAsm and RACON polishing; optical maps were assembled (Bionano Solve/Access) with and without CMPR cuts. Hybrid scaffolds combining HiFi contigs and optical maps were constructed (hybridScaffold DLE-1). Chromosome-scale scaffolding required a custom pipeline integrating five evidence streams: (1) optical maps for long-range ordering; (2) linkage groups from ~1.8 million simplex markers genotyped in 96 R570 S1 progeny; (3) markers from single flow-sorted chromosome libraries; (4) synteny to Sorghum bicolor gene models to resolve overlaps and redundancies; (5) manual verification and correction using Hi-C contact maps. Misjoins were identified and broken iteratively. The resulting primary assembly (5.04 Gb; contig N50 12.6 Mb) encompassed 67 chromosomes with additional optical and unanchored scaffolds; alternate assembly (3.73 Gb; contig N50 2.1 Mb) captured nearly identical haplotypes aligned by best unique placement to primary chromosomes. Haplotype collapse and IBD assessment: PacBio HiFi reads were mapped back (minimap2/pbmm2), depth was summarized in 10 kb windows to classify regions as x0–x4 depth, and NucFreq was used to detect non-identical collapsed haplotypes after excluding high-depth repeats and outliers. Representation of identical-by-descent (IBD) collapsed regions and heterozygous single-copy regions was quantified. Genome annotation: The PERTRAN pipeline integrated ~3.7B pairs of RNA-seq and ~31M Iso-Seq reads. Transcriptome assembly (GSNAP, PASA), protein homology (17 genomes, Swiss-Prot), ab initio and evidence-based predictors (FGENESH+, FGENESH_EST, EXONERATE, AUGUSTUS) were combined, filtered for TE domains and low-confidence models, and curated intragenomically to balance subgenome representation. Repeats were modeled (RepeatModeler). Completeness was assessed with BUSCO. Comparative and structural genomics: Synteny and orthology networks were built with GENESPACE (MCScanX, OrthoFinder) versus S. bicolor, S. spontaneum (AP85-441), Setaria viridis, and the R570 monoploid. Pairwise peptide identities within progenitor-assigned orthogroups were computed (Biostrings/MAFFT). Structural variants among homeologous groups were detected by aligning homeologs to the ‘A’ chromosome with minimap2 and calling with SyRI. Resistance gene analogues (RGAs) were predicted using NLR-Annotator and assigned to progenitor blocks. Neutral divergence (Ks) between progenitors was estimated from 45,000 ortholog pairs. Bru1 fine-mapping and candidate identification: Leveraging prior high-resolution genetic and BAC physical maps (2,383 S1 progeny; flanking markers 0.14–0.28 cM; two gaps previously), the new genome closed a ~100 kb unsized gap within the 309 kb (0.42 cM) target haplotype on Chr 3D: 5,944,326–6,253,115. Manual curation produced 13 gene models, functionally evaluated via domain annotation, phylogenetics, and homology to known resistance genes. Two tandem genes encoding a receptor-like kinase and a pseudokinase (RLK-PELLE-DSLV family) were prioritized as Bru1 candidates (TKP7 and TKP8).
- Genome assembly: Primary assembly 5.04 Gb (contig N50 12.6 Mb; 0.1% gaps; LAI 22.82), alternate assembly 3.73 Gb (contig N50 2.1 Mb). Combined assembled sequence 8.72 Gb closely matches expectations after accounting for collapsed identical haplotypes (~12.5% inbreeding; represented genome size 9.32 Gb).
- Annotation: 194,593 protein-coding genes in the primary assembly (105,138 alternative isoforms); BUSCO completeness 99.8% total (99.3% duplicates). Mean ploidy of primary assembly ~6.78× based on syntenic block coverage, reflecting half of the ~12× biological ploidy.
- Progenitor composition and synteny: Of the primary genome, 3.66 Gb (73%) assigned to S. officinarum and 1.37 Gb (27%) to S. spontaneum, consistent with cytogenetics. S. officinarum chromosomes (x=10) are directly syntenic to S. bicolor; S. spontaneum (typical x=8) shows characteristic fusions (observed as Chr5_9A, Chr6_9A, Chr7_10A, Chr8_10A). Thirteen interspecific recombinant chromosomes were identified; a unique translocation (S. spontaneum Chr5 with S. officinarum Chr8) was confirmed in R570.
- Haplotype structure and IBD: Approximately 50.4% of the genome is identical-by-descent and collapsed among 2–4 haplotypes; 49.6% has sufficient heterozygosity for unique HiFi read placement. S. spontaneum regions are mostly heterozygous single-copy (86%), while only ~48% of S. officinarum regions are heterozygous, with the majority collapsed. 87% of duplicated sequence across primary+alternate derives from S. officinarum. Only ~0.1% of primary chromosome bases appear to be non-identically collapsed.
- Marker bias: The most common breeding markers (simplex; 3:1 in S1) are enriched in S. spontaneum-derived, heterozygous regions (45% of markers; Fisher’s exact test enrichment ×3.25, P<0.0001; 98% in heterozygous haplotypes), rendering large inbred S. officinarum regions underrepresented in genetic mapping.
- Gene diversity: 68% (n=132,618) of gene models assigned to S. officinarum, 31% (n=61,197) to S. spontaneum. Non-synonymous variation was present in 87% (S. officinarum) and 95% (S. spontaneum) homeologous gene copies (note: some genes in collapsed regions may be under-represented). Mean pairwise peptide identity among homeologs: S. officinarum 86%, S. spontaneum 83% (Mann–Whitney U=3.5×10^8, P<0.0001). Genes impacted by structural variants were enriched in S. officinarum regions (n=5,090; 94% of impacted; Fisher’s exact test OR=9.03, P<0.0001). Novel gene content showed slight enrichment from S. officinarum (OR=1.24, P<0.0001), but notable S. spontaneum-specific expansions included a 9-gene LRR tandem duplication on Chr7_10A.
- RGA enrichment: Resistance gene analog motifs were significantly enriched in S. spontaneum-derived genome blocks (OR=2.14, P<0.0001), especially on chromosomes 3, 6, and 7 (enrichment ×4.81, ×3.35, ×4.11 respectively, P<0.0001).
- Sugar transport gene conservation: Candidate sugar transport genes underlying sweet sorghum ‘rio’ phenotype exhibited high conservation across genomes (median PID ~94% in R570 vs S. bicolor), though some R570 alleles carry frameshifts likely impacting function.
- Bru1 locus resolution: The assembly closed a ~100 kb unsized gap within the ~309 kb (0.42 cM) Bru1 region on Chr 3D (5,944,326–6,253,115). Thirteen curated genes were confirmed; two tandem kinase-pseudokinase genes (TKP7: functional kinase with all 12 subdomains; TKP8: pseudokinase with domains I–VII) within the haplotype-specific insertion were identified as top candidates for the Bru1 durable brown rust resistance gene, consistent with known TKP resistance mechanisms (e.g., barley RPG1, wheat Yr15).
By delivering a chromosome-scale, polyploid reference for R570 that captures unique haplotypes across an ~12× genome, this study overcomes the limitations of prior monoploid assemblies and fragmented resources. The assembly clarifies the genomic contributions and rearrangements of the S. officinarum and S. spontaneum progenitors, quantifies extensive IBD-driven collapse (primarily from S. officinarum), and exposes a large reservoir of heterozygous, S. spontaneum-derived variants enriched for disease resistance motifs. These insights explain biases in traditional marker-based mapping (favoring heterozygous S. spontaneum regions) and indicate why large, inbred S. officinarum segments have been refractory to QTL discovery, despite their relevance to sugar content via dosage effects. The genome enables precise synteny-aware exploration of gene families, dosage, structural variants, and progenitor ancestry, facilitating copy-number-aware genotyping and marker-assisted selection. Crucially, closure of the gap in the Bru1 region and identification of a tandem kinase-pseudokinase pair as candidate causal genes provide a mechanistic hypothesis for the durable brown rust resistance widely used in breeding. The resource advances sugarcane genetics by enabling fine-scale haplotype analysis, comparative genomics, and functional studies that can accelerate trait discovery and breeding in this highly complex crop.
This work presents the first highly contiguous, polyploid reference genome and comprehensive annotation for the modern sugarcane cultivar R570, integrating long-read sequencing, optical maps, Hi-C, genetic maps, and single-chromosome data into chromosome-scale assemblies. It quantifies haplotype collapse and heterozygosity patterns, reveals extensive S. officinarum-driven IBD and S. spontaneum-enriched resistance gene content, and refines the structural and evolutionary context of hybrid sugarcane. Closure of the Bru1 locus and prioritization of a tandem kinase-pseudokinase pair as candidate causal genes offer immediate targets for validation and breeding. The strategies and resources developed here will support copy-number-aware genotyping, improved marker design, and transcriptomic analyses to dissect complex traits (e.g., sucrose accumulation) under polyploid dosage and epistasis. Future research should: (1) functionally validate the TKP7/TKP8 candidate genes for Bru1 via transgenic or gene-editing approaches; (2) expand to a sugarcane pan-genome to capture broader haplotype diversity; (3) perform allele-specific and dosage-aware expression profiling across tissues and environments; and (4) translate genome-enabled insights into marker-assisted and genomic selection pipelines for disease resistance and sugar yield improvement.
- Alternate assembly anchoring is based on sequence similarity to primary chromosomes and does not guarantee biological pairing with true meiotic homologues.
- Extensive identical-by-descent regions, especially from S. officinarum, remain collapsed, limiting full haplotype resolution and potentially under-representing gene copies and allelic diversity in these segments.
- Despite high contiguity (0.1% gaps), the assembled sequence (8.72 Gb) plus inferred collapsed content does not fully enumerate all 12 copies for all chromosomes; perfectly duplicated regions cannot be confidently placed without artificial duplication.
- Marker discovery and genetic mapping remain biased toward heterozygous S. spontaneum regions; large inbred blocks from S. officinarum may require alternative strategies (e.g., copy-number-aware genotyping) and larger populations for segregation-based analyses.
- The study focuses on a single cultivar (R570); broader generalization to diverse germplasm awaits pan-genomic analyses.
- Some gene models in collapsed regions may be under-represented; functional predictions (e.g., TKP mechanism) require experimental validation.
Related Publications
Explore these studies to deepen your understanding of the subject.

