Agriculture
A high-quality genome assembly highlights rye genomic characteristics and agronomically important genes
G. Li, L. Wang, et al.
Unlock the secrets of rye genetics with the ground-breaking genome sequence of Weining rye, an elite Chinese variety! This research, carried out by a team of experts, reveals significant findings about gene duplications and starch biosynthesis, promising to enhance breeding studies in rye and related crops.
~3 min • Beginner • English
Introduction
Rye (Secale cereale) is closely related to wheat and barley but exhibits unique agronomic traits including strong abiotic stress tolerance, disease resistance and adaptability to poor soils. Rye chromosome arm 1RS has provided key resistance genes to wheat, and rye is critical for developing triticale. Despite its large genome (~7.9–8.0 Gb) with ~90% transposable elements, a high-quality reference genome for rye was lacking, hindering insights into TE-driven expansion and comparative genomics. In contrast, reference assemblies exist for wheat and barley and for wheat progenitors. Weining is an elite early-flowering Chinese rye variety with broad-spectrum resistance to powdery mildew and stripe rust. To elucidate the genetic basis of rye traits and enable genomics-assisted breeding, the authors generated and analyzed a chromosome-scale reference assembly for Weining rye.
Literature Review
Prior work estimated rye genome size and repeat content and highlighted its value for wheat improvement through 1RS introgressions. Chromosome conformation capture-based assemblies and reference genomes have been completed for barley and bread wheat and its diploid progenitors (Triticum urartu and Aegilops tauschii), along with wild emmer and durum wheat. However, contributions of specific transposable element (TE) families to rye genome expansion remained unresolved, and rye lacked a high-quality reference sequence. Previous studies identified SSP loci (Sec-1 to Sec-4) in rye without fully elucidated structures and mapped QTL for heading date on rye chromosomes 2R, 5R and 6R. Comparative studies in Triticeae have documented TE impacts on genome evolution and domestication loci such as Btr in wheat and barley.
Methodology
Plant materials: The Weining rye line was selfed for 18 generations and karyotyped by FISH; Jingzhou rye was used for comparisons. An F2 population (Weining × Jingzhou) was developed for genetic mapping and QTL analyses.
Sequencing: Generated 430 Gb Illumina paired-end short reads (13 libraries, ~270 bp inserts) and 497 Gb PacBio long reads (120 SMRT cells; 10–50 kb fragments). Constructed six Hi-C libraries (five DpnII, one HindIII; 560 Gb raw data). BioNano optical maps provided additional validation coverage.
Assembly: PacBio subreads were corrected (Canu, correctedErrorRate=0.045) and assembled with wtdbg, FALCON and MECAT; assemblies were merged with Quickmerge. Polishing was performed with Illumina reads using Pilon (≥3 iterations), correcting SNPs and indels. Hi-C data (processed with Cutadapt, BWA, HiC-Pro) were used to detect and split misjoins (2,249 contact points) and scaffold contigs with LACHESIS into seven chromosome-scale scaffolds. Gaps were filled using corrected PacBio reads; scaffolds were evaluated with a genetic linkage map (2,662 SNPs; 843.8 cM) and BioNano data.
Annotation: Repeats were identified using a combined de novo and homology-based pipeline (RepeatScout, LTR-FINDER, MITE-Hunter, PILER-DF; REPET/CLARITE; RepeatMasker; LTR_retriever). Protein-coding genes were annotated by integrating de novo prediction, homology, and extensive transcriptome evidence: 25 Illumina RNA-seq datasets (leaf, stem, root, spike, grains 10–40 DAA) assembled with HISAT/StringTie/Cufflinks and two PacBio Iso-Seq datasets (IsoSeq3) merged via PASA/TransDecoder; models were integrated with EvidenceModeler and filtered into high- and low-confidence sets. Noncoding RNAs (miRNA, lncRNA, tRNA, snoRNA) were annotated. Assembly quality was assessed using LAI and BUSCO.
Comparative genomics and evolution: Single-copy orthologs (2,517) were identified with OrthoMCL; phylogeny and divergence times were inferred using BEAST with fossil-calibrated priors. Synteny with rice and wheat subgenomes was analyzed using BLASTP and MCScanX to infer ancestral grass karyotype segment arrangements.
Gene duplication analyses: MCScanX duplicate_gene_classifier categorized duplicated genes (segmental/whole-genome, tandem, proximal, dispersed). Transposed duplicated genes (TrDGs) were identified using DupGen_finder with barley as outgroup to assign parental vs transposed copies.
Trait gene analyses: Starch biosynthesis-related genes (SBRGs) were identified via BLASTN against the rye genome using wheat SBRG sequences; expression quantified from RNA-seq (TopHat/Cufflinks) and visualized (pheatmap). Secalin loci (SSP genes, Sec-1 to Sec-4) were defined via BLAST searches using wheat/barley SSPs, manual curation, Iso-Seq validation, SDS-PAGE, phylogenetic analysis (MUSCLE/MEGA X) and microsynteny (MCScan python).
TFs and resistance genes: TFs predicted with iTAK; disease-resistance-associated (DRA) genes enumerated and mapped.
Heading date analyses: Differential expression profiling of heading date-related genes in Weining vs Jingzhou at 4, 7, 10 DAS by RNA-seq; RT-qPCR for ScFT1, ScFT2, ScPpd1 across timepoints. ScFT protein detection and phosphorylation assessed by immunoblotting and Phos-tag SDS-PAGE. Site-directed mutagenesis of ScFT2 (S76/T132 de/phosphomimics) was performed and ectopically expressed using a PVX-based vector in tobacco; effects on growth and flowering quantified; protein accumulation assayed by immunoblotting.
QTL and selection sweeps: Heading date QTL were mapped in the F2 population. Selection sweeps related to domestication were detected using a published genotyping-by-sequencing SNP dataset (101 accessions), filtered to 127,826 high-quality SNPs. DRI, FST and XP-CLR were computed in sliding windows; top 5% signals with ≥10 SNPs were retained; overlapping sweeps were merged. Candidate loci within sweeps were inferred via synteny to rice/barley and functional annotations.
Key Findings
Assembly and quality: The Weining rye genome was assembled to 7.74 Gb (98.47% of the 7.86 Gb estimate), with 7.25 Gb (93.67%) assigned to seven chromosomes (1R–7R). Scaffold N50 was 1.04 Gb; contig N50 was 480.35 kb; the longest contig was 9.02 Mb. GC content was 45.89%. Mapping rates were high: 99.77% of 2.77 billion Illumina reads mapped; previous Lo7 pyrosequencing reads mapped at 97.45% with 97.71% identity. Heterozygosity was 0.26%. LAI was 18.42, exceeding that of wheat and barley; BUSCO recovered 1,393/1,440 (96.74%) conserved genes.
Repeats: Repeats comprised 6.99 Gb (90.31%) of the assembly; LTR-RTs dominated (76.29% of the genome; 84.49% of annotated TEs). CACTA elements accounted for 10.55%. Rye had substantially more LTR-RTs than barley (~2.52 Gb more), explaining ~85.4% of the 2.95 Gb size difference. Three Gypsy LTR families (Daniela 5.03%, Sumaya 3.61%, Sumana 1.82%) showed marked expansion vs. Tu, Aet, Hv. Intact LTR-RT insertion times showed a bimodal pattern with peaks at ~0.5 Ma and ~1.7 Ma; Copia elements had very recent bursts (~0.3 Ma); Gypsy dynamics shaped the bimodal pattern.
Genes: Annotated 86,991 protein-coding genes, including 45,596 high-confidence genes with 84,179 transcripts. Weining HC genes exhibited the longest average intron length among 11 grasses; exon/CDS sizes were similar across grasses.
Comparative genomics: Divergence times were estimated at ~15 Ma for barley vs wheat lineage split and ~9.6 Ma for rye vs diploid wheats. Using rice as ancestral reference, rye chromosomes were reconstructed from AGK segments: 3R largely from AGK1; 1R from a nested insertion of AGK10 into AGK5; 2R from nested AGK7 into AGK4; 4R/5R/6R/7R from complex fusions/translocations. Rye showed extensive collinearity with wheat subgenomes: 1R, 2R, 3R wholly collinear with wheat groups 1, 2, 3; other chromosomes exhibited segmental collinearity and known translocations.
Gene duplications: Among chromosomal HC genes, there were 7,077 tandem and 6,659 proximal duplicates—higher than in Tu, Aet, Hv, Bd, Os. Identified 10,357 transposed duplicated genes (TrDGs) in rye (vs. Tu 7,145; Aet 7,351), with 5,926 specific to rye. SBRGs displayed multiple duplication types (transposed, tandem, proximal, dispersed) with expression divergence among duplicates; example: ScSuSy2 parental vs transposed copies showed tissue-specific expression differences.
Seed storage proteins: Defined structures of secalin loci: Sec-1 (~12 Mb; clusters of γ- and ω-secalins), Sec-4 (~591 kb; one γ- and one ω-secalin), Sec-3 (~38 kb; HMW-1Rx and HMW-1Ry), Sec-2 (~33 kb; three 75k γ-secalins). SDS-PAGE confirmed accumulation of corresponding secalins. Sec-1 and Sec-4 were syntenic to wheat γ/ω-gliadin and barley γ/C-hordein regions; no LMW-GS or B-hordein orthologs were detected in rye, indicating deletions of those segments. α-gliadin genes were absent, consistent with their recent evolution in wheat.
TFs and resistance genes: Rye had expanded TF families in 28 of 65 families, notably AP2-ERF. Predicted 1,989 DRA genes (more than Tu, Aet, Hv, Bd, Os or wheat subgenomes), with highest counts on 2R–4R.
Heading date mechanisms: Weining headed 10–12 days earlier than Jingzhou under long days, with faster SAM development. Two FT genes (ScFT1, ScFT2) were more highly expressed in Weining at 7 and 10 DAS. ScFT protein was detected at ~29 kDa and shown to be phosphorylated by Phos-tag assays. Mutational analysis of ScFT2 revealed that dephosphomimic mutants (S76A, T132A, S76A+T132A) enhanced tobacco growth/flowering and accumulated to high levels, whereas phosphomimic mutants accumulated poorly and did not promote flowering, indicating phosphorylation affects FT stability/function. ScPpd1 expression peaked earlier in Weining (2 DAS) than in Jingzhou (4 DAS). QTL mapping identified Hd2R (LOD 8.19; 12.16% variance) near ScPpd1, plus Hd5R and Hd6R; together they explained 33.63% of variance with Weining alleles conferring earliness.
Domestication sweeps: Selection sweep analyses (top 5%) identified 86 (DRI), 56 (FST), and 65 (XP-CLR) signals, with 11 shared. Candidate domestication-related loci included ScBC1, ScBtr, ScGW2, ScMOC1, ScID1, and ScWx. A key sweep on 6RS (detected by all three methods; DRI=2.55, FST=0.18, XP-CLR=2.59) contained tandemly duplicated ScID1 paralogs (ScID1.1/ScID1.2), unique to Weining rye; these genes were more highly expressed in Weining young leaves, and F2 genotyping showed ScID1JZ/JZ homozygotes headed later than ScID1JZ/WN or ScID1WN/WN.
Discussion
The chromosome-scale Weining rye assembly fills a critical gap in Triticeae genomics, enabling detailed analyses of genome structure, evolution, and agronomically important genes. The study demonstrates that rye genome expansion is largely driven by recent bursts of specific LTR retrotransposon families (notably Gypsy elements Daniela, Sumaya, Sumana) around 0.3–0.5 Ma and an older wave at ~1.7 Ma. Elevated TE activity correlates with increased transposed gene duplications, contributing to functional diversification, as evidenced in starch biosynthesis genes where duplicates exhibit distinct tissue-specific expression. Clarifying the architecture and content of secalin loci refines understanding of rye seed storage proteins and their divergence from wheat and barley, informing end-use quality improvement. Comparative synteny with rice and wheat reveals the chromosomal rearrangements shaping rye karyotype and supports precise cross-species genomic inferences. The integration of expression, protein biochemistry and QTL mapping links early heading in Weining to elevated ScFT1/ScFT2 and earlier ScPpd1 expression, and uncovers FT phosphorylation as a previously unreported regulatory layer influencing protein stability and flowering. Selection sweep analyses identify candidate domestication loci, notably a unique tandem duplication at ScID1 associated with earlier heading, suggesting selection on flowering-time genes during rye domestication. Collectively, these findings address the original aims by elucidating rye genomic characteristics and pinpointing genes and regions relevant to breeding.
Conclusion
This work delivers a high-quality, chromosome-scale reference genome for rye and reveals key genomic features: massive LTR-RT content with recent family-specific expansions, elevated gene duplication (especially transposed duplications), and clarified structures of complex secalin loci. It connects gene duplication and regulatory variation to traits, including starch biosynthesis and heading date, identifies phosphorylation-dependent regulation of FT, and highlights candidate domestication sweeps such as ScID1. The Weining assembly provides a robust foundation for comparative cereal genomics and for accelerating molecular breeding in rye, wheat, and triticale. Future research should functionally validate domestication candidates (for example, ScID1 and ScBtr regions), dissect the mechanistic impact of TE-driven duplications on gene networks, refine QTL-to-gene resolutions for heading date and other agronomic traits, and exploit the assembly for targeted introgressions and genome editing.
Limitations
The ScID1-associated selection sweep spans a large region (~12 Mb), and functional causality for heading date requires further fine mapping and experimental validation. Although 93.67% of the assembly is assigned to chromosomes, a fraction remains unanchored. Trait associations beyond the studied Weining and Jingzhou lines may require validation across broader germplasm and environments.
Related Publications
Explore these studies to deepen your understanding of the subject.

