logo
ResearchBunny Logo
Chromosome-scale assembly and analysis of biomass crop *Miscanthus lutarioriparius* genome

Biology

Chromosome-scale assembly and analysis of biomass crop *Miscanthus lutarioriparius* genome

J. Miao, Q. Feng, et al.

Discover the groundbreaking chromosome-scale assembly of the *Miscanthus lutarioriparius* genome, showcasing innovative techniques like Oxford Nanopore sequencing and Hi-C technologies. This research, led by a team of experts including Jiashun Miao and Qi Feng, highlights significant gene expansions linked to disease resistance and stress response, revealing potential insights into this remarkable plant's traits.

00:00
00:00
~3 min • Beginner • English
Introduction
The genus Miscanthus (~20 species) is a rhizomatous perennial C4 grass with high biomass yield, strong stress tolerance, cold adaptation, and tolerance to heavy metals, making it valuable for bioenergy and phytoremediation. Miscanthus lutarioriparius, endemic to the Yangzi River region in China, can reach ~7 m in height, shows the highest biomass among major Miscanthus species, and exhibits favorable papermaking properties, high photosynthetic rates, water-use efficiency, and tolerance to drought and salinity on marginal lands. However, self-incompatibility, high heterozygosity, variable ploidy, large genome size, and abundant repeats have hindered previous high-quality genome assemblies based on short reads, limiting genomic understanding and molecular breeding. With advances in long-read sequencing and Hi-C scaffolding, this study aims to generate a chromosome-level reference genome for M. lutarioriparius to elucidate its genome evolution and the genomic basis of its distinctive traits and to facilitate utilization of Miscanthus genetic resources.
Literature Review
Methodology
Plant material: A M. lutarioriparius individual was sampled from Honghu Lake (Hubei, China), verified for genome size and diploidy by karyotype and flow cytometry, clonally propagated, and grown for tissue collection. Sequencing: High molecular weight DNA from young leaves was extracted (CTAB, Qiagen Genomic-tip). Oxford Nanopore libraries (LSK-109) with 20–50 kb size selection (BluePippin) were sequenced on PromethION (R9.4.1), generating 307.71 Gb raw data (280.84 Gb clean). Three Illumina paired-end DNA libraries (PE250 and PE150) yielded 205.74 Gb raw reads (172.52 Gb clean) for polishing. Hi-C libraries (DpnII digestion) from the same plant were sequenced on HiSeq4000, yielding 347.76 Gb clean. RNA-seq from nine tissues (leaf, spikelet, root, internode upper/middle/lower, rhizome, lateral bud, seedling) produced 95.12 Gb raw reads. Genome size and heterozygosity: 17-mer Jellyfish counts and GCE estimated genome size (~2.19 Gb) and repeat content (~67.3%); flow cytometry estimated 2.15 Gb. Assembly: ONT reads were self-corrected with Canu (v1.8), longest 40x (112.78 Gb) used for SMARTdenovo assembly. Polishing included three rounds of Racon with ONT reads and three rounds of Pilon with Illumina reads. Assembly quality was monitored by BUSCO and BAC-to-contig alignments (MUMmer, minimap2 coverage). Hi-C scaffolding: Hi-C reads were processed with fastp, mapped (BWA/HiC-Pro, Juicer). Two pipelines (3d-dna and LACHESIS) were tested with multiple parameter sets; LACHESIS produced the final 19 pseudochromosomes. Hi-C maps were visualized with Juicebox. Redundant heterozygous sequences collapsed during scaffolding reduced assembly length from 2.25 Gb to ~2.075 Gb. Assembly evaluation: BUSCO (embryophyta_odb10), LAI (LTR_retriever), Illumina DNA and RNA read mapping rates (BWA-MEM, HISAT2), and synteny with Sorghum bicolor (MCScan2) assessed completeness and accuracy. Repeat annotation: A de novo TE library (RepeatModeler) plus RepBase was used with RepeatMasker. Intact LTR-RTs were identified (LTR_FINDER, LTR_retriever), insertion times estimated. LINEs/SINEs, DNA transposons, MITEs (MITE Tracker), tandem repeats (TRF) were annotated; centromere/telomere locations inferred from tandem repeats. Gene prediction and annotation: Repeats-masked genome was annotated by integrating ab initio (FGENESH, AUGUSTUS), protein homology (Exonerate with rice/maize/sorghum/S. spontaneum proteomes), and RNA-seq evidence (Trinity de novo, StringTie genome-guided; PASA). EvidenceModeler integrated models with weighted scores; MAKER applied QC filters. Functional annotation used InterProScan, eggNOG-mapper, KEGG KOALA; TFs via PlantTFDB; ncRNAs via Rfam/Infernal, tRNAscan-SE, and barrnap. Comparative genomics and WGD: Synteny with sorghum and self-synteny used MCScan/jcvi. Duplicate gene classification (MCScanX) identified WGD/segmental, tandem, proximal, dispersed, and singletons. Ks distributions (KaKs_calculator YN model) dated the recent WGD using r = 6.5e-9 substitutions/site/year. Gene family evolution: Orthogroups across eight species (OrthoFinder) informed phylogeny (MAFFT, RAxML; divergence dating with BEAST using TimeTree calibrations) and CAFE analysis of expansion/contraction; KinFin aided functional summaries. GO/KEGG enrichments used clusterProfiler with BH FDR. Trait-related families: NBS-LRR R-genes identified via PRGdb DRAGO2, HMMER (NB-ARC PF00931), BLASTP, and domain curation (CD-Search); classified into CNL/TNL/RLP/RLK/others. CAZymes annotated with dbCAN2; CesA/Csl families identified by BLAST and HMM (PF03552/PF00535), phylogeny (IQ-TREE), and expression profiling. C4 pathway genes identified by homology to sorghum orthologs and expression assessed across tissues. Chloroplast genome and phylogeny: Plastome assembled from WGS Illumina reads (MITObim) using Miscanthus junceus and Saccharum spontaneum baits; annotated with GeSeq. Genus-level plastome phylogenies reconstructed (MAFFT, GBLOCKS, IQ-TREE, MrBayes, MEGA X) and time-calibrated with BEAST. Population genetic analysis: Public RNA-seq from 79 individuals across 10 populations processed (fastp) and variants called with GATK RNA-seq pipeline; PCA (GCTA), NJ tree (PHYLIP), admixture (ADMIXTURE), and distance (VCF2Dis) characterized structure.
Key Findings
- Assembly: 2,074.80 Mb total in 919 scaffolds; scaffold N50 113.46 Mb; contig N50 1.71 Mb; longest scaffold 150.81 Mb; 94.30% of sequence anchored/oriented to 19 pseudochromosomes; assembly covers 96.64% of the ~2,147 Mb flow-cytometry genome size. LAI = 12.11; BUSCO complete = 97.4% (1339/1375); Illumina DNA read mapping >99.7%; RNA-seq mapping 82.6–95.4%. - Centromeres/telomeres: Centromeric sequences assembled for all 19 chromosomes; centromeric satellites are 137-bp monomers forming two distinct types, supporting allotetraploid origin and recombination between centromeric types. Telomeric TTTAGGG repeats detected on Chr10 (1917 repeats). - Genome content: 68,328 gene models (mean CDS 1215 bp; mean exons per gene 4.77); 4031 TFs; ncRNAs: 1164 tRNAs, 257 rRNAs, 521 miRNAs, 970 snoRNAs, 98 snRNAs. InterProScan annotated 92.31% proteins; eggNOG 93.07%; KEGG 33.52%; GO 57.33%. - GC landscape: Genome GC 45.46%; CDS GC 56.40%; GC and GC3s bimodal; strong positive correlation between GC content and gene density (Pearson r = 0.95, FDR = 8.25e-10). Chromosomes 9 and 10 have lower GC and gene density. - Repeats: 64.39% interspersed repeats (1.34 Gb); LTR-RTs 46.78% with Gypsy ~35.2% (centromere-enriched) and Copia ~11.6%; 8848 intact LTR-RTs with a burst ~1–2 Ma. LINEs 1.21%, SINEs 0.16%; DNA transposons 9.64%, enriched on arms of chromosomes 9 and 10. Tandem repeats total 517,973 (3.74% genome); 19,062 MITEs (0.23%). - Synteny and WGD: Predominant 2:1 syntenic depth to sorghum; 87.05% of sorghum genes have two syntenic copies in M. lutarioriparius; only 0.68% of sorghum genes lack synteny. Large inversions on MlChr07 and MlChr08; fusion of ancestral SbChr04 and SbChr07 into MlChr07 after recent WGD; ends of MlChr09/10 and MlChr14/15 highly collinear. Recent WGD dated to ~6.15 Ma (Ks small peak); evidence of older grass shared WGD. - Duplicate gene origin: WGD/segmental 63.96% (43,704 genes) dominate; dispersed 18.59% (12,700); tandem 6.39% (4365); proximal 6.24% (4267). Chromosomes 9 and 10 show highest proportion of proximal and tandem duplicates. Tandem duplicates have highest GC3s and lowest ENC, and are enriched for stress responses and cell wall biosynthesis; WGD/segmental duplicates enriched for redox, transport, signaling, metal ion transport, photosynthesis; proximal duplicates enriched for hexosyl transfer and polysaccharide binding; dispersed duplicates enriched for DNA repair and chromatin functions. - Gene family evolution: Across Panicoideae, 21,515 orthogroups include 57,710 M. lutarioriparius genes; 144 gene families unique to M. lutarioriparius include NB-ARC and cytochrome P450 families; enriched GO terms include peroxidase activity and response to oxidative stress. CAFE identified 9509 expanded and 3228 contracted families; 211 rapidly expanded families are stress-related (e.g., NB-ARC: 21 families/334 genes; xylanase inhibitors: 4 families/56 genes; WRKY, P450, thaumatin, terpene synthase). Expanded families are enriched for metal ion transport and cell wall biosynthesis. - Disease resistance: 547 NBS-LRR genes (more than rice, sorghum, maize, S. spontaneum), predominantly CC-NBS-LRR; 42.8% clustered on chromosomes 9 and 10; 17.4% are tandem duplicates (vs 6.39% genome-wide), indicating tandem-driven expansion. - Cell wall and CAZymes: 2919 CAZyme genes (4.3% of genes), most among 12 species analyzed; GTs ~46%, GHs ~32%, CEs 4.6%. CesA family has 27 genes, many highly expressed in middle internode; Csl family has 90 genes (more than maize, rice, Arabidopsis). Tandem, WGD/segmental, and proximal duplications expanded CesA/Csl. Lignin biosynthesis: 333 genes in 10 families (greater than sorghum 141, rice 155), with clustering potentially enhancing pathway efficiency. - C4 photosynthesis: 55 putative NADP-ME C4 pathway genes (CA 10, PEPC 12, PPCK 6, PPDK 3, PPDK-RP 6, NADP-ME 13, NADP-MDH 3, RbcS 2). Many expanded via WGD and further proximal/tandem/segmental duplication; C4 isoform duplicates show high leaf expression; some non-C4 isoforms show tissue-specific expression, suggesting neofunctionalization. - Chloroplast genome and phylogeny: Plastome 142,989 bp with 123 protein-coding, 71 tRNA, 8 rRNA genes. Miscanthus plastome phylogeny supports three major groups; M. lutarioriparius clusters closely with tetraploid M. sacchariflorus (Type II) and M. × giganteus maternal lineage; results support taxonomic distinction between diploid and tetraploid M. sacchariflorus. - Population diversity: Transcriptomes from 79 individuals (10 populations) yielded 3,209,041 SNPs and 279,810 indels; population structure and PCA support two genetic groups with greater within-population than between-population variation in some cases.
Discussion
The chromosome-scale assembly provides a high-quality reference for M. lutarioriparius, enabling robust synteny-based comparisons and revealing a recent WGD and an allotetraploid origin defined by two distinct centromeric satellite types. Macrosynteny with sorghum and shared rearrangements with M. sinensis indicate a recent divergence and conserved genome architecture, including a post-WGD chromosome fusion and multiple inversions. The duplication landscape shows WGD/segmental events as the primary source of gene expansion for core processes, while tandem and proximal duplications disproportionately contribute to genes related to stress responses and cell wall biosynthesis, aligning with the species’ remarkable environmental adaptability and high lignocellulosic biomass. The pronounced expansion of R-genes (NBS-LRR), CAZymes, CesA/Csl, and metal ion transport-related families provides genomic bases for disease resistance, efficient cell wall formation, and heavy metal tolerance. Expression patterns of duplicated C4 genes suggest retained or specialized functions underpinning efficient C4 photosynthesis in cooler conditions, a hallmark of Miscanthus. Plastome phylogenetics clarifies interspecific relationships within Miscanthus and supports distinct lineages associated with ploidy differences, with M. lutarioriparius closely allied to tetraploid M. sacchariflorus and M. × giganteus maternal ancestry.
Conclusion
This study delivers a reference-quality, chromosome-scale genome assembly for Miscanthus lutarioriparius and comprehensive analyses of repeats, centromeres, synteny, WGD history, gene duplication modes, gene family evolution, and trait-associated gene families. The findings identify genomic foundations for stress tolerance, disease resistance, high lignocellulosic biomass, metal tolerance, and efficient C4 photosynthesis, thereby providing critical resources for comparative genomics, functional studies, and genome-assisted breeding in Miscanthus and related grasses. Future work should target pan-genomic sampling across Miscanthus species and ploidy levels, deeper functional validation of expanded/tandemly duplicated gene families (R-genes, CesA/Csl, CAZymes, metal transporters), and integrative multi-omics to link genotype to key bioenergy traits under diverse environments.
Limitations
- Some uncertainty remains in scaffold orientation and local inversions: comparisons between LACHESIS and 3d-dna revealed small inversions at chromosome ends (Chr09, Chr13, Chr15) and relatively lower sequence identity for Chr15 and Chr19 between methods, suggesting regions that could benefit from additional validation. - Heterozygosity and polyploid complexity required collapsing redundant sequences during scaffolding, which may obscure allelic variation and subgenome-specific features. - Population genetic inferences used transcriptome-derived variants rather than whole-genome resequencing, potentially biasing estimates toward expressed regions. - Chloroplast-based phylogenies capture maternal lineages; resolving complex reticulate evolution and ploidy variation in Miscanthus will require more accessions, geographic context, nuclear genomes, and karyotype data. - Telomeric sequences were fully assembled only for Chr10; other telomeres may remain incomplete.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny