Agriculture
Long-read sequencing reveals genomic structural variations that underlie creation of quality protein maize
C. Li, X. Xiang, et al.
Maize endosperm storage proteins (zeins) are deficient in essential amino acids, making protein quality a breeding target. The opaque2 (o2) mutation reduces zein accumulation, thereby increasing non-zein proteins and doubling lysine content, but produces soft, opaque kernels that are agronomically inferior. Quality protein maize (QPM) restores vitreous (hard) endosperm to o2 backgrounds via multiple unlinked modifier loci (Mo2s), notably including duplication at the 27-kD γ-zein (γ27) locus. However, the broader genetic and molecular bases of endosperm modification remain unclear due to complex repetitive genomes and structural variation. This study aims to generate a high-contiguity genome assembly of a QPM line (K0326Y) with long-read sequencing, systematically compare structural/genetic variation with reference inbreds (B73, Mo17), map Mo2 QTLs in a K0326Y × W64Ao2 cross, and integrate RNA-seq to identify candidate genes and mechanisms (e.g., enhanced glycolysis and unfolded protein response) underlying vitreous endosperm formation in QPM.
Prior work established that multiple unlinked Mo2 QTLs underlie endosperm modification in QPM, with a major effect associated with tandem duplication of the γ27 zein gene that stabilizes protein body formation. Earlier mapping in K0326Y identified Mo2 QTLs on chromosomes 1, 7, and 9, and microarray studies highlighted upregulation of Pfpa (PFPa subunit of pyrophosphate-dependent phosphofructokinase) and γ27 as correlates of vitreous phenotype. Long-read assemblies of other maize lines (B73, Mo17, SK) revealed extensive structural variation, presence/absence variation, and improved resolution of repetitive regions, but a complete QPM genome sequence enabling dissection of Mo2 architecture was lacking. The o2 mutation impacts protein folding and energy metabolism; previous data suggested increased activity of non-ATP-requiring glycolytic enzymes (PFPa, PPDK2) could compensate for ATP limitation in QPM endosperm.
- Plant materials: QPM line K0326Y sequenced; F2 population generated from K0326Y × W64Ao2 for QTL mapping; additional QPM lines and populations used for marker association.
- Sequencing and assembly: PacBio SMRT long reads (28.35 million; N50 read length 16.6 kb; ~139× coverage) assembled with Falcon; polishing with 132.5 Gb PacBio consensus and 217.5 Gb Illumina paired-end reads (BWA-MEM, Pilon). BioNano optical maps (389.3 Gb molecules) integrated using BioNano Solve HybridScaffold; gaps filled with PBJelly. Scaffolds oriented to chromosomes via synteny with B73 (NUCmer) to generate pseudomolecules.
- Assembly evaluation: Mapping of ~4.4 million GBS tags to assess order/orientation; Illumina read mapping (~100×) for accuracy; BUSCO (Embryophyta) for completeness.
- Repeat and centromere annotation: RepeatModeler and RepeatMasker with Repbase; LTRharvest/LTRdigest for intact LTRs; HelitronScanner for Helitrons; additional pipelines for SINEs, LINEs, TIRs, MITEs; centromeric CRM and CentC defined via BLASTN and TRF.
- Gene annotation: PacBio Iso-Seq across nine tissues produced 1,618,691 HQ-FLNC reads collapsed into 247,616 non-redundant transcripts; integrated with RNA-seq assemblies (Trinity/TGICL) and protein homology in MAKER-P using Augustus/FGENESH; functional assignment via Swiss-Prot, TrEMBL, InterPro, nr, KEGG, KOG, GO.
- Comparative genomics: Whole-genome alignments K0326Y vs B73/Mo17 (MUMmer/NUCmer) to define synteny, inversions, SNPs/InDels (show-snp), structural variants from long-read mapping (NGMLR/Sniffles), PAVs by window alignment (BWA-MEM) and CDS mapping (GMAP). Gene duplication classes computed with BLASTP/MCScanX.
- QTL mapping (BSA-seq): Extreme vitreous and opaque F2 bulks (165 vitreous, 160 opaque) with fixed qy27 allele selected; Illumina sequencing (2×150 bp, ~80× depth per bulk); reads mapped by BWA-MEM; variants called with GATK; SNP-index and G' statistic computed in 4-Mb windows to identify QTLs.
- Transcriptomics: RNA-seq on 16 DAP endosperm from K0326Y, W64Ao2, CM105Mo2, CM105o2 (Illumina HiSeq, 125-bp PE); trimming with Trimmomatic; alignment with HISAT2; counts via HTSeq; DE genes via DESeq2 (adjusted P ≤ 0.05; fold change > 1.5). Validation of Pfpa expression by qRT-PCR.
- Marker association: PCR to detect Helitron insertion in Pfpa promoter in QPM inbreds, GWAS populations, and F2 individuals; segregation analysis to associate with vitreous phenotype.
- High-contiguity QPM genome: K0326Y assembly size 2,161 Mb with 438 gaps; contig N50 7.77 Mb and scaffold N50 22.78–27.98 Mb (hybrid stage), exceeding B73 (contig N50 1.25 Mb) and Mo17 (1.47 Mb). 97.74% of assembled sequence anchored to chromosomes; BUSCO completeness 95.8% (1380/1440 genes).
- Repeats and genes: 83.32% of the genome is repetitive (retrotransposons 77.38%; DNA transposons 4.72%); 136,191 intact LTRs identified. Total annotated genes: 38,238 with 60,475 transcripts; 69% supported by full-length cDNAs (CDS coverage >50%).
- Structural variation: Two large inversions unique to K0326Y: 8.5-Mb pericentric inversion on chromosome 1 and 5.8-Mb paracentric inversion near the centromere of chromosome 4. O2 in K0326Y carries a 4,958-bp rbg transposon insertion 249 bp upstream of ATG.
- Polymorphisms and PAVs: Versus B73: 10,205,511 SNPs and 1,397,901 InDels (<100 bp); vs Mo17: 9,655,364 SNPs and 1,458,329 InDels. Insertions (>100 bp)/deletions affecting thousands of genes (e.g., vs B73: 19,778 insertions affecting 6,538 genes; 39,931 deletions affecting 10,463 genes). K0326Y-specific PAVs: 39,479 segments (154.7 Mb) absent in B73 and 37,906 segments (149.5 Mb) absent in Mo17, impacting 3,568 genes; 631 PAV genes upregulated in QPM endosperm enriched in starch metabolism, ATPase activity, auxin biosynthesis, sulfur transport.
- Mo2 QTLs: BSA-seq identified QTLs on chromosomes 1, 7, and 9, consistent with prior studies; an additional sharp QTL on chromosome 6 coincides with a K0326Y-specific insertion but may be a false positive due to limited F2 recombination.
- Differential expression: 1,791 DEGs overlapped between two QPM vs o2 comparisons (926 up, 865 down). Upregulated genes enriched in chaperone binding, unfolded protein binding, protein folding, and heat/temperature response. Forty-three HSPs and HSP transcription factors significantly upregulated in QPM.
- Candidate genes within QTLs and with structural changes:
- o10 (chromosome 1): an 85-bp promoter deletion in K0326Y with elevated expression in QPMs; o10 regulates zein deposition and PB organization.
- qy27 duplication: Expanded contiguous 28-kb sequence covering the ~15-kb tandem duplication including γ27 and ARID4. The first ARID4 copy has a 1,923-bp 3' deletion (missing four exons); a diagnostic PCR marker spanning this deletion can assist selection. Comparative analysis suggests single-copy y27 alleles may derive from rearrangement of a duplicated ancestor allele.
- Pfpa (chromosome 9): K0326Y carries a 983-bp Helitron insertion in the promoter and a 2,485-bp intronic insertion; B73/Mo17 carry distinct large insertions (e.g., 6,181-bp CACTA and LTR retrotransposons of 10,685 bp and 6,037 bp in introns) absent from K0326Y. Pfpa transcript abundance (whole gene and per exon) is higher in QPM lines (K0326Y, CM105Mo2) than in non-QPM (W64Ao2, CM105o2, CM105+); differences are statistically significant (e.g., p-values ≤ 0.003 across comparisons). The Helitron promoter insertion is present in 65% of QPM lines and in 95% of vitreous F2 kernels, strongly associating with the vitreous phenotype.
- SR45a (chromosome 9): 399-bp hAT DNA transposon in the 5th intron; expression increased 2–28× in QPM; insertion allele shows ~61% linkage with vitreous trait in F2s.
- ERDJ3A: A 26,022-bp retrotransposon downstream; expression elevated 5–8× in QPM endosperm.
- Mechanistic model: Data support that increased non-ATP-requiring glycolytic capacity (e.g., PFPa, ENO) and enhanced unfolded protein response (HSPs, ERDJ3A) mitigate ATP limitation and protein folding stress in o2 endosperm, facilitating protein body formation, starch biosynthesis coordination, and vitreous endosperm development.
By generating a highly contiguous long-read assembly of a QPM inbred and integrating comparative genomics, QTL mapping, and transcriptomics, the study pinpoints genomic structural variants and expression changes associated with endosperm modification. The QTLs on chromosomes 1, 7, and 9 align with prior genetic data, strengthening confidence in these regions. Candidate genes within these intervals show compelling structural features—promoter and intronic transposon insertions, tandem duplications, and promoter deletions—linked to altered expression and the vitreous phenotype. In particular, the Helitron insertion in the Pfpa promoter is strongly associated with vitreous kernels, consistent with a role for increased non-ATP-dependent glycolysis in alleviating energy deficits in o2 endosperm. Elevated HSP and co-chaperone expression suggests a strengthened unfolded protein response aiding protein folding and ER function, which, together with stabilized protein body formation (via γ27 duplication) and potentially improved zein deposition (o10 upregulation), contributes to vitreous endosperm. The discovery of large K0326Y-specific inversions and extensive PAVs underscores the importance of structural variation in maize phenotypic diversity and provides markers and targets for breeding. Overall, the findings address the central question of how Mo2s act at the molecular level and offer a framework for marker-assisted selection and functional validation in QPM breeding.
This work delivers a high-quality, long-read-based genome assembly of the QPM line K0326Y and reveals extensive structural variation relative to B73 and Mo17. Through BSA-QTL mapping and RNA-seq, it identifies candidate Mo2 genes and mechanisms underlying vitreous endosperm formation, including γ27 tandem duplication, a Pfpa promoter Helitron associated with elevated expression and vitreous kernels, and stress/UPR components (HSPs, ERDJ3A). The integrated model proposes that enhanced non-ATP-requiring glycolysis and unfolded protein responses compensate for o2-associated energy and protein folding deficits, supporting coordinated protein body and starch development. Future work should include fine-mapping with high-resolution populations (e.g., RILs), functional validation of candidate variants (CRISPR, transgenics), dissection of the chromosome 6 QTL signal, and broader surveys of structural variants across diverse QPM germplasm to inform breeding.
- The chromosome 6 QTL peak may represent a false positive due to limited recombination in the F2 population; higher-resolution mapping is needed.
- Association of structural variants (e.g., Pfpa Helitron, SR45a hAT, ERDJ3A downstream retrotransposon) with vitreous phenotype is correlative; causal effects require functional validation.
- Many DEGs outside QTL intervals may be downstream effects; their direct roles in Mo2-mediated modification are unresolved.
- Although assembly contiguity is high, repetitive regions and complex SVs can still harbor unresolved sequence or phasing ambiguities.
- Generalizability across all QPM backgrounds remains to be tested, as structural variation is extensive among maize inbreds.
Related Publications
Explore these studies to deepen your understanding of the subject.

