logo
ResearchBunny Logo
Transcriptome and metabolome analyses reveal regulatory networks associated with nutrition synthesis in sorghum seeds

Agriculture

Transcriptome and metabolome analyses reveal regulatory networks associated with nutrition synthesis in sorghum seeds

A. Khan, R. Tian, et al.

This research paper explores the intricate molecular mechanisms behind sorghum seed development in the inbred line ‘BTx623’. Researchers, including Adil Khan and Ran Tian, unveil crucial developmental stages and dynamic gene expression pathways, establishing a foundation for improving sorghum grain quality and yield.

00:00
00:00
~3 min • Beginner • English
Introduction
Sorghum is a versatile, climate-smart cereal essential to global food and feed security. Meeting rising demand under climate change requires improving both grain quantity and quality, which depends on understanding molecular, biochemical, and physiological mechanisms of seed development. Sorghum seeds comprise genetically distinct tissues (diploid embryo, triploid endosperm, diploid maternal tissues). Development from fertilization to maturity typically spans 40–45 days, with limited growth initially (3–5 dpa), followed by endoreduplication and starch accumulation that in sorghum begins around 5 dpa (earlier than maize). Storage protein (kafirin) and starch biosynthesis pathways, controlled by enzymes such as AGPase, SSs, SBEs, and DBEs, determine grain quality. Despite sorghum’s importance, there has been limited tissue- and stage-resolved transcriptomic and metabolomic characterization, particularly distinguishing embryo and endosperm and integrating metabolite profiles. The study addresses this gap by dissecting early whole seed, embryo, and endosperm across development and integrating transcriptome and metabolome data to identify regulatory networks governing carbon allocation to starch and protein.
Literature Review
Prior work in multiple species (Arabidopsis, rice, wheat, maize, barley, oat, soybean, Brassica, Medicago, Paeonia) has used transcriptomics to elucidate seed spatiotemporal expression and regulation, identifying genes/TFs for starch, oil, and protein accumulation and integrating metabolomics to reveal biosynthetic pathways (e.g., amylose/amylopectin in maize, oil/protein hubs in soybean, anthocyanin modification in rice). In sorghum, studies identified kafirin gene families and protein properties, and profiled developing seeds without tissue resolution. However, comprehensive tissue-specific transcriptome and metabolome networks governing starch-protein tradeoffs throughout sorghum seed development have been lacking. This study builds on those foundations by integrating tissue-resolved transcriptomes and metabolomes in sorghum and mapping co-expression networks and hub genes for starch and kafirin biosynthesis.
Methodology
Plant material and field experiments: Sorghum bicolor cultivar BTx623 was grown under field conditions (Quaker Research Farm, Lubbock, TX; semi-arid climate; irrigated 1 inch/week) in summer 2022. Panicles were bagged to prevent cross-pollination; post-pollination mesh bags prevented bird damage. Seeds were sampled daily from pollination to 30 dpa; for molecular analyses, tissues were dissected on ice. For RNA-seq, samples covered early whole seed (1–9 dpa), endosperm (6–25 dpa), and embryo (10–25 dpa). Sampling was done in the morning to minimize circadian effects. Kafirin analysis: Seeds at 5, 10, 15, 20, and 25 dpa were processed. Kafirin 1 (non-reducing) and kafirin 2 (reducing) fractions were extracted (Da Silva et al.) with modifications (50 mg tissue, 0.5 mL solvent), then BME and alkylation (4-VP) as per Bean et al. Both fractions were analyzed by RP-HPLC (C3 columns) to quantify kafirin fractions and total protein. Metabolomics: Untargeted LC-MS profiling (Waters 2777c UPLC + Thermo Q Exactive HF). Extraction from 50 mg tissue in methanol:water (7:3) with internal standard; homogenization, cold sonication, precipitation at −20 °C, centrifugation, filtration (0.22 µm). QC pools prepared. Compound Discoverer v3.3 with bmdb, mzCloud, ChemSpider databases for peak detection/annotation; KEGG and HMDB for pathway annotation. Statistical analysis with MetaboAnalyst: PCA/PLS-DA; univariate t-tests; DEM criteria VIP > 1, P < 0.05, and |log2FC| ≥ 1.5. Five developmental stages were profiled (5, 10, 15, 20, 25 dpa), five biological replicates. RNA-seq: Total RNA isolated (RiboPure kit); QC by gel and Nanodrop; RIN ≥ 7. Libraries prepared using DNBSEQ eukaryotic transcriptome protocols; PE150 sequencing by Innomics. Raw reads filtered with SOAPnuke; QC by FastQC. Alignment to sorghum reference genome v3.3.1 with STAR; expression quantified as FPKM using StringTie. Expressed genes defined as average FPKM ≥ 1 in at least one sample and ≥2 mapped reads in each of two replicates. Two biological replicates per sample (45 samples total). qRT-PCR validation performed for selected genes (PP2A reference gene) with three biological replicates. Data analysis: PCA via prcomp; hierarchical clustering via k-means (pheatmap), elbow method to select cluster number; expression transformed as log2(FPKM+1) and normalized to maximum per gene for plots. Functional enrichment via KEGG/ShinyGO (hypergeometric, FDR < 0.05). Co-expression networks: STRING database; fuzzy c-means clustering (Mfuzz v2.42) with 12 clusters, fuzzifier 2.01; genes with membership ≥ 0.5 used for enrichment. Tissue-specific (TS) genes identified using 42 non-seed sorghum RNA-seq datasets and a TS scoring algorithm (TS score > 0.5). Statistical reproducibility: RNA-seq two replicates; metabolomics five; qPCR three; GO analyses via Fisher’s Exact test with FDR correction.
Key Findings
- Morphology and storage onset: Starch granules were first visible at 5 dpa by cryo-SEM, increasing in number and size thereafter. Kafirin 1 and 2 appeared at low levels between 5–10 dpa and increased from 15–25 dpa. At 25 dpa, kafirin 1 accounted for ~84% of total protein and kafirin 2 for ~16%. - Metabolome overview: Of 7,959 detected peaks, 2,073 metabolites were identified; 955 mapped to pathways across 13 categories. Top enriched categories: secondary metabolite biosynthesis (20.23%), amino acid metabolism (18.75%), lipid metabolism (13.98%), carbohydrate metabolism (13.09%). Pathway dynamics indicated starch biosynthesis initiates at 5 dpa, with a shift to protein biosynthesis/degradation after 15 dpa. Across development, 1,495 compounds showed differential accumulation. Early up-regulated metabolites (10 vs 5 dpa) involved fatty acid, linoleic acid, sugar metabolism, lysine biosynthesis; down-regulated included flavanol biosynthesis and pentose phosphate pathway. Later comparisons (20, 25 vs 5 dpa) emphasized alanine, aspartate, glutamate, flavonoid, and linoleic acid biosynthesis. 189 metabolites were consistently up-regulated and 234 consistently down-regulated across stages. - Transcriptome landscape: 218.8 million high-quality reads (avg 23.78M per replicate) with strong replicate correlation (avg R^2 = 0.976). 21,971 genes were expressed (FPKM ≥ 1). More genes expressed in early whole seed and early endosperm than at later stages. Embryo exhibited higher median expression and average FPKM (10–25 dpa) than endosperm. Tissue-specific genes: 2,049 specific to early whole seed (1–9 dpa), 795 embryo-specific (enriched in embryogenesis pathways), and 397 endosperm-specific (enriched in metabolic and MAPK signaling). PCA separated tissues and developmental phases; embryo clusters reflected morphogenesis then maturation; endosperm clusters aligned with milk, soft dough, and hard dough phases. - Programmed cell death and ethylene: Ethylene biosynthesis-related genes (e.g., SAM genes Sobic.003G151600, Sobic.009G033600) peaked early (6–10 dpa) then were downregulated, suggesting ethylene as a negative regulator of grain filling and PCD. Maize PCD regulator orthologs (e.g., ZmDEK40, ZmDEK664, ZmATR, ZmATM) showed similar downregulation trends. - Storage compound gene expression: Starch biosynthesis was most active between 5–15 dpa; protein (kafirin) biosynthesis predominated 15–25 dpa. Kafirin transcripts comprised 44.77% of total endosperm transcripts (6–25 dpa), rising from 24.67% (6–15 dpa) to 62.16% (16–25 dpa). By type across endosperm development: α-kafirins 34.20% of transcripts, γ-kafirins 6.99%, β-kafirins 3.29%, δ-kafirins 0.277%. Expression timing: β- and δ-kafirins largely 20–25 dpa; α-kafirins 15–25 dpa. - Network modules and hub genes: Fuzzy c-means clustering of 20,491 expressed endosperm genes (FPKM ≥ 1) yielded 12 modules. Modules 8 and 12 were enriched (FDR < 0.05) for starch biosynthesis genes; GO terms included TCA cycle, ribosome biogenesis, oxidative phosphorylation, DNA replication, starch/sucrose metabolism. 361 hub genes (degree ≥ 5; module membership > 0.8) were identified, including SDHB (Sobic.007G023400) and starch branching genes SbPHOL (Sobic.001G083900) and SbPHOH (Sobic.003G358600). Modules 4 and 10 were enriched for kafirin genes (FDR < 0.05): module 4 contained 15 α-kafirin genes; module 10 had six α- and two γ-kafirins; β- and δ-kafirin genes appeared in modules 2 and 5, respectively. From modules 4 and 10, 207 hub genes linked to kafirin biosynthesis were identified, enriched in lipid metabolism, fatty acid degradation, amino acid biosynthesis/degradation, MAPK signaling, carotenoid biosynthesis, and hormonal signaling. Notable hub categories included extra-large GTP-binding proteins and bZIP transcription factors. - Tissue-specific (TS) genes and TFs: 499 TS genes (including 41 TFs) were identified across early whole seed, embryo, and endosperm. Counts: embryo 127 (14 TFs), endosperm 71 (6 TFs), early whole seed 79 (12 TFs). Early whole seed TS genes were linked to cell wall biosynthesis/structure; embryo TS genes predominated in later stages; endosperm TS genes were expressed throughout and related to RNA processing/regulation for storage programming. Many TS genes/TFs belonged to seed regulators (WOX, NF-YB, NAC, ERF, AP2, MYB), implicating regulatory roles in seed maturation and ABA-mediated responses.
Discussion
The study delineates stage- and tissue-specific molecular programs underlying sorghum seed development and storage reserve accumulation. Integrating metabolomics with transcriptomics confirmed that starch accumulation begins by 5 dpa, preceding a shift toward protein synthesis after ~15 dpa, consistent with SEM and kafirin quantification. Endosperm-centered co-expression networks and hub genes clarify the regulatory architecture coordinating energy metabolism (TCA cycle, oxidative phosphorylation), cell cycle/replication, and biosynthetic pathways for starch and storage proteins. The temporal increase of kafirin transcript abundance and distinct module partitioning of kafirin gene types provide mechanistic insight into protein accumulation and crosslinking phases, which influence end-use quality. Downregulation of ethylene biosynthesis genes and PCD regulators across endosperm development suggests ethylene as a negative regulator of grain filling, aligning with known roles in cereal PCD. The identification of 361 starch- and 207 protein-associated hub genes, along with 499 tissue-specific genes (41 TFs), offers candidate targets for breeding and functional studies to modulate carbon partitioning and improve grain quality in sorghum.
Conclusion
This work provides a comprehensive spatiotemporal map of transcriptome and metabolome dynamics in sorghum seed development for the reference line BTx623, defining key transitions from early cellular growth to storage reserve accumulation. It identifies regulatory modules and 568 hub genes associated with starch and kafirin biosynthesis in the endosperm, validates developmental timing of starch (≈5 dpa) and protein (≈15–25 dpa) accumulation, and catalogs tissue-specific genes and TFs likely controlling seed maturation. These resources establish a baseline for leveraging natural genetic variation and future pangenome analyses to improve grain quality traits. Future research should examine diverse sorghum genotypes and environments to connect allelic variation in hub genes to differential carbon allocation, metabolite profiles, and adaptation, and functionally validate candidate regulators for breeding climate-resilient, high-quality grain.
Limitations
Findings are based on a single sorghum inbred line (BTx623) grown in one field environment, which may limit generalizability across diverse germplasm and conditions. RNA-seq analyses used two biological replicates per timepoint, potentially reducing power for detecting subtle effects. Tissue separation became challenging at 25 dpa, constraining late-stage sampling. Functional roles of identified hub genes and TFs were inferred from co-expression and enrichment rather than experimentally validated; causal relationships remain to be tested. Metabolomics was untargeted and limited to five developmental stages, which may miss transient metabolite dynamics.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny