logo
ResearchBunny Logo
Introduction
Precise modulation of plant traits is crucial for crop improvement. While biallelic variations (SNPs, Indels, PAVs) partially explain heritability, multiallelic variants, like tandem repeats (TRs), are a significant yet often overlooked source of complex trait heritability. TRs are DNA stretches with tandemly repeated nucleotide sequences, classified as STRs (microsatellites) or VNTRs (minisatellites) based on repeat unit length. TR variations are abundant, highly unstable, and typically multiallelic. Studies in various species, including humans, pigs, and plants, demonstrate their role in mediating gene expression. In rice, TR variations have been identified near genes like *OsSPL13*, *IPA1*, and *FZP*, influencing grain length, stem width, panicle branching, and chilling tolerance. However, systematic genome-wide analyses of TR variations and their functional contributions are lacking due to technical limitations in accurately identifying genome-wide TR polymorphisms, particularly those absent from the reference genome. Traditional TR genotyping methods, relying on short-read mapping to a reference genome, suffer from reference bias and difficulty in accurately determining repeat numbers in repetitive regions. Long-read sequencing technologies and high-quality genome assemblies provide a unique opportunity to overcome these limitations and systematically study TR polymorphisms at a population scale.
Literature Review
Existing literature highlights the significant role of genetic variation in influencing gene expression and ultimately, phenotypic traits in plants. Studies have successfully identified and characterized biallelic variants like SNPs, Indels, and PAVs responsible for variations in important agronomic traits. However, these studies often overlook the contribution of multiallelic variants, particularly tandem repeats (TRs). Previous research has demonstrated the involvement of TRs in gene expression regulation in various organisms. In rice, specific examples like *OsSPL13*, *IPA1*, and *FZP* show how TR variations in their regulatory regions modulate gene expression levels and influence grain length, plant architecture, and yield. While these studies provide valuable insights, they often lack the population-scale, genome-wide perspective necessary to fully understand the overall impact of TRs on rice genetic diversity and phenotypic variation. The limitations of short-read sequencing in accurately identifying and genotyping TRs, especially those absent from the reference genome, have hampered comprehensive studies. This research aims to address this gap by leveraging long-read sequencing and pan-genome analysis to comprehensively identify and functionally characterize TR variations in rice.
Methodology
This study utilized 231 rice genome assemblies, including the Nipponbare reference genome and 230 assemblies based on Oxford Nanopore Technologies (ONT) long reads. Tandem repeats (TRs) were annotated using RepeatMasker, Tandem Repeats Finder (TRF), and ULTRA. These annotations were integrated into a previously constructed rice pan-genome graph to identify polymorphic TR loci and create a pan-TR dataset. The quality of the pan-TR dataset was validated using BAC clone sequences and manual evaluation of selected TR loci through multiple sequence alignments and PCR/Sanger sequencing. The characteristics of the pan-TR dataset, including STR and VNTR distribution, repeat motif length, allele numbers, and allele frequencies, were analyzed. The distribution of TR variations relative to annotated genes and bi-allelic variants (SNPs, Indels, PAVs) was assessed to investigate linkage disequilibrium (LD). Transcriptomic data from 193 panicles and 202 young leaves were used to perform a genome-wide eQTL analysis, identifying associations between TR variations and the expression levels of nearby genes. Conditional analyses were conducted to determine the independent contribution of TR variations to gene expression, controlling for the effects of nearby bi-allelic variants. Genome-wide association studies (GWAS) were performed to investigate the association of TR variations with agronomic traits like plant height and grain width. Bayesian fine-mapping using susieR was employed to pinpoint causal variants, and colocalization analysis using the 'coloc' package assessed the shared causality between TR variations and phenotypes. CRISPR-Cas9 and CRISPR-Cas12a systems were utilized to generate knock-out and TR copy number editing lines for functional validation of selected eTRs. Real-time PCR and dual-luciferase reporter assays provided further experimental validation of gene expression changes.
Key Findings
The study identified 227,391 multiallelic TR loci in 231 rice genome assemblies, with 54,416 loci absent from the Nipponbare reference genome. Only about one-third of TR variations showed strong linkage with nearby bi-allelic variants. Analysis of 193 panicle and 202 leaf transcriptomic datasets revealed 485 and 511 TRs acting as eQTLs independently of other bi-allelic variations for nearby gene expression, respectively. A substantial proportion of eGenes (1392 in panicles, 1049 in leaves) were exclusively associated with TRs. For genes associated with both TRs and bi-allelic variants, conditional analyses showed that 485 panicle and 511 leaf TR-gene pairs maintained the same directional effects after controlling for the lead bi-allelic variants. The study demonstrated that TR variations in the *OsPRR1* promoter region were associated with plant height, and experimental validation using CRISPR-generated knockout lines confirmed this relationship. GWAS for grain width identified a significant peak associated with TR variants on Chr6, independent of bi-allelic variants. Colocalization analysis revealed that a TR variant in the promoter of *TRGW6* (LOC_Os06g03850) was causally linked to both gene expression and grain width. Functional validation using CRISPR-generated knockout and TR copy editing lines confirmed the effect of this TR on *TRGW6* expression and grain width.
Discussion
This study provides compelling evidence for the widespread contribution of multiallelic TR variations to rice gene expression and phenotypic diversity. The findings highlight the limitations of relying solely on biallelic variants in understanding the genetic basis of complex traits. The identification of numerous eQTLs exclusively associated with TRs underscores their unique and significant role in gene regulation. The independent contributions of TRs to gene expression, even after accounting for the effects of nearby bi-allelic variants, emphasize their importance as independent regulators. The functional validation using CRISPR-based gene editing further strengthens the causal relationship between specific TR variations and both gene expression and phenotypic traits. These findings demonstrate the potential of targeting eTRs for precise genome editing to fine-tune key agronomic traits and optimize rice yield and quality.
Conclusion
This research provides a comprehensive analysis of tandem repeat variations in rice, highlighting their significant role in regulating gene expression and influencing agronomic traits. The use of long-read sequencing and pan-genome analysis overcame previous technical limitations, enabling the identification of numerous TRs that act as independent eQTLs and contribute to phenotypic variation. Functional validation through CRISPR-based gene editing confirmed the causal link between specific TRs and phenotypic traits. This study establishes a foundation for future research focused on utilizing TR variations for precise genome editing to optimize rice breeding strategies for improved yield and quality. Future research could explore the epigenetic mechanisms underlying TR-mediated gene expression regulation and expand the investigation to other important agronomic traits.
Limitations
While this study represents a significant advancement in understanding the role of TRs in rice, certain limitations should be considered. The analysis was limited to TRs with repeat units shorter than 4kb due to computational constraints. Furthermore, although multiple methods were used to identify TRs, the possibility of missing extra-long VNTRs still exists. While the sample size used for transcriptome sequencing is substantial, an even larger sample size might uncover rare TR variants. Future work could investigate the potential sigmoidal relationship between TR copy number and gene expression, which may reveal additional TRs affecting gene expression and phenotypes. Finally, while functional validation was performed for selected eTRs, further experimental evidence is needed to confirm the effects of a wider range of TR variations on rice breeding outcomes.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny