Biology
Subtelomeric assembly of a multi-gene pathway for antimicrobial defense compounds in cereals
Y. Li, A. Leveau, et al.
Explore the groundbreaking research on the avenacin biosynthetic gene cluster in oat, revealing its unique origin and structure. This study, conducted by esteemed researchers, sheds light on the adaptive evolution and genome plasticity that led to the emergence of this defense compound pathway. Discover how this cluster's innovative organization contributes to plant resilience.
~3 min • Beginner • English
Introduction
Oat (Avena spp.) belongs to the Aveneae tribe, which diverged from the Triticeae (containing wheat and barley) around 30 million years ago, and from the Panicoideae (maize and sorghum) around 50–60 million years ago. A distinctive feature of oat is the ability to produce antifungal specialised metabolites (avenacins) that are synthesised in the roots and provide protection against soil-borne diseases such as take-all, a major cause of yield loss in wheat. Previous mutagenesis in diploid oat (Avena strigosa S75) yielded ~100 avenacin-deficient mutants, and genetic analysis indicated that the loci defined by mutation were clustered. Ten avenacin pathway genes were cloned and characterized, five of which were contiguous on a ~300 kb BAC contig, with the remaining five genetically linked to this contig, though their precise physical relationship and the full extent of clustering were unknown. Understanding the organisation and evolution of the avenacin cluster would provide insights into the origins of metabolic diversity in grasses and inform opportunities to engineer other cereals for enhanced disease resistance.
In addition to the avenacin cluster, biosynthetic gene clusters for a wide variety of natural products have been reported across diverse plant species. How clusters of non-homologous yet functionally related genes arise—presumably in response to selective pressures—remains an open question. Deciphering the mechanisms of cluster formation and the significance of clustering is key to understanding how genome organisation influences the evolution of complex adaptive traits in eukaryotes.
Here, a genomics-driven approach is used to investigate the nature and origin of the avenacin cluster in diploid oat, revealing that a 12-gene cluster has formed de novo in a subtelomeric region of chromosome 1 lacking homology with other grasses. The cluster shows approximate colinearity between gene order and biosynthetic steps, with early pathway genes located nearest the telomere. Because mutations in late pathway steps lead to accumulation of toxic intermediates, such organisation may mitigate against ‘self-poisoning’ resulting from telomeric deletion events.
Literature Review
Plant biosynthetic gene clusters (BGCs) for specialized metabolites, including compounds of agronomic and pharmaceutical importance, are now recognized in diverse plant species. Prior work identified and genetically mapped multiple avenacin pathway genes in oat, with evidence of clustering on a BAC contig and genetic linkage of additional pathway genes. More broadly, the existence of plant BGCs raises questions about their origin—distinct from horizontal gene transfer—and suggests recruitment and neofunctionalisation of genes from elsewhere in the genome under selective pressures. Subtelomeric regions in eukaryotic genomes have been proposed to facilitate gene recombination and transposon insertions and to serve as hotbeds for new gene origination, potentially fostering cluster assembly. Chromatin-level regulation is implicated in plant BGC expression, and subtelomeric positioning may influence gene expression gradients, as demonstrated in other organisms (e.g., Candida glabrata).
Methodology
Plant material and growth: Avena strigosa accession S75 plants were grown under controlled conditions and outdoors prior to tissue collection for DNA and RNA extraction.
Genome sequencing and assembly: High-molecular-weight genomic DNA was extracted from leaves. Oxford Nanopore PromethION long reads (428 Gb passed reads; read length N50 ~33.5 kb) were generated and corrected with Canu (v1.6). De novo assembly used SMARTdenovo (and tests with wtdbg2), yielding a 3.50 Gb contig assembly (contig N50 4.77 Mb) after three rounds of Illumina-based polishing with Pilon. Bionano Genomics Direct Label and Stain (DLS) optical maps (331.5 Gb single-molecule data; map N50 39.46 Mb) were used for hybrid scaffolding, producing 289 scaffolds totaling 3.53 Gb (scaffold N50 73.36 Mb). Hi-C libraries (DpnII digestion) were prepared and sequenced on Illumina to produce chromosome-length scaffolds using Juicer and 3D-DNA pipelines; the seven largest scaffolds represented the seven haploid chromosomes.
Genome annotation and quality assessment: RNA-seq from six tissues (root, root tips, leaf, panicle, shoot, spikelet) supported EvidenceModeler-based annotation integrating ab initio, homology, and transcript evidence, resulting in 39,885 high-confidence and 36,816 low-confidence gene models. BUSCO (embryophyta_odb10) indicated 95.5% completeness. Repeats were annotated with RepeatModeler/RepeatMasker; LTR_retriever produced an LAI of 11.51; total repetitive content was 81.1%. Synteny with wheat progenitor genomes was assessed by MCscan.
Pathway gene identification and functional validation: The avenacin cluster region was located on subtelomeric chromosome 1 using the assembly and validated by BAC sequence mapping and DNA FISH. Candidate cytochrome P450s (CYP94D65, CYP72A476) were identified within the cluster based on co-expression and position. Transient expression in Nicotiana benthamiana via Agrobacterium infiltration used Golden Gate-assembled multigene constructs, including CPMV-HT UTRs and a p19 suppressor. Co-expression assays tested enzyme functions and pathway completion. Metabolites were analyzed by HPLC with fluorescence detection and LC-MS (dual ESI/APCI) and by NMR for purified intermediates. 23-hydroxy-β-amyrin was purified by solvent extraction, ion-exchange cleanup, flash chromatography, and recrystallization; structure was confirmed by 1H/13C NMR and 2D experiments.
Cytogenetics: Karyotyping and DNA fluorescence in situ hybridisation (FISH) were performed on mitotic metaphase and meiotic pachytene chromosomes using probes for Sad1 (bAS1), Sad3 (TG1), telomeres, and rDNA to localize cluster genes relative to telomeres. Chromosome 1 was flow-sorted and sequenced to validate assembly placement.
Comparative and cluster analyses: Collinearity and synteny analyses compared A. strigosa with Brachypodium, rice, barley, and wheat genomes; synteny breakpoints were mapped at the cluster borders. Divergence times were estimated from Ks of single-copy orthologs. plantiSMASH identified biosynthetic gene clusters genome-wide; cluster density was computed in 100 Mb sliding windows and co-expression assessed by Pearson correlation. Avenacin cluster complexity and gene family composition were compared across grasses. Related cluster regions in A. atlantica, A. eriantha, and hexaploid A. sativa were examined for gene content, expression patterns, and evolutionary dynamics.
Metabolite profiling across Avena species: Root and leaf extracts from multiple Avena accessions were analyzed by TLC, LC-MS, and high-resolution LC-MS to determine tissue specificity of avenacin production.
Key Findings
- A 12-gene avenacin biosynthetic gene cluster was resolved and mapped to a subtelomeric region at the end of the long arm of chromosome 1 in Avena strigosa.
- Two previously uncharacterized CYP enzymes within the cluster complete the pathway: CYP94D65 catalyzes C-23 hydroxylation of β-amyrin (validated by LC-MS and NMR of 23-hydroxy-β-amyrin), and CYP72A476 introduces the C-30 aldehyde, a step dependent on prior C-3 glycosylation.
- The entire avenacin A-1 pathway was reconstituted in Nicotiana benthamiana via transient expression, yielding a product co-eluting and mass-matching an avenacin A-1 standard with strong UV autofluorescence.
- Gene order in the cluster is approximately colinear with biosynthetic steps: early enzymes (e.g., bAS1/Sad1, CYP51H10/Sad2) positioned closer to the telomere and late glycosylation genes (UGT91G16, TG1/Sad3) positioned distally.
- Cytogenetic mapping (DNA FISH) confirmed co-localization of Sad1 and Sad3 near the telomere of chromosome 1; pachytene spreads showed close proximity with partial overlap to telomeric signals in most cells.
- Comparative genomics shows a breakdown of synteny with Brachypodium, rice, barley, and wheat precisely at the cluster boundary, indicating de novo cluster formation in oat since divergence (estimated Ks-based divergence: ~28–33 mya with Triticeae/Brachypodium; ~53–65 mya with rice/sorghum/maize).
- plantiSMASH predicted 83 biosynthetic gene clusters genome-wide; the terminal 100 Mb of chromosome 1 is a hotspot containing 19 predicted clusters (17 with ≥3 co-expressed genes), with the highest normalized cluster density observed across analyzed grass genomes.
- The avenacin cluster exhibits high complexity relative to other triterpene clusters in grasses, with greater gene number and gene family diversity.
- Related cluster regions occur in A. atlantica (syntenic, root-expressed) and A. eriantha (expanded region on chromosome 6 with broader expression including aerial tissues); C-genome oats (e.g., A. eriantha) produce avenacins in leaves, unlike A-genome oats.
- Genome assembly and annotation metrics: assembled size 3.53 Gb; scaffold N50 73.36 Mb; contig N50 4.77 Mb; 39,885 high-confidence genes (87.6% functionally annotated); BUSCO completeness 95.5%; LAI 11.51; repeat content 81.1%.
Discussion
The study delineates the complete avenacin A-1 biosynthetic pathway and demonstrates that the responsible genes are organized in a subtelomeric cluster that arose de novo in oat. The approximate colinearity between gene order and biosynthetic steps suggests potential evolutionary and regulatory advantages, including coordinated expression and mitigation of phytotoxic intermediate accumulation: if telomeric deletions occur, early pathway genes would be lost before late glycosylation steps (UGT91G16, TG1/Sad3), reducing risk of ‘self-poisoning’. The subtelomeric environment—prone to recombination and transposon activity—likely facilitated recruitment and neofunctionalisation of genes into a cluster. Chromatin dynamics may further contribute to pathway regulation, consistent with reported chromatin involvement in other plant metabolic clusters. The discovery that C-genome oats accumulate avenacins in leaves expands the ecological role of these saponins and suggests lineage-specific rewiring of pathway expression. The cluster hotspot at the chromosome 1 terminus reflects a broader genomic context that favors assembly of complex metabolic loci in oat. Collectively, these findings link genome architecture to adaptive metabolic evolution and provide a blueprint for engineering disease resistance traits into other cereals.
Conclusion
This work provides a chromosome-scale Avena strigosa genome assembly and uses it to resolve a 12-gene subtelomeric avenacin biosynthetic cluster. The two missing enzymes were identified (CYP94D65 and CYP72A476), and the full pathway was functionally reconstituted in Nicotiana benthamiana to produce avenacin A-1. Comparative analyses show that the cluster formed de novo in a region lacking synteny with other grasses, with gene order broadly mirroring pathway sequence, potentially minimizing deleterious accumulation of toxic intermediates after telomeric deletions. The subtelomeric terminal region is a biosynthetic cluster hotspot in oat. These insights advance understanding of plant BGC evolution and open avenues to engineer avenacin-mediated disease resistance into crops such as wheat.
Future directions include elucidating the molecular mechanisms and chromatin dynamics underlying cluster assembly and regulation, testing the proposed ‘self-poisoning’ mitigation model experimentally, leveraging the assembled pathway for metabolic engineering in cereals, and exploring the ecological and evolutionary drivers of tissue-specific pathway expression across Avena lineages.
Limitations
Despite a high-quality assembly (LAI 11.51, BUSCO 95.5%), gaps between the three scaffolds spanning the subtelomeric cluster region could not be bridged by optical mapping, likely due to repetitive elements at scaffold ends. FISH mapping on pachytene chromosomes could not unambiguously resolve the precise order of Sad1 and Sad3 relative to the telomere due to close proximity and signal overlap. The inference of de novo cluster formation is based on absence of synteny and sequence divergence patterns rather than direct historical reconstruction. Mechanistic details of cluster assembly, regulatory chromatin states, and the proposed mitigation of ‘self-poisoning’ via gene order remain to be experimentally validated. Findings from A. strigosa and related Avena species may not fully generalize across all grasses.
Related Publications
Explore these studies to deepen your understanding of the subject.

