Agriculture
Comprehensive genomic resources related to domestication and crop improvement traits in Lima bean
T. Garcia, J. Duitama, et al.
This research by Tatiana Garcia, Jorge Duitama, and their colleagues presents a groundbreaking chromosome-level genome assembly for Lima bean (*Phaseolus lunatus* L.), unveiling disease resistance genes and revealing critical insights into *Phaseolus* evolution. With comparative genomics highlighting significant synteny and chromosomal rearrangements, this work also identifies wild Lima bean clusters vital for climate-resilient breeding.
~3 min • Beginner • English
Introduction
The study addresses how domestication and adaptation have shaped the genome and agronomic traits of Lima bean (Phaseolus lunatus), a key Phaseolus crop with broad ecological adaptability and tolerance to heat and drought. The purpose is to generate comprehensive genomic resources specific to Lima bean—overcoming limitations of relying solely on the common bean (P. vulgaris) reference—to better understand domestication-related convergent evolution and to support breeding for climate resilience. The context includes three major wild gene pools (MI, MII, AI), at least two domestication events (Mesoamerica and Andes), and convergent domestication traits such as larger pods and seeds, reduced pod dehiscence and seed dormancy, and changes in growth habit and antinutritional compounds. A high-quality P. lunatus genome, coupled with population genomics and functional genomics, is needed to dissect the genetic bases of these traits and to enable precise breeding and evolutionary studies.
Literature Review
Prior work established Lima bean and common bean as closely related, highly syntenic diploids (2n=22) with extensive use of the P. vulgaris reference for Lima bean genetics. Cytogenetic studies reported high macro-collinearity between species, but reliance on P. vulgaris risks reference bias for diversity analyses and trait locus prediction. Previous studies resolved wild P. lunatus into three main gene pools (MI, MII, AI) with at least two domestication events (Andean from AI leading to Big Lima types; Mesoamerican from MI leading to Potato and Sieva types). Convergent domestication traits and broad abiotic stress tolerance (notably heat and drought) highlight Lima bean’s breeding value. Existing common bean genomic resources (reference genomes and QTL studies) informed expectations for synteny and candidate loci (e.g., TFL1 for determinacy, PDH1 for pod dehiscence). However, a dedicated P. lunatus reference and species-wide population genomics had been lacking, motivating this study.
Methodology
- Genome sequencing and assembly: Genomic DNA from domesticated accession G27455 (Mesoamerican MI) was sequenced using Pacific Biosciences long reads and Illumina short reads, plus 10x Genomics linked reads. An initial Canu v1.6 PacBio assembly (496 contigs; 542 Mbp; N50 5.5 Mbp) was polished iteratively with Illumina reads using NGSEP (FindVariants/VCFIndividualGenomeBuilder). Scaffolding leveraged 10x linked reads (bwa v0.7.17; custom graph approach akin to Salsa) to combine 164 contigs into 40 scaffolds. A dense genetic map from a biparental UC 92 × UC Haskell population (GBS-derived 10,497 SNPs across 522 loci; genetic length 1064 cM) anchored and oriented 206 contigs into 11 pseudomolecules totaling 512 Mbp. Quality checks: 99% Illumina read mapping; BUSCO completeness 98.8% of 1614 conserved plant single-copy genes.
- Repeat annotation: RepeatMasker v4.0.5 with a P. vulgaris TE library identified 656,928 elements covering 225 Mbp (41%): LTRs (174 Mbp), other class I (8 Mbp), class II DNA transposons (25 Mbp), unclassified (6 Mbp), enriched in pericentromeres.
- Gene annotation and functional annotation: Combined ab initio, evidence-based (RNA-seq from leaf, flower, pod; two developmental stages; wild G25230 and domesticated G27455), and homology-based approaches via MAKER v2.31.9; StringTie/HISAT2 Tuxedo pipeline and Trinity de novo assemblies supported models. Final set: 28,326 gene models, 35,881 transcripts; 21,642 genes with GO; 19,554 with KEGG; 22,634 (80%) with functional annotations; expression evidence in ≥1 dataset for 26,295 (93%); orthology within P. vulgaris synteny blocks for 22,180 (78%).
- Comparative genomics and duplication analyses: Paralog clustering (NGSEP GenomesAligner) identified 3499 families; 1647 genes from ancient Fabaceae WGD; 7285 intrachromosomal duplications (5849 after filtering highly repeated). Calculated Ks and Ka/Ks for WGD vs local duplicates and orthologs (P. vulgaris, Vigna unguiculata) to infer ages and selection. Synteny mapping revealed structural rearrangements relative to P. vulgaris.
- QTL mapping: UC 92 (Andean, large-seeded, bush-type) × UC Haskell (Mesoamerican, small-seeded, vine-type) RILs advanced to F5 by single-seed descent; GBS with CviAI; genetic map built with ASMap/R/qtl. Phenotyping: determinacy, flowering time (two field locations), hundred-seed weight (HSW), cyanogenesis (Fiegl-Anger method). QTL mapping with R/qtl (scanone, cim, makeqtl/fitqtl) and permutation-based thresholds.
- Population genomics: GBS of 482 accessions (267 wild, 215 domesticated) from CIAT and CICY; reads mapped with bowtie2; variants called with NGSEP; stringent filters yielded 12,398 high-quality biallelic SNVs (5.3% missing). Structure inference: STRUCTURE (K=1–10, Evanno), DAPC (Adegenet), NJ (Darwin), fineSTRUCTURE with Beagle phasing. LD decay (TASSEL v5) estimated by r² vs distance. Diversity statistics (Ho, He, Fis, FST) via Adegenet/Hierfstat. Introgression analysis using NGSEP window-based haplotype similarity with ANGSD D-statistics; validated shared segments by regional NJ trees.
- RNA-seq differential expression: Pods sampled at T1 (initiation of pod elongation) and T2 (pre-seed filling maximum pod size) from one wild (shattering) and one domesticated (non-shattering) accession, three replicates per stage. Reads processed with HISAT2/StringTie; counts analyzed with DESeq2 (p<0.05, |log2FC|>1). De novo transcript quantification with Salmon for bias assessment; functional enrichment with topGO/REVIGO/Cytoscape.
- Resistance gene mining: Manual curation of domain-containing R-gene candidates (CC, NB-ARC, TIR, LRR, WRKY, LZ, kinase) using BLAST, Pfam, and KEGG/GO; physical mapping and NJ tree of LRR-type genes (MUSCLE, MEGA X, iTOL).
Key Findings
- Genome assembly and annotation: Produced 11 chromosome-level pseudomolecules totaling 512 Mbp from 206 contigs; high base accuracy (99% Illumina mapping) and BUSCO completeness (98.8%). Repeats span 41% (225 Mbp), dominated by LTRs, enriched in pericentromeres. Predicted 28,326 genes (35,881 transcripts); 80% functionally annotated; 93% with expression evidence; 78% with P. vulgaris orthologs in synteny.
- Comparative genomics and structural variation: High collinearity with P. vulgaris; identified five major rearrangements: inversion of short arm of P110; translocation of pericentromeric Pv02 into short arm of P102; 5 Mb inversion on long arm P103; 10 Mb inversion on long arm P107; complex translocation on short arm P109. Ks distribution of 22,180 orthologs centers ~0.05, dating P. lunatus–P. vulgaris speciation ~6 MYA; Phaseolus–Vigna split ~15 MYA. Ka/Ks indicates predominant purifying selection, but 656 ortholog pairs show Ka/Ks>1 (enriched in aminoglycan, chitin, lignan metabolism).
- Gene duplication dynamics: 1647 genes from ancient Fabaceae WGD; 7285 intrachromosomal duplications (more recent; lower Ks) with higher Ka/Ks than WGD paralogs; 12% of local duplicates with Ka/Ks>1, enriched in immune-related processes (cell death, signaling), lipid transport, lignan metabolism.
- Resistance gene repertoire: 1917 candidate genes with resistance-associated domains across 11 chromosomes. Abundant receptor-like kinases (828), WRKY (91), LZ (74); NB-ARC (151), LRR (631), both NB-ARC+LRR (91); 98 TNLs; relatively few CNLs (11), consistent with dicot patterns. Clusters particularly on P102, P104, P108, P110, P111; LRR genes show discrete genomic clusters and sequence clustering; many positions collinear with common bean R-loci for anthracnose, angular leaf spot, halo blight, BGGMV, rust, and mildew.
- QTL mapping (UC 92 × UC Haskell RILs):
• Determinacy: Major QTL on long arm of P101 explaining 78% of phenotypic variance; candidate gene PITFL1 (ortholog of Arabidopsis TFL1; PvTFL1 at 45 Mb on Pv01; PITFL1 at 41 Mb on P101).
• Flowering time: Major QTL on P101 (~30% variance), likely PITFL1-related; additional loci implied by transgressive segregation.
• Hundred-seed weight: Four minor additive QTL collectively explain 28%; one on long arm of P110 explains 11%.
• Cyanogenesis (floral buds): Major QTL on long arm of P105 explains 93%; together with two minor QTLs, total 97.5%; epistasis observed—UC 92 allele at P105 (loss of cyanogenic glucosidase) masks effects at P108 and P110; candidate glucosidase homolog to cyanogenic β-glucosidase in white clover.
- Population structure and diversity (482 accessions; 12,398 SNVs): Optimal K=6; confirms MI, MII, AI and identifies domesticated MI as a distinct cluster and supports a differentiated Andean wild AII gene pool (central Colombia). Substructure: wild MI splits into NW vs SW Mexico; wild MII splits into Mexico/southern Guatemala vs Yucatan/Guatemala-north/Costa Rica/Colombia; domesticated MI splits into Mexico–Central America, South America, and Yucatan Peninsula subgroups (the latter further grouping by traditional landrace types). LD decay: wild ~100 kbp to baseline, domesticated ~500 kbp; overall LD at Mb scale remains >0.3 (wild) and >0.4 (domesticated) due to structure. Diversity: Ho << He (selfing species); wild MI and MII most diverse (He=0.128, 0.133), AI and AII least (He=0.040, 0.052). Domestication reduced diversity by ~25% overall; within-gene-pool reduction severe for MI landraces (55%); MI Yucatan He=0.029 vs other MI landraces He=0.065. FST: MI wild closest to MI landraces (FST=0.33); AI wild closest to AI landraces (FST=0.21). AII more related to Mesoamerican pools (FST 0.56–0.65) than to AI (FST=0.86) by pairwise FST; fineSTRUCTURE suggests closer relationship to AI.
- Introgression: Detected 103 chromosomal segments (1–53 Mb) across 35 accessions (9 wild, 26 domesticated), mostly within Mesoamerican (MI↔MII) or within Andean (AI↔AII) pools; rarer Mesoamerican–Andean exchanges. Notable shared MII segments in MI landraces: 2.6 Mb on P107 in six Colombian MI landraces; 2.2 Mb on P108 in seven MI landraces from Central America and Colombia.
- Pod development transcriptomics: 4275 DEGs across accessions/stages. PDH1 expression increases from T1 to T2; at T2 higher in wild (not statistically significant due to one low replicate). Genes upregulated in both accessions from T1→T2 enriched in cell wall biogenesis/organization (xylan, lignin). Wild-only T1→T2 upregulation enriched for lignan, chitin, aminoglycan metabolism and fruit ripening, driven by gene clusters on P109 (chitin) and P111 (lignan). Domesticated-biased T2 expression enriched for reproductive/fruit development. Genes downregulated T1→T2 enriched for cuticle development and cyanogenic glycoside metabolism, consistent with developmental transitions.
Discussion
By generating a high-quality, chromosome-scale P. lunatus reference genome integrated with dense genetic mapping, comparative genomics, population genomics, QTL mapping, and developmental transcriptomics, the study directly addresses gaps caused by reliance on the P. vulgaris reference. The results refine understanding of Phaseolus genome evolution (high synteny with discrete major rearrangements and a speciation timeframe ~6 MYA) and illuminate the roles of ancient WGD and recent local duplications, particularly the rapid evolution of immune-related gene families. The extensive catalog of resistance-related genes and their clustered genomic organization, often collinear with known common bean resistance loci, provides immediate targets for disease-resistance breeding.
QTL analyses connect genotype to phenotype for key domestication and agronomic traits: determinacy and flowering time implicate PITFL1 on P101; seed weight shows polygenic architecture suited to marker-assisted pyramiding; cyanogenesis is largely controlled by a major locus (P105) with epistatic interactions, informing strategies balancing consumer safety and potential pest resistance. Population genomic analyses reveal finer substructure than previously recognized, including a distinct Andean AII gene pool in central Colombia and multiple domesticated MI subgroups, with clear geographic associations. Evidence of a severe domestication bottleneck in Mesoamerica (especially MI landraces and Yucatan subgroup) highlights conservation needs and opportunities for targeted introgression from wild relatives. Limited, yet detectable, introgression across gene pools and recurrent shared segments suggest both historical and recent gene flow that can be harnessed for breeding. Transcriptomic shifts across pod development stages, including PDH1 dynamics and enrichment in lignification pathways, provide functional candidates underlying pod dehiscence and seed traits, complementing QTL results and aiding candidate prioritization for functional validation.
Conclusion
This work delivers the first chromosome-level reference genome and a suite of genomic, genetic, and transcriptomic resources for Lima bean, enabling precise comparative analyses with common bean, dissection of domestication and improvement traits, and informed breeding for resilience. Major contributions include: (1) a high-quality genome assembly with extensive functional annotation; (2) identification of key structural differences with P. vulgaris and evolutionary timelines; (3) a comprehensive resistance gene landscape; (4) robust QTLs for determinacy, flowering time, seed weight, and cyanogenesis; (5) detailed population structure revealing six main clusters and a novel Andean AII pool; and (6) developmental expression programs tied to pod dehiscence and seed development (e.g., PDH1). Future research should: (a) perform functional validation of candidate genes through forward/reverse genetics; (b) expand whole-genome resequencing to denser, unbiased variation maps across landraces and wild gene pools to refine demographic histories and introgression; (c) fine-map and clone QTL underlying agronomic traits, examining parallelism with P. vulgaris; (d) integrate genomic selection using identified markers/QTL (notably for seed weight and determinacy) and monitor cyanogenesis loci to balance safety and pest resistance; and (e) prioritize conservation and utilization of underrepresented diversity (e.g., AII and MI Yucatan) in prebreeding.
Limitations
- Candidate gene predictions and resistance gene annotations are bioinformatics-based and require experimental validation (e.g., functional assays, gene editing, fine-mapping).
- Population genetic inferences rely on GBS data with relatively low SNP density and missingness, which can bias estimates of diversity, LD, structure, and introgression; broader whole-genome resequencing would provide higher resolution and reduce reference bias.
- Genetic map density varied across chromosomes with sparse pericentromeric coverage (e.g., P101, P109), potentially affecting recombination rate estimates and QTL interval precision in these regions.
- Expression analyses involve one wild and one domesticated accession at two developmental stages; broader sampling across gene pools and developmental time courses would generalize findings and reduce accession-specific effects.
Related Publications
Explore these studies to deepen your understanding of the subject.

