logo
ResearchBunny Logo
Whole-genome resequencing of wild and domestic sheep identifies genes associated with morphological and agronomic traits

Veterinary Science

Whole-genome resequencing of wild and domestic sheep identifies genes associated with morphological and agronomic traits

X. Li, J. Yang, et al.

This study unveils the genetic underpinnings of phenotypic variation in sheep by deeply resequencing 248 individuals, including wild ancestors and improved breeds. The findings highlight crucial genomic regions related to distinct traits and suggest *PDGFD* as a key gene influencing tail fat deposition. Conducted by a collaborative team of experts, this research is a pivotal contribution to sheep breeding and genetic study.

00:00
00:00
~3 min • Beginner • English
Introduction
Sheep (Ovis aries) have provided meat, wool, skin, and milk to humans since the Neolithic. Characterizing genome-wide sequence variation and identifying functional variants associated with phenotypes are critical to enable genome-assisted breeding. Prior work has examined the impacts of domestication and selection on sheep genomic variation and identified QTLs and genes for some traits, but most studies targeted few phenotypes, used limited markers, and sampled few breeds. Whole-genome resequencing (WGS) now permits identification of variants involved in domestication and improvement across plants and animals. With a completed sheep reference genome, this study compares genomes of phenotypically diverse landraces and improved breeds with their wild ancestor, the Asiatic mouflon. The authors resequenced 248 genomes at ~25.7x coverage to map variation, test for selective sweeps, and perform GWAS, aiming to discover regions and genes linked to key morphological and agronomic traits. They also surveyed non-silent SNPs and gene-containing CNVs contributing to selection signatures. The work seeks to provide a genomic resource for genetic studies and improve understanding of sheep demography and the molecular basis of phenotypic diversity.
Literature Review
The authors note recent investigations into the effects of domestication and selection on the sheep genome, identifying QTLs and functional genes, though often limited in phenotypes and marker density. WGS has successfully uncovered domestication and improvement variants in crops (e.g., rice, soybean) and animals (cattle, sheep). Prior sheep resequencing studies used lower-depth data (approximately 8–17x), whereas this work leverages higher coverage. The introduction references convergent domestication signatures in sheep and goats, functional annotation of regulatory elements in sheep, and earlier sheep genome resources. The study builds on findings that domestication can reduce diversity and that selection may act on non-coding regulatory regions, as seen in other domesticated species such as yak, soybeans, and rabbit.
Methodology
Sampling and sequencing: Blood samples were collected from 248 individuals: 232 domestic sheep from 36 landraces (n=172) and 6 improved breeds (n=60) spanning Asia, Europe, Africa, and the Middle East, and 16 Asiatic mouflon from captivity in Iran. Individuals were unrelated and sampled across locations. Genomic DNA libraries (350 bp inserts) were sequenced on Illumina HiSeq X Ten, generating 137.0 billion 150-bp paired-end reads (~20.55 Tb), averaging 25.7x depth and 98.27% genome coverage per animal. Read mapping and variant calling: Reads were mapped to the Oar v4.0 reference genome using BWA v0.7.8, sorted and deduplicated with SAMtools v1.3.1. SNPs were jointly called using SAMtools (mpileup) and GATK v3.7 (UnifiedGenotyper), retaining only sites detected by both. Filters included missing rate ≤0.1, MAF ≥0.05 across groups, and quality/depth thresholds (-qd 2, -fs 60, -mq 40, depth per individual 6–120, genotype quality ≥20, MQRankSum ≥ -12.5, ReadPosRankSum ≥ -8.0). INDELs (<100 bp) were called with SAMtools (min depth 24, GQ >20). Variant annotation used ANNOVAR against Oar v4.0. CNV and SV detection: CNVs were identified using both CNVnator v0.3.2 (bin 100 bp; length >200 bp) and DELLY v0.7.9 (deletions and duplications), keeping calls with >50% reciprocal overlap and excluding gaps/repeats. SVs (DEL, INV, DUP, TRA) were detected with Manta v1.6.0 and DELLY, merged with SURVIVOR v1.0.6 (1 kb distance, methods=2, etc.). CNVs were binned genome-wide for downstream quantitative analyses. Validation: Concordance checks included comparison to O. aries dbSNP v151 (96.21% domestic, 81.26% mouflon SNP validation), and genotype concordance with Ovine Infinium HD BeadChip (~600K SNPs) for 223 individuals (98.98% average agreement). False positive rates estimated using 10,007 homozygous reference loci were 6.38% (GATK) and 5.37% (SAMtools), reduced to 0.66% after intersection. Sanger sequencing of 68 SNPs in candidate genes across 1,414 individuals yielded 95.69% validation. PCR/qPCR validation confirmed 78.79% of CNV genotypes. Population genomics: Genetic structure was assessed via NJ tree (TreeBeST v1.9.2, p-distance) with mouflon as outgroup, PCA (GCTA v1.24.2), and ADMIXTURE (K=2–7). A group-level NJ tree for five lineages used LD-pruned intergenic SNPs (59,943 SNPs). Nucleotide diversity (π) and global FST (50 kb windows, 25 kb steps) were computed with VCFtools and Arlequin (100,000 Markov chains, 10,000 burn-ins). LD decay (r²) was computed in PLINK v1.07. Demography was inferred per individual with PSMC (-N30 -t15 -r5 -p '4+25*2+4+6'), assuming μ=2.5×10^-8 and generation time = 3 years. Recent Ne was estimated with SNeP. Selection scans: For domestication, Asiatic mouflon were contrasted with five old landraces (Drenthe Heathen, Hu, Altay, Djallonké, Karakul) using XP-CLR (0.5 cM windows, 2 kb step, max 200 SNPs, p0=0.95) and π ratio ln(π_mouflon/π_landrace)/ln(2) (50 kb windows, 25 kb step). Additional scans used iHS (Selscan; normalized |iHS|>2; top 5% windows) and HKA tests (χ² on 50 kb windows versus genome-wide neutral expectations). For improvement/breed differentiation, global FST among 42 domestic breeds was computed in 50 kb sliding windows, top 1% considered. CNV selection: VST (analog of FST for quantitative CNV intensity) was calculated; CNVs in the top 1% VST per comparison were considered selected. GWAS: Mixed linear model association (GEMMA v0.96) was performed for litter size (109 animals, 11 breeds; 14.57M SNPs), horn number (146 animals, 15 breeds; 14.56M SNPs), and nipple number (123 animals, 13 breeds; 14.42M SNPs). First three PCs were covariates. Bonferroni-adjusted thresholds: -log10(P) ≥6 (litter size, horns), ≥4 (nipples). CNV-based GWAS used GEMMA on DELLY/CNVnator genotypes with thresholds -log10(0.05/total CNVs) ≈5.28–5.29 or -log10(P)=4. Trait phenotyping and expression: Phenotypes (coat color, wool fineness, ear size, horn and nipple numbers, tail configuration) were recorded during sampling. For tail configuration, pairwise selection scans contrasted fat-rumped, long fat-tailed, long woolly tailed vs short fat-tailed and thin-tailed breeds. RNA-Seq of tail adipose (n=3 per tail type; thin-tailed, short fat-tailed, long fat-tailed, fat-rumped) used HISAT2/StringTie/Ballgown; DE defined as |log2FC|≥2, padj ≤0.05. PDGFD expression was measured by RT-PCR, qPCR (2^-ΔΔCT normalized to β-actin), and Western blot in adipose of representative breeds. QTL overlap significance was tested via BEDTools shuffle with 10,000 permutations.
Key Findings
Sequencing and variant discovery: The study generated 137.0 billion reads (~20.55 Tb), with average depth 25.7x and 98.27% coverage. After stringent intersection and filtering, 28.36 million SNPs and 4.80 million short INDELs (≤100 bp) were retained. Additionally, 13,551 autosomal CNVs (176 bp–224.6 kb; 311–804 per individual) and 28,973 autosomal SVs (50 bp–984.0 kb; 4,515–6,657 per individual) were identified. Concordance to dbSNP was high (domestic: 96.21%; mouflon: 81.26%) and Ovine HD chip genotypes agreed at 98.98%; overall SNP false positive rate estimated at 0.66% post-intersection. Diversity and structure: Asiatic mouflon harbored more SNPs (23.27M total; ~7.77–9.16M per individual) than domestic sheep. Shared SNPs between landraces and improved breeds exceeded sharing with mouflon. Nucleotide diversity (π) was 0.00127 (mouflon), 0.00113 (landraces), 0.00109 (improved). Pairwise genome-wide FST showed mouflon were more differentiated from domestic groups (FST ≈0.125–0.132) than landraces from improved breeds (FST=0.032). LD decayed to half-maximum at 2.8 kb (mouflon), 12.1 kb (landraces), and 17.1 kb (improved); Europeans exhibited higher LD (25.1 kb) than Asians (9.8 kb). Phylogeny and PCA resolved European, Middle Eastern, Asian, and two African lineages; mouflon were closest to Middle Eastern sheep. PSMC revealed concordant demographic expansions and contractions; recent Ne estimates were lower in domestic (73.7–199.3) than mouflon (344.1), consistent with stronger LD. Domestication selection: Combining XP-CLR and π ratio identified 144 domestication sweeps encompassing/near 261 genes in five old landraces; iHS and HKA supported subsets. Enrichments involved biosynthesis and metabolic processes, and olfactory transduction. Fourteen candidate genes with known sheep functions (e.g., SLC11A1, HOXA11, CAMK4, LEF1, TET2, KDR, FLT1, BCO2, HTRA1) and 22 previously selected in other species (e.g., IGF2BP2, RFX3, KITLG, HERC5, PDE6B, EDN3, RALY, GTF2I/GTF2IRD1) were implicated, linked to reproduction, immunity, pigmentation, fat deposition, photoreception, behavior, growth. Non-synonymous variants in PDE6B, BCO2, ADAMTSL3, NKX2-1, and LOC101108252 showed significant allele frequency differences between mouflon and old landraces. CNV selection identified 137 domestication-associated CNVs overlapping genes/QTLs for fertility (SLIT2), milk (JAK2), wool (KIF16B), adipogenesis (TCF7L1, BCO2), and hypoxia tolerance (PDE10A); seven deletions and three translocations showed divergent frequencies between mouflon and domestic sheep. Breed improvement selection: Global FST scans among domestic breeds yielded 205 selected regions (23.80 Mb) containing 391 genes enriched in immune-related GO terms and cytokine–cytokine receptor interaction. The top region (FST=0.56) was near PAPPA2, associated with fat deposition in humans and production traits in cattle. Non-synonymous allele frequencies differed significantly among breeds for SPAG8, FAM184B, PDE6B, and PDGFD. Nine genes overlapped with domestication candidates (e.g., RALY, EDN3, PDE6B), and 131 regions overlapped known QTLs for reproduction, milk, body weight, meat, teat number, tail fat, and horns. Tail configuration genetics: Across multiple pairwise contrasts of tail phenotypes, 80–122 sweeps per comparison overlapped genes; consistent candidates included PDGFD, NRIP1, KRT5, KRT71, and novel candidates (XYLB, TSHR, SGCZ, CNOT3, CFLAR, GLIS3, MSRA, MAP2K3, FGF7). PDGFD stood out across analyses: a non-synonymous variant showed frequency differences between fat- and thin-tailed breeds; promoter-region genotype patterns differed markedly between fat-tailed/rumped vs thin-tailed sheep and mouflon. RNA-Seq identified PDGFD as differentially expressed between fat-tailed/fat-rumped and thin-tailed sheep (log2FC=3.08; padj=0.045), with one isoform (transcript I) most divergent. RT-PCR, qPCR, and Western blots showed PDGFD mRNA/protein levels negatively correlated with tail fat deposition (highest in thin-tailed Merino; progressively lower in Small-tailed Han, Large-tailed Han, and fat-rumped Altay), consistent with PDGFRβ signaling inhibiting white adipocyte differentiation. GWAS and trait genetics: MLM GWAS identified 600 (litter size), 989 (horn number), and 1969 (nipple number) significant SNP signals, with 20, 56, and 1 overlapping selection sweeps, respectively. Litter size mapped to known (BMPR1B, INHBB, ESR1) and novel genes (NOX4, IRF2, PDE11A, ZFAT, ZFP91, TENM1, BICC1, LRRTM3, CTNN3, SMYD3, KCNN3, CD96). Horn number signals highlighted HOXD1/HOXD3 and RXFP2; nipple number associated with LRP1B, GRM3, MACROD2, SETBP1, GPC3, and novel genes PHGDH, KDM3A, GLIS3, FSHR, CSN2, CSN1S1, ROBO2. Variants within ±20 kb of 25 candidate genes explained 1.2–16.8% (litter size), 7.0–16.1% (nipples), and 14.1–17.0% (horns) of phenotypic variance. CNV-GWAS detected significant associations for litter size (SMARCA1, APP), horns (RXFP2), and nipples (GPC5). Many selection/GWAS signals overlapped known QTLs at rates exceeding random expectation (permutation P<0.001).
Discussion
High-depth resequencing across a hierarchically structured panel of wild, landrace, and improved sheep provided a comprehensive variome and resolved population structure consistent with geography and breed history. Reduced nucleotide diversity and elevated LD in domestic breeds relative to mouflon reflect founder effects, management, and artificial selection, while similar diversity between landraces and improved breeds suggests strong positive selection is comparatively recent. Domestication signals were enriched for metabolic and olfactory pathways, concordant with earlier reports that domestication involves shifts in diet, physiology, and behavior. The predominance of non-coding variants among fixed or near-fixed domestication signals suggests regulatory changes underpin polygenic domestication traits (e.g., litter size), with SVs and CNVs also contributing to phenotypic diversification. The breed-level selection landscape revealed differentiation at loci linked to morphology, production, adaptation, and immunity, with cytokine–receptor interactions notably enriched—likely reflecting both artificial and long-term natural selection. Integrative scans and GWAS fine-mapped several traits, confirming known loci (e.g., RXFP2 for horns; BMPR1B for litter size) and uncovering novel candidates for reproduction and mammary traits. A key mechanistic insight is the identification of PDGFD as a strong candidate influencing tail fat deposition. Convergent evidence from selection scans, allele frequency differentiation, promoter genotype patterns, transcript isoform shifts, and protein expression supports a role for PDGFD/PDGFRβ signaling in inhibiting white adipocyte differentiation, thereby modulating tail fat phenotypes. These results link specific coding and regulatory variation to expression changes and trait differences, offering targets for functional validation and breeding. Collectively, the study advances understanding of sheep domestication, demographic history, and the genetic architecture of economically important traits. The resource of validated variants, sweeps, CNVs/SVs, and GWAS associations will facilitate marker-assisted and genomic selection, and guide genome editing strategies.
Conclusion
This work delivers a high-resolution map of genomic variation in wild and domestic sheep and identifies candidate genes and regions associated with domestication and breed improvement, including strong evidence implicating PDGFD in tail fat deposition. The study corroborates reduced diversity and increased LD in domestic sheep, clarifies population structure, and highlights that regulatory and structural variants contribute substantially to domestication-related phenotypes. Integrating selection scans, CNV/SV analyses, transcriptomics, and GWAS pinpointed known and novel loci for reproduction, horn and nipple number, mammary traits, pigmentation, and tail configuration. These findings provide a valuable genomic resource to support genome-assisted breeding and functional studies. Future work should expand sampling across Asiatic mouflon subspecies and geographies, undertake functional validation (e.g., CRISPR/Cas) of candidate variants, and leverage ancient DNA to refine the timing and pathways of sheep domestication. Cross-species comparisons may elucidate common domestication mechanisms.
Limitations
The wild ancestor sampling was limited to 16 Asiatic mouflon from captivity in Iran; mouflon comprise several subspecies across broader geographic ranges, and comprehensive sampling would better resolve domestication origins and selection signals. Many domestication-associated SNPs reside in non-coding regions, complicating causal inference without functional assays. Some trait analyses involved modest sample sizes across multiple breeds, potentially reducing power and precision. Enrichment of immune pathways among breed-differentiated genes may reflect both artificial and natural selection, making attribution challenging. While PDGFD expression and variant analyses are consistent, experimental validation of specific causal mutations and mechanisms remains necessary.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny