logo
Loading...
The conservation of human functional variants and their effects across livestock species

Biology

The conservation of human functional variants and their effects across livestock species

R. Zhao, A. Talenti, et al.

This exciting research reveals the prevalence and impact of human functional variants in livestock species, uncovering that over 1.6 million human variants have orthologues in domesticated mammals. The authors employed machine learning to highlight the conserved effects of these variants across species, showcasing livestock as valuable models for studying human genetics.... show more
Introduction

The study investigates whether naturally occurring orthologues of human functional variants exist in livestock and whether their effects are conserved across species. The context is that rodent models have limitations for translational research due to physiological differences from humans, while livestock (especially pigs and cattle) more closely match human anatomy and physiology. Creating genome-edited animal models is expensive, time-consuming, and ethically challenging, and random transgenesis often loses genomic context. The purpose is to leverage naturally segregating variants in domesticated mammals to model human functional variants without transgenesis. The importance lies in enabling scalable studies of variant function, understanding disease mechanisms, and potentially improving livestock traits. The study aims to quantify the extent of shared variants, identify genomic features predicting orthologue presence, catalog orthologues of pathogenic and complex-trait-associated human variants, and test conservation of molecular and phenotypic effects across humans and livestock.

Literature Review

Animal models are widely used, but rodents often fail to translate to human outcomes due to physiological differences; primates are costly and ethically constrained. Livestock, particularly pigs, are increasingly used in translational studies and exhibit gene expression profiles closer to humans. Prior large-scale livestock genomics (e.g., 1000 Bull Genomes) revealed extensive polymorphism, suggesting many human variants may have orthologues. Limited examples exist of functional variants shared across species (e.g., coat color missense variant in dogs and water buffalo). Cross-species comparisons have shown that orthologues of human pathogenic variants in zebrafish are more likely to have detectable phenotypic effects. Historically, genome-wide study of natural orthologues was limited by uncertainty in causal variants at GWAS loci, but recent fine-mapping and functional datasets now allow better identification of causative variants.

Methodology

Datasets: Human variants (78 million SNPs) from 1000 Genomes (n=2504). Livestock cohorts: cattle (n=477), pig (n=409), dog (n=722), water buffalo (n=79/81). All non-human cohorts were filtered to biallelic SNPs. Human SNPs were lifted to orthologous positions in BosTau9 (cattle), SusScr3 (pig), CanFam3 (dog) using UCSC liftOver; for water buffalo, nf-LO was used due to lack of chain files. Sites lifting to multiple locations were excluded. Orthologous variants were called where a polymorphism existed at the orthologous coordinate in the target species; same allele changes were identified allowing for complements, assuming shared ancestral base in conserved regions. Relatedness filtering was assessed in cattle using VCFtools kinship. Variant annotation: 1,589 human genomic features were compiled per SNP, including sequence conservation (phastCons100way/30way, phyloP100way/30way), sequence context (ancestral/derived alleles; 5-mer flanks encoded via custom binary encoding), genomic position features (distances to CpG islands, TSS by biotype, chromatin marks, regulatory elements, processed pseudogenes, snoRNAs), VEP consequences, allele frequencies, gene density. Distances computed via bedtools and ChIPpeakAnno; conservation extracted from UCSC bigWigs; ancestral allele from Ensembl ancestral genome. Machine learning: Balanced datasets (foreground: human SNPs with cattle orthologues with matching alleles; background: liftable human SNPs without detected cattle polymorphism) were assembled with 200,000 total variants (100k foreground, 100k background), split 70/30 train/test with 5-fold cross-validation. Models: Random Forest, XGBoost, CatBoost (with GPU) via scikit-learn and CatBoost. Hyperparameters tuned via random search and manual tuning. Cross-species models trained using orthologues in pig, cattle, dog, water buffalo, individually and combined, and tested both within-species and cross-species. Feature importance compared across models; SHAP values computed for CatBoost cross-species model. ClinVar and UK Biobank analyses: ClinVar pathogenic/likely pathogenic SNPs (n=89,158) were lifted to cow and pig genomes; overlaps with orthologous variants and matching allele changes were counted; consequence conservation across species annotated via Ensembl REST, comparing missense/stop-gained/synonymous etc. Enrichment by ClinVar phenotype tested via chi-squared. For complex traits, 2,240 fine-mapped SNPs across 47 traits from UK Biobank (Weissbrod et al.) were lifted to cattle and pig; presence of orthologues and allele matching assessed; enrichment for traits (e.g., height) tested via Fisher’s exact test; gene-level consequences noted for missense examples. Regulatory variant analyses: Human fine-mapped eQTLs from GTEx v8 (CAVIAR, CaVEMAN, DAP-G) were combined; associations (probability >0.2) were lifted to cattle; overlaps with cattleGTEx cis-eQTLs across 23 tissues/cell types were assessed. For each tissue, QQ plots compared observed P-values of orthologous cattle variants vs distributions from random cattle variants; false discovery rate conservatively estimated using random sampling baseline. Direction-of-effect conservation was assessed by correlating slopes (effect of allele dosage on expression) between human and cattle for orthologous variant-gene pairs. Colocalization examples were illustrated for SIRPB1 and TAFIC across human tissues and cattle muscle. Deep learning: Enformer (human-trained) was applied to sequences around shared regulatory variants in both human and cattle to predict CAGE and chromatin tracks for reference vs alternate alleles, explaining cases of non-conserved expression effects due to isoform/promoter usage differences (e.g., NINJ2 short isoform absent in cattle). Statistics: Chi-squared tests, Fisher’s exact tests, and two-sample Kolmogorov–Smirnov tests were used as appropriate; multiple random sampling used to derive confidence intervals in QQ plots. All analyses were conducted in R and Python.

Key Findings
  • Shared variants: Of 78M human SNPs, 35M and 34M map to orthologous positions in pig and cattle, respectively. Among these, 3.7% (pig) and 3.0% (cattle) overlap an orthologous polymorphism; 55.5% and 55.8% of overlaps have identical allele changes. Across four species (cattle, pig, dog, water buffalo), 1,651,728 human variants have an orthologue in at least one species.
  • Sequence context and genomic features: Orthologous sharing is enriched at CpG contexts (consistent with deamination), with 5-mer contexts containing G/C increasing orthologue likelihood, except when a G is 5' of the SNP. Variants with orthologues are enriched near processed pseudogenes, snoRNAs, and certain chromatin marks (e.g., H3K9ac proximity differences; KS tests P<0.0018).
  • Machine learning prediction: CatBoost achieved AUC 0.69, accuracy 0.64, F1=0.70 distinguishing human variants with vs without cattle orthologues; cross-species models achieved AUC up to 0.73, accuracy up to 0.67. Most important features: conservation (phyloP100way), allele change, and 5-mer context; lower conservation scores predict higher orthologue probability; a G immediately downstream (NNNGN) increases predicted orthologue presence.
  • ClinVar pathogenic variants: Of 89,158 pathogenic/likely pathogenic ClinVar SNPs, 1,290 overlap a cattle variant (253 same alleles), and 767 overlap a pig variant (212 same alleles). ClinVar SNPs are ~3× less likely to have an orthologue and ~7× less likely to match alleles compared with background, consistent with purifying selection. Consequence conservation: 80% of 103 human missense ClinVar variants with cattle orthologues produce the same amino acid change; 13% are missense-to-different-AA; 3.9% become synonymous. Only 22% of human stop-gained variants remain stop-gained in cattle (63% become missense), possibly reflecting codon differences and annotation limitations. Some variants yield conserved protein impacts with different allele changes.
  • Trait-specific enrichment: ClinVar variants for biotinidase deficiency, neurofibromatosis, and glycogen storage are enriched for cattle orthologues; factor VII deficiency variants enriched in pigs. Four of 23 known human biotinidase deficiency variants (17%) have direct cattle orthologues with identical alleles; one has MAF 22% in cattle vs 0.002% in humans.
  • UK Biobank fine-mapped variants: 58 of 2,240 fine-mapped SNPs have direct orthologues in pigs or cattle; variants with matching alleles are disproportionately linked to human height in cattle (11/43 with matching alleles; enrichment P=0.040). Three missense variants (FGFR3 rs154001, KIAA1614 rs61735104, FBN2 rs79485039) cause the same amino acid change in both species; FOXM1 has conserved protein impact with different allele changes. Ten of these eleven contribute an estimated 2.7 cm combined height variation in humans; FGFR3 variant associates with ~1 cm height difference between homozygotes.
  • Regulatory variant conservation: 221 human fine-mapped regulatory variants had matching cattle variants tested against orthologous genes (469 ignoring allele changes). Orthologous cattle variants show enriched association signals vs random in multiple tissues (QQ deviations). Direction of effect is conserved across species, with significant positive correlations of eQTL slopes (e.g., R=0.79 for CaVEMAN-based fine-mapped variants; R=0.68 for DAP-G; both P<2e-15). Colocalized examples include SIRPB1 and TAFIC eQTLs with shared lead variants and directions across human and cattle tissues.
  • Mechanistic insight: Enformer predictions explain non-conserved effects when human variants affect isoform-specific promoters absent in cattle (e.g., rs10849334 affecting a short NINJ2 isoform TSS not present in cattle).
  • Evolutionary patterns: Very few sites are polymorphic across all five species (12 positions), indicating independent mutation at hypermutable, weakly constrained sites and providing a potential metric of selective pressure via orthologue depletion (e.g., ClinVar sites).
Discussion

The study shows that a large number of human variants, including many with functional roles, have naturally occurring orthologues in livestock, enabling their study without genome editing. Machine learning based on conservation, allele change, and sequence context can predict which human variants are likely to have orthologues across mammals, indicating shared mutational and selective landscapes. Importantly, for functional variants, both coding and regulatory effects often conserve across species: missense consequences and amino acid changes are frequently preserved, and regulatory variants commonly influence orthologous genes with the same direction of effect despite divergent LD structures and tissues. This supports using livestock to dissect mechanisms of human disease variants and to inform livestock trait improvement. The disproportionate sharing of variants associated with certain phenotypes (e.g., height, biotinidase deficiency) suggests selection may maintain such variants across lineages, consistent with domestication-related selection on body size. Cross-species comparisons also help validate fine-mapping, as methods whose fine-mapped variants show more conserved effects (e.g., DAP-G, CaVEMAN vs CAVIAR) may better capture causal signals. Overall, natural orthologues provide a scalable, ethically tractable avenue to probe variant function, leverage differences in allele frequency and LD for fine-mapping, and guide breeding and genome editing programs.

Conclusion

This work catalogs extensive natural orthologues of human variants in livestock and demonstrates that many functional effects, especially regulatory directions and missense consequences, are conserved across species. Machine learning can prioritize human variants likely to have livestock orthologues using conservation and sequence context. The findings identify hundreds of immediately available large animal models for human pathogenic and complex-trait variants, highlight phenotype-specific enrichment (notably for height), and illustrate how cross-species analyses can elucidate variant mechanisms and support breeding strategies. Future research should expand to additional species and larger cohorts, improve livestock genome and transcript isoform annotations, integrate tissue-matched multi-omics, and systematically test conserved variants’ phenotypic effects, thereby refining fine-mapping and enhancing translational genomics.

Limitations
  • Underestimation of shared functional variants due to limited cohort sizes, rare variant frequencies, and incomplete fine-mapping power.
  • Differences in tissue availability, sample sizes, and eQTL detection power between species hinder direct comparability; absence of association in cattle does not imply lack of function.
  • Livestock gene and isoform annotations are less complete, affecting consequence prediction (e.g., stop-gained vs missense) and regulatory interpretation.
  • Cross-species tissue matching is imperfect; LD patterns differ, potentially confounding effect comparisons.
  • Machine learning performance is moderate (AUC ~0.69), indicating predictive features are informative but not fully determinative.
  • Liftover and orthology assumptions (e.g., ancestral allele identity) may introduce mapping errors at some loci; multi-mapped sites were excluded.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny