logo
ResearchBunny Logo
Introduction
The SARS-CoV-2 pandemic necessitates tracing its origin. Previous outbreaks like SARS and MERS originated in bats, with intermediate hosts. Initial genomic data for SARS-CoV-2 also pointed towards bats as the original host, with Bat-CoV-RaTG13 showing 96% genome-wide identity. Pangolins were also considered potential intermediate hosts due to the higher sequence identity in the spike protein's RBD compared to Bat-CoV-RaTG13. It has been hypothesized that recombination between Bat-CoV-RaTG13-related and Pangolin-CoV-2019-related viruses might have led to the SARS-CoV-2 RBD, impacting its high infectivity. Recombination in RNA viruses is a key evolutionary mechanism, generating diversity and novel viruses. Therefore, analyzing genomic recombination between coronaviruses from different hosts is crucial for understanding SARS-CoV-2's origin and subsequent evolution.
Literature Review
Existing literature highlights the zoonotic origin of several coronaviruses, with bats often implicated as the primary reservoir. Studies have explored the genomic similarity between SARS-CoV-2 and bat coronaviruses like Bat-CoV-RaTG13, indicating a close evolutionary relationship. However, discrepancies exist regarding the role of pangolins as an intermediate host. While some studies suggest potential recombination events involving pangolin and bat coronaviruses contributing to SARS-CoV-2's RBD, others have questioned the direct contribution of recombination to the virus's origin. This study aims to address these conflicting views by performing a comprehensive analysis of available coronavirus genomes to identify and characterize recombination events, potentially clarifying the evolutionary trajectory of SARS-CoV-2.
Methodology
The researchers collected 29,452 publicly available coronavirus genomes, including 26,312 SARS-CoV-2 genomes. They used RDP4 software to detect recombination events, employing seven statistical tests (RDP, GENECONV, Bootscan, Maxchi, Chimaera, SiSscan, 3Seq). Phylogenetic trees were constructed using MEGA software with the Jukes-Cantor model and 5000 bootstrap replicates. Pairwise genetic distances were calculated using the Tajima-Nei model in MEGA. Sliding window analysis was performed to examine sequence differences between SARS-CoV-2 and closely related coronaviruses. Population genetic analyses, including calculations of Pi, Tajima's D, and composite likelihood ratios (CLR), were conducted on 448 Coronaviridae samples and 26,312 SARS-CoV-2 samples using CoVdb online tools. The analysis involved identifying recombination events, building phylogenetic trees, calculating genetic distances, and performing sliding window analyses of nucleotide differences, Fst values, and CLRs. Statistical significance of peaks observed in sliding window analysis was determined by comparing the distribution of values in the target region against the flanking regions using Wilcoxon rank-sum tests. Data on coronavirus strains were obtained from NCBI, GISAID, and CoVdb. Unique genomic sequences were identified using CD-HIT, requiring >95% identity and coverage. The identified recombination events were annotated using CoVdb, NCBI taxonomy, and manual curation.
Key Findings
The analysis revealed three independent recombination events between bat and pangolin coronaviruses, each supported by at least six statistical tests (P<0.05). One key recombination event involved a 228 bp sequence within the SARS-CoV-2 S protein's RBD, likely originating from recombination between Bat-CoV-RaTG13-like and Pangolin-CoV-2019-like strains. This recombination was significant in six independent statistical tests. Phylogenetic trees for the recombination regions differed from those constructed using whole genomes, supporting the recombination events. Genetic distance analysis confirmed that the genetic distance between the putative recombinant and minor parent was the lowest for all three events. Population genetic analyses showed peaks in fixation index (Fst) values for RI_RNA_S between human and bat, human and pangolin, suggesting significant differentiation in this region. CLR peaks were observed adjacent to RI_RNA_S in SARS-CoV-2 strains, indicating potential adaptation. High RBD diversity was observed in bat coronaviruses compared to human, camel, and cow coronaviruses, with RI_RNA_S showing a peak in Pi (nucleotide diversity) for five bat clades. Bats also exhibited a higher number of coronavirus subclades and recombination events compared to other hosts, with a high frequency of recombination between bat and human, as well as bat and pangolin coronaviruses. Analyses of RI_RNA_ORF1 and RI_RNA_Boundary also showed significant CLR and Fst peaks indicating evolutionary activity.
Discussion
The findings strongly suggest that SARS-CoV-2 originated from recombination between bat and pangolin coronaviruses, with a key recombination event occurring within the RBD of the S protein. This recombination event likely played a crucial role in the virus's adaptation to human hosts and its enhanced infectivity. The high genetic diversity of bat coronaviruses, coupled with their frequent participation in recombination events, highlights their role as a reservoir for novel coronavirus emergence. While the study provides strong evidence for a recombination origin, direct isolation of SARS-CoV-2 from bats or pangolins remains lacking. The study’s comprehensive approach in analyzing recombination events across a vast number of genomes contributes significantly to understanding the evolutionary history of SARS-CoV-2.
Conclusion
This research provides compelling evidence for the origin of SARS-CoV-2 through recombination between bat and pangolin coronaviruses, particularly involving a key sequence in the S protein's RBD. The high genetic diversity in bat coronavirus RBD regions and their frequent recombination events underscore the importance of bats as a reservoir for future coronavirus emergence. Further research should focus on experimentally validating the functional differences of RI_RNA_S in SARS-CoV-2 and Bat-CoV-RaTG13, and on exploring the potential role of other intermediate hosts in the zoonotic transmission of the virus.
Limitations
The study relies on publicly available genomic data, which may not represent the full diversity of coronaviruses in their natural reservoirs. The absence of direct isolation of SARS-CoV-2 from bats or pangolins limits definitive conclusions about the precise intermediate hosts. While the study uses robust statistical methods for recombination detection, there is always a possibility of false positives or negatives. Pooling of coronavirus strains from the same host to overcome limited data availability might have inflated genetic diversity levels.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny