logo
ResearchBunny Logo
Genomic recombination events may reveal the evolution of coronavirus and the origin of SARS-CoV-2

Biology

Genomic recombination events may reveal the evolution of coronavirus and the origin of SARS-CoV-2

Z. Zhu, K. Meng, et al.

Explore the fascinating evolution of coronaviruses and the potential origins of SARS-CoV-2 through the groundbreaking research conducted by Zhenglin Zhu, Kaiwen Meng, and Geng Meng. Analyzing over 29,000 coronavirus genomes, this study uncovers significant recombination events and highlights the crucial role of bat coronaviruses in genetic diversity. Discover how viral adaptation to hosts is detected in the receptor-binding domain of the spike glycoprotein.

00:00
00:00
~3 min • Beginner • English
Introduction
SARS-CoV-2, first identified in Wuhan, China, rapidly became a global pandemic. Tracing its origin is vital for control and prevention. Previous zoonotic coronaviruses (SARS-CoV and MERS-CoV) originated from bats with intermediate hosts. Early genomic analyses suggested bats as the original host for SARS-CoV-2, with Bat-CoV-RaTG13 showing 96% whole-genome identity to SARS-CoV-2. Pangolin coronaviruses have lower whole-genome identity (~91%) but higher S gene similarity (97.5%) to SARS-CoV-2 than RaTG13, implicating pangolins as potential intermediates. It has been proposed that the RBD of SARS-CoV-2 S protein may have arisen from recombination between viruses similar to RaTG13 and Pangolin-CoV-2019. Given that the SARS-CoV-2 RBD binds human ACE2 with a lower (more favorable) binding free energy than SARS-CoV, recombination may be linked to its high infectivity. The study aims to comprehensively analyze coronavirus genomes to detect recombination among hosts, especially bat–pangolin–human lineages, and to assess evolutionary signatures that inform the origin and adaptation of SARS-CoV-2.
Literature Review
Prior reports established bat origins for several human coronaviruses and identified potential intermediate hosts (civets for SARS-CoV, camels for MERS-CoV). Early SARS-CoV-2 studies highlighted bat coronavirus RaTG13 as a close relative, while pangolin coronaviruses showed high similarity within the S gene and RBD, raising the possibility of recombination contributing to SARS-CoV-2 emergence. Some studies did not find direct recombination evidence linked to SARS-CoV-2 origin, whereas others suggested recombination involvement, including pangolin-originated sequences within the RBD. Functional and structural analyses showed stronger binding between SARS-CoV-2 RBD and human ACE2 compared with SARS-CoV, potentially explaining enhanced transmissibility. Recombination is recognized as a key evolutionary mechanism in RNA viruses, generating diversity and novel chimeric genomes. Collectively, the literature supports investigating recombination between bat and pangolin coronaviruses as a plausible route to SARS-CoV-2.
Methodology
- Data collection: Retrieved 29,452 coronavirus genomes (3,140 non–SARS-CoV-2 and 26,312 SARS-CoV-2) from NCBI, GISAID, and CoVdb. - Multiple sequence alignment: Performed whole-genome alignments using CUDA ClustalW. - Phylogenetics: Built phylogenetic trees (whole-genome and regional) with MEGA X using the Jukes–Cantor model; phylogeny assessed by 5,000 bootstrap replicates. - Recombination detection: Focused analyses on SARS-CoV-2 and proximal outgroups (bat and pangolin coronaviruses), using RDP4 with seven detection methods (RDP, GENECONV, Bootscan, MaxChi, Chimaera, SiSScan, 3Seq). Events required P<0.05 in at least six tests. - Genetic distance estimation: Calculated evolutionary divergence (Tajima–Nei model) in recombination regions and flanking 2,000 bp windows using MEGA X. - Genome-wide recombination survey: Retrieved all CoV genomes from CoVdb, clustered to unique sequences with CD-HIT (identity >95%, coverage >95%), aligned and scanned with RDP4, yielding 1,149 putative events; after filtering, 532 independent recombination events remained. - Population genetics: Grouped sequences by host and by disease for human isolates in CoVdb. Calculated nucleotide diversity (Pi) and Tajima’s D with VariScan; composite likelihood ratio (CLR) for selective sweeps with SweepFinder2. Performed Fst sliding window analyses across hosts; significance assessed by comparing target region distributions with extended flanks (~1,000 bp left/right) and by genome-wide top 5% thresholds. SARS-CoV-2 temporal subsets from March (16,270 genomes) and April (10,042 genomes) were analyzed for dynamics. - Sliding window and validation: Implemented Perl pipelines for sliding windows of nucleotide differences and population-genetic metrics; validated recombination by regional phylogenies, pairwise identity plots, and minimal genetic distances between presumed recombinants and minor parents. - Annotations: Host, taxonomy, and metadata from CoVdb, NCBI, and GISAID; manual curation as needed.
Key Findings
- Detected three independent recombination events among bat and pangolin coronaviruses relative to SARS-CoV-2 reference MN908947, with coordinates: 1) 16,623–17,891 bp (RI_RNA_ORF1 region within ORF1ab) 2) 21,187–22,368 bp (RI_RNA_Boundary spanning 3' end of ORF1ab and 5' of S) 3) 22,870–23,099 bp (RI_RNA_S within S protein RBD) All three events were supported by at least six statistical tests in RDP4 (P<0.05); for the RI_RNA_S event, six of seven tests were significant (SiSscan not significant). - Phylogenetic incongruence observed: Regional trees for recombination segments differed from whole-genome trees, and pairwise identity plots corroborated recombination. - Minor/major parents: - RI_RNA_S (22,870–23,099): Putative recombination between strains similar to Bat-CoV-RaTG13 (major parent) and Pangolin-CoV-2019 (minor parent); the 228 bp insert encodes a 76 aa peptide within the RBD. - RI_RNA_ORF1 and RI_RNA_Boundary: Evidence of bat-SL-CoVZC45/ZXC21-like sequences recombined into pangolin coronavirus lineages (including Pangolin-CoV-2017 and Pangolin-CoV-2019-like). - Genetic distances: In each recombination region, the genetic distance between the presumed recombinant and the presumed minor parent was the lowest among comparisons, consistent with recombination (e.g., SARS-CoV-2 vs Pangolin-CoV-2019 shows reduced distance within RI_RNA_S relative to flanks, Table 2). - Selection and differentiation signals: - RI_RNA_S showed peaks of Fst between human and bat, human and pangolin, and other host pairs; many peaks exceeded local 0.05 or 0.1 thresholds and were supported by Wilcoxon rank-sum tests comparing RI_RNA_S with flanking distributions. - CLR analyses in SARS-CoV-2 revealed significant or weakly significant peaks flanking RI_RNA_S in March and April datasets, suggesting directional selection acting near the integrated segment. - Additional CLR and Fst peaks indicated that RI_RNA_ORF1 and RI_RNA_Boundary are also evolutionarily active. - Diversity patterns: - Bats exhibited high nucleotide diversity (Pi) in the RBD, including a Pi peak at RI_RNA_S across multiple bat clades; human clades lacked significant Pi peaks in RI_RNA_S. - Bat coronaviruses had higher Tajima’s D than human coronaviruses in RBD in pooled analyses, though clade-based analyses suggested population structure effects. - Recombination landscape across hosts: - Identified 532 independent recombination events among coronaviruses; bats harbored the highest number of events and subclades among 32 hosts. - Among inter-host recombinations, bat–human pairs were most frequent; 43.5% (37/85) of human-related events involved bats, and 100% (10/10) of pangolin-related events involved bats. - Interpretation: The RI_RNA_S recombination between a RaTG13-like and a Pangolin-CoV-2019-like virus likely contributed to the emergence of SARS-CoV-2, with subsequent adaptive evolution facilitating host specificity and transmission.
Discussion
Comprehensive analyses of tens of thousands of coronavirus genomes revealed three robust recombination events involving bat and pangolin coronaviruses, including a 228 bp segment within the SARS-CoV-2 spike RBD (RI_RNA_S). Phylogenetic incongruence, reduced genetic distances to the minor parent within the recombination window, and multiple recombination tests collectively support a recombination origin for this segment. Population-genetic signals (Fst peaks, CLR elevations) around RI_RNA_S and other recombinant regions indicate differentiation among hosts and signatures consistent with recent selection, supporting a role for the integrated sequence in host adaptation. The data suggest that a RaTG13-like backbone acquiring an RBD segment from a Pangolin-CoV-2019-like lineage could underlie the origin of SARS-CoV-2. More broadly, bats emerged as a major reservoir of coronavirus diversity and recombination, frequently interfacing with human and pangolin lineages, which may facilitate cross-species transmission events. While these findings strengthen a recombination-based origin hypothesis, direct progenitor viruses have not been isolated, and limitations in non-human sampling warrant caution.
Conclusion
This study identifies and validates three inter-host recombination events in coronaviruses, including a 228 bp RBD segment in SARS-CoV-2 likely derived from a Pangolin-CoV-2019-like virus into a RaTG13-like backbone. Population-genetic evidence indicates that this region has been under directional selection, potentially contributing to SARS-CoV-2’s host adaptation. Bats harbor extensive coronavirus diversity and recombination activity, positioning them as a key genetic reservoir for the emergence of novel human coronaviruses. Future research should expand genomic surveillance of non-human hosts (especially bats and pangolins), recover closer progenitor viruses, and functionally characterize the RI_RNA_S segment to elucidate its role in receptor binding and host range. Enhanced monitoring and reduced human–wildlife contact are recommended to mitigate future spillovers.
Limitations
- Lack of direct isolation of SARS-CoV-2 from bats or pangolins; progenitor viruses remain unsampled. - Limited availability and uneven sampling of non-human coronavirus genomes; pooling across subclades and time may inflate diversity estimates and confound statistics. - Population structure and sample size differences (e.g., fewer bat samples vs. more human samples) may influence metrics like Tajima’s D. - Inference of recombination relies on statistical and phylogenetic signals; while multiple tests support events, alternative evolutionary processes cannot be fully excluded. - Functional effects of the RI_RNA_S segment were not experimentally tested; conclusions about adaptation are based on population-genetic signatures.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny