logo
ResearchBunny Logo
The proximal origin of SARS-CoV-2

Medicine and Health

The proximal origin of SARS-CoV-2

K. G. Andersen, W. I. Lipkin, et al.

This groundbreaking research delves into the origins of SARS-CoV-2, unraveling genomic data to reveal its remarkable features and natural emergence. Conducted by a team of experts including Kristian G. Andersen and W. Ian Lipkin, the study concludes that the virus likely developed through natural selection rather than in a lab setting.... show more
Introduction

The correspondence addresses the origin of SARS-CoV-2, the seventh known human-infecting coronavirus, amid a rapidly expanding COVID-19 outbreak. The authors analyze genomic features to assess whether the virus arose through natural processes or laboratory manipulation. They focus on notable characteristics of the spike gene that influence host receptor binding and proteolytic activation, aiming to determine plausible evolutionary scenarios for emergence. Establishing the origin is important for preventing future zoonoses and understanding viral transmissibility and pathogenesis.

Literature Review

Background literature highlights that SARS-CoV, MERS-CoV and SARS-CoV-2 can cause severe disease while HKU1, NL63, OC43 and 229E typically cause mild illness. Prior work identified six key receptor-binding domain (RBD) residues governing ACE2 binding and host range in SARS-like coronaviruses and documented variability in the spike protein, particularly in the RBD. Studies have shown the role of polybasic (furin) cleavage sites in viral infectivity and host range across viruses, including coronaviruses and avian influenza, as well as the potential immunoevasive functions of mucin-like domains with O-linked glycans. Comparative genomic resources include bat coronavirus RaTG13 and pangolin coronaviruses, which provide close but distinct relatives to SARS-CoV-2 for contextual analysis. Reverse genetics systems for betacoronaviruses are known, and there is precedent for selection of polybasic cleavage sites in other viruses after passage in cell culture or animals.

Methodology

The authors conducted comparative genomic analyses of alpha- and betacoronaviruses, with emphasis on spike protein sequence alignment and inspection of the receptor-binding domain (RBD) and the S1–S2 junction. They mapped RBD contact residues (corresponding to SARS-CoV positions Y442, L472, N479, D480, T487, Y491; and SARS-CoV-2 positions L455, F486, Q493, S494, N501, Y505) and compared them across human SARS-CoV-2, bat RaTG13, pangolin CoVs, SARS-CoV, and related bat SARS-like CoVs. They identified unique sequence insertions at the S1–S2 boundary, specifically a polybasic furin cleavage site (RRAR) with an inserted PRRA motif, and used established prediction tools/knowledge to infer three adjacent O-linked glycosylation sites (S673, T678, S686). Structural and biochemical data from published studies were used to interpret ACE2-binding affinity and functionality. They evaluated evolutionary scenarios (natural selection in an animal before zoonosis, selection in humans after zoonosis, and selection during laboratory passage) by integrating sequence similarity (e.g., ~96% overall identity to RaTG13, RBD similarity to pangolin CoVs), known patterns of cleavage site acquisition in other viruses, and epidemiological timing (tMRCA estimates late Nov–early Dec 2019).

Key Findings
  • Two notable genomic features characterize SARS-CoV-2: (1) an RBD optimized for binding to human ACE2 based on structural and biochemical data, and (2) a functional polybasic (furin) cleavage site at the S1–S2 boundary due to a 12-nucleotide insertion (PRRA), accompanied by predicted O-linked glycans at S673, T678, and S686.
  • In the RBD, five of six key ACE2-contact residues differ between SARS-CoV-2 and SARS-CoV (SARS-CoV positions Y442, L472, N479, D480, T487, Y491 correspond to SARS-CoV-2 L455, F486, Q493, S494, N501, Y505), yet SARS-CoV-2 exhibits high-affinity ACE2 binding, implying natural selection found an alternative optimal solution rather than purposeful design.
  • Computational models suggest the SARS-CoV-2 RBD is not the same as previously predicted optimal SARS-CoV solutions, supporting natural evolution rather than engineering.
  • The PRRA insertion creates a polybasic furin site (RRAR) not seen previously in lineage B betacoronaviruses; the local structure likely supports O-linked glycosylation, potentially forming a mucin-like shield.
  • RaTG13 (bat, Rhinolophus affinis) is ~96% identical overall to SARS-CoV-2 but diverges in the RBD, suggesting less efficient human ACE2 binding; some pangolin CoVs possess RBDs highly similar to SARS-CoV-2, including all six key residues, demonstrating that such RBDs arise in nature.
  • No sampled bat or pangolin CoVs currently show the polybasic cleavage site, but insertions/deletions at S1–S2 junctions can occur naturally, and coronavirus diversity is massively undersampled.
  • The genetic data indicate SARS-CoV-2 is not derived from any previously used laboratory backbone, arguing against purposeful manipulation.
  • Plausible origin scenarios: (i) natural selection in an animal host before zoonosis; (ii) natural selection in humans after zoonosis. Selection during laboratory passage is disfavored because it would require an unsampled near-progenitor, repeated passage on human-like ACE2, and acquisition of both a polybasic site and O-linked glycan features, which is not supported by existing reports.
  • tMRCA estimates place emergence in late November to early December 2019. As of 11 March 2020, there were 121,564 confirmed cases across >110 countries with 4,373 deaths.
Discussion

The observed spike protein features—an RBD capable of high-affinity binding to human ACE2 achieved via a sequence distinct from previously recognized optimal motifs, and a unique polybasic S1–S2 cleavage site with predicted O-linked glycans—strongly support natural evolutionary processes rather than purposeful genetic manipulation. The presence of closely matching RBDs in pangolin coronaviruses and high overall genome similarity to a bat coronavirus (RaTG13), despite RBD differences, provides a coherent evolutionary context consistent with zoonosis. Considering known mechanisms by which polybasic sites arise in other viruses and the lack of evidence for a laboratory backbone, laboratory manipulation is deemed implausible. The findings narrow plausible origins to natural selection either in an intermediate animal host before human spillover or during early, possibly cryptic, human-to-human transmission after spillover. Understanding these pathways is crucial for assessing re-emergence risks, guiding surveillance, and informing studies of transmissibility and pathogenesis.

Conclusion

Comparative genomic analyses indicate that SARS-CoV-2 is not a laboratory construct or purposefully manipulated virus. Its key features—the ACE2-binding RBD and the polybasic S1–S2 cleavage site with predicted O-linked glycans—have parallels in naturally occurring coronaviruses, supporting origin via natural selection either in an animal host prior to zoonosis or in humans after zoonosis. To resolve the origin more definitively, future work should prioritize: discovery and sequencing of closely related animal coronaviruses (especially those with partial or full polybasic cleavage sites), identification of any intermediate host, functional and animal-model studies to assess the impact of these genomic features on transmissibility and pathogenesis, retrospective analyses of early human cases, and enhanced surveillance of pneumonia in humans and animals.

Limitations

The correspondence relies on comparative genomics and existing structural/biochemical evidence rather than direct experimental demonstration for SARS-CoV-2 of some predicted features (e.g., O-linked glycosylation and functional consequences of the polybasic site). The diversity of animal coronaviruses is vastly undersampled, and no direct progenitor virus has been identified. It remains impossible at present to prove or disprove all origin scenarios definitively. Estimates of emergence timing have uncertainty, and retrospective serological data to detect cryptic transmission are limited and potentially confounded by cross-reactivity.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny