Evolutionary Origin of SARS-CoV-2 (COVID-19 Virus) and SARS Viruses through the Identification of Novel Protein/DNA Sequence Features Specific for Different Clades of Sarbecoviruses
Both SARS-CoV-2 (COVID-19) and SARS coronaviruses (CoVs) are members of the subgenus Sarbecovirus. To understand the origin of SARS-CoV-2 and its relation to other viruses, protein sequences from sarbecoviruses were analyzed to identify conserved inserts or deletions (termed CSIs) demarcating either particular clusters/lineages of sarbecoviruses or those shared by specific lineages shedding light on their interrelationships. We report several clade-specific CSIs in the spike (S) and
... (S) and nucleocapsid (N) proteins that reliably demarcate distinct sarbecoviruses clades providing important insights into the origin and evolution of SARS-CoV-2. Two CSIs in the N-terminal domain (NTD) of S-protein are uniquely shared by SARS-CoV-2, BatCoV-RaTG13 and most pangolin CoVs (SARS-CoV-2r cluster); another CSI supports a closer relationship of SARS-CoV-2 to BatCov-RaTG13. Three additional CSIs in the NTD are specific for two Bat-SARS-like CoVs (viz. CoVZXC21 and CoVZC45; CoVZC cluster) which form an outgroup of the SARS-CoV-2r cluster. Interestingly, one of the pangolin-CoV-MP789 also shares these CSIs but lack the CSIs specific for the SARS-CoV-2r cluster. The N-terminal sequence (aa 1-320) of the S-protein for pangolin-CoV-MP789 shows highest similarity (85.94%) to the CoVZC cluster, while its C-terminal region including the receptor binding domain (RBD) is most similar (97-98% identity) to the SARS-CoV-2 virus. These observations indicate that the spike protein sequence for the strain MP789 is of chimeric origin. Multiple CSIs described here also distinguish two bat SARS-CoVs strains (BM48-31/BGR/2008 and SARS_BtKY72) from all others. Our work also clarifies that two large CSIs (5 aa and 13 aa) found in the RBD of S-protein are mainly specific for the SARS and SARS-CoV-2r clusters of CoVs. The surface loops formed by these CSIs are predicted to be important in the binding of S-protein with the human ACE-2 receptor. Lastly, we have mapped the locations of different CSIs in the structure of the S-protein. These studies reveal that the three CSIs specific for the SARS-CoV-2r cluster form distinct surface-exposed loops/patches on the S-protein. As the surface-exposed loops play important roles in mediating novel interactions, the novel lobes/patches formed by the SARS-CoV-2-specific CSIs in the spike protein are predicted to play important roles in the interaction of this protein with other surface-exposed components in the host cells thereby enhancing the binding/infectivity of this virus to humans.