In silico comparative analysis of SARS-CoV-2 Nucleocapsid (N) protein using bioinformatics tools

Mehmet Emin URAS
2021 Frontiers in Life Sciences and Related Technologies  
The world has been encountered to one of the biggest pandemics that causing by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). SARS-CoV-2 is placed in the Beta-CoV genus in the Coronaviridae family. N protein is one of the crucial structural proteins of SARS-CoV-2 that binds to the genome thereby generating helical ribonucleoprotein core. It is involved in viral transcription/replication, translation, and viral assembly after entering the host cell through interacting with host
more » ... teins. N protein sequences of SARS-CoV-2 and taxonomically related CoVs are examined using bioinformatics tools and approaches including sequence alignment, sequence and phylogenetic analyzes, and predicting of putative N-Glycosylation and phosphorylation positions and also predictions and comparative analyzes are performed on 3D structures of N proteins from SARS-CoV-2 related CoVs through using of some of applied bioinformatics analyzes. Results of mega BLAST search revealed that the most similar N protein sequence to SARS-CoV-2 is Bat-CoV RaTG13 N protein sequence in the taxonomically related CoVs. SARS-CoV-2 is grouped with SARS, pangolin, civet and bat CoVs (RATG13, SL ZC45 and SL ZXC21) in N protein, nucleotide and protein based ML phylogenetic trees. Some of SARS-CoV-2 N proteins were showed divergence from other SARS-CoV-2 N proteins analyzed due to amino acid substitutions detected in SARS-CoV-2 N proteins samples in phylogenetic trees. The highest amino acid substitutions were detected in Richmont/USA (QJA42209.1) and Greece (QIZ16579.1) samples, with 2 and 3 place substitutions, respectively. By domain analyzes, three domains were detected as Corona_nucleocora (Pfam), N terminal CoV RNA-binding domain (HAMAP) and C terminal N protein dimerization domain (HAMAP). Possible N-glycosylation positions of SARS-CoV-2 N protein were predicted at two positions. Assessments of possible serine, threonine and tyrosine phosphorylations were found to be at 100 positions, 34 of them were higher than 80% possibility. 3D structure analysis based on TM scores revealed that although the results of 3D structure analysis were shown consistency with the taxonomy of the CoVs, the 3D structures of SARS-CoV-2 N protein and taxonomically related CoVs were not at the same fold.
doi:10.51753/flsrt.843166 fatcat:3v55w5h34fem7gztcurittn3zm