The coding capacity of SARS-CoV-2 [article]

Yaara Finkel, Orel Mizrahi, Aharon Nachshon, Shira Weingarten-Gabbay, Yfat Yahalom-Ronen, Hadas Tamir, Hagit Achdout, Sharon Melamed, Shay Weiss, Tomer Isrealy, Nir Paran, Michal Schwartz (+1 others)
2020 bioRxiv   pre-print
SARS-CoV-2 is a coronavirus responsible for the COVID-19 pandemic. In order to understand its pathogenicity, antigenic potential and to develop diagnostic and therapeutic tools, it is essential to portray the full repertoire of its expressed proteins. The SARS-CoV-2 coding capacity map is currently based on computational predictions and relies on homology to other coronaviruses. Since coronaviruses differ in their protein array, especially in the variety of accessory proteins, it is crucial to
more » ... haracterize the specific collection of SARS-CoV-2 translated open reading frames (ORF)s in an unbiased and open-ended manner. Utilizing a suit of ribosome profiling techniques, we present a high-resolution map of the SARS-CoV-2 coding regions, allowing us to accurately quantify the expression of canonical viral ORFs and to identify 23 novel unannotated viral ORFs. These ORFs include several in-frame internal ORFs lying within existing ORFs, resulting in N-terminally truncated products, as well as internal out-of-frame ORFs, which generate novel polypeptides, such as a 97 amino acid (aa) poly-peptide that is translated from the ORF-N transcript. Finally, we detected a prominent initiation at a CUG codon located in the 5'UTR. Although this codon is shared by all SARS-CoV-2 transcripts, the initiation was specific to the genomic RNA, indicating that the genomic RNA harbors unique features that may affect ribosome engagement. Overall, our work reveals the full coding capacity of SARS-CoV-2 genome, providing a rich resource, which will form the basis of future functional studies and diagnostic efforts.
doi:10.1101/2020.05.07.082909 fatcat:rdfee67nc5bxvde34bjq3c7yhu