Investigating function roles of hypothetical proteins encoded by the Mycobacterium tuberculosis H37Rv genome

Zhiyuan Yang, Xi Zeng, Stephen Kwok-Wing Tsui
2019 BMC Genomics  
Mycobacterium tuberculosis (MTB) is a common bacterium causing tuberculosis and remains a major pathogen for mortality. Although the MTB genome has been extensively explored for two decades, the functions of 27% (1051/3906) of encoded proteins have yet to be determined and these proteins are annotated as hypothetical proteins. Methods: We assigned functions to these hypothetical proteins using SSEalign, a newly designed algorithm utilizing structural information. A set of rigorous criteria was
more » ... orous criteria was applied to these annotations in order to examine whether they were supported by each parameter. Virulence factors and potential drug targets were also screened among the annotated proteins. Results: For 78% (823/1051) of the hypothetical proteins, we could identify homologs in Escherichia coli and Salmonella typhimurium by using SSEalign. Functional classification analysis indicated that 62.2% (512/823) of these annotated proteins were enzymes with catalytic activities and most of these annotations were supported by at least two other independent parameters. A relatively high proportion of transporter was identified in MTB genome, indicating the potential frequent transportation of frequent absorbing essential metabolites and excreting toxic materials in MTB. Twelve virulence factors and ten vaccine candidates were identified within these MTB hypothetical proteins, including two genes (rpoS and pspA) related to stress response to the host immune system. Furthermore, we have identified six novel drug target candidates among our annotated proteins, including Rv0817 and Rv2927c, which could be used for treating MTB infection. Conclusions: Our annotation of the MTB hypothetical proteins will probably serve as a useful dataset for future MTB studies.
doi:10.1186/s12864-019-5746-6 fatcat:si6ianhadzatpmyjsdtnec56ee