Revealing highly conserved regions in the E6 protein among distinct human papillomavirus types using comparative analysis of multiple sequence alignments

JE Gabriel, DDLG de Figueiredo, RP de Farias
2013 Brazilian Journal of Biology  
Several epidemiological and molecular studies have confirmed that cervical infection by certain human papillomaviruses (HPV) types is a precursor of the genesis of cervical neoplasia (Gravitt, 2011; Guzmán-Olea et al., 2012) . It is yet well elucidated that high risk genotypes lead to the development of cervical cancer and are also associated with other mucosal anogenital, head and neck tumors. The HPV genome is approximately 8 kb in length and is divided into three regions, the non-coding long
more » ... the non-coding long control region (LCR, ~1 kb), and the protein coding early (E, ~4 kb) and late (L, ~3 kb) regions (Ganguly and Parihar, 2009) . The viral genome encodes six early (E1, E2, E4, E5, E6 and E7) and two late (L1 and L2) proteins. HPV E6 and E7 genes encode oncoproteins that cause transformation of the host cell, whose action is involved in maintenance of the HPV genome extrachromosomally. Since HPV immortalisation of keratinocytes in vitro has been identified as an important step in tumor progression in vivo, some studies have provided strong experimental supports for the hypothesis that the maintenance and expression of E6 and E7 proteins observed in human cervical carcinomas has pathologic significance (Hawley-Nelson et al., 1989; Ganguly and Parihar, 2009 ). The E6 protein in high risk HPV genotypes has been reported to prevent apoptosis by a p53-independent mechanism which involves inhibition of bax gene expression and degradation of Bax protein in human keratinocytes, resulting in inhibition of apoptosis and therefore cells accumulate mutations in their DNA (Magal et al., 2005) . Recently, Liu et al. (2009) have suggested that the E6 protein mediates telomerase activation by a posttranscriptional mechanism, exerting potential effects on the direct modulation of cell telomerase/telomere function, and triggering a crucial role in both neoplasia and virus replication. Over the last decades, computational biology tools have opened new insights concerning the characterisation and conservation of biological molecule sequences among distinct viral genomes. Using multiple sequence alignments, the purpose of the present study was to perform comparative analysis of E6 protein segments between high and low risk HPV genotypes. For these analyses, E6 protein segments present in HPV 16 and 18 (high risk and virulent HPV types) and in HPV11 and 06A (low risk and non-virulent HPV types) were selected (Ganguly and Parihar, 2009). Initially, amino acid sequences of E6 protein from these four genotypes were searched using the computational algorithm UniProtKB/Swiss-Prot that provides a high quality annotated and non-redundant protein sequence database (Lesk, 2002; Jungo et al., 2012) . Such a search resulted in the identification of amino acid sequences corresponding to E6 protein previously sequenced and deposited in this database with access numbers: P03126 for HPV16, P06463 for HPV18, P04019 for HPV11 and Q84291 for HPV06A. The multiple alignments of the amino acid residues in the E6 viral protein among distinct HPV genotypes were generated using the UniProtKB/Swiss Prot database tools (Figure 1 ), providing relevant statistical parameters, such as similarity and identity. The comparative analysis of the E6 viral protein showed a 56.3% similarity between the HPV16 and 18 types. Additionally, the amino acid alignments between HPV11 and 06A genotypes revealed similarity value corresponding to 81.3%. Multiple alignments of the segment of interest between the four HPV types resulted in a 22.7% similarity, identifying similar and identical positions corresponding to 48 and 37 residues, respectively, in a total amino acid amount of 158 (Figure 1 ). Using the basic local alignment search tools (BLAST), further analysis provided additional statistical parameters, such as expectation values (e-values) and score bits. E-values in blast results represent the probability of the alignment has occurred by chance, whereas the value of score bits depends on the size of alignment, the number of matches/ mismatches/gaps and matrix used for the comparison of sequences and is normalised by means of variable statistics (Lesk, 2002; Healy, 2007) . Blast analysis generated significant hits with E-values corresponding to 1 e -61 , maximum identity of 57% and 178 score bits between HPV16 and 18 types, whereas significant alignments were observed between HPV11 and 06A with E-values corresponding to 1 e -95 , maximum identity of 81% and 264 score bits (data not shown). According to Lesk (2002) , E-values close to zero indicate that the match of E6 proteins is significant in short sequences and therefore, the findings presented in this study demonstrate accentuated similarities in the alignments of the amino acid residues in the E6 protein between high and low risk HPV types. Based on the substitutions that really do or do not occur in real protein sequences, the ability of a protein to tolerate those replacements is related to the chemical properties of the amino acids in the pair. As observed in Figure 1 (arrows), replacements of amino acid residues with identical chemical properties (for example, of valine (V) for isoleucine (I) between HPV18 and 16, of methionine (M) for leucine (L) between HPV11 and 06A) might indicate conservative changes throughout E6 viral protein. Such conservative changes could represent replacements of both amino acids with identical chemical proprieties (for
doi:10.1590/s1519-69842013000200030 pmid:23917577 fatcat:xulylwjxcfckbm4kbf72e5mvfq