Annotation of selection strengths in viral genomes

Stephen McCauley, Saskia de Groot, Thomas Mailund, Jotun Hein
2007 Computer applications in the biosciences : CABIOS  
Motivation: Viral genomes tend to code in overlapping reading frames to maximize informational content. This may result in atypical codon bias and particular evolutionary constraints. Due to the fast mutation rate of viruses, there is additional strong evidence for varying selection between intra-and intergenomic regions. The presence of multiple coding regions complicates the concept of K a /K s ratio, and thus begs for an alternative approach when investigating selection strengths. Building
more » ... the paper by McCauley and Hein, we develop a method for annotating a viral genome coding in overlapping reading frames. We introduce an evolutionary model capable of accounting for varying levels of selection along the genome, and incorporate it into our prior single sequence HMM methodology, extending it now to a phylogenetic HMM. Given an alignment of several homologous viruses to a reference sequence, we may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. Results: We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as of three Hepatitis B sequences. We obtain an annotation of the coding regions, as well as a posterior probability for each site of the strength of selection acting on it. From this we may deduce the average posterior selection acting on the different genes. Whilst we are encouraged to see in HIV2, that the known to be conserved genes gag and pol are indeed annotated as such, we also discover several sites of less stringent negative selection within the env gene. To the best of our knowledge, we are the first to subsequently provide a full selection annotation of the Hepatitis B genome by explicitly modelling the evolution within overlapping reading frames, and not relying on simple K a /K s ratios. Availability: The Matlab code can be downloaded from http:// www.stats.ox.ac.uk/ mccauley/ Contact: degroot@stats.ox.ac.uk
doi:10.1093/bioinformatics/btm472 pmid:17921171 fatcat:pzluqel5obbdbf4u6mx3pz3vui