IEEE/ACM Transactions on Computational Biology and Bioinformatics

2005 IEEE Engineering in Medicine and Biology Magazine  
This paper proposes new algorithms for computing pairwise rearrangement scenarios that conserve the combinatorial structure of genomes. More precisely, we investigate the problem of sorting signed permutations by reversals without breaking common intervals. We describe a combinatorial framework for this problem that allows to characterize classes of signed permutations for which one can compute in polynomial time a shortest reversal scenario that conserves all common intervals. In particular we
more » ... define a class of permutations for which this computation can be done in linear time with a very simple algorithm that does not rely on the classical Hannenhalli-Pevzner theory for sorting by reversals. We apply these methods to the computation of rearrangement scenarios between permutations obtained from 16 synteny blocks of the X chromosomes of the human, mouse and rat. Index Terms Evolution scenarios, reversals, common intervals. I. INTRODUCTION The reconstruction of evolution scenarios based on genome rearrangements, and in particular reversals and translocations, has proven to be a powerful tool to understand the evolution of groups of species. For eukaryotic genomes, several evolution scenarios have been recently proposed between vertebrates genomes [10], [11], [32], using the MGR and GRIMM softwares [9], [39]. These scenarios lead to interesting insight on the architecture of ancestral genomes, the evolution pattern across different lineages or the presence of genome regions prone to be involved in rearrangements (the so-called "breakpoint reuse" hypothesis) [31], [33], [36]. Putative evolution scenarios based on rearrangements were also computed on large datasets of prokaryotic genomes [3], [16]. In this paper, we describe new combinatorial and algorithmical results for computing such scenarios, based on the combinatorial problem of sorting by reversals. Current approaches for sorting by reversals: At the heart of the computation of such rearrangement scenarios is the encoding of genomes by signed permutations, where each element of a permutation represents a genomic segment -from large synteny blocks in [10] to genes in prokaryotic genomes analysis [16] -, and the problem of sorting signed permutations by reversals, introduced by Sankoff [35]: given two signed permutations, find a "good" sequence of reversals that transforms one into the other one. The original approach consists to define a "good" sequence of reversals as a parsimonious sequence of reversals, which is classical in phylogenetics. This approach was pioneered, among others, by Hannenhalli and Pevzner, who described a combinatorial and algorithmical framework, known as the Hannenhalli-Pevzner theory, leading to polynomial time algorithms computing parsimonious sequences of reversals sorting signed permutations [21] . Later, their approach was refined and simplified by several authors and the current best algorithm to compute a parsimonious reversal scenario runs in subquadratic time [38] . Note that the best algorithm to compute the length of a parsimonious reversal scenario, known as the reversal distance, runs in linear time [2], [8]. However, the approach based on parsimonious pairwise scenarios suffers from at least two limitations. First, it was shown in [6] that the number of such scenarios can be exponential, and it then becomes problematic to pick DRAFT Plan of the paper: In Section II, we define precisely the notions of reversal, scenario, common interval, and the problem of perfect sorting by reversals. In Section III, we introduce the notion of strong intervals of a signed permutation. These strong intervals form a linear size basis of the set of common intervals of a permutation. The strong intervals of a signed permutation can be arranged in a tree structure, called the strong interval tree, that is a central combinatorial tool to design algorithms computing perfect scenarios. Note that the strong interval tree of a permutation has a deep relationship with the theory of modular decomposition of permutation graphs [7], [30], that will be described in Appendix. In Section IV, we show that perfect scenarios can be characterized precisely in terms of the vertices of the strong interval tree, which makes this structure a "guide" for computing perfect scenarios. Building on this fact, we propose (1) a subquadratic time algorithm for computing perfect scenarios
doi:10.1109/memb.2005.1411355 fatcat:a7uxabsznbasncscnbyuqlrgw4