An Algorithm to Enumerate Sorting Reversals for Signed Permutations

Adam C. Siepel
2003 Journal of Computational Biology  
The rearrangement distance between single-chromosome genomes can be estimated as the minimum number of inversions required to transform the gene ordering observed in one into that observed in the other. This measure, known as "inversion distance," can be computed as the reversal distance between signed permutations. During the past decade, much progress has been made both on the problem of computing reversal distance and on the related problem of nding a minimum-length sequence of reversals,
more » ... ch is known as "sorting by reversals." For most problem instances, however, many minimum-length sequences of reversals exist, and in the absence of auxiliary information, no one is of greater value than the others. The problem of nding all minimum-length sequences of reversals is thus a natural generalization of sorting by reversals, yet it has received little attention. This problem reduces easily to the problem of nding all "sorting reversals" of one permutation with respect to another-that is, all reversals ½ such that, if ½ is applied to one permutation, then the reversal distance of that permutation from the other is decreased. In this paper, an ef cient algorithm is derived to solve the problem of nding all sorting reversals, and experimental results are presented indicating that, while the new algorithm does not represent a signi cant improvement in asymptotic terms (it takes O(n 3 ) time, for permutations of size n; the problem can now be solved by brute force in 2 (n 3 ) time), it performs dramatically better in practice than the best known alternative. An implementation of the algorithm is available at www.cse.ucsc.edu/»acs. 575 576 SIEPEL of n genes, then the genomes can be represented by permutations of size n, and their inversion distance is equal to the minimum number of "reversals" required to transform one permutation into the other, known as the reversal distance between the permutations. Here, a reversal is an operation by which contiguous elements of a permutation are changed in order: for example, .1; 2; 3; 4/ ! .3; 2; 1; 4/. There has been considerable interest during the past decade in the reversal distance problem and in the related but distinct problem of nding an actual sequence of reversals that will "sort" one permutation with respect to another. Both of these problems have been shown to be NP-hard with ordinary permutations (Caprara, 1997), but in the case of signed permutations, where each permutation element is assigned a "C" or "¡" sign, they have polynomial-time solutions (Hannenhalli and Pevzner, 1995) (with signed permutations, a reversal changes the sign of affected elements, as well as their order). The genome rearrangement problem can be modeled with signed permutations if the direction of transcription is known of each gene in each genome. The rst major step in solving the reversal-distance and sorting-by-reversals problems was apparently the recognition, by Bafna and Pevzner (1993) , that the reversal distance between signed permutations was closely related to the number of cycles in a particular diagram-the "breakpoint graph," or (more colorfully) "Diagram of Reality and Desire" (Setubal and Meidanis, 1997). The breakthrough came when Hannenhalli and Pevzner (1995) characterized certain peculiar structures in the breakpoint graph-which they called "hurdles" and "fortresses"-that caused the relationship between cycles and distance not to be exact. Hannenhalli and Pevzner proved that reversal distance can be exactly expressed as a function of the numbers of cycles, hurdles, and fortresses and derived a O.n 4 /-time algorithm to sort by reversals (where n is the permutation size). Berman and Hannenhalli (1996) soon improved the bound for the sorting problem to O.n 2 ®.n// (where ® is the inverse of Ackermann's function), and it was then further improved by Kaplan, Shamir, and Tarjan (1999) to O.n 2 /. Recently, Bader, Moret, and Yan (2001) have shown how to compute reversal distance (without actually sorting) in O.n/ time, and Bergeron (2001) and Bergeron and Strasbourg (2001) have described an alternative sorting algorithm that takes O.n 2 / time but sidesteps much of the complexity of earlier algorithms. All sorting-by-reversals algorithms published so far nd a single minimum-length sequence of sorting reversals. While they generally can be adapted to nd multiple sequences of sorting reversals, none will nd all sequences. For certain search problems in the space of genome rearrangements, it can be very useful to obtain all minimum-length sequences of sorting reversals, as has been shown in the case of the reversal median problem (Siepel, 2001) . Knowing all minimum-length sequences of sorting reversals also might improve the usefulness in real scienti c applications of reversal sorting algorithms. One might attempt, for example, to assess the biological merits of various parsimonious rearrangement scenarios. Indeed, from a biological perspective, a single minimum-length sequence of sorting reversals is of limited value, even aside from the limitations of an inversions-only model of rearrangement. Many such sequences exist (as will be shown below), and in the absence of additional data or a richer model, no one is more plausible than the others. The problem of nding all minimum-length sequences of sorting reversals between a permutation ¼ and a permutation Á reduces easily to the problem of nding all individual sorting reversals of an intermediate permutation ¼ 0 with respect to Á. It is this "inner" or "branching" problem-which I will call the "all sorting reversals problem" .ASR/-that this paper addresses. The paper begins with a straighforward classi cation scheme for all possible reversals. Next, a simpli ed version of the problem is introduced, called the "Fortress-Free Model" (FFM), and it is shown, under the FFM, what criteria the reversals of each class must meet in order to be sorting reversals. Next, fortresses are reintroduced, the results of the previous section are adapted for the general case, and an algorithm is presented that solves ASR. Finally, experimental results are shown that demonstrate the ef ciency of the algorithm and af rm its correctness.
doi:10.1089/10665270360688200 pmid:12935346 fatcat:sd2dkca7qrcalozhm5wou76lyq