Analogs and Duals of the MAST Problem for Sequences and Trees
Lecture Notes in Computer Science
Two natural kinds of problems about \structured collections of symbols" can be generally refered to as the Largest Common Subobject and the Smallest Common Superobject problems, which w e consider here as the dual problems of interest. For the case of rooted binary trees where the symbols occur as leaf-labels and a subobject is de ned by label-respecting hereditary topological containment, both of these problems are NP-complete, as are the analogous problems for sequences (the well-known
... Common Subsequence and Shortest Common Supersequence problems). However, when the trees are restricted by allowing each symbol to occur as a leaf-label at most once (which w e call a phylogenetic tree or p-tree), then the Largest Common Subobject problem, better known as the Maximum Agreement Subtree (MAST) problem, is solvable in polynomial time. We explore the complexity of the basic subobject and superobject problems for sequences and binary trees when the inputs are restricted to p-trees and psequences (p-sequences are sequences where each s y m bol occurs at most once). We p r o ve that the sequence analog of MAST can be solved in polynomial time. The Shortest Common Supersequence problem restricted to inputs consisting of a collection of p-sequences (pSCS) remains NP-complete, as does the analogous Smallest Common Supertree problem restricted to p-trees (pSCT). We also show that both problems are hard for the parameterized complexity classes W 1] where the parameter is the number of input trees or sequences. We p r o ve x e dparameter tractability for pSCS and pSCT when the k input sequences (trees) are restricted to be complete: every symbol of occurs exaxtly once in each object and the question is whether there is a common superobject of size bounded by j j +r and the parameter is the pair (k r). We s h o w that without this restriction, both problems are harder than Directed Feedback Vertex Set, for which parameterized complexity is famously unresolved. We describe an application of the tractability result for pSCT in the study of gene duplication events, where k and r are naturally small.