Parameterized Intractability of Distinguishing Substring Selection

Jens Gramm, Jiong Guo, Rolf Niedermeier
2004 Theory of Computing Systems  
A central question in computational biology is the design of genetic markers to distinguish between two given sets of (DNA) sequences. This question is formalized as the NP-complete Distinguishing Substring Selection problem (DSSS for short) which asks, given a set of "good" strings and a set of "bad" strings, for a solution string which is, with respect to Hamming metric, "away" from the good strings and "close" to the bad strings. More precisely, given integers d g , d b , and L, we ask for a
more » ... length-L string s such that s has Hamming distance at least d g to every length-L substring of the good strings and such that every bad string has a length-L substring with Hamming distance at most d b to s. Studying the parameterized complexity of DSSS, we show that, already for binary alphabet, DSSS is W[1]-hard with respect to its natural parameters. This, in particular, implies that a recently given polynomial-time approximation scheme (PTAS) by Deng et al. [6, 7] cannot be replaced by a so-called efficient polynomial-time approximation scheme (EPTAS) [4] unless an unlikely collapse in parameterized complexity theory occurs. This is seemingly the first computational biology problem for which such a border between PTAS (which exists) and EPTAS (which is unlikely to exist) could be established. By way of contrast, for a special case of DSSS, we present an exact fixed-parameter algorithm solving the problem efficiently. In this way, we * An extended abstract of this paper appeared under the title "On exact and approximation algorithms for also exhibit a sharp border between fixed-parameter tractability and intractability results.
doi:10.1007/s00224-004-1185-z fatcat:n2jdzmlmarastelx4frzhr4s3y