Incremental discovery of the irredundant motif bases for all suffixes of a string in O(n2logn) time

Alberto Apostolico, Claudia Tagliacollo
2008 Theoretical Computer Science  
Compact bases formed by motifs called "irredundant" and capable of generating all other motifs in a sequence have been proposed in recent years and successfully tested in tasks of biosequence analysis and classification. Given a sequence s of n characters drawn from an alphabet Σ, the problem of extracting such a base from s had been previously solved in time O(n 2 log n log | Σ |) and O(| Σ | n 2 log 2 n log log n), respectively, using the FFTbased string searching by Fischer and Paterson.
more » ... recently, a solution on binary strings taking time O(n 2 ) without resorting to the FFT was also proposed. In the present paper, we considered the problem of incrementally extracting the bases of all suffixes of a string. This problem was solved in a previous work in time O(n 3 ). A much faster incremental algorithm is described here, which takes time O(n 2 log n) for binary strings. Although this algorithm does not make use of the FFT, its performance is comparable to the one exhibited by the previous FFT-based algorithms involving the computation of only one base. The implicit representation of a single base requires O(n) space, whence for finite alphabets the proposed solution is within a log n factor from optimality.
doi:10.1016/j.tcs.2008.08.002 fatcat:wjvrtw4pbrel7epltf3an5xxoi