String Indexing for Patterns with Wildcards [article]

Philip Bille, Inge Li Goertz, Hjalte Wedel Vildhøj, Søren Vind
2012 arXiv   pre-print
We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results. - A linear space index with query time O(m+σ^j n + occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn) in the worst case. - An index with query time
more » ... j+occ) using space O(σ^k^2 n ^k n), where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.
arXiv:1110.5236v2 fatcat:afrg25lvifcabo5cqmdb2jenbq