Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

Takuya TAKAGI, Shunsuke INENAGA, Kunihiko SADAKANE, Hiroki ARIMURA
2017 IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences  
In this paper, we present a new data structure called the packed compact trie (packed c-trie) which stores a set S of k strings of total length n in n σ + O(k n) bits of space and supports fast pattern matching queries and updates, where σ is the size of an alphabet. Assume that α = _σ n letters are packed in a single machine word on the standard word RAM model, and let f(k,n) denote the query and update times of the dynamic predecessor/successor data structure of our choice which stores k
more » ... ers from universe [1,n] in O(k n) bits of space. Then, given a string of length m, our packed c-tries support pattern matching queries and insert/delete operations in O(m/α f(k,n)) worst-case time and in O(m/α + f(k,n)) expected time. Our experiments show that our packed c-tries are faster than the standard compact tries (a.k.a. Patricia trees) on real data sets. As an application of our packed c-trie, we show that the sparse suffix tree for a string of length n over prefix codes with k sampled positions, such as evenly-spaced and word delimited sparse suffix trees, can be constructed online in O((n/α + k) f(k,n)) worst-case time and O(n/α + k f(k,n)) expected time with n σ + O(k n) bits of space. When k = O(n/α), by using the state-of-the-art dynamic predecessor/successor data structures, we obtain sub-linear time construction algorithms using only O(n/α) bits of space in both cases. We also discuss an application of our packed c-tries to online LZD factorization.
doi:10.1587/transfun.e100.a.1785 fatcat:oqtma2hqg5dylj5g7o73hcrfea