Compressed Dynamic Tries with Applications to LZ-Compression in Sublinear Time and Space [chapter]

Jesper Jansson, Kunihiko Sadakane, Wing-Kin Sung
2007 Lecture Notes in Computer Science  
The dynamic trie is a fundamental data structure which finds applications in many areas. This paper proposes a compressed version of the dynamic trie data structure. Our data-structure is not only space efficient, it also allows pattern searching in o(|P |) time and leaf insertion/deletion in o(log n) time, where |P | is the length of the pattern and n is the size of the trie. To demonstrate the usefulness of the new data structure, we apply it to the LZ-compression problem. For a string S of
more » ... ngth s over an alphabet A of size σ, the previously best known algorithms for computing the Ziv-Lempel encoding (lz78) of S either run in: (1) O(s) time and O(s log s) bits working space; or (2) O(sσ) time and O(sH k + s log σ/ log σ s) bits working space, where H k is the korder entropy of the text. No previous algorithm runs in sublinear time. Our new data structure implies a LZ-compression algorithm which runs in sublinear time and uses optimal working space. More precisely, the LZ-compression algorithm uses O(s(log σ + log log σ s)/ log σ s) bits working space and runs in O(s(log log s) 2 /(log σ s log log log s)) worst-case time, which is sublinear when σ = 2 o(log s log log log s (log log s) 2 ) . Compressed Dynamic Tries with Applications to LZ-Compression 425 computer networks, dynamic tries are used in IP routing to efficiently maintain the hierarchical organization of routing information to enable fast lookup of IP addresses [14] . In data compression, dynamic tries are used to represent the socalled lz-trie and the Huffman coding trie which are the key data structures in the Ziv-Lempel encoding (lz78) [20] (or its variant LZW encoding [17] ) and the Huffman encoding, respectively. Furthermore, many data structures such as the suffix trie/suffix tree, the Patricia trie [11] , and the associative array (hashing table) can be maintained as dynamic tries. Without loss of generality, assume σ ≤ n. A dynamic trie T of size n can be implemented using a standard tree data-structure in O(n log n) bits space such that: (1) insertion or deletion of a leaf into or from T takes O(1) time; and (2) finding the longest prefix of a query pattern P in T takes O(|P |) time. A number of solutions have been proposed to improve the average time and space complexities of tries [1,2,11]. However, in the worst case, those solutions still use O(n log n) bits space and pattern searching still requires O(|P |) time. Employing the latest advances on compressed trees, a trie can now be maintained in O(n log σ) bits space under the unit-cost RAM model such that: (1) insertion or deletion of a leaf takes O(log n) time; and (2) the longest common pattern query takes O(|P |) time. Note that none of the existing data structures can answer the longest common pattern query in o(|P |) time. This paper assumes a unit-cost RAM model with word size logarithmic in n, in which standard arithmetic and bitwise boolean operations on word-sized operands can be performed in constant time [9] . Also, we assume the pattern P is packed in O(|P | log σ/ log n) words. Under such a model, we propose a data structure which uses O(n log σ) bits such that: (1) insertion or deletion of a leaf takes O((log log n) 2 / log log log n) time; and (2) the longest common pattern query takes O( |P |
doi:10.1007/978-3-540-77050-3_35 fatcat:qudoqejvrjdhjn6dflooqcemzm