LZW based compressed pattern matching

Tao Tao, A. Mukherjee
Data Compression Conference, 2004. Proceedings. DCC 2004  
Compressed pattern matching is an emerging research area that addresses the following problem: given a file in compressed format and a pattern, report the occurrence(s) of the pattern in the file with minimal (or no) decompression. In this paper, we report our work on compressed pattern matching in LZW compressed files. The reported work is based on Amir's well-known "almost-optimal" algorithm but has been improved to search not only the first occurrence of the pattern but also all other
more » ... nces. The improvements also include the multipattern matching and a faster implementation for socalled "simple patterns". Extensive experiments have been conducted to test the search performance and to compare with the BWT-based compressed pattern matching algorithms. The results showed that our method is competitive among the best compressed pattern matching algorithms. LZW is one of the most efficient and popular compression algorithms used extensively and our method requires no modification on the compression algorithm. The work reported in this paper, therefore, has great economical and market potential.
doi:10.1109/dcc.2004.1281544 dblp:conf/dcc/TaoM04 fatcat:n2asidsn3vh2njtbwzkaqnkngu