(S,C)-Dense Coding: An Optimized Compression Code for Natural Language Text Databases [chapter]

Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, María F. Esteller
2003 Lecture Notes in Computer Science  
This work presents (s, c)-Dense Code, a new method for compressing natural language texts. This technique is a generalization of a previous compression technique called End-Tagged Dense Code that obtains better compression ratio as well as a simpler and faster encoding than Tagged Huffman. At the same time, (s, c)-Dense Code is a prefix code that maintains the most interesting features of Tagged Huffman Code with respect to direct search on the compressed text. (s, c)-Dense Coding retains all
more » ... e efficiency and simplicity of Tagged Huffman, and improves its compression ratios. We formally describe the (s, c)-Dense Code and show how to compute the parameters s and c that optimize the compression for a specific corpus. Our empirical results show that (s, c)-Dense Code improves End-Tagged Dense Code and Tagged Huffman Code, and reaches only 0.5% overhead over plain Huffman Code.
doi:10.1007/978-3-540-39984-1_10 fatcat:edztgxtibzcjtj7dcc66ewfv7u