Using d-gap patterns for index compression

Jinlin Chen, Terry Cook
2007 Proceedings of the 16th international conference on World Wide Web - WWW '07  
Sequential patterns of d-gaps exist pervasively in inverted lists of Web document collection indices due to the cluster property. In this paper the information of d-gap sequential patterns is used as a new dimension for improving inverted index compression. We first detect d-gap sequential patterns using a novel data structure, UpDown Tree. Based on the detected patterns, we further substitute each pattern with its pattern Id in the inverted lists that contain it. The resulted inverted lists
more » ... then coded with an existing coding scheme. Experiments show that this approach can effectively improve the compression ratio of existing codes.
doi:10.1145/1242572.1242769 dblp:conf/www/ChenC07a fatcat:5ep3t53vb5bbjh2xa4hdazkwpi