A Mixed Coding Scheme for Inverted File Index Compression

Jinlin Chen, Ping Zhong, Terry Cook
2006 2006 1st IEEE Workshop on Hot Topics in Web Systems and Technologies  
taking consecutive differences, d i+1 -d i . In this way it is possible to code inverted lists using fewer bits per pointer on average. Many codes have been proposed for compressing inverted lists. These codes use different codewords for different dgaps. The performance of a code is decided by whether the implicit d-gap distribution model conforms to that of the document collection. One way to improve inverted file compression is to use the cluster property [1] of document collection, which
more » ... es that term occurrences are not uniformly distributed. Some terms are more frequently used in some parts of the collection than in others. The corresponding part of the inverted list will consequently be small d-gap values clustered. Interpolative code [9] exploits the cluster property of term occurrences and achieves very good performance. Other codes that favor small d-gaps also perform well on document collections with cluster property.
doi:10.1109/hotweb.2006.355272 dblp:conf/hotweb/ChenZC06 fatcat:g4u6z5g2nbhdtdiho4cj4twuhu