Index compression vs. retrieval time of inverted files for XML documents

Norbert Fuhr, Norbert Gövert
2002 Proceedings of the eleventh international conference on Information and knowledge management - CIKM '02  
Query languages for retrieval of XML documents allow for conditions referring both to the content and the structure of documents. In order to process these queries efficiently, inverted files must contain also structural information, thus leading to index sizes that exceed the storage space of the original data. In this paper, we investigate two different approaches for reducing index space. First, we consider methods for compressing index entries. Second, we develop the new XS tree data
more » ... re which contains the structural description of a document in a rather compact form, such that these descriptions can be kept in main memory. We evaluate the efficiency of several variants of these two approaches on two large XML document collections. Results show that very high compression rates for indexes can be achieved. However, any compression increases retrieval time. Thus, retrieval time is minimized when uncompressed indexes are used. On the other hand, highly compressed indexes may be feasible for applications where storage is limited, such as in PDAs or E-book devices.
doi:10.1145/584902.584912 fatcat:mfc3pjjyxnfrhk2gc3hyd7fp5m