XML compression techniques: A survey and comparison

Sherif Sakr
2009 Journal of computer and system sciences (Print)  
XML has been acknowledged as the defacto standard for data representation and exchange over the World Wide Web. Being self describing grants XML its great flexibility and wide acceptance but on the other hand it is the cause of its main drawback that of being huge in size. The huge document size means that the amount of information that has to be transmitted, processed, stored, and queried is often larger than that of other data formats. Several XML compression techniques has been introduced to
more » ... deal with these problems. In this paper, we provide a complete survey over the state-of-the-art of XML compression techniques. In addition, we present an extensive experimental study of the available implementations of these techniques. We report the behavior of nine XML compressors using a large corpus of XML documents which covers the different natures and scales of XML documents. In addition to assessing and comparing the performance characteristics of the evaluated XML compression tools, the study also tries to assess the effectiveness and practicality of using these tools in the real world. Finally, we provide some guidelines and recommendations which are useful for helping developers and users for making an effective decision towards selecting the most suitable XML compression tool for their needs.
doi:10.1016/j.jcss.2009.01.004 fatcat:dvluugsp5vajhaols2fftuc23a