Massive Semantic Web data compression with MapReduce

Jacopo Urbani, Jason Maassen, Henri Bal
2010 Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10  
The Semantic Web consists of many billions of statements made of terms that are either URIs or literals. Since these terms usually consist of long sequences of characters, an effective compression technique must be used to reduce the data size and increase the application performance. One of the best known techniques for data compression is dictionary encoding. In this paper we propose a MapReduce algorithm that efficiently compresses and decompresses a large amount of Semantic Web data. We
more » ... implemented a prototype using the Hadoop framework and we report an evaluation of the performance. The evaluation shows that our approach is able to efficiently compress a large amount of data and that it scales linearly regarding the input size and number of nodes.
doi:10.1145/1851476.1851591 dblp:conf/hpdc/UrbaniMB10 fatcat:q4oryyljlbg3fevkpt4ee6kzzu