Efficient Dictionary Compression for Processing RDF Big Data Using Google BigQuery

Omer Dawelbeit, Rachel McCrindle
2016 2016 IEEE Global Communications Conference (GLOBECOM)  
The Resource Description Framework (RDF) data model, is used on the Web to express billions of structured statements in a wide range of topics, including government, publications, life sciences, etc. Consequently, processing and storing this data requires the provision of high specification systems, both in terms of storage and computational capabilities. On the other hand, cloud-based big data services such as Google BigQuery can be used to store and query this data without any upfront
more » ... nt. Google BigQuery pricing is based on the size of the data being stored or queried, but given that RDF statements contain long Uniform Resource Identifiers (URIs), the cost of query and storage of RDF big data can increase rapidly. In this paper we present and evaluate a novel and efficient dictionary compression algorithm which is faster, generates small dictionaries that can fit in memory and results in better compression rate when compared with other large scale RDF dictionary compression. Consequently, our algorithm also reduces the BigQuery storage and query cost.
doi:10.1109/glocom.2016.7841775 dblp:conf/globecom/DawelbeitM16 fatcat:hwpucy72xfbgve4jartsnoihle