Filters








344 Hits in 4.5 sec

Scalable RDF data compression with MapReduce

Jacopo Urbani, Jason Maassen, Niels Drost, Frank Seinstra, Henri Bal
2012 Concurrency and Computation  
SCALABLE RDF DATA COMPRESSION WITH MAPREDUCE 25 make dictionary encoding a feasible technique on a very large input, a distributed implementation is required.  ...  In this paper, we propose a set of distributed MapReduce algorithms to efficiently compress and decompress a large amount of RDF data.  ...  Exper. 2013; 25:24-39 DOI: 10.1002/cpe SCALABLE RDF DATA COMPRESSION WITH MAPREDUCE 29 Table I . I Execution time data compression and decompression on different datasets.  ... 
doi:10.1002/cpe.2840 fatcat:ya6dtyxr3ndqpkkakfhill7vbi

Scalable Indexing and Adaptive Querying of RDF Data in the cloud

Nikolaos Papailiou, Dimitrios Tsoumakos, Ioannis Konstantinou, Panagiotis Karras, Nectarios Koziris
2014 Proceedings of Semantic Web Information Management on Semantic Web Information Management - SWIM'14  
In this work, we emphasize on our novel, scalable and efficient MapReduce indexing process that allows H2RDF+ to handle arbitrarily large RDF datasets.  ...  The enormous increase in both user and machine generated content dictates for scalable solutions in triple data stores.  ...  of MapReduce [14] jobs to load and index large RDF data.  ... 
doi:10.1145/2630602.2630603 dblp:conf/sigmod/PapailiouTKKK14a fatcat:7asq5xr43fbazklagzowxyynue

HDT-MR: A Scalable Solution for RDF Compression with HDT and MapReduce [chapter]

José M. Giménez-García, Javier D. Fernández, Miguel A. Martínez-Prieto
2015 Lecture Notes in Computer Science  
This paper introduces HDT-MR, a MapReduce-based technique to process huge RDF and build the HDT serialization.  ...  HDT a is binary RDF serialization aiming at minimizing the space overheads of traditional RDF formats, while providing retrieval features in compressed space.  ...  Ramos by his support with the Hadoop cluster, and Jürgen Umbrich for lending us his sever.  ... 
doi:10.1007/978-3-319-18818-8_16 fatcat:bwrsvhxtdffnlhpc5rh55mbn64

Scalable RDF Data Compression using X10 [article]

Long Cheng, Avinash Malik, Spyros Kotoulas, Tomas E Ward, Georgios Theodoropoulos
2014 arXiv   pre-print
Compared to the state-of-art MapReduce algorithm, we demonstrate a speedup of 2.6-7.4x and excellent scalability.  ...  A typical method for alleviating the impact of this problem is through the use of compression methods that produce more compact representations of the data.  ...  In this paper, we propose a scalable solution for compressing massive RDF data in parallel.  ... 
arXiv:1403.2404v1 fatcat:isjn3vqu4rfybgv3pqpc36o7r4

H2RDF+

Nikolaos Papailiou, Dimitrios Tsoumakos, Ioannis Konstantinou, Panagiotis Karras, Nectarios Koziris
2014 Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14  
The proliferation of data in RDF format has resulted in the emergence of a plethora of specialized management systems.  ...  In this paper, we present its key scientific contributions and allow participants to interact with an H2RDF+ deployment over a Cloud infrastructure.  ...  It consists of 4 highly scalable MapReduce jobs that: • Translate RDF literals to integer IDs with respect to the literal's occurrence frequency in the dataset.  ... 
doi:10.1145/2588555.2594535 dblp:conf/sigmod/PapailiouTKKK14 fatcat:hlui2tfwlrgxjcfmowkmipnvv4

Sempala: Interactive SPARQL Query Processing on Hadoop [chapter]

Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen
2014 Lecture Notes in Computer Science  
Indeed, existing SPARQL-on-Hadoop (MapReduce) approaches have already demonstrated very good scalability, however, query runtimes are rather slow due to the underlying batch processing framework.  ...  At the same time, Hadoop has become dominant in the area of Big Data processing with large infrastructures being already deployed and used in manifold application fields.  ...  However, they do not provide a distributed query engine, thus scalability and query performance for large RDF data is still an issue.  ... 
doi:10.1007/978-3-319-11964-9_11 fatcat:nbt5xlop2jahlapxvfikcqkn5y

Fast Compression of Large Semantic Web Data Using X10

Long Cheng, Avinash Malik, Spyros Kotoulas, Tomas E Ward, Georgios Theodoropoulos
2016 IEEE Transactions on Parallel and Distributed Systems  
One can see that the scalability of our algorithm is not linear with input data when reading 1 chunk per loop.  ...  (i.e. compress) huge RDF datasets, so as to meet the big data challenges from large data warehouses and the Semantic Web  ... 
doi:10.1109/tpds.2015.2496579 fatcat:vk6rafssszfhlgj75ggjagfkfi

An Effective and Efficient MapReduce Algorithm for Computing BFS-Based Traversals of Large-Scale RDF Graphs

Alfredo Cuzzocrea, Mirel Cosulschi, Roberto de Virgilio
2016 Algorithms  
In line with this trend, in this paper, we present an approach for efficiently implementing traversals of large-scale RDF graphs over MapReduce that is based on the Breadth First Search (BFS) strategy  ...  for visiting (RDF) graphs to be decomposed and processed according to the MapReduce framework.  ...  Contribution [66] studies how to apply compression paradigms to obtain scalable RDF processing with MapReduce.  ... 
doi:10.3390/a9010007 fatcat:wxorzsnnovbjvpnoc4prrucnty

A survey and experimental comparison of distributed SPARQL engines for very large RDF data

Ibrahim Abdelaziz, Razen Harbi, Zuhair Khayyat, Panos Kalnis
2017 Proceedings of the VLDB Endowment  
Some are based on distributed frameworks such as MapReduce; others implement proprietary distributed processing; and some rely on expensive preprocessing for data partitioning.  ...  Then, we select 12 representative systems and perform extensive experimental evaluation with respect to preprocessing cost, query performance, scalability and workload adaptability, using a variety of  ...  , we extensively evaluated existing systems, through a wide range of SPARQL queries, considering different performance factors including startup overhead, incurred replication, query performance, and scalability  ... 
doi:10.14778/3151106.3151109 fatcat:6m7iotec65cufebmm5jbali74q

Scalable peer-to-peer-based RDF management

Christoph Böhm, Daniel Hefenbrock, Felix Naumann
2012 Proceedings of the 8th International Conference on Semantic Systems - I-SEMANTICS '12  
We present Hdrs-a scalable storage infrastructure that enables online-analysis of very large RDF data sets.  ...  The store is open source and integrates well with Hadoop MapReduce or any other client application.  ...  Though RDF data is highly structured, those solutions use low-level storage systems, such as the Hadoop Distributed File System Hdfs to feed data into MapReduce, which requires handcrafted optimizations  ... 
doi:10.1145/2362499.2362523 dblp:conf/i-semantics/BohmHN12 fatcat:qysdhtkd6bbvpgif5ynsmspv5i

Scalable Semantics – The Silver Lining of Cloud Computing

Andrew Newman, Yuan-Fang Li, Jane Hunter
2008 2008 IEEE Fourth International Conference on eScience  
We then present some implementation details for our MapReduce-based RDF molecule store.  ...  Our objective is to expedite this process by employing Google's MapReduce framework to implement scale-out distributed querying and reasoning.  ...  As shown in Figure 6 , with the increase of data size, the 3-node cluster shows greater scalability.  ... 
doi:10.1109/escience.2008.23 dblp:conf/eScience/NewmanLH08 fatcat:xmeh3hi5mfdl3l2ekz7qdh54u4

Efficient processing of RDF graph pattern matching on MapReduce platforms

Padmashree Ravindra, Seokyong Hong, HyeongSik Kim, Kemafor Anyanwu
2011 Proceedings of the second international workshop on Data intensive computing in the clouds - DataCloud-SC '11  
This has positioned the issue of scalable data processing techniques for RDF as a central issue in the Semantic Web research community.  ...  In addition, most of the existing techniques for optimizing RDF data processing do not transfer well to the MapReduce model and often require significant lead time for pre-processing.  ...  Task C -Scalability of Information-Passing technique: We evaluated the benefit of information passing, with increasing size of RDF graphs.  ... 
doi:10.1145/2087522.2087527 fatcat:fjacm3udwrgrvckodltcszzfsi

Map-Side Merge Joins for Scalable SPARQL BGP Processing

Martin Przyjaciel-Zablocki, Alexander Schaetzle, Eduard Skaley, Thomas Hornung, Georg Lausen
2013 2013 IEEE 5th International Conference on Cloud Computing Technology and Science  
In recent times, it has been widely recognized that, due to their inherent scalability, frameworks based on MapReduce are indispensable for so-called "Big Data" applications.  ...  Our experiments with the LUBM benchmark show an average performance benefit between 15% and 48% compared to other MapReduce based approaches while at the same time scaling linearly with the RDF dataset  ...  Due to its inherent high degree of parallelism and good scalability properties, MapReduce [2] is one of the predominant frameworks used in many large companies for dealing with "Big Data".  ... 
doi:10.1109/cloudcom.2013.9 dblp:conf/cloudcom/Przyjaciel-ZablockiSSHL13 fatcat:72mfthvmnzh73mjxdsksqyera4

Rainbow: A distributed and hierarchical RDF triple store with dynamic scalability

Rong Gu, Wei Hu, Yihua Huang
2014 2014 IEEE International Conference on Big Data (Big Data)  
Experiments show that Rainbow outperforms typical existing distributed RDF triple stores, with excellent scalability and fault tolerance.  ...  The RDF data in memory storage is partitioned by the consistent hashing algorithm to achieve the dynamic scalability.  ...  By encoding, the storage space of RDF triples is compressed as compared with long literals or URIs.  ... 
doi:10.1109/bigdata.2014.7004274 dblp:conf/bigdataconf/GuHH14 fatcat:mxi5hq657vfrpn6gbyr6hsunyi

Cascading map-side joins over HBase for scalable join processing [article]

Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Christopher Dorner, Georg Lausen
2012 arXiv   pre-print
One of the major challenges in large-scale data processing with MapReduce is the smart computation of joins.  ...  processing layer, with MapReduce, which in turn does not provide appropriate storage structures for efficient large-scale join processing.  ...  For loading data into HBase, we set the region size to 512 MB using snappy compression.  ... 
arXiv:1206.6293v1 fatcat:towdefhca5dufanmehey3pyavy
« Previous Showing results 1 — 15 out of 344 results