Filters








499 Hits in 7.4 sec

SWRL Parallel Reasoning Implementation with Spark SQL

Wan Li, Huaai Kang, Dongbo Ma, Weiwei Wei
2020 IOP Conference Series: Materials Science and Engineering  
On one hand, the conventional ontology reasoners do not scale well for large amounts of ontologies because they are designed for run on a single machine.  ...  On the other hand, the existing scalable reasoners are not perfect enough, for example, to completely support the widely used Semantic Web Rule Language (SWRL) rules.  ...  RDF defines a simple graph model to denote relationships between resources using the format of RDF triple.  ... 
doi:10.1088/1757-899x/719/1/012020 fatcat:5i2bvjqvzvek5nmsf5jz36rery

Chapter 7 Scalable Knowledge Graph Processing Using SANSA [chapter]

Hajira Jabeen, Damien Graux, Gezim Sejdiu
2020 Lecture Notes in Computer Science  
In the meantime, the distributed data processing technologies have also advanced to deal with big data and large scale knowledge graphs.  ...  This chapter introduces Scalable Semantic Analytics Stack (SANSA), that addresses the challenge of dealing with large scale RDF data and provides a unified framework for applications like link prediction  ...  This is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark.  ... 
doi:10.1007/978-3-030-53199-7_7 fatcat:zx4suhofwngsxbigtars4y4s3u

A Scalable Framework for Quality Assessment of RDF Datasets [article]

Gezim Sejdiu, Anisa Rula, Jens Lehmann, Hajira Jabeen
2020 arXiv   pre-print
This is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark.  ...  In this paper, we present DistQualityAssessment -- an open source implementation of quality assessment of large RDF datasets that can scale out to a cluster of machines.  ...  p) ∩ α = count(r) Retrieving the RDF data (step 2) RDF data first needs to be loaded into a large-scale storage that Spark can efficiently read from. We use Hadoop Distributed File-System 8 (HDFS).  ... 
arXiv:2001.11100v1 fatcat:azwjqvmwu5bzvlgcqrjaoaik54

A Scalable Framework for Quality Assessment of RDF Datasets

Gezim Sejdiu, Anisa Rula, Jens Lehmann, Hajira Jabeen
2019 Zenodo  
This is the first distributed, in-memory approach for different quality metrics for large RDF datasets using Apache Spark. ovide a quality assessment pattern that can be used to generate new etrics that  ...  In this paper, we present DistQualityAssessment – an open lementation of quality assessment of large RDF datasets that can scale ster of machines.  ...  p) ∩ α = count(r) Retrieving the RDF data (step 2) RDF data first needs to be loaded into a large-scale storage that Spark can efficiently read from. We use Hadoop Distributed File-System 8 (HDFS).  ... 
doi:10.5281/zenodo.3567905 fatcat:yoqhftwtkjabvc4rubiyrrcf4u

DistLODStats: Distributed Computation of RDF Dataset Statistics

Gezim Sejdiu, Ivan Ermilov, Jens Lehmann, Mohamed Nadjib Mami
2018 Zenodo  
More specifically, we describe the first distributed in- approach for computing 32 different statistical criteria for RDF datasets ache Spark.  ...  In this paper, we introduce a soft- mponent for statistical calculations of large RDF datasets, which scales sters of machines.  ...  Fetching the RDF data (Step 1): RDF data needs first to be loaded into a large-scale storage that Spark can efficiently read from. For this purpose, we use HDFS (Hadoop Distributed File-System) 6 .  ... 
doi:10.5281/zenodo.3567965 fatcat:24tntp6einggrjawhjwo5c5aj4

Towards a Scalable Semantic-Based Distributed Approach for SPARQL Query Evaluation [chapter]

Gezim Sejdiu, Damien Graux, Imran Khan, Ioanna Lytra, Hajira Jabeen, Jens Lehmann
2019 Lecture Notes in Computer Science  
An evaluation of the performance of our approach in processing large-scale RDF datasets is also presented.  ...  In this study, we propose a scalable approach to evaluate SPARQL queries over distributed RDF datasets using a semantic-based partition and is implemented inside the state-of-the-art RDF processing framework  ...  Cloud-based approaches for managing large-scale RDF mainly use NoSQL distributed data stores or employ various partitioning approaches on top of Hadoop infrastructure, i.e., the Hadoop Distributed File  ... 
doi:10.1007/978-3-030-33220-4_22 fatcat:wmu6dc4pqjbeherfkwmvfynbk4

PRoST: Distributed Execution of SPARQL Queries Using Mixed Partitioning Strategies [article]

Matteo Cossu, Michael Färber, Georg Lausen
2018 arXiv   pre-print
The rapidly growing size of RDF graphs in recent years necessitates distributed storage and parallel processing strategies.  ...  Related to the approach presented in the current paper are systems built on top of Hadoop HDFS, for example using Apache Accumulo or using Apache Spark.  ...  This characteristic makes Spark very fast in practice and able to compute complex queries on large RDF graphs.  ... 
arXiv:1802.05898v1 fatcat:6feeage4j5a4dhfacjgltfyjmi

RDF Query Answering Using Apache Spark: Review and Assessment

Giannis Agathangelos, Georgia Troullinou, Haridimos Kondylakis, Kostas Stefanidis, Dimitris Plexousakis
2018 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW)  
The purpose of this paper is to provide an overview of the existing works dealing with efficient query answering, in the area of RDF data, using Apache Spark.  ...  More specifically, the everincreasing size and number of RDF data collections raises the need for efficient query answering, and dictates the usage of distributed data management systems for effectively  ...  EVALUATION DIMENSIONS Apache Spark [29] is an in-memory distributed computing platform designed for large-scale data processing.  ... 
doi:10.1109/icdew.2018.00016 dblp:conf/icde/AgathangelosTKS18 fatcat:ji7puutbmfbaln667f5nvz3uzq

CLAMS

Mina Farid, Alexandra Roatis, Ihab F. Ilyas, Hella-Franziska Hoffmann, Xu Chu
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
The coveted use of unstructured and semi-structured data in large volumes makes current data cleaning tools (primarily designed for relational data) not directly adoptable.  ...  We present CLAMS, a system to discover and enforce expressive integrity constraints from large amounts of lake data with very limited schema information (e.g., represented as RDF triples).  ...  In particular, CLAMS operates on large-scale data processing frameworks (specifically, Apache Spark) to manipulate massive datasets.  ... 
doi:10.1145/2882903.2899391 dblp:conf/sigmod/FaridRIHC16 fatcat:re5aay3od5d3vmeutzva6chrse

Strider: A Hybrid Adaptive Distributed RDF Stream Processing Engine [article]

Xiangnan Ren, Olivier Curé
2017 arXiv   pre-print
We propose Strider, a hybrid adaptive distributed RDF Stream Processing engine that optimizes logical query plan according to the state of data streams.  ...  These guarantees are obtained by designing the engine's architecture with state-of-the-art Apache components such as Spark and Kafka.  ...  A general approach for large scale data stream processing is performed over a distributed settings.  ... 
arXiv:1705.05688v1 fatcat:xgptlj7e25efzfjm7jy5edszqu

BigDataGrapes D4.2 - Methods and Tools for Distributed Inference

Milena Yankova, Boyan SImeonov, Atanas Kiryakov, Vladimir Alexiev
2018 Zenodo  
There are many challenges in data reasoning and inference based on distributed data. The first one is addressing data security and access rights to both original data and inferred information.  ...  The second challenge is how the actual inference over distributed sources can be performed and implemented.  ...  Parallel reasoning and distributed reasoning are considered to be essential for Web-scale reasoning to improve scalability.  ... 
doi:10.5281/zenodo.1481809 fatcat:7jkignzjnfdmxomknr5vjrwhhm

RORS: Enhanced Rule-based OWL Reasoning on Spark [article]

Zhihui Liu and Zhiyong Feng and Xiaowang Zhang and Xin Wang and Guozheng Rao
2016 arXiv   pre-print
The rule-based OWL reasoning is to compute the deductive closure of an ontology by applying RDF/RDFS and OWL entailment rules.  ...  In this paper, we present an approach to enhancing the performance of the rule-based OWL reasoning on Spark based on a locally optimal executable strategy.  ...  It can deal with large scale ontology on a distributed computing cluster. However, WebPIE exhibits poor reasoning time. [16] are rule-based OWL reasoner.  ... 
arXiv:1605.02824v1 fatcat:j6sslb3e4fcf3hlnuiof3hya7q

S2RDF

Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, Georg Lausen
2016 Proceedings of the VLDB Endowment  
RDF has become very popular for semantic data publishing due to its flexible and universal graph-like data model.  ...  Thus, the ever-increasing size of RDF data collections raises the need for scalable distributed approaches.  ...  CONCLUSION In this paper, we present S2RDF, a distributed Hadoopbased SPARQL query processor for large-scale RDF data implemented on top of Spark.  ... 
doi:10.14778/2977797.2977806 fatcat:kehcu2c43rhczorh4nl7vkxlwu

Utilizing semantic big data for realizing a national-scale infrastructure vulnerability analysis system

Sangkeun Lee, Supriya Chinthavali, Sisi Duan, Mallikarjun Shankar
2016 Proceedings of the International Workshop on Semantic Big Data - SBD '16  
Next, we present a generic system architecture and discuss challenges including: (1) Constructing and managing a CI network-of-networks graph, (2) Performing analytic operations at scale, and (3) Interactive  ...  We argue that this architecture acts as a baseline to realize a national-scale network based vulnerability analysis system.  ...  SBD for large-scale data analysis. Representing nationalscale, heterogeneous CI datasets using the RDF model can result massive number of RDF triples.  ... 
doi:10.1145/2928294.2928295 dblp:conf/sigmod/LeeCDS16 fatcat:biolcgbtojcfja72ryoiqcj5ku

LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs [article]

Olivier Curé, Hubert Naacke, Tendry Randriamalala, Bernd Amann
2015 arXiv   pre-print
In this article, we present a structured resource identification scheme using a clever encoding of concepts and property hierarchies for efficiently evaluating the main common RDFS entailment rules while  ...  To reduce the memory footprint and ease the exchange of large datasets, these systems generally apply a dictionary approach for compressing triple data sizes by replacing resource identifiers (IRIs), blank  ...  Finally, to tackle very large graphs, we evaluate our implementation over the Apache Spark framework using synthetic and real world use cases. II. BACKGROUND KNOWLEDGE A.  ... 
arXiv:1510.03409v1 fatcat:yaz64mu6y5fezi56uhxqoromay
« Previous Showing results 1 — 15 out of 499 results