17,812 Hits in 6.8 sec

S2X: Graph-Parallel Querying of RDF with GraphX [chapter]

Alexander Schätzle, Martin Przyjaciel-Zablocki, Thorsten Berberich, Georg Lausen
2016 Lecture Notes in Computer Science  
However, existing approaches using general-purpose cluster frameworks employ a record-oriented perception of RDF ignoring its inherent graph-like structure.  ...  In this paper we introduce S2X, a SPARQL query processor for Hadoop where we leverage this unified abstraction by implementing basic graph pattern matching of SPARQL as a graph-parallel task while other  ...  This is a data-parallel operation using the API of Spark which result is a collection (RDD) of mappings for bgp. Concluding the example, Algorithm 4 outputs a single solution mapping for bgp Q : {(?  ... 
doi:10.1007/978-3-319-41576-5_12 fatcat:5i4a3tmulfagbcd6kwx6juhvf4

Accelerating Large-scale Image Retrieval on Heterogeneous Architectures with Spark

Hanli Wang, Bo Xiao, Lei Wang, Jun Wu
2015 Proceedings of the 23rd ACM international conference on Multimedia - MM '15  
With the computing power of Spark, a utility library, referred to as IRlib, is proposed in this work to accelerate large-scale image retrieval applications by jointly harnessing the power of GPU.  ...  First, IRlib provides a uniform set of APIs for the programming of image retrieval applications.  ...  for large-scale image retrieval with existed parallel frameworks.  ... 
doi:10.1145/2733373.2806392 dblp:conf/mm/WangXWW15 fatcat:r73v75iglffqpbeiej34aei7xi

Parallel Markov-based Clustering Strategy for Large-scale Ontology Partitioning

Imadeddine Mountasser, Brahim Ouhbi, Bouchra Frikh
2017 Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management  
To this end, this paper proposes a novel approach for large-scale ontology partitioning through parallel Markov-based clustering strategy using Spark framework.  ...  Parallel Markov-based Clustering Strategy for Large-scale Ontology Partitioning.  ...  For that, this paper presents a novel approach for large-scale ontology partitioning through parallel Markov-based clustering strategy over distributed architecture using Spark framework.  ... 
doi:10.5220/0006504001950202 dblp:conf/ic3k/MountasserOF17 fatcat:7kco2r6qvfdedeeeujlxhstmgi

Analyzing large scale genomic data on the cloud with Sparkhit

Liren Huang, Jan Krüger, Alexander Sczyrba, Inanc Birol
2017 Bioinformatics  
To address these limitations, we have developed Sparkhit: a distributed bioinformatics framework built on top of the Apache Spark platform.  ...  Existing tools use different distributed computational platforms to scale-out bioinformatics workloads. However, the scalability of these tools is not efficient.  ...  Acknowledgements We thank Georges Hattab for proof reading the preliminary manuscript. Gratitude to Raunak Shrestha and Dr. Faraz Hach for bringing insights to the project.  ... 
doi:10.1093/bioinformatics/btx808 pmid:29253074 pmcid:PMC5925781 fatcat:avbzcngwmzdjxib67thpdirc2u

A Survey on Trajectory Big Data Processing

Amina Belhassena
2018 International Journal of Performability Engineering  
As the massive trajectory data processing exceeds the power of centralized approaches used previously, in this paper, we survey various existing tools used to process large-scale trajectory data in a distributed  ...  Therefore, large-scale trajectory data has received increasing attention in research fields as well as in industry.  ...  Acknowledgements This paper was partially supported by NSFC grant U1509216,61472099, National Sci-Tech Support Plan 2015BAH10F01, the Scientific Research Foundation for the Returned Overseas Chinese Scholars  ... 
doi:10.23940/ijpe.18.02.p13.320333 fatcat:m74w3cfajrbzpamzpghfyrm6am

GeoMatch: Efficient Large-Scale Map Matching on Apache Spark

Ayman Zeidan, Eemil Lagerspetz, Kai Zhao, Petteri Nurmi, Sasu Tarkoma, Huy T. Vo
2018 2018 IEEE International Conference on Big Data (Big Data)  
We develop GeoMatch as a novel, scalable, and efficient big-data pipeline for large-scale map matching on Apache Spark.  ...  We separately assess execution performance and accuracy of map matching and develop a benchmark framework for evaluating large-scale map matching.  ...  SUMMARY We presented GeoMatch, an accurate, scalable, and fast map matching framework for very large spatial datasets, based on Apache Spark.  ... 
doi:10.1109/bigdata.2018.8622488 dblp:conf/bigdataconf/ZeidanLZNTV18 fatcat:fzjza45hoffi7ir5qq2jw6otze

Design and evaluation of small–large outer joins in cloud computing environments

Long Cheng, Ilias Tachmazidis, Spyros Kotoulas, Grigoris Antoniou
2017 Journal of Parallel and Distributed Computing  
Large-scale analytics is a key application area for data processing and parallel computing research. One of the most common (and challenging) operations in this domain is the join.  ...  using existing predicates in data processing frameworks.  ...  Acknowledgments Part of the work was done when Long Cheng worked at TU Dresden, and supported by the German Research Foundation (DFG) within the Cluster of Excellence "Center for Advancing Electronics  ... 
doi:10.1016/j.jpdc.2017.02.007 fatcat:5sxtdgw76beqte2nebm6ukqu24

High Performance Data Engineering Everywhere [article]

Chathura Widanage, Niranda Perera, Vibhatha Abeykoon, Supun Kamburugamuve, Thejaka Amila Kanewala, Hasara Maithree, Pulasthi Wickramasinghe, Ahmet Uyar, Gurhan Gunduz, Geoffrey Fox
2020 arXiv   pre-print
Initial experiments show that Cylon enhances popular tools such as Apache Spark and Dask with major performance improvements for key operations and better component linkages.  ...  All this demands an efficient and highly distributed integrated approach for data processing, yet many of today's popular data analytics tools are unable to satisfy all these requirements at the same time  ...  We thank Intel for their use of the Juliet and Victor systems, and extend our gratitude to the FutureSystems team for their support with the infrastructure.  ... 
arXiv:2007.09589v1 fatcat:5qm4d5e4ajhltkpxbk2z57nxii

Large-Scale Electron Microscopy Image Segmentation in Spark [article]

Stephen M. Plaza, Stuart E. Berg
2016 arXiv   pre-print
To map this connectivity, we acquire thousands of electron microscopy (EM) images with nanometer-scale resolution.  ...  We implement our algorithms in a Spark application which minimizes disk I/O, and apply them to a few large EM datasets, revealing both their effectiveness and scalability.  ...  Bill Katz implemented DVID API that used in our segmentation system.  ... 
arXiv:1604.00385v1 fatcat:g4os7wcqdrenpoxvzg624dhmpe

Minutiae-based fingerprint matching decomposition: Methodology for big data frameworks

Daniel Peralta, Salvador García, Jose M. Benitez, Francisco Herrera
2017 Information Sciences  
parallel in a flexible manner.  ...  The proposal is evaluated over two matching algorithms, two Big Data frameworks (Apache Hadoop and Apache Spark) and two large-scale fingerprint databases, with promising results concerning the identification  ...  The computation of the final matching score for two fingerprints is split at a finer level, defining partial scores between subsets of local structures.  ... 
doi:10.1016/j.ins.2017.05.001 fatcat:djn5v67khndtlpve5nsfvzsgla

Big enterprise registration data imputation: Supporting spatiotemporal analysis of industries in China

Fa Li, Zhipeng Gui, Huayi Wu, Jianya Gong, Yuan Wang, Siyu Tian, Jiawen Zhang
2018 Computers, Environment and Urban Systems  
Big, fine-grained enterprise registration data that includes time and location information enables us to quantitatively analyze, visualize, and understand the patterns of industries at multiple scales  ...  In this paper, we propose a big data imputation workflow based on Apache Spark as well as a bare-metal computing cluster, to impute enterprise registration data.  ...  Thanks to Zelong Yang, Xu Gao, Xi Long, and Maoding Zhang for providing helps in system development, data preprocessing. Thanks to Stephen C. McClure for language assistance.  ... 
doi:10.1016/j.compenvurbsys.2018.01.010 fatcat:vascgayhcjgtnl57vtb4r5nvum

MPIgnite: An MPI-Like Language and Prototype Implementation for Apache Spark [article]

Brandon L. Morris, Anthony Skjellum
2017 arXiv   pre-print
Scale-out parallel processing based on MPI is a 25-year-old standard with at least another decade of preceding history of enabling technologies in the High Performance Computing community.  ...  Newer frameworks such as MapReduce, Hadoop, and Spark represent industrial scalable computing solutions that have received broad adoption because of their comparative simplicity of use, applicability to  ...  Jared Ramsey in his MS thesis at Auburn that motivated this work. Dr. Jonathan Dursi's blog [13] was a strong motivator for this work. Dr. Purushotham Bangalore provided helpful input to this paper.  ... 
arXiv:1707.04788v1 fatcat:aepllhae2zbqbowhwtqomvkirq

Map Reduce: A Survey Paper on Recent Expansion

Shafali Agarwal, Zeba Khanam
2015 International Journal of Advanced Computer Science and Applications  
Our survey paper emphasizes the state of the art in improving the performance of various applications using recent MapReduce models and how it is useful to process large scale dataset.  ...  At the end, a high-level discussion will be done about the enhancement of the MapReduce computation in specific problem area such as Iterative computation, continuous query processing, hybrid database  ...  Spark is another implementation of Map Reduce, useful for performing iterative computation.  ... 
doi:10.14569/ijacsa.2015.060828 fatcat:j25bu2yosne3dnq5umxbrnhjki

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark [article]

Grégory M. Essertel, Ruby Y. Tahboub, James M. Decker, Kevin J. Brown, Kunle Olukotun, Tiark Rompf
2017 arXiv   pre-print
We present Flare: a new back-end for Spark that brings performance closer to the best SQL engines, without giving up the added expressiveness of Spark.  ...  The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized.  ...  We use the standard TPC-H [47] benchmark with scale factor SF10 for sequential execution, and SF20 and SF100 for parallel execution. Single-Core Running Time.  ... 
arXiv:1703.08219v1 fatcat:3euaew55sbe5le72a6rxatxt7a

SPARQL over GraphX [article]

Besat Kassaie
2017 arXiv   pre-print
We propose a subgraph matching algorithm, compatible with the GraphX programming model to evaluate SPARQL queries.  ...  In this work we take advantage of the graph representation of RDF data and exploit GraphX, a new graph processing system based on Spark.  ...  Finally, we switch into a collection view of the partial results. Then, we use Spark as a data-parallel framework to join the partial results for generating final answers.  ... 
arXiv:1701.03091v1 fatcat:kgxnclgd75glbnnzslynxvwnce
« Previous Showing results 1 — 15 out of 17,812 results