372 Hits in 2.8 sec


Reynold S. Xin, Joseph E. Gonzalez, Michael J. Franklin, Ion Stoica
2013 First International Workshop on Graph Data Management Experiences and Systems - GRADES '13  
We introduce GraphX, which combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework.  ...  Finally, by exploiting the Scala foundation of Spark, we enable users to interactively load, transform, and compute on massive graphs.  ...  ACKNOWLEDGMENTS We thank Haijie Gu and other members of the GraphLab team for discussions on the GraphX prototype.  ... 
doi:10.1145/2484425.2484427 dblp:conf/sigmod/XinGFS13 fatcat:qwlrklz6wbgzhlogl5bqkvrskm

SPARQL over GraphX [article]

Besat Kassaie
2017 arXiv   pre-print
In this work we take advantage of the graph representation of RDF data and exploit GraphX, a new graph processing system based on Spark.  ...  On the other hand, the enormity of datasets that are graph in nature such as social network data, has led the database community to develop graph-parallel processing systems to support iterative graph  ...  GraphX GraphX is a lightweight graph processing library on top of Spark, a general-purpose data flow framework.  ... 
arXiv:1701.03091v1 fatcat:kgxnclgd75glbnnzslynxvwnce

Cost Model for Pregel on GraphX [chapter]

Rohit Kumar, Alberto Abelló, Toon Calders
2017 Lecture Notes in Computer Science  
The graph partitioning strategy plays a vital role in the overall execution of an algorithm in a distributed graph processing system.  ...  In this paper, we help users choosing a suitable partitioning strategy for algorithms based on the Pregel model by providing a cost model for the Pregel implementation in Spark-GraphX.  ...  Pregel Model in GraphX GraphX is built on top of Apache Spark which uses a distributed data structure called Resilient Distributed Datasets (RDD) [16] .  ... 
doi:10.1007/978-3-319-66917-5_11 fatcat:oxiqualne5g73ibabvpuqrzhpe

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics [article]

Reynold S. Xin, Daniel Crankshaw, Ankur Dave, Joseph E. Gonzalez, Michael J. Franklin, Ion Stoica
2014 arXiv   pre-print
We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph  ...  To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation.  ...  Distributed Graph Representation GraphX represents graphs internally using two Spark distributed collections (RDDs) -an edge collection and a vertex collection.  ... 
arXiv:1402.2394v1 fatcat:xxx2uvx6arbgdnjqiw7igqztnm

S2X: Graph-Parallel Querying of RDF with GraphX [chapter]

Alexander Schätzle, Martin Przyjaciel-Zablocki, Thorsten Berberich, Georg Lausen
2016 Lecture Notes in Computer Science  
Recently, GraphX was published as a graph abstraction on top of Spark, an in-memory cluster computing system.  ...  It allows to seamlessly combine graph-parallel and data-parallel computation in a single system, an unique feature not available in other systems.  ...  Graph-Parallel Computation with GraphX Spark [13] is a general-purpose in-memory cluster computing system that can run on Hadoop.  ... 
doi:10.1007/978-3-319-41576-5_12 fatcat:5i4a3tmulfagbcd6kwx6juhvf4

GraphX: Graph Processing in a Distributed Dataflow Framework

Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, Ion Stoica
2014 USENIX Symposium on Operating Systems Design and Implementation  
to the Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX.  ...  Acknowledgments We would like to thank Matei Zaharia, Peter Bailis, and our colleagues in the AMPLab, Databricks, and GraphLab for their help in building and presenting the GraphX system.  ...  Distributed Graph Representation GraphX represents graphs internally as a pair of vertex and edge collections built on the Spark RDD abstraction.  ... 
dblp:conf/osdi/GonzalezXDCFS14 fatcat:figkr5l2h5gurga5pkvku4ywna

Handling Big Data in medical imaging: Iterative reconstruction with large-scale automated parallel computation

Jae H. Lee, Yushu Yao, Uttam Shrestha, Grant T. Gullberg, Youngho Seo
2014 2014 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC)  
GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel.  ...  Spark/GraphX.  ...  GraphX is a large-scale graph-parallel system built on top of Spark.  ... 
doi:10.1109/nssmic.2014.7430758 pmid:27081299 pmcid:PMC4829376 fatcat:xvjoigxcr5ecpmg2ob5mxdor6a

Analysis of SSD Utilization by Graph Processing Systems

Haider Qutbuddin, Dr. Syed Saif-ur-Rahman
2015 Journal of Independent Studies and Research - Computing  
To address all these challenges, distributed graph processing frameworks were introduced which inherited both the properties of graph parallel systems and data parallel system.  ...  Graph Processing Systems are highly productive when it comes to graph data. While using data parallel approach, it could not exploit common characteristics of a graph computation workload.  ...  APACHE SPARK GraphX is actualized on top of Spark, a broadly utilized data parallel system, Like Hadoop Map reduce. A Spark group comprises of a solitary driver hub and numerous laborer hubs.  ... 
doi:10.31645/jisrc/(2015).13.1.0005 fatcat:i2gau5dsffa7hcjezhk2jclj74

Efficient Distributed SPARQL Queries on Apache Spark

Saleh Albahli
2019 International Journal of Advanced Computer Science and Applications  
We further experimented with the performance of queries using distributed SPARQL query on Apache Spark GraphX and studied different stages involved in this pipeline.  ...  The execution of distributed SPARQL query on Apache Spark GraphX helped us study its performance and gave insights into which stages of the pipeline can be improved.  ...  On the other side, Apache Spark as a MapReduce framework proposes parallel computation using distributed main-memory data abstraction i.e. 1) Resilient Distributed Data Sets (RDD), a distributed lineage  ... 
doi:10.14569/ijacsa.2019.0100874 fatcat:lolh5mvrhrgttjdfm3uqbrqdam

RDF Query Answering Using Apache Spark: Review and Assessment

Giannis Agathangelos, Georgia Troullinou, Haridimos Kondylakis, Kostas Stefanidis, Dimitris Plexousakis
2018 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW)  
To this direction, Apache Spark is one of the most active big-data approaches, with more and more systems adopting it, for efficient, distributed data management.  ...  We discuss on the characteristics and the key dimension of such systems, we describe novel ideas in the area, and the corresponding drawbacks, and provide directions for future work.  ...  Spark GraphX [28] is a library enabling graph processing by extending the RDD abstraction and hence introduces a new feature called Resilient Distributed Graph or RDG.  ... 
doi:10.1109/icdew.2018.00016 dblp:conf/icde/AgathangelosTKS18 fatcat:ji7puutbmfbaln667f5nvz3uzq

Big data analytics on Apache Spark

Salman Salloum, Ruslan Dautov, Xiaojun Chen, Patrick Xiaogang Peng, Joshua Zhexue Huang
2016 International Journal of Data Science and Analytics  
In this paper, we present a technical review on big data analytics using Apache Spark. This review focuses on the key components, abstractions and features of Apache Spark.  ...  In addition, we highlight some research and development directions on Apache Spark for big data analytics.  ...  Graph processing Resilient Distributed Graphs (RDG) [83] or property graphs, an RDD extension for graph processing in Spark GraphX.  ... 
doi:10.1007/s41060-016-0027-9 dblp:journals/ijdsa/SalloumD0PH16 fatcat:gtzw3aqupnhxvcjbefovrnfhne

Querying massive RDF data using Spark

Mouad Banane, Hassan II University, Morocco
2019 International Journal of Advanced Trends in Computer Science and Engineering  
In this paper, we propose a new solution based on Apache Spark for massive querying and RDF data.  ...  This new system allows the processing of complex SPARQL queries on large volumes of RDF data stored in the Hadoop file system or a NoSQL database management system.  ...  GraphX extends Spark's RDDs by introducing the Resilient Distributed Dataset Graph, an oriented multi-graph with properties attached to nodes and arcs.  ... 
doi:10.30534/ijatcse/2019/68842019 fatcat:tu3btrw63ba2rfdu3iz6btquuu

In Search of Actionable Patterns of Lowest Cost - A Scalable Graph Method

Angelina A. Tzacheva, Arunkumar Bagavathi, Aabir K. Datta
2018 International Journal of Database Management Systems  
Action Rules are rule based systems for discovering actionable patterns which are hidden in a large dataset. All recommended patterns from Action Rules incur some form of cost to the users.  ...  In this work, we introduce the notion of Action Graph and propose an algorithm to search the Action Graph for actionable patterns of lowest cost.  ...  Spark GraphX Spark, with its efficiency in Resilient Distributed Datasets (RDDs) help wide variety of applications such as Machine Learning with MLlib library [17] , Graph Analysis with GraphX library  ... 
doi:10.5121/ijdms.2018.10301 fatcat:p4culbx6ebgq3nihqiklyyzdvy

Time-evolving graph processing at scale

Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, Ion Stoica
2016 Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems - GRADES '16  
We then introduce G T , a time-evolving graph processing framework built on top of Apache Spark, a widely used distributed dataflow system.  ...  In this paper, we represent most computations on time evolving graphs into (1) a stream of consistent and resilient graph snapshots, and (2) a small set of operators that manipulate such streams of snapshots  ...  Resilient Distributed Datasets (RDDs) Resilient Distributed Datasets (RDDs) is the main abstraction provided by Apache Spark [16] , a data-parallel computation engine that supports general DAG computations  ... 
doi:10.1145/2960414.2960419 dblp:conf/grades/IyerLDS16 fatcat:tks4gkhimzhtzocriqu3vxwrle

MESH: A Flexible Distributed Hypergraph Processing System [article]

Benjamin Heintz, Rankyung Hong, Shivangi Singh, Gaurav Khandelwal, Corey Tesdahl, Abhishek Chandra
2019 arXiv   pre-print
We implement MESH on top of the popular GraphX graph processing framework in Apache Spark.  ...  We further show that it is competitive in performance to HyperX, another hypergraph processing system based on Spark, while providing a much simpler implementation (requiring about 5X fewer lines of code  ...  GraphX [1] , built upon Apache Spark [10] , adopted a similar model while inheriting the scalability and fault tolerance of Spark's Resilient Distributed Datasets (RDD).  ... 
arXiv:1904.00549v2 fatcat:lgfgdpjnvjecxnwcrsm35pexpa
« Previous Showing results 1 — 15 out of 372 results