Filters








2,225 Hits in 9.8 sec

Making pull-based graph processing performant

Samuel Grossman, Heiner Litz, Christos Kozyrakis
2018 Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '18  
Graph processing engines following either the push-based or pull-based pattern conceptually consist of a two-level nested loop structure.  ...  Parallelizing and vectorizing these loops is critical for high overall performance and memory bandwidth utilization.  ...  Acknowledgements We thank our anonymous reviewers and our shepherd, Michelle Goodstein, for their feedback and assistance in improving our paper.  ... 
doi:10.1145/3178487.3178506 dblp:conf/ppopp/GrossmanLK18 fatcat:up3zsecpynhzjmum53l4kt3ft4

Optimal algebraic Breadth-First Search for sparse graphs [article]

Paul Burkhardt
2021 arXiv   pre-print
An exemplar for these approaches is Breadth-First Search (BFS).  ...  Compared to a leading GraphBLAS library our method achieves up to 24x faster sequential time and for parallel computation it can be 17x faster on large graphs and 12x faster on large-diameter graphs.  ...  Harris and Christopher H. Long for their helpful comments. The author also thanks the anonymous reviewers for their critical suggestions that improved the paper.  ... 
arXiv:1906.03113v4 fatcat:lr7odruq6jfupk32ryerzt7ina

Presto

Shivaram Venkataraman, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, Robert S. Schreiber
2013 Proceedings of the 8th ACM European Conference on Computer Systems - EuroSys '13  
In this paper we show that array-based languages such as R [3] are suitable for implementing complex algorithms and can outperform current data parallel solutions.  ...  It is cumbersome to write machine learning and graph algorithms in data-parallel models such as MapReduce and Dryad.  ...  Aurojit Panda and Evan Sparks suggested improvements to earlier drafts of this paper.  ... 
doi:10.1145/2465351.2465371 dblp:conf/eurosys/VenkataramanBRAS13 fatcat:k7pzvhvpbfcvdjxyswgomtfnwy

Distributed Sparse Matrices for Very High Level Languages [chapter]

John R. Gilbert, Steve Reinhardt, Viral B. Shah
2008 Advances in Computers  
We demonstrate the versatility of our infrastructure by using it to implement a benchmark that creates and manipulates large graphs.  ...  Parallel computing is becoming ubiquitous, specifically due to the advent of multi-core architectures.  ...  We demonstrated the effectiveness of our tools by implementing a graph analysis benchmark in Star-P, which scales to large problem sizes on large processor counts.  ... 
doi:10.1016/s0065-2458(08)00005-3 fatcat:hienrbxdu5hdjbnadvnulvl7ku

Graph Reachability on Parallel Many-Core Architectures

Stefano Quer, Andrea Calabrese
2020 Computation  
Unfortunately, the original algorithm executes a sequence of depth-first visits which are intrinsically recursive and cannot be efficiently implemented on parallel systems.  ...  For that reason, we design an alternative approach in which a sequence of breadth-first visits substitute the original depth-first traversal to generate the labeling, and in which a high number of concurrent  ...  Acknowledgments: The author wish to thank Antonio Caiazza for implementing the first version of the tool and performing the initial experimental evaluation.  ... 
doi:10.3390/computation8040103 fatcat:dnqybvtlsvh5pc2pe5v7fjvmna

Lifting sequential graph algorithms for distributed-memory parallel computation

Douglas Gregor, Andrew Lumsdaine
2005 SIGPLAN notices  
We illustrate our approach by describing the process as applied to one of the core algorithms in the BGL, breadth-first search.  ...  The BGL consists of a rich set of generic graph algorithms and supporting data structures, but it was not originally designed with parallelism in mind.  ...  The authors also thank Brian Barrett for his help with performance analysis and Chris Mueller and Ronald Garcia for valuable suggestions.  ... 
doi:10.1145/1103845.1094844 fatcat:qstidbshurdcbejvuuy3jwveuy

Lifting sequential graph algorithms for distributed-memory parallel computation

Douglas Gregor, Andrew Lumsdaine
2005 Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming systems languages and applications - OOPSLA '05  
We illustrate our approach by describing the process as applied to one of the core algorithms in the BGL, breadth-first search.  ...  The BGL consists of a rich set of generic graph algorithms and supporting data structures, but it was not originally designed with parallelism in mind.  ...  The authors also thank Brian Barrett for his help with performance analysis and Chris Mueller and Ronald Garcia for valuable suggestions.  ... 
doi:10.1145/1094811.1094844 dblp:conf/oopsla/GregorL05 fatcat:d5ym5io6vbbj3oxxth3mjeszm4

Parallel breadth-first search on distributed memory systems

Aydin Buluç, Kamesh Madduri
2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11  
In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms.  ...  We present two highly-tuned parallel approaches for BFS on large parallel systems: a levelsynchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse  ...  BREADTH-FIRST SEARCH OVERVIEW Preliminaries Given a distinguished "source vertex" s, Breadth-First Search (BFS) systematically explores the graph G to discover every vertex that is reachable from s.  ... 
doi:10.1145/2063384.2063471 dblp:conf/sc/BulucM11 fatcat:cn4tlzqd4ndqlhekngx76hjvhy

New Approach of Bellman Ford Algorithm on GPU using Compute Unified Design Architecture (CUDA)

Pankhari Agarwal, Maitreyee Dutta
2015 International Journal of Computer Applications  
To process them we present a fundamental single source shortest path (SSSP) algorithm i.e. Bellman Ford algorithm.  ...  Large graphs involving millions of vertices are common in many practical applications and are challenging to process.  ...  A large number of graph operations are present, such as minimum spanning tree, breadth-first search, shortest path etc., having applications in different problem domains like VLSI chip layout [1] , phylogeny  ... 
doi:10.5120/19375-1027 fatcat:wnd3f4qyz5etposivhyujt546u

Accelerating all-pairs shortest path using a message-passing reconfigurable architecture

Osama G. Attia, Alex Grieve, Kevin R. Townsend, Phillip Jones, Joseph Zambreno
2015 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)  
In this paper, we study the design and implementation of a reconfigurable architecture for graph processing algorithms.  ...  We take advantage of our architecture to showcase a parallel implementation of the all-pairs shortest path algorithm (APSP) for unweighted directed graphs.  ...  ACKNOWLEDGMENT This work was supported in part by the National Science Foundation (NSF), under awards CNS-1116810 and CCF-1149539.  ... 
doi:10.1109/reconfig.2015.7393284 dblp:conf/reconfig/AttiaGTJZ15 fatcat:aqg3676dvrcinaxmpgjj5nhq4y

A Systematic Survey of General Sparse Matrix-Matrix Multiplication [article]

Jianhua Gao, Weixing Ji, Zhaonian Tan, Yueyan Zhao
2020 arXiv   pre-print
Existing optimization techniques have been grouped into different categories based on their target problems and architectures.  ...  Based on our findings, we highlight future research directions and how future studies can leverage our findings to encourage better design and implementation.  ...  Multi-Source Breadth-first Search Breadth-first search (BFS) is a key and fundamental subroutine in many graph analysis algorithms, such as finding connectivity, finding the shortest path, and finding  ... 
arXiv:2002.11273v1 fatcat:5ppccisodvaevdhvfhawvjam5q

Parallel Breadth-First Search on Distributed Memory Systems [article]

Aydin Buluc, Kamesh Madduri
2011 arXiv   pre-print
In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms.  ...  We present two highly-tuned parallel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse  ...  John Shalf and Nick Wright provided generous technical and moral support during the project.  ... 
arXiv:1104.4518v2 fatcat:a7nvtwil35dbtohgpnsfldeeki

Regularizing graph centrality computations

Ahmet Erdem Sarıyüce, Erik Saule, Kamer Kaya, Ümit V. Çatalyürek
2015 Journal of Parallel and Distributed Computing  
h i g h l i g h t s • We propose parallel algorithms to compute centrality on accelerators. • We apply multiple breadth-first search operations simultaneously. • Vectorization is applied to make the closeness  ...  5.9 on CPU architectures, 70.4 on GPU architectures and 21.0 on Intel Xeon Phi.  ...  We are also grateful to Intel for providing us the Intel Xeon Phi card used in the experiments and to NVIDIA for providing us the Tesla K20 card.  ... 
doi:10.1016/j.jpdc.2014.07.006 fatcat:ezcfobrnszbw3eqcm7gzp4reve

Scalable GPU graph traversal

Duane Merrill, Michael Garland, Andrew Grimshaw
2012 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12  
Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms.  ...  Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations  ...  In this paper, we explore the parallelization of one fundamental graph algorithm on GPUs: breadth-first search (BFS).  ... 
doi:10.1145/2145816.2145832 dblp:conf/ppopp/MerrillGG12 fatcat:dn7judc27nawnpf7iwjpmh3vqa

Scalable GPU graph traversal

Duane Merrill, Michael Garland, Andrew Grimshaw
2012 SIGPLAN notices  
Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms.  ...  Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations  ...  In this paper, we explore the parallelization of one fundamental graph algorithm on GPUs: breadth-first search (BFS).  ... 
doi:10.1145/2370036.2145832 fatcat:u36wmlchhnfljiivpf3u56xjxa
« Previous Showing results 1 — 15 out of 2,225 results