A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2013; you can also visit <a rel="external noopener" href="http://gauss.cs.ucsb.edu/~aydin/sc11_bfs.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="ACM Press">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/zigbcra6rjdivda6lkzknwuo5q" style="color: black;">Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11</a>
Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned parallel approaches for BFS on large parallel systems: a levelsynchronous strategy that relies on a simple vertex-based partitioning of the graph, and a<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2063384.2063471">doi:10.1145/2063384.2063471</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/sc/BulucM11.html">dblp:conf/sc/BulucM11</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/cn4tlzqd4ndqlhekngx76hjvhy">fatcat:cn4tlzqd4ndqlhekngx76hjvhy</a> </span>
more »... two-dimensional sparse matrix partitioning-based approach that mitigates parallel communication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execution regimes in which these approaches will be competitive, and we demonstrate extremely high performance on leading distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny-Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20130912212131/http://gauss.cs.ucsb.edu/~aydin/sc11_bfs.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/94/6b/946b4b14e1a73927c510eb6ef9fa0cfc3771c352.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2063384.2063471"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>