A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit <a rel="external noopener" href="https://static.aminer.org/pdf/20170130/pdfs/ppopp/kthfx5cdtlr4adbcgpumanxjper32eiv.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="ACM Press">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/a3hx753rrfdorizx3a3ovuee4y" style="color: black;">Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit - GPGPU '16</a>
We investigate applying general-purpose join algorithms to the triangle listing problem on heterogeneous systems that feature a multi-core CPU and multiple GPUs. In particular, we consider an out-of-core context where graph data are available on secondary storage such as a solid-state disk (SSD) and may not fit in the CPU main memory or GPU device memory. We focus on Leapfrog Triejoin (LFTJ), a recently proposed, worst-case optimal algorithm and present "boxing": a novel yet conceptually simple<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2884045.2884054">doi:10.1145/2884045.2884054</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/ppopp/ZinnWWAY16.html">dblp:conf/ppopp/ZinnWWAY16</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/yo7ln5ce6ze6pmrmj3x7e33yhe">fatcat:yo7ln5ce6ze6pmrmj3x7e33yhe</a> </span>
more »... approach for partitioning and feeding out-of-core input data to LFTJ. The "boxing" algorithm integrates well with a GPU-Optimized LFTJ algorithm for triangle listing. We achieve significant performance gains on a heterogeneous system comprised of GPUs and CPU by utilizing the massive-parallel computation capability of GPUs. Our experimental evaluations on real-world and synthetic data sets (some of which containing more than 1.2 billion edges) show that out-of-core LFTJ is competitive with specialized graph algorithms for triangle listing. By using one or two GPUs, we achieve additional speedups on the same graphs. CCS Concepts •Information systems → Relational parallel and distributed DBMSs; •Theory of computation → Data structures and algorithms for data management; Massively parallel algorithms;
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190218135044/https://static.aminer.org/pdf/20170130/pdfs/ppopp/kthfx5cdtlr4adbcgpumanxjper32eiv.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e7/3f/e73fe49beadf3da0a8dce683804f1854e83249f8.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2884045.2884054"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>