A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Computing Graph Neural Networks: A Survey from Algorithms to Accelerators
[article]
2021
arXiv
pre-print
On the other hand, an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators ...
Besides of their novelty, GNNs are hard to compute due to their dependence on the input graph, their combination of dense and very sparse operations, or the need to scale to huge graphs in some applications ...
GReTA also discusses partitioning briefly and exemplifies it in a hardware accelerator called GRIP [85] , which is described in the next section.
Paddle Graph Learning (PGL). ...
arXiv:2010.00130v3
fatcat:u5bcmjodcfdh7pew4nssjemdba
CoSPARSE: A Software and Hardware Reconfigurable SpMV Framework for Graph Analytics
2021
2021 58th ACM/IEEE Design Automation Conference (DAC)
In this work, we propose a novel framework, CoSPARSE, that employs hardware and software reconfiguration as a synergistic solution to accelerate SpMV-based graph analytics algorithms. ...
This variablity has been used to improve performance by either dynamically switching algorithms between iterations (software) or designing custom accelerators (hardware) for graph analytics algorithms. ...
Graph Analytics Algorithms on CoSPARSE Hardware accelerated graph processing solutions often require programmers with in-depth architectural knowledge of the hardware to fully exploit the available performance ...
doi:10.1109/dac18074.2021.9586114
fatcat:pukdkrjnyndtpcnwyusebclpjm
Computing Graph Neural Networks: A Survey from Algorithms to Accelerators
2022
ACM Computing Surveys
On the other hand, an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators ...
Besides of their novelty, GNNs are hard to compute due to their dependence on the input graph, their combination of dense and very sparse operations, or the need to scale to huge graphs in some applications ...
GReTA also discusses partitioning briefly and exemplifies it in a hardware accelerator called GRIP [85] , which is described in the next section. Paddle Graph Learning. ...
doi:10.1145/3477141
fatcat:6ef4jh3hrvefnoytckqyyous3m
Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication
[article]
2022
arXiv
pre-print
and (4) hardware flexibility to enable prototyping the hardware once to support SpMMs of different size as a general-purpose accelerator. ...
Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of applications, including scientific computing, graph processing, and deep learning. ...
With Sextans HFlex SpMM processing method, the parameters passed to the hardware accelerator are fixed. ...
arXiv:2109.11081v2
fatcat:wlf7lraenzd7tkb3g7h6mumqei
ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines
[article]
2022
arXiv
pre-print
Our heterogeneous architecture comprises of two types of pipelines: Little pipelines to process dense partitions with good locality and Big pipelines to process sparse partitions with the extremely poor ...
We also found that the diverse workloads can be easily classified into two types, namely dense and sparse partitions. ...
. • We classify graph partitions to dense and sparse partitions by grouping vertices based on their degrees; The dense partitions have high-degree vertices, with good locality, and the sparse partitions ...
arXiv:2203.02676v1
fatcat:yvkfsuxstnhczbjevtnnqpuzge
First-Generation Inference Accelerator Deployment at Facebook
[article]
2021
arXiv
pre-print
This platform, with six low-power accelerator cards alongside a single-socket host CPU, allows us to serve models of high complexity that cannot be easily or efficiently run on CPUs. ...
We describe the inference accelerator platform ecosystem we developed and deployed at Facebook: both hardware, through Open Compute Platform (OCP), and software framework and tooling, through Pytorch/Caffe2 ...
must be co-located on the device performing the dense compute partition. ...
arXiv:2107.04140v3
fatcat:fpmlpb5kgzf7tnti3wpfqq4u4y
HPIPE: Heterogeneous Layer-Pipelined and Sparse-Aware CNN Inference for FPGAs
[article]
2020
arXiv
pre-print
Instead of having generic processing elements that together process one layer at a time, our network compiler statically partitions available device resources and builds custom-tailored hardware for each ...
We evaluate the performance of our architecture on both sparse Resnet-50 and dense MobileNet Imagenet classifiers on a Stratix 10 2800 FPGA. ...
We integrate this with a PCIe core and validate network accuracy and accelerator throughput in physical hardware. ...
arXiv:2007.10451v1
fatcat:subhmlobnrdfxk6dq5fhz7o2dm
H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture
[article]
2022
arXiv
pre-print
Compared with other Machine Learning (ML) modalities, the acceleration of Graph Neural Networks (GNNs) is more challenging due to the irregularity and heterogeneity derived from graph typologies. ...
Compared with state-of-the-art GCN accelerators, H-GCN achieves, on average, speedups of 1.1~2.3X. ...
BoostGCN [13] uses hardware-aware partition centric feature aggregation scheme to increase on-chip data reuse. ...
arXiv:2206.13734v1
fatcat:bpfurud6srawli6jmu2jx63vca
GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design
[article]
2022
arXiv
pre-print
On the hardware level, we further develop a dedicated two-pronged accelerator with a separated engine to process each of the aforementioned denser and sparser workloads, further boosting the overall utilization ...
and acceleration efficiency. ...
Cheng Wan at Rice University for his help and discussion in the graph reordering algorithm. ...
arXiv:2112.11594v2
fatcat:ivnelobzlbgrzgtb4yf5okew3a
Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads
[article]
2021
arXiv
pre-print
The performance implications of using various formats along with DSAs, however, has not been extensively studied by prior work. ...
The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. ...
In such cases, preprocessing the sparse data to a format compatible with a hardware accelerator is highly suggested. Figures 7 illustrates the impact of partition size on . ...
arXiv:2011.10932v2
fatcat:dkc77jbaujailkbtepsmedvebq
revised submission
[article]
2021
Zenodo
dense graphs, we however observe that 53 these accelerators lose efficiency on sparse graphs. ...
This paper proposes shuffling-then-grouping method and the hardware accelerator, Shugra, to practice the concept. It evaluates the effectiveness of the solution with five real-world graphs. ...
doi:10.5281/zenodo.5758517
fatcat:ie7ga7f4krdprdizkdl4ekaqya
Bring Your Own Codegen to Deep Learning Compiler
[article]
2021
arXiv
pre-print
In addition, the vendors have to contiguously update their hardware and/or software to cope with the rapid evolution of the DNN model architectures and operators. ...
However, to achieve high model coverage with high performance, each accelerator vendor has to develop a full compiler stack to ingest, optimize, and execute the DNNs. ...
However, interpreting a deep learning model at the runtime loses the opportunity of optimizing the model graph with the consideration of hardware accelerators. ...
arXiv:2105.03215v1
fatcat:4bjuy7tsdzfdpnkmki55q5jlim
revised submission
[article]
2021
Zenodo
Although existing accelerators demonstrate promising performance in processing dense graphs, we however observe that these accelerators lose efficiency on sparse graphs. ...
Programmers need to insert only a handful of lines 89 into existing constructs of graph neural networks to enjoy the 90 data shuffling and hardware acceleration. ...
doi:10.5281/zenodo.5758581
fatcat:6wjhp7qelbhpbmfjcjoi4yjgzy
A Survey on Graph Processing Accelerators: Challenges and Opportunities
[article]
2019
arXiv
pre-print
Interestingly, we find that there is not an absolute winner for all three aspects in graph acceleration due to the diverse characteristics of graph processing and complexity of hardware configurations. ...
We also examine the benchmarks and results in existing studies for evaluating a graph processing accelerator. ...
With the source-oriented partition, it is convenient to determine the partitions that need the updated vertex property in the graph processing.
• Dedicated Hardware Acceleration. ...
arXiv:1902.10130v1
fatcat:p5lzlf3gubckfpu4eowgo4myi4
Solving Large Top-K Graph Eigenproblems with a Memory and Compute-optimized FPGA Design
[article]
2021
arXiv
pre-print
the eigenvectors associated with the Top-K largest eigenvalues. ...
In this work, we propose a hardware-optimized algorithm to approximate a solution to the Top-K eigenproblem on sparse matrices representing large graph topologies. ...
We prototyped our hardware design on an Alveo U280 accelerator card with HBM2 and DDR4 memory. ...
arXiv:2103.10040v1
fatcat:x6yo6juk4zaffabnsf5cl3wnaa
« Previous
Showing results 1 — 15 out of 6,815 results