Filters








6,815 Hits in 3.7 sec

Computing Graph Neural Networks: A Survey from Algorithms to Accelerators [article]

Sergi Abadal, Akshay Jain, Robert Guirado, Jorge López-Alonso, Eduard Alarcón
2021 arXiv   pre-print
On the other hand, an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators  ...  Besides of their novelty, GNNs are hard to compute due to their dependence on the input graph, their combination of dense and very sparse operations, or the need to scale to huge graphs in some applications  ...  GReTA also discusses partitioning briefly and exemplifies it in a hardware accelerator called GRIP [85] , which is described in the next section. Paddle Graph Learning (PGL).  ... 
arXiv:2010.00130v3 fatcat:u5bcmjodcfdh7pew4nssjemdba

CoSPARSE: A Software and Hardware Reconfigurable SpMV Framework for Graph Analytics

Siying Feng, Jiawen Sun, Subhankar Pal, Xin He, Kuba Kaszyk, Dong-hyeon Park, Magnus Morton, Trevor Mudge, Murray Cole, Michael O'Boyle, Chaitali Chakrabarti, Ronald Dreslinski
2021 2021 58th ACM/IEEE Design Automation Conference (DAC)  
In this work, we propose a novel framework, CoSPARSE, that employs hardware and software reconfiguration as a synergistic solution to accelerate SpMV-based graph analytics algorithms.  ...  This variablity has been used to improve performance by either dynamically switching algorithms between iterations (software) or designing custom accelerators (hardware) for graph analytics algorithms.  ...  Graph Analytics Algorithms on CoSPARSE Hardware accelerated graph processing solutions often require programmers with in-depth architectural knowledge of the hardware to fully exploit the available performance  ... 
doi:10.1109/dac18074.2021.9586114 fatcat:pukdkrjnyndtpcnwyusebclpjm

Computing Graph Neural Networks: A Survey from Algorithms to Accelerators

Sergi Abadal, Akshay Jain, Robert Guirado, Jorge López-Alonso, Eduard Alarcón
2022 ACM Computing Surveys  
On the other hand, an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators  ...  Besides of their novelty, GNNs are hard to compute due to their dependence on the input graph, their combination of dense and very sparse operations, or the need to scale to huge graphs in some applications  ...  GReTA also discusses partitioning briefly and exemplifies it in a hardware accelerator called GRIP [85] , which is described in the next section. Paddle Graph Learning.  ... 
doi:10.1145/3477141 fatcat:6ef4jh3hrvefnoytckqyyous3m

Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication [article]

Linghao Song, Yuze Chi, Atefeh Sohrabizadeh, Young-kyu Choi, Jason Lau, Jason Cong
2022 arXiv   pre-print
and (4) hardware flexibility to enable prototyping the hardware once to support SpMMs of different size as a general-purpose accelerator.  ...  Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of applications, including scientific computing, graph processing, and deep learning.  ...  With Sextans HFlex SpMM processing method, the parameters passed to the hardware accelerator are fixed.  ... 
arXiv:2109.11081v2 fatcat:wlf7lraenzd7tkb3g7h6mumqei

ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines [article]

Xinyu Chen, Yao Chen, Feng Cheng, Hongshi Tan, Bingsheng He, Weng-Fai Wong
2022 arXiv   pre-print
Our heterogeneous architecture comprises of two types of pipelines: Little pipelines to process dense partitions with good locality and Big pipelines to process sparse partitions with the extremely poor  ...  We also found that the diverse workloads can be easily classified into two types, namely dense and sparse partitions.  ...  . • We classify graph partitions to dense and sparse partitions by grouping vertices based on their degrees; The dense partitions have high-degree vertices, with good locality, and the sparse partitions  ... 
arXiv:2203.02676v1 fatcat:yvkfsuxstnhczbjevtnnqpuzge

First-Generation Inference Accelerator Deployment at Facebook [article]

Michael Anderson, Benny Chen, Stephen Chen, Summer Deng, Jordan Fix, Michael Gschwind, Aravind Kalaiah, Changkyu Kim, Jaewon Lee, Jason Liang, Haixin Liu, Yinghai Lu (+102 others)
2021 arXiv   pre-print
This platform, with six low-power accelerator cards alongside a single-socket host CPU, allows us to serve models of high complexity that cannot be easily or efficiently run on CPUs.  ...  We describe the inference accelerator platform ecosystem we developed and deployed at Facebook: both hardware, through Open Compute Platform (OCP), and software framework and tooling, through Pytorch/Caffe2  ...  must be co-located on the device performing the dense compute partition.  ... 
arXiv:2107.04140v3 fatcat:fpmlpb5kgzf7tnti3wpfqq4u4y

HPIPE: Heterogeneous Layer-Pipelined and Sparse-Aware CNN Inference for FPGAs [article]

Mathew Hall, Vaughn Betz
2020 arXiv   pre-print
Instead of having generic processing elements that together process one layer at a time, our network compiler statically partitions available device resources and builds custom-tailored hardware for each  ...  We evaluate the performance of our architecture on both sparse Resnet-50 and dense MobileNet Imagenet classifiers on a Stratix 10 2800 FPGA.  ...  We integrate this with a PCIe core and validate network accuracy and accelerator throughput in physical hardware.  ... 
arXiv:2007.10451v1 fatcat:subhmlobnrdfxk6dq5fhz7o2dm

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture [article]

Chengming Zhang, Tong Geng, Anqi Guo, Jiannan Tian, Martin Herbordt, Ang Li, Dingwen Tao
2022 arXiv   pre-print
Compared with other Machine Learning (ML) modalities, the acceleration of Graph Neural Networks (GNNs) is more challenging due to the irregularity and heterogeneity derived from graph typologies.  ...  Compared with state-of-the-art GCN accelerators, H-GCN achieves, on average, speedups of 1.1~2.3X.  ...  BoostGCN [13] uses hardware-aware partition centric feature aggregation scheme to increase on-chip data reuse.  ... 
arXiv:2206.13734v1 fatcat:bpfurud6srawli6jmu2jx63vca

GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design [article]

Haoran You, Tong Geng, Yongan Zhang, Ang Li, Yingyan Lin
2022 arXiv   pre-print
On the hardware level, we further develop a dedicated two-pronged accelerator with a separated engine to process each of the aforementioned denser and sparser workloads, further boosting the overall utilization  ...  and acceleration efficiency.  ...  Cheng Wan at Rice University for his help and discussion in the graph reordering algorithm.  ... 
arXiv:2112.11594v2 fatcat:ivnelobzlbgrzgtb4yf5okew3a

Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads [article]

Bahar Asgari, Ramyad Hadidi, Joshua Dierberger, Charlotte Steinichen, Amaan Marfatia, Hyesoon Kim
2021 arXiv   pre-print
The performance implications of using various formats along with DSAs, however, has not been extensively studied by prior work.  ...  The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries.  ...  In such cases, preprocessing the sparse data to a format compatible with a hardware accelerator is highly suggested. Figures 7 illustrates the impact of partition size on .  ... 
arXiv:2011.10932v2 fatcat:dkc77jbaujailkbtepsmedvebq

revised submission [article]

XXX
2021 Zenodo  
dense graphs, we however observe that 53 these accelerators lose efficiency on sparse graphs.  ...  This paper proposes shuffling-then-grouping method and the hardware accelerator, Shugra, to practice the concept. It evaluates the effectiveness of the solution with five real-world graphs.  ... 
doi:10.5281/zenodo.5758517 fatcat:ie7ga7f4krdprdizkdl4ekaqya

Bring Your Own Codegen to Deep Learning Compiler [article]

Zhi Chen, Cody Hao Yu, Trevor Morris, Jorn Tuyls, Yi-Hsiang Lai, Jared Roesch, Elliott Delaye, Vin Sharma, Yida Wang
2021 arXiv   pre-print
In addition, the vendors have to contiguously update their hardware and/or software to cope with the rapid evolution of the DNN model architectures and operators.  ...  However, to achieve high model coverage with high performance, each accelerator vendor has to develop a full compiler stack to ingest, optimize, and execute the DNNs.  ...  However, interpreting a deep learning model at the runtime loses the opportunity of optimizing the model graph with the consideration of hardware accelerators.  ... 
arXiv:2105.03215v1 fatcat:4bjuy7tsdzfdpnkmki55q5jlim

revised submission [article]

XX
2021 Zenodo  
Although existing accelerators demonstrate promising performance in processing dense graphs, we however observe that these accelerators lose efficiency on sparse graphs.  ...  Programmers need to insert only a handful of lines 89 into existing constructs of graph neural networks to enjoy the 90 data shuffling and hardware acceleration.  ... 
doi:10.5281/zenodo.5758581 fatcat:6wjhp7qelbhpbmfjcjoi4yjgzy

A Survey on Graph Processing Accelerators: Challenges and Opportunities [article]

Chuangyi Gui, Long Zheng, Bingsheng He, Cheng Liu, Xinyu Chen, Xiaofei Liao, Hai Jin
2019 arXiv   pre-print
Interestingly, we find that there is not an absolute winner for all three aspects in graph acceleration due to the diverse characteristics of graph processing and complexity of hardware configurations.  ...  We also examine the benchmarks and results in existing studies for evaluating a graph processing accelerator.  ...  With the source-oriented partition, it is convenient to determine the partitions that need the updated vertex property in the graph processing. • Dedicated Hardware Acceleration.  ... 
arXiv:1902.10130v1 fatcat:p5lzlf3gubckfpu4eowgo4myi4

Solving Large Top-K Graph Eigenproblems with a Memory and Compute-optimized FPGA Design [article]

Francesco Sgherzi, Alberto Parravicini, Marco Siracusa, Marco Domenico Santambrogio
2021 arXiv   pre-print
the eigenvectors associated with the Top-K largest eigenvalues.  ...  In this work, we propose a hardware-optimized algorithm to approximate a solution to the Top-K eigenproblem on sparse matrices representing large graph topologies.  ...  We prototyped our hardware design on an Alveo U280 accelerator card with HBM2 and DDR4 memory.  ... 
arXiv:2103.10040v1 fatcat:x6yo6juk4zaffabnsf5cl3wnaa
« Previous Showing results 1 — 15 out of 6,815 results