Filters








13,933 Hits in 8.2 sec

Optimizing Memory Efficiency of Graph Neural Networks on Edge Computing Platforms [article]

Ao Zhou, Jianlei Yang, Yeqi Gao, Tong Qiao, Yingjie Qi, Xiaoyi Wang, Yunli Chen, Pengcheng Dai, Weisheng Zhao, Chunming Hu
2021 arXiv   pre-print
However, the poor efficiency of GNN inference and frequent Out-Of-Memory (OOM) problem limit the successful application of GNN on edge computing platforms.  ...  Graph neural networks (GNN) have achieved state-of-the-art performance on various industrial tasks.  ...  INTRODUCTION In recent years, as a generalization of conventional deep learning methods on the non-Euclidean domain, Graph Neural Networks (GNN) are widely applied to various fields of research, such as  ... 
arXiv:2104.03058v2 fatcat:ieljlftycnah5ipci6h537rtqe

EMC2-NIPS 2019 Abstracts of Invited Talks

2019 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS)  
This talk will uncover the need for building accurate, platform-specific power and latency models for convolutional neural networks (CNNs) and efficient hardware-aware CNN design methodologies, thus allowing  ...  MIT Computing near the sensor is preferred over the cloud due to privacy and/or latency concerns for a wide range of applications including robotics/drones, self-driving cars, smart Internet of Things,  ...  First, for potentially complex topologies on edge devices with limited total memory, we solve the minimum memory usage problem, thus characterizing and enabling deployment of all feasible networks on a  ... 
doi:10.1109/emc2-nips53020.2019.00007 fatcat:bvtcsgwxsrh3bmwh6tba3ly3ra

Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster

Pradeep Moorthy, Nachiket Kapre
2015 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines  
For graphs with 32M nodes and 32M edges, Zedwulf delivers the highest 94 MTEPS (Million Traversed Edges Per Second) throughput over other x86 multi-threaded platforms in our study by 1.2-1.4×.  ...  For this experiment, Zedwulf operates at an efficiency of 0.49 MTEPS/W when using ARM+FPGA which is 1.2× better than using ARMv7 CPUs alone, and within 8% of the Intel Core i7-4770k platform.  ...  This operates on a neural network, or sparse graph, where we can represent the neurons as nodes while synaptic connections between neurons are the edges of the network.  ... 
doi:10.1109/fccm.2015.37 dblp:conf/fccm/MoorthyK15 fatcat:7iapt5jzp5fmddhu3fn7oxs5ci

2020-2021 Index IEEE Transactions on Computers Vol. 70

2021 IEEE transactions on computers  
The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination.  ...  ., +, TC April 2021 614-625 Efficient Pipelined Execution of CNNs Based on In-Memory Computing and Graph Homomorphism Verification.  ...  ., +, TC Aug. 2021 1253-1268 Efficient Pipelined Execution of CNNs Based on In-Memory Computing and Graph Homomorphism Verification.  ... 
doi:10.1109/tc.2021.3134810 fatcat:p5otlsapynbwvjmqogj47kv5qa

Rubik: A Hierarchical Architecture for Efficient Graph Learning [article]

Xiaobing Chen, Yuke Wang, Xinfeng Xie, Xing Hu, Abanti Basak, Ling Liang, Mingyu Yan, Lei Deng, Yufei Ding, Zidong Du, Yunji Chen, Yuan Xie
2020 arXiv   pre-print
However, learning from graphs is non-trivial because of its mixed computation model involving both graph analytics and neural network computing.  ...  Results show that Rubik accelerator design improves energy efficiency by 26.3x to 1375.2x than GPU platforms across different datasets and GCN models.  ...  For example, neural network computing on nodelevel Euclidean data introduces heavy vector and matrix computation but regular memory accesses, thus dataflow optimizations can easily enlarge data reuse and  ... 
arXiv:2009.12495v1 fatcat:c7alktpjfjdzhbfmnsbwivv74a

POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

Shishir G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, Joseph Gonzalez
2022 International Conference on Machine Learning  
We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices.  ...  We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency  ...  This work was supported in part by the CONIX Research Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.  ... 
dblp:conf/icml/Patil0DS022 fatcat:jtu3vmscnbgfvefht7fmzdjt4q

POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging [article]

Shishir G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, Joseph E. Gonzalez
2022 arXiv   pre-print
We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices.  ...  We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency  ...  This work was supported in part by the CONIX Research Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.  ... 
arXiv:2207.07697v1 fatcat:cxc6mawbfbaqlf2fulqo45keiy

A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing

Mário P. Véstias
2019 Algorithms  
CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized high-performance platforms.  ...  Reconfigurable computing is being considered for inference on edge due to its high performance and energy efficiency while keeping a high hardware flexibility that allows for the easy adaption of the target  ...  In general, these designs require a minimum of on-chip memory that may not be available at edge platforms.  ... 
doi:10.3390/a12080154 fatcat:jbdak7eisbcjtj6ba5hlpvnq5y

EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks [article]

Shengwen Liang, Ying Wang, Member, IEEE, Cheng Liu, Lei He, Huawei Li, Senior Member, IEEE, and, Xiaowei Li, Senior Member (+1 others)
2020 arXiv   pre-print
Graph neural networks (GNNs) emerge as a powerful approach to process non-euclidean data structures and have been proved powerful in various application domains such as social networks and e-commerce.  ...  In addition, we utilize a graph tiling strategy to fit large graphs into EnGN and make good use of the hierarchical on-chip buffers through adaptive computation reordering and tile scheduling.  ...  However, previous graph processors and neural network accelerators are optimized to support either graph processing or neural networks, rather than both of them simultaneously.  ... 
arXiv:1909.00155v3 fatcat:o735rmhtgfd4dgh6lns42ulyle

2021 Index IEEE Transactions on Parallel and Distributed Systems Vol. 32

2022 IEEE Transactions on Parallel and Distributed Systems  
., +, TPDS June 2021 1307-1321 EDGES: An Efficient Distributed Graph Embedding System on GPU Clusters.  ...  ., +, TPDS June 2021 1307-1321 EDGES: An Efficient Distributed Graph Embedding System on GPU Clus- ters.  ...  Graph coloring Feluca: A Two-Stage Graph Coloring Algorithm With Color-Centric Paradigm on GPU. Zheng, Z., +,  ... 
doi:10.1109/tpds.2021.3107121 fatcat:e7bh2xssazdrjcpgn64mqh4hb4

NeuGraph: Parallel Deep Neural Network Computation on Large Graphs

Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, Yafei Dai
2019 USENIX Annual Technical Conference  
We present NeuGraph, a new framework that bridges the graph and dataflow models to support efficient and scalable parallel neural network computation on graphs.  ...  This evolution has led to large graph-based neural network models that go beyond what existing deep learning frameworks or graph computing systems are designed for.  ...  These methods, known as graph neural networks (GNNs), combine standard neural networks with iterative graph propagation: the property of a vertex is computed recursively (with neural networks) from the  ... 
dblp:conf/usenix/MaYMXWZD19 fatcat:zr2sgdhlefa3rj77j3hi3bsvnq

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective [article]

Xin Liu, Mingyu Yan, Lei Deng, Guoqi Li, Xiaochun Ye, Dongrui Fan, Shirui Pan, Yuan Xie
2022 arXiv   pre-print
Graph neural networks (GNNs) have been a hot spot of recent research and are widely utilized in diverse applications.  ...  Next, we provide comparisons from aspects of the efficiency and characteristics of these methods. Finally, we suggest some promising prospects for future research.  ...  Introduction Graph Neural Networks (GNNs) [Scarselli et al., 2008] are deep learning based models that apply neural networks to graph learning and representation.  ... 
arXiv:2202.04822v2 fatcat:ydnbs75uancljonaqjmaz6c4qa

Towards Efficient Large-Scale Graph Neural Network Computing [article]

Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, Yafei Dai
2018 arXiv   pre-print
NGra further achieves efficiency through highly optimized Scatter/Gather operators on GPUs despite its sparsity.  ...  We introduce NGra, the first parallel processing framework for graph-based deep neural networks (GNNs).  ...  In GCN, there is computation (without neural networks) on the edge for weighted neighbor activation.  ... 
arXiv:1810.08403v1 fatcat:qvybtgioife7zarswcnppu5vrm

HyGCN: A GCN Accelerator with Hybrid Architecture [article]

Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, Yuan Xie
2020 arXiv   pre-print
Third, we optimize the overall system via inter-engine pipeline for inter-phase fusion and priority-based off-chip memory access coordination to improve off-chip bandwidth utilization.  ...  In this work, we first characterize the hybrid execution patterns of GCNs on Intel Xeon CPU.  ...  Acknowledgments We thank the anonymous reviewers of HPCA 2020 and the sealer in Scalable Energy-efficient Architecture Lab (SEAL) for their constructive and insightful comments.  ... 
arXiv:2001.02514v1 fatcat:uts223fpivefhh4lmrcyg7asuy

Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges [article]

Chao Wang, Wenqi Lou, Lei Gong, Lihui Jin, Luchao Tan, Yahui Hu, Xi Li, Xuehai Zhou
2017 arXiv   pre-print
To achieve high efficiency of data-intensive computing, studies of heterogeneous accelerators which focus on latest applications, have become a hot issue in computer architecture domain.  ...  Nowadays, in top-tier conferences of computer architecture, emerging a batch of accelerating works based on FPGA or other reconfigurable architectures.  ...  and optimization of the storage structure of graphs to make full use of memory bandwidth.  ... 
arXiv:1712.04771v1 fatcat:3lxv45qb4zaqpagtn3eghrmroe
« Previous Showing results 1 — 15 out of 13,933 results