Filters








16 Hits in 8.9 sec

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats [article]

Eric Qin, Geonhwa Jeong, William Won, Sheng-Chun Kao, Hyoukjun Kwon, Sudarshan Srinivasan, Dipankar Das, Gordon E. Moon, Sivasankaran Rajamanickam, Tushar Krishna
2021 arXiv   pre-print
Since DL and scientific workloads span across all sparsity regions, there can be numerous format combinations for optimizing memory and compute efficiency.  ...  Sparsity, which occurs in both scientific applications and Deep Learning (DL) models, has been a key target of optimization within recent ASIC accelerators due to the potential memory and compute savings  ...  ., for the U.S.  ... 
arXiv:2103.10452v1 fatcat:jsn7psgnhra4zngrwt7hhgdzs4

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights [article]

Shail Dave, Riyadh Baghdadi, Tony Nowatzki, Sasikanth Avancha, Aviral Shrivastava, Baoxin Li
2021 arXiv   pre-print
This paper provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators.  ...  structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; understanding recent design trends for  ...  Graph neural networks (GNNs) and other graph learning models [47] are used for applications such as text classification and translation, node classification and link predictions in large social graphs  ... 
arXiv:2007.00864v2 fatcat:k4o2xboh4vbudadfiriiwjp7uu

FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training [article]

Sangkug Lym, Mattan Erez
2020 arXiv   pre-print
To make a systolic array efficient for pruning and training, we propose FlexSA, a flexible systolic array architecture.  ...  Modern deep learning models have high memory and computation cost. To make them fast and memory-cost efficient, structured model pruning is commonly used.  ...  by 60% in training and pruning modern convolutional neural network (CNN) models.  ... 
arXiv:2004.13027v1 fatcat:6q5aiindzbebzbwixeer4nn7ie

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Guido Masera, Maurizio Martina, Muhammad Shafique
2020 IEEE Access  
This paper first introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and Spiking Neural Network (SNN), and then analyzes techniques to produce efficient and high-performance  ...  In a scenario where several sophisticated algorithms need to be executed with limited energy and low latency, the need for cost-effective hardware platforms capable of implementing energy-efficient DL  ...  In the row stationary dataflow all the MACs necessary to perform a row of the convolution (1D convolution) are mapped to a single PE.  ... 
doi:10.1109/access.2020.3039858 fatcat:nticzqgrznftrcji4krhyjxudu

ZIPPER: Exploiting Tile- and Operator-level Parallelism for General and Scalable Graph Neural Network Acceleration [article]

Zhihui Zhang, Jingwen Leng, Shuwen Lu, Youshan Miao, Yijia Diao, Minyi Guo, Chao Li, Yuhao Zhu
2021 arXiv   pre-print
Graph neural networks (GNNs) start to gain momentum after showing significant performance improvement in a variety of domains including molecular science, recommendation, and transportation.  ...  To address the challenge, we propose Zipper, an efficient yet general acceleration system for GNNs.  ...  We first leverage the graph tiling to address the problem of the excessive memory footprint and then pipeline the resulted tiles through a tiling-based multi-stream execution model.  ... 
arXiv:2107.08709v1 fatcat:2vejfeqgnzanbit3bg6rk5vife

An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks

Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Muhammad Shafique, Guido Masera, Maurizio Martina
2020 Future Internet  
Deep Neural Networks (DNNs) are nowadays a common practice in most of the Artificial Intelligence (AI) applications.  ...  Their ability to go beyond human precision has made these networks a milestone in the history of AI.  ...  In this dataflow, the operations of a row of the convolution are mapped to the same PE. The weights are kept stationary in the PEs. For instance, Eyeriss [63] is an RS accelerator.  ... 
doi:10.3390/fi12070113 fatcat:heyq4l3rkrdc5p55xdbhsh4jxu

Transmuter

Subhankar Pal, Siying Feng, Dong-hyeon Park, Sung Kim, Aporva Amarnath, Chi-Sheng Yang, Xin He, Jonathan Beaumont, Kyle May, Yan Xiong, Kuba Kaszyk, John Magnus Morton (+8 others)
2020 Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques  
Transmuter addresses a rapidly growing set of algorithms exhibiting dynamic data movement patterns, irregularity, and sparsity, while delivering GPU-like efficiencies for traditional dense applications  ...  This is particularly true for domains that have frequently changing algorithms and applications involving mixed sparse/dense data structures, such as those in machine learning and graph analytics.  ...  ACKNOWLEDGMENTS We thank the anonymous reviewers for their helpful feedback.  ... 
doi:10.1145/3410463.3414627 dblp:conf/IEEEpact/PalFPKAYHBMXKMS20 fatcat:kwsaun2g65b6jl6mdqrhgiv7yq

Distributed-Memory Sparse Kernels for Machine Learning [article]

Vivek Bharadwaj, Aydin Buluç, James Demmel
2022 arXiv   pre-print
We embed and test the scaling of our algorithms in real-world applications, including collaborative filtering via alternating-least-squares and inference for attention-based graph neural networks.  ...  Sampled Dense Times Dense Matrix Multiplication (SDDMM) and Sparse Times Dense Matrix Multiplication (SpMM) appear in diverse settings, such as collaborative filtering, document clustering, and graph embedding  ...  Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness  ... 
arXiv:2203.07673v2 fatcat:yribalejkza6nbonmoa3odc2ka

Computing Graph Neural Networks: A Survey from Algorithms to Accelerators [article]

Sergi Abadal, Akshay Jain, Robert Guirado, Jorge López-Alonso, Eduard Alarcón
2021 arXiv   pre-print
Such an ability has strong implications in a wide variety of fields whose data is inherently relational, for which conventional neural networks do not perform well.  ...  On the other hand, an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators  ...  ACKNOWLEDGMENTS The authors would like to thank the anonymous reviewers and the editorial team for their constructive criticism, which has helped improve the quality of the paper.  ... 
arXiv:2010.00130v3 fatcat:u5bcmjodcfdh7pew4nssjemdba

Computing Graph Neural Networks: A Survey from Algorithms to Accelerators

Sergi Abadal, Akshay Jain, Robert Guirado, Jorge López-Alonso, Eduard Alarcón
2022 ACM Computing Surveys  
Such an ability has strong implications in a wide variety of fields whose data are inherently relational, for which conventional neural networks do not perform well.  ...  On the other hand, an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators  ...  Acacio for the countless discussions on the topic.  ... 
doi:10.1145/3477141 fatcat:6ef4jh3hrvefnoytckqyyous3m

An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators

Seyed Morteza Nabavinejad, Mohammad Baharloo, Kun-Chih Chen, Maurizio Palesi, Tim Kogel, Masoumeh Ebrahimi
2020 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
As a result, efficient interconnection and data movement mechanisms for future on-chip artificial intelligence (AI) accelerators are worthy of study.  ...  Finally, we investigate the emerging interconnection technologies (e.g., in/near-memory processing) for the DNN accelerator design.  ...  Furthermore, to improve the performance and energy efficiency, a flow mapping approach based on the row-node stationary (RNS) has been devised that can reduce the number of memory accesses and hop counts  ... 
doi:10.1109/jetcas.2020.3022920 fatcat:idqitgwnrnegbd4dhrly3xsxbi

FourCastNet: Accelerating Global High-Resolution Weather Forecasting using Adaptive Fourier Neural Operators [article]

Thorsten Kurth, Shashank Subramanian, Peter Harrington, Jaideep Pathak, Morteza Mardani, David Hall, Andrea Miele, Karthik Kashinath, Animashree Anandkumar
2022 arXiv   pre-print
FourCastNet produces accurate instantaneous weather predictions for a week in advance, enables enormous ensembles that better capture weather extremes, and supports higher global forecast resolutions.  ...  FourCast-Net is optimized and scales efficiently on three supercomputing systems: Selene, Perlmutter, and JUWELS Booster up to 3,808 NVIDIA A100 GPUs, attaining 140.8 petaFLOPS in mixed precision (11.9%  ...  This research used resources from the National Energy Research Scientific Computing Center (NERSC), a U.S.  ... 
arXiv:2208.05419v1 fatcat:lti5hf52jjgexdojcx2bfobcdy

Efficient deep neural network model training by reducing memory and compute demands

Sangkug Lym, Austin, The University Of Texas At, Austin, The University Of Texas At, Mattan Erez
2020
The proposed training accelerator has 45 mixed precision FLOPS and, with the memory bandwidth-efficient network training scheduling, beats a state-of-the-art GPU that has ∼3X higher peak FLOPs.  ...  In particular, CNN (convolutional neural network) models have become the de facto choices for most vision applications such as image classification, object segmentation, and object detection.  ...  This dissertation focuses accelerating convolutional neural network training, an important branch of DNN models for computer vision tasks, by identifying and reducing its unnecessary memory accesses and  ... 
doi:10.26153/tsw/8156 fatcat:wtyl25xcwbbonevayufcol3vcy

In-Memory Acceleration for General Data Parallel Applications

Daichi Fujiki, University, My
2022
This is empowered by a compiler that transforms Data Flow Graphs of tensor programs to a set of data-parallel code modules with memory ISA.  ...  General purpose processors and accelerators including system-on-a-chip and graphics processing units are composed of three principal components: processor, memory, and interconnection of these two.  ...  ISAAC [3] and PRIME [2] utilize ReRAMs to accelerate several Convolutional Neural Networks (CNNs).  ... 
doi:10.7302/4538 fatcat:alevj7ibbfe6tivzjyai4uqjt4

Event-Based Vision Processing in Deep Neural Networks

Bodo Rückauer
2020
In applications ranging from data compression to optical flow and Spiking Neural Networks, we demonstrate computational savings when operating on sparse, informative events rather than dense, redundant  ...  In applications ranging from data compression to optical flow and Spiking Neural Networks, we demonstrate computational savings when operating on sparse, informative events rather than dense, redundant  ...  The graph then splits into a branch with 7 convolution layers and a skip-connection branch with a single convolution layer.  ... 
doi:10.5167/uzh-200987 fatcat:y2clxptcabh2bklipriywhtx4i
« Previous Showing results 1 — 15 out of 16 results