14,412 Hits in 3.2 sec

Sparse GPU Kernels for Deep Learning [article]

Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen
2020 arXiv   pre-print
Based on these insights, we develop high-performance GPU kernels for two sparse matrix operations widely applicable in neural networks: sparse matrix-dense matrix multiplication and sampled dense-dense  ...  While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because these applications have relatively moderate levels of sparsity that are not sufficient for existing  ...  We'd also like to thank Penporn Koanantakool for her help debugging our kernel benchmarks, Artem Belevich for his help with Bazel and Docker and the TensorFlow team for answering many questions.  ... 
arXiv:2006.10901v2 fatcat:76wdsepdlffslgz3kkuxykwv5i

At-Scale Sparse Deep Neural Network Inference with Efficient GPU Implementation [article]

Mert Hidayetoglu, Carl Pearson, Vikram Sharma Mailthody, Eiman Ebrahimi, Jinjun Xiong, Rakesh Nagi, Wen-Mei Hwu
2020 arXiv   pre-print
This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020.  ...  Sparse deep neural networks (SpDNN) have shown promise for reining in the memory footprint of large neural networks. However, there is room for improvement in implementing SpDNN operations on GPUs.  ...  on up to 768 V100 GPUs as well as the new A100 GPUs for the Sparse Deep Neural Network Challenge dataset.  ... 
arXiv:2007.14152v2 fatcat:r6ukhavrmjedvm6wx5jph5zqva

FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems [article]

Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, Yida Wang
2020 arXiv   pre-print
FeatGraph incorporates optimizations for graph traversal into the sparse templates and allows users to specify optimizations for UDFs with a feature dimension schedule (FDS).  ...  FeatGraph speeds up end-to-end GNN training and inference by up to 32x on CPU and 7x on GPU.  ...  ACKNOWLEDGEMENT We thank the anonymous reviewers for valuable comments.  ... 
arXiv:2008.11359v2 fatcat:pm5cdwjlj5bfdk547w3r2lwv7q

Performance Portability for ECP Data Analytics (SIAM CSE21) [article]

William Hart
DOE applications CANDLE: CANcer Distributed Learning Environment Goal: Develop an exascale deep learning environment for cancer 5 CANDLE Challenge Problem: Training for Deep Learning • Runs  ...  Challenge Problem: Training for Deep Learning -CUDA and HIP implementations of GPU kernels • Performance bottlenecks -Graph traversals require communication that is latency sensitive -Extensive use of  ... 
doi:10.6084/m9.figshare.14153816.v1 fatcat:273nr7j26zazld4xz5furwnpiq

A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats

Hang CUI, Shoichi HIRASAWA, Hiroaki KOBAYASHI, Hiroyuki TAKIZAWA
2018 IEICE transactions on information and systems  
This paper hence presents an effective deep learning mechanism for SpMV code selection best suited for a given sparse matrix.  ...  Instead of using manually-predefined features of a sparse matrix, a feature image and a deep learning network are used to map each sparse matrix to the implementation, which is expected to have the best  ...  Okatani of Tohoku University for their meaningful discussions.  ... 
doi:10.1587/transinf.2017edp7176 fatcat:rxjouyoqo5cerj6pefjahsikwi

SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference [article]

Ziheng Wang
2020 arXiv   pre-print
In this paper, we present SparseRT, a code generator that leverage unstructured sparsity to accelerate sparse linear algebra operations in deep learning inference on GPUs.  ...  of test cases in deep learning.  ...  Table 1 : 1 Results for SpMM problems in deep learning.  ... 
arXiv:2008.11849v1 fatcat:x4usrp5ocrhifkuicim3nujtlm

Accelerating convolutional neural network by exploiting sparsity on GPUs [article]

Weizhi Xu, Shengyu Fan, Hui Yu, Xin Fu
2022 arXiv   pre-print
Convolutional neural network (CNN) is an important deep learning method. The convolution operation takes a large proportion of the total execution time for CNN.  ...  Feature maps for convolution operation are usually sparse. Multiplications and additions for zero values in the feature map are useless for convolution results.  ...  The above acceleration methods have been integrated into cuDNN library, which is a state-of-the-art library for deep learning on GPU [16] .  ... 
arXiv:1909.09927v5 fatcat:xdydyeme3baxfebyiztqtnsmui

Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units

Arturo Argueta, David Chiang
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
While previous work in deep learning has focused on accelerating operations on dense matrices/tensors on GPUs, efforts have concentrated on operations involving sparse data structures.  ...  We present two new GPU algorithms: one at the input layer, for multiplying a matrix by a few-hot vector (generalizing the more common operation of multiplication by a one-hot vector) and one at the output  ...  Graphics Processing Units (GPUs) are now a standard platform for deep learning.  ... 
doi:10.18653/v1/p19-1626 dblp:conf/acl/ArguetaC19 fatcat:clokai2fmzhx3ldn2fcvxx3cc4

Deep Learning with Apache SystemML [article]

Niketan Pansare, Michael Dusenberry, Nakul Jindal, Matthias Boehm, Berthold Reinwald, Prithviraj Sen
2018 arXiv   pre-print
both machine learning and deep learning.  ...  Apache SystemML provides a unified framework for implementing machine learning and deep learning algorithms in a variety of shared deployment scenarios.  ...  GPU backend invokes highly tuned kernels from CUDA libraries like CuBLAS, CuSPARSE, or CuDNN when available.  ... 
arXiv:1802.04647v1 fatcat:jkt6zv6tfjghnbzvvjhlrtioay

Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks [article]

Yusuke Nagasaka, Akira Nukada, Ryosuke Kojima, Satoshi Matsuoka
2019 arXiv   pre-print
In order to improve the performance of GCNs applications, we propose new SpMM algorithm especially for small sparse matrix and Batched SpMM, which exploits high parallelism of GPU by processing multiple  ...  SpMM operations with single CUDA kernel.  ...  Akira Naruse of NVIDIA for providing much of advices on our research.  ... 
arXiv:1903.11409v1 fatcat:56n2ddwkefftnie6mc6lbxppg4

AsyncTaichi: On-the-fly Inter-kernel Optimizations for Imperative and Spatially Sparse Programming [article]

Yuanming Hu, Mingkuan Xu, Ye Kuang, Frédo Durand
2021 arXiv   pre-print
We show that a system that looks beyond a single kernel, plus additional domain-specific sparse data structure analysis, opens up exciting new space for optimizing sparse computations.  ...  Without any computational code modification, our new system leads to 4.02× fewer kernel launches and 1.87× speed up on our GPU benchmarks, including computations on Eulerian grids, Lagrangian particles  ...  Sparse tensors in deep learning frameworks.  ... 
arXiv:2012.08141v2 fatcat:5h5id7hvybaalbvblt2pbezcqe

A New Approach for Sparse Matrix Classification Based on Deep Learning Techniques

Juan C. Pichel, Beatriz Pateiro-Lopez
2018 2018 IEEE International Conference on Cluster Computing (CLUSTER)  
In this paper, a new methodology to select the best storage format for sparse matrices based on deep learning techniques is introduced.  ...  We focus on the selection of the proper format for the sparse matrixvector multiplication (SpMV), which is one of the most important computational kernels in many scientific and engineering applications  ...  Methodology In this section a new methodology to select the best storage format for sparse matrices based on deep learning techniques is introduced.  ... 
doi:10.1109/cluster.2018.00017 dblp:conf/cluster/PichelP18 fatcat:fanbx4zzgjhzvjfsxto7oywg7u

Hardware Accelerator Design for Machine Learning [chapter]

Li Du, Yuan Du
2018 Machine Learning - Advanced Techniques and Emerging Applications  
kinds of machine learning algorithms such as a deep convolutional neural network.  ...  Field programmable gate arrays (FPGA) show better energy efficiency compared with GPU when computing machine learning algorithm at the cost of low speed.  ...  GPU/FPGA-based accelerator in datacenter Over the past decades, graphics processing units (GPUs) have become popular and standard in training deep-learning algorithms or convolutional neural networks for  ... 
doi:10.5772/intechopen.72845 fatcat:z6ias3vzibbtdpn2tbx5sli7ie

SparseDNN: Fast Sparse Deep Learning Inference on CPUs [article]

Ziheng Wang
2021 arXiv   pre-print
To tackle this challenge, we present SparseDNN, a sparse deep learning inference engine targeting CPUs.  ...  The last few years have seen gigantic leaps in algorithms and systems to support efficient deep learning inference.  ...  Unfortunately, it was soon recognized that unstructured sparsity patterns are hard to support efficiently on modern CPUs and GPUs typically used for deep learning inference.  ... 
arXiv:2101.07948v4 fatcat:bs6rdifdlvat3hr4n435w65h2y

Deep Learning for Consumer Devices and Services: Pushing the limits for machine learning, artificial intelligence, and computer vision

Joe Lemley, Shabab Bazrafkan, Peter Corcoran
2017 IEEE Consumer Electronics Magazine  
Convolutional layers, a key component of deep learning, make use of this sparse mapping approach.  ...  It is a good choice for "off the shelf" deep learning hardware when one is ready to deploy a model. CUDNN/CUDA NVIDIA is the most popular consumer GPU manufacturer with Deep Learning researchers.  ... 
doi:10.1109/mce.2016.2640698 fatcat:k4bdd7zvbrckjle2ni7ck4pxq4
« Previous Showing results 1 — 15 out of 14,412 results