8,914 Hits in 5.1 sec

Semi-External Memory Sparse Matrix Multiplication for Billion-Node Graphs

Da Zheng, Disa Mhembere, Vince Lyzinski, Joshua T. Vogelstein, Carey E. Priebe, Randal Burns
2017 IEEE Transactions on Parallel and Distributed Systems  
In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse matrix dense matrix multiplication (SpMM) in a semi-external memory (SEM) fashion; i.e., we keep the sparse  ...  Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes.  ...  The row interval size is multiple of the tile size of a sparse matrix so that multiplication on a tile only needs to access rows from a single row interval.  ... 
doi:10.1109/tpds.2016.2618791 fatcat:n7fc34xn4rbmfgoqhuz5tiedjy

An SSD-based eigensolver for spectral analysis on billion-node graphs [article]

Da Zheng, Randal Burns, Joshua Vogelstein, Carey E. Priebe, Alexander S. Szalay
2016 arXiv   pre-print
FlashEigen performs sparse matrix multiplication in a semi-external memory fashion, i.e., we keep the sparse matrix on SSDs and the dense matrix in memory.  ...  They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems.  ...  : • partition dense matrices for NUMA (NUMA), • partition the sparse matrix in both dimensions into tiles of 16K × 16K (Cache blocking), • organize multiple physical tiles into a super tile to fill CPU  ... 
arXiv:1602.01421v3 fatcat:htnfmomwt5c2zbvo7ywjy2dkwa

Sparse GPU Kernels for Deep Learning [article]

Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen
2020 arXiv   pre-print
Based on these insights, we develop high-performance GPU kernels for two sparse matrix operations widely applicable in neural networks: sparse matrix-dense matrix multiplication and sampled dense-dense  ...  matrix multiplication.  ...  We'd also like to thank Penporn Koanantakool for her help debugging our kernel benchmarks, Artem Belevich for his help with Bazel and Docker and the TensorFlow team for answering many questions.  ... 
arXiv:2006.10901v2 fatcat:76wdsepdlffslgz3kkuxykwv5i

Tensor-matrix products with a compressed sparse tensor

Shaden Smith, George Karypis
2015 Proceedings of the 5th Workshop on Irregular Applications Architectures and Algorithms - IA3 '15  
In this work, we bridge the gap between the two approaches and introduce the compressed sparse fiber (CSF) a data structure for sparse tensors along with a novel parallel algorithm for tensor-matrix multiplication  ...  The bottleneck of computing the CPD is multiplying a sparse tensor by several dense matrices. Algorithms for tensormatrix products fall into two classes.  ...  In this work, we presented the compressed sparse fiber (CSF) format for sparse tensors and three associated shared-memory parallel algorithms for performing tensor-matrix multiplication.  ... 
doi:10.1145/2833179.2833183 dblp:conf/sc/SmithK15 fatcat:vqe7zqdnxvdxdnzxapildpcxoq

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity [article]

Cong Guo and Bo Yang Hsueh and Jingwen Leng and Yuxian Qiu and Yue Guan and Zehuan Wang and Xiaoying Jia and Xipeng Li and Minyi Guo and Yuhao Zhu
2020 arXiv   pre-print
Our work builds upon the insight that the matrix multiplication generally breaks the large matrix into multiple smaller tiles for parallel execution.  ...  Consequently, sparse models cannot achieve meaningful speedup on commodity hardware (e.g., GPU) built for dense matrix computations.  ...  ACKNOWLEDGEMENT We thank the anonymous reviewers for their constructive feedback for improving the work.  ... 
arXiv:2008.13006v1 fatcat:5r3luayqivaafbt5daho4rulmi

Generalizing Run-Time Tiling with the Loop Chain Abstraction

Michelle Mills Strout, Fabio Luporini, Christopher D. Krieger, Carlo Bertolli, Gheorghe-Teodor Bercea, Catherine Olschanowsky, J. Ramanujam, Paul H.J. Kelly
2014 2014 IEEE 28th International Parallel and Distributed Processing Symposium  
These approaches were shown to benefit applications such as moldyn, Gauss-Seidel, and the sparse matrix powers kernel, however the run-time routines for performing sparse tiling were hand coded per application  ...  Previously, sparse tiling approaches were developed for individual benchmarks to group iterations across such loops to improve data locality.  ...  Mike Giles and Istvan Reguly (Oxford University) are gratefully acknowledged for their contribution to the OP2 project.  ... 
doi:10.1109/ipdps.2014.118 dblp:conf/ipps/StroutLKBBORK14 fatcat:ywmea7ay4zhi5pk7vx3lxnodsa

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture [article]

Chengming Zhang, Tong Geng, Anqi Guo, Jiannan Tian, Martin Herbordt, Ang Li, Dingwen Tao
2022 arXiv   pre-print
To further improve performance, we explore the sparsity support of AIE and develop an efficient density-aware method to automatically map tiles of sparse matrix-matrix multiplication (SpMM) onto the systolic  ...  To this end we propose H-GCN, a PL (Programmable Logic) and AIE (AI Engine) based hybrid accelerator that leverages the emerging heterogeneity of Xilinx Versal Adaptive Compute Acceleration Platforms (  ...  It includes both sparse systolic tensor array and dense systolic tensor array; the sparse systolic tensor array is designed for sparse-dense matrix-matrix multiplications in GCNs, while the dense systolic  ... 
arXiv:2206.13734v1 fatcat:bpfurud6srawli6jmu2jx63vca

Exploiting recent SIMD architectural advances for irregular applications

Linchuan Chen, Peng Jiang, Gagan Agrawal
2016 Proceedings of the 2016 International Symposium on Code Generation and Optimization - CGO 2016  
This method has been applied to unstructured grids, molecular dynamics, and graph applications, in addition to sparse matrix computations.  ...  Based on the observation that all applications with indirect memory accesses can be viewed as sparse matrix computations, we design an optimization methodology, which includes three sub-steps: 1) locality  ...  Sparse Matrix-Matrix Multiplication Unlike irregular reductions and graph algorithms, Sparse Matrix Matrix Multiplication (SpMM) is an irregular application that accesses multiple sparse matrices.  ... 
doi:10.1145/2854038.2854046 dblp:conf/cgo/ChenJA16 fatcat:6g6s3f6oerchbayaklj77fvi44

Blending Extensibility and Performance in Dense and Sparse Parallel Data Management

Javier Fresno, Arturo Gonzalez-Escribano, Diego R. Llanos
2014 IEEE Transactions on Parallel and Distributed Systems  
matrix parallel library.  ...  Dealing with both dense and sparse data in parallel environments usually leads to two different approaches: To rely on a monolithic, hard-to-modify parallel library, or to code all data management details  ...  Sparse matrix-vector multiplication benchmark The first benchmark is a simple matrix-vector multiplication y = Ax, where the A matrix is sparse and the x and y vectors are dense.  ... 
doi:10.1109/tpds.2013.248 fatcat:ecciaa4e6razpmy24z7lefzutq

Accelerating sparse matrix–matrix multiplication with GPU Tensor Cores

Orestis Zachariadis, Nitin Satpute, Juan Gómez-Luna, Joaquín Olivares
2020 Computers & electrical engineering  
Modern GPUs include Tensor Core Units (TCUs), which specialize in dense matrix multiplication. Our aim is to re-purpose TCUs for sparse matrices.  ...  Sparse general matrix-matrix multiplication (spGEMM) is an essential component in many scientific and data analytics applications.  ...  Introduction Sparse general matrix-matrix multiplication (spGEMM), similar to its dense counterpart, performs the Matrix Multiplication (MM) of two sparse matrices.  ... 
doi:10.1016/j.compeleceng.2020.106848 fatcat:enqt7ck42banlilc2isavisi74

Adaptive Optimization of Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures

Shizhao Chen, Jianbin Fang, Donglin Chen, Chuanfu Xu, Zheng Wang
2018 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)  
Sparse matrix vector multiplication (SpMV) is one of the most common operations in scientific and highperformance applications, and is often responsible for the application performance bottleneck.  ...  The learned model can be be used to predict the best matrix representation for any unseen input for a given architecture.  ...  INTRODUCTION Sparse matrix-vector multiplication (SpMV) is commonly seen in scientific and high-performance applications [18] .  ... 
doi:10.1109/hpcc/smartcity/dss.2018.00116 dblp:conf/hpcc/ChenFCXW18 fatcat:nvtcmxgdt5afzmbnfl4zq46fga

Rescheduling for Locality in Sparse Matrix Computations [chapter]

Michelle Mills Strout, Larry Carter, Jeanne Ferrante
2001 Lecture Notes in Computer Science  
For dense matrix computations, loop transformations can be used to improve data locality.  ...  This paper describes an algorithm to tile at runtime called serial sparse tiling.  ...  For input, we use the sparse matrices generated for a nonlinear elasticity problem on 2D and 3D bar meshes. We generate different problem sizes by using FEtk's adaptive refinement.  ... 
doi:10.1007/3-540-45545-0_23 fatcat:jrkyz42nzbaf7hxuqifaojnt2u


Tom St. John, Benoît Meister, Andres Marquez, Joseph B. Manzano, Guang R. Gao, Xiaoming Li
2014 Proceedings of International Workshop on Adaptive Self-tuning Computing Systems - ADAPT '14  
We present a generic framework for collecting a useful set of runtime parameters and creating adaptive optimizations at all levels of the software stack.  ...  The software stack element used for validation is an adaptive data compression engine, which in practice could be part of the application or the runtime.  ...  We observed speedups ranging between 57% and 68% compared to standard matrix-vector multiplication and speedup of 15% compared to sparse matrix-vector multiplication.  ... 
doi:10.1145/2553062.2553063 fatcat:ztzkcn4ncbhn3lt73hgmn5nnxa

Dual-side Sparse Tensor Core [article]

Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng
2021 arXiv   pre-print
We propose a set of novel ISA extensions and co-design the matrix-matrix multiplication and convolution algorithms, which are the two dominant computation patterns in today's DNN models, to exploit our  ...  new dual-side sparse Tensor Core.  ...  Acknowledgements We thank the anonymous reviews for their thoughtful comments and suggestions.  ... 
arXiv:2105.09564v1 fatcat:nhyrsyhotzaspnemv6ihylqebu

VersaGNN: a Versatile accelerator for Graph neural networks [article]

Feng Shi, Ahren Yiqiao Jin, Song-Chun Zhu
2021 arXiv   pre-print
The Transformation (or Node Embedding) phase can be either dense or sparse-dense matrix multiplication.  ...  We then divide the computing engine into blocked systolic arrays to support the Strassen's algorithm for dense matrix multiplication, dramatically scaling down the number of multiplications and enabling  ...  Assume the sparse adjacency matrix is split into tiles, and each tile is stored in CSR or COO format.  ... 
arXiv:2105.01280v1 fatcat:vpvhdzrfkzhpjiysjshahr23ay
« Previous Showing results 1 — 15 out of 8,914 results