Filters








12,821 Hits in 6.3 sec

Matrix multiplication

N. Eiron, M. Rodeh, I. Steinwarts
1999 ACM Journal of Experimental Algorithmics  
In this paper we introduce a cache aware algorithm for matrix multiplication.  ...  We also suggest generic guidelines that may be applied to compute intensive algorithm to efficiently utilize the data cache. We believe that some of our concepts may be embodied in compilers.  ...  Cache Utilization· 11 Matrix Multiplication: A Case Study of Enhanced Data Cache Utilization· 13 Matrix Multiplication: A Case Study of Enhanced Data Cache Utilization· 19 Matrix Multiplication:  ... 
doi:10.1145/347792.347806 fatcat:7dbzvbop5nd2zicmnayypu2nt4

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

Ariful Azad, Grey Ballard, Aydin Buluç, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, Samuel Williams
2016 SIAM Journal on Scientific Computing  
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid.  ...  In this work, we present the first ever implementation of the 3D SpGEMM formulation that also exploits multiple (intra-node and inter-node) levels of parallelism, achieving significant speedups over the  ...  Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under contract number DE-AC02-05CH11231.  ... 
doi:10.1137/15m104253x fatcat:cv7oa32jdnedxapnga63ljogf4

Cache-Oblivious Sparse Matrix–Vector Multiplication by Using Sparse Matrix Partitioning Methods

A. N. Yzelman, Rob H. Bisseling
2009 SIAM Journal on Scientific Computing  
The savings in computation time achieved by our matrix reorderings reach up to 50 percent, in the case of a large link matrix.  ...  In this article, we introduce a cache-oblivious method for sparse matrix-vector multiplication.  ...  We are grateful to the anonymous referees who helped us a lot by their careful reading and constructive remarks.  ... 
doi:10.1137/080733243 fatcat:vwxmialmgjfpffjsqt2ulmwqb4

Scaling sparse matrix-matrix multiplication in the accumulo database

Gunduz Vehbi Demirci, Cevdet Aykanat
2019 Distributed and parallel databases  
We propose and implement a sparse matrix-matrix multiplication (SpGEMM) algorithm running on top of Accumulo's iterator framework which enables high performance distributed parallelism.  ...  We also propose a matrix partitioning scheme which reduces the total communication volume and provides a balance of workload among servers.  ...  So, the local matrix multiplication algorithm performed by T k is a variant of columnby-column paralel SpGEMM although the proposed iterator algorithm utilizes the row-by-row parallelization to gather  ... 
doi:10.1007/s10619-019-07257-y fatcat:ougixtisvjho3pyr2it3ihfiiy

Efficient Mixed-Precision Tall-and-Skinny Matrix-Matrix Multiplication for GPUs

Hao Tang, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi
2021 International Journal of Networking and Computing  
General matrix-matrix multiplication (GEMM) is a commonly used BLAS level-3 routine in big data analysis and scientific computations.  ...  To examine the effectiveness of the proposed optimization methods, the experiments are conducted in two cases of GEMM that take tall-and-skinny matrices as input.  ...  Acknowledgments This research was partially supported by MEXT Next Generation High-Performance Computing Infrastructures and Applications R&D Program, entitled "R&D of A Quantum-Annealing-Assisted Next  ... 
doi:10.15803/ijnc.11.2_267 fatcat:vr3qutjwu5erbntgaj6xbrdmdy

Hypergraph Partitioning Based Models and Methods for Exploiting Cache Locality in Sparse Matrix-Vector Multiplication

Kadir Akbudak, Enver Kayaaslan, Cevdet Aykanat
2013 SIAM Journal on Scientific Computing  
The multiple-SpMxV framework depends on splitting a given matrix into a sum of multiple nonzero-disjoint matrices so that the SpMxV operation is performed as a sequence of multiple input- and output-dependent  ...  For the single-SpMxV framework, we propose two cache-size-aware top-down row/column-reordering methods based on 1D and 2D sparse matrix partitioning by utilizing the column-net and enhancing the row-column-net  ...  Here, temporal locality refers to the reuse of data words (e.g., x-vector entries) before eviction of the words from cache, whereas spatial locality refers to the use of data words (e.g., matrix nonzeros  ... 
doi:10.1137/100813956 fatcat:mhx4okcv6jbwlk63w3klbl6zyu

Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, James Demmel
2007 Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07  
In this work, we examine sparse matrix-vector multiply (SpMV) -one of the most heavily used kernels in scientific computing -across a broad spectrum of multicore designs.  ...  Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientific study of the highly multithreaded Sun Niagara2.  ...  These techniques include register-and cache-level blocking, exploiting symmetry, multiple vectors, variable block and diagonal structures, and locality-enhancing reordering.  ... 
doi:10.1145/1362622.1362674 dblp:conf/sc/WilliamsOVSYD07 fatcat:e4one5xz6bftnkr3zwmtoieyxa

Fast sparse matrix-vector multiplication on GPUs

Xintian Yang, Srinivasan Parthasarathy, P. Sadayappan
2011 Proceedings of the VLDB Endowment  
Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry.  ...  Using real web graph data, we show how our representation scheme, coupled with a novel tiling algorithm, can yield significant benefits over the current state of the art GPU efforts on a number of core  ...  The resultant cache misses reduce memory bandwidth utilization due to the long latency of non-coalesced global memory accesses. Solution 1: Tiling matrix A and vector x with texture cache.  ... 
doi:10.14778/1938545.1938548 fatcat:ki6uzhelsvazpdt6i3udym3hbe

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication [article]

Grzegorz Kwasniewski, Joost VandeVondele Department of Computer Science, ETH Zurich, Swiss National Supercomputing Centre
2019 arXiv   pre-print
We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near communication-optimal for all combinations of matrix dimensions, processor counts, and memory sizes.  ...  The key idea behind COSMA is to derive an optimal (up to a factor of 0.03% for 10MB of fast memory) sequential schedule and then parallelize it, preserving I/O optimality.  ...  Acknowledgements We thank Yishai Oltchik and Niels Gleinig for invaluable help with the theoretical part of the paper, and Simon Pintarelli for advice and support with the implementation.  ... 
arXiv:1908.09606v3 fatcat:7wkxq2qd7rfw7jjjukhh23gbfm

SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication [article]

Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte
2021 arXiv   pre-print
coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit.  ...  Further, our design improves performance by effectively mapping the sparse data to the hardware units by utilizing sparsity in both input feature maps and weights.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.  ... 
arXiv:2107.13386v2 fatcat:k7oampka5rdztojmmwrr2yvnfm

GE-SpMM: General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks [article]

Guyue Huang, Guohao Dai, Yu Wang, Huazhong Yang
2020 arXiv   pre-print
Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operator in GNNs, which performs a multiplication between a sparse matrix and a dense matrix.  ...  We introduce the Coalesced Row Caching method to process columns in parallel and ensure coalesced access to sparse matrix data.  ...  memory to cache sparse matrix rows.  ... 
arXiv:2007.03179v1 fatcat:epshcpa7fbbdbcjchdkrocej3m

Locality-Aware Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication on Many-Core Processors

M. Ozan Karsavuran, Kadir Akbudak, Cevdet Aykanat
2016 IEEE Transactions on Parallel and Distributed Systems  
Sparse matrix-vector and matrix-transpose-vector multiplication (SpMM T V) repeatedly performed as z A T x and y A z (or y A w) for the same sparse matrix A is a kernel operation widely used in various  ...  These two methods utilize rowwise and columnwise singly bordered block-diagonal (SB) forms of A. We evaluate the validity of our methods on a wide range of sparse matrices.  ...  Here, temporal locality refers to the reuse of data words (e.g., vector entries and matrix nonzeros) within a relatively small time duration, actually before eviction of the words from cache.  ... 
doi:10.1109/tpds.2015.2453970 fatcat:bnwlk426mrbbldag3cxrq2bkyy

Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform

Shiming Xu, Wei Xue, Hai Xiang Lin
2011 Journal of Supercomputing  
For GPU with better cache support, we propose differentiated memory access scheme to avoid contamination of caches by matrix data.  ...  We propose optimization of SpMV based on ELLPACK from two aspects: (1) enhanced performance for the dense vector by reducing cache misses, and (2) reduce accessed matrix data by index reduction.  ...  For GF-100 architecture with better cache support, we propose differentiated cache accesses to further enhance cache utilization with inline PTX codes.  ... 
doi:10.1007/s11227-011-0626-0 fatcat:twrincqtv5bvfeb2qhqvld7xqi

Optimization of sparse matrix–vector multiplication on emerging multicore platforms

Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, James Demmel
2009 Parallel Computing  
We also find that using multiple cores provides considerably higher speedups than single-core code and data structure transformations alone.  ...  In this work, we examine sparse matrix-vector multiply (SpMV) -one of the most heavily used kernels in scientific computing -across a broad spectrum of multicore designs.  ...  All authors from Lawrence Berkeley National Laboratory were supported by the Office of Advanced Scientific Computing Research in the Department of Energy Office of Science under contract number DE-AC02  ... 
doi:10.1016/j.parco.2008.12.006 fatcat:uywjc6jlvfavdng43t2v62ykye

Scalability of Hybrid Sparse Matrix Dense Vector (SpMV) Multiplication

Brian A. Page, Peter M. Kogge
2018 2018 International Conference on High Performance Computing & Simulation (HPCS)  
Issues with both data placement and remote reductions are modeled over a range of matrix characteristics. Those factors that limit strong scalability are quantified.  ...  SpMV, the product of a sparse matrix and a dense vector, is emblematic of a new class of applications that are memory bandwidth and communication, not flop, driven.  ...  DE-NA0002377 as part of the Predictive Science Academic Alliance Program II, in part by NSF grant CCF-1642280, and in part by the University of Notre Dame.  ... 
doi:10.1109/hpcs.2018.00072 dblp:conf/ieeehpcs/PageK18 fatcat:6rrfyhxsg5f4zgsglih2rz4aku
« Previous Showing results 1 — 15 out of 12,821 results