Filters








4,266 Hits in 3.9 sec

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, Charles E. Leiserson
2009 Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures - SPAA '09  
This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A T x to be computed efficiently in parallel, where A is an n × n sparse matrix  ...  Our algorithms use Θ(nnz) work (serial running time) and Θ( √ n lg n) span (critical-path length), yielding a parallelism of Θ(nnz / √ n lg n), which is amply high for virtually any large matrix.  ...  CSB_SpMV and CSB_SpMV_T use compressed sparse blocks to perform Ax and A T x, respectively.  ... 
doi:10.1145/1583991.1584053 dblp:conf/spaa/BulucFFGL09 fatcat:xxbrscitjfakth36jayigda7om

Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors

Kai Zhang, Shuming Chen, Yaohua Wang, Jianghua Wan
2013 IEICE Electronics Express  
The method includes a new sparse matrix compressed format, a block SpMV algorithm, and a vector write buffer.  ...  The low utilization of SIMD units and memory bandwidth is the main performance bottleneck on SIMD processors for sparse matrix-vector multiplication (SpMV), which is one of the most important kernels in  ...  A new sparse matrix compressed format, which is called stride-combination CSR with transpose (SCT), is proposed to increase the utilization of SIMD units.  ... 
doi:10.1587/elex.10.20130147 fatcat:ikfbytnsyzh2zdrqyctw4wii3i

Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations

Hasan Metin Aktulga, Aydin Buluc, Samuel Williams, Chao Yang
2014 2014 IEEE 28th International Parallel and Distributed Processing Symposium  
of that matrix) with multiple vectors (SpMM and SpMM T).  ...  We base our implementation on the compressed sparse blocks (CSB) matrix format and target systems with multi-core architectures.  ...  Conceptually SpMM performs a sparse matrix-vector multiplication on a block of m vectors.  ... 
doi:10.1109/ipdps.2014.125 dblp:conf/ipps/AktulgaBWY14 fatcat:st5jejogzfdatinzbmd2jy3vem

An Improved Sparse Matrix-Vector Multiply Based on Recursive Sparse Blocks Layout [chapter]

Michele Martone, Marcin Paprzycki, Salvatore Filippone
2012 Lecture Notes in Computer Science  
By laying out the matrix in sparse, non overlapping blocks, we allow for the shared memory parallel execution of transposed SParse Matrix-Vector multiply (SpMV ), with higher efficiency than the traditional  ...  Second, we look at the performance of standard and transposed shared memory parallel SpMV for unsymmetric matrices, using the proposed approach.  ...  We define the sparse matrix-vector multiply (SpMV ) operation as "y ← A x" and its transposed version (SpMV T ) as "y ← A T x" (where A is a sparse matrix, while x, y are vectors).  ... 
doi:10.1007/978-3-642-29843-1_69 fatcat:ie7qvvhvqzfihfova57qejzxdq

Distributed Sparse Matrices for Very High Level Languages [chapter]

John R. Gilbert, Steve Reinhardt, Viral B. Shah
2008 Advances in Computers  
We describe the design and implementation of a sparse matrix infrastructure for Star-P, a parallel implementation of the Matlab R programming language.  ...  Sparse matrices are first class objects in many VHLLs (very high level languages) used for scientific computing. They are a basic building block for various numerical and combinatorial algorithms.  ...  We hope that the our experiences will shape the design of future parallel sparse matrix infrastructures in other languages.  ... 
doi:10.1016/s0065-2458(08)00005-3 fatcat:hienrbxdu5hdjbnadvnulvl7ku

Parallel Transposition of Sparse Data Structures

Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng
2016 Proceedings of the 2016 International Conference on Supercomputing - ICS '16  
Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel transposition for sparse  ...  In this paper, we first identify that the transposition operation can be a bottleneck of some fundamental sparse matrix and graph algorithms.  ...  and mapping (SLAM) problem.  ... 
doi:10.1145/2925426.2926291 dblp:conf/ics/WangLHF16 fatcat:zhq2wcjy4vbdhf5bmtrqpsrxmq

High-Performance Fortran and possible extensions to support conjugate gradient algorithms

K. Dincer, G.C. Fox, K. Hawick
1996 Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing HPDC-96  
We discuss the use of intrinsic functions, data distribution directives and explicitly parallel constructs to optimize p e rjormance by minimizing communications requirements in a portable manner We focus  ...  We evaluate the High Pe$ormance Fortran (HPF) language f o r the compact expression and eficient implementation of conjugate gradient iterative matrix-solvers on High-Perjormance Computing and Communications  ...  We further thank Alok Choudhary for his suggestions and Elaine Weinman for proofreading this manuscript.  ... 
doi:10.1109/hpdc.1996.546175 dblp:conf/hpdc/DincerFH96 fatcat:ekwdyfrjrzatdjolfyolhczfju

yaSpMV

Shengen Yan, Chao Li, Yunquan Zhang, Huiyang Zhou
2014 Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14  
First, we devise a new SpMV format, called blocked compressed common coordinate (BCCOO), which uses bit flags to store the row indices in a blocked common coordinate (COO) format so as to alleviate the  ...  We further improve this format by partitioning the matrix into vertical slices to enhance the cache hit rates when accessing the vector to be multiplied.  ...  Related Work Sparse matrix-vector multiplication (SpMV) is so important that there have been numerous works optimizing its performance. We only discuss the most relevant ones here.  ... 
doi:10.1145/2555243.2555255 dblp:conf/ppopp/YanLZZ14 fatcat:zdtn4yxa7rhzlmsq574tnj6jxm

A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

Hasan Metin Aktulga, Md. Afibuzzaman, Samuel Williams, Aydin Buluc, Meiyue Shao, Chao Yang, Esmond G. Ng, Pieter Maris, James P. Vary
2017 IEEE Transactions on Parallel and Distributed Systems  
We consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations.  ...  We present techniques to significantly improve the SpMM and the transpose operation SpMM T by using the compressed sparse blocks (CSB) format.  ...  An existing implementation of CSB for sparse matrix-vector (SpMV) and transpose sparse matrix-vector (SpMV T ) multiplication stores nonzeros within each block using a space filling curve to exploit data  ... 
doi:10.1109/tpds.2016.2630699 fatcat:6w4u7qyec5ehtjld4deuq45cwe

Sparse GPU Kernels for Deep Learning [article]

Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen
2020 arXiv   pre-print
Based on these insights, we develop high-performance GPU kernels for two sparse matrix operations widely applicable in neural networks: sparse matrix-dense matrix multiplication and sampled dense-dense  ...  matrix multiplication.  ...  Cisco, SAP, and the NSF under CAREER grant CNS-1651570.  ... 
arXiv:2006.10901v2 fatcat:76wdsepdlffslgz3kkuxykwv5i

An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)

He Huang, Liqiang Wang, En-Jui Lee, Po Chen
2012 Procedia Computer Science  
On MPI level, our contributions include: (1) decompose both matrix and vector to increase parallelism; (2) design a static load balancing strategy.  ...  LSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve large-scale linear systems in seismic tomography.  ...  Our parallel scheme utilizes this feature, so we partition matrix based on row. We also use Compressed Sparse Column (CSC) to store matrix transpose.  ... 
doi:10.1016/j.procs.2012.04.009 fatcat:cg7salnelnajtor7snostd6moq

A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix

Ching-Hsien Hsu, Kun-Ming Yu
2004 Journal of Supercomputing  
Keywords: compressed diagonals remapping, data redistribution, banded matrix, sparse matrix, parallel algorithm, runtime support. statements of arrays that were distributed in arbitrary BLOCK-CYCLIC(c)  ...  The CDR technique uses an efficient one-dimensional indexing scheme to perform data redistribution on banded sparse matrix.  ...  The parallel sparse redistribution code uses a multiple scan approach for unpacking each message to construct local CRS vectors in the receiving phase.  ... 
doi:10.1023/b:supe.0000026846.74050.18 fatcat:omizx2ectbgkzhpikiwsioredi

A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix [chapter]

Ching-Hsien Hsu, Kun-Ming Yu
2003 Lecture Notes in Computer Science  
Keywords: compressed diagonals remapping, data redistribution, banded matrix, sparse matrix, parallel algorithm, runtime support. statements of arrays that were distributed in arbitrary BLOCK-CYCLIC(c)  ...  The CDR technique uses an efficient one-dimensional indexing scheme to perform data redistribution on banded sparse matrix.  ...  The parallel sparse redistribution code uses a multiple scan approach for unpacking each message to construct local CRS vectors in the receiving phase.  ... 
doi:10.1007/3-540-37619-4_8 fatcat:chcwbycbnzd3pdijtb42mibvta

Joint Schedule and Layout Autotuning for Sparse Matrices with Compound Entries on GPUs

Johannes Sebastian Mueller-Roemer, André Stork, Dieter W. Fellner
2019 International Symposium on Vision, Modeling, and Visualization  
We generalize several matrix layouts and apply joint schedule and layout autotuning to improve the performance of the sparse matrix-vector product on massively parallel graphics processing units.  ...  Large sparse matrices with compound entries, i.e., complex and quaternionic matrices as well as matrices with dense blocks, are a core component of many algorithms in geometry processing, physically based  ...  All matrix-vector multiplications were also performed using cuSPARSE, NVIDIA's own highly tuned sparse linear algebra library.  ... 
doi:10.2312/vmv.20191324 dblp:conf/vmv/Mueller-RoemerS19 fatcat:rfioejlbdve5dlpnrlshld3y34

Specifying and verifying sparse matrix codes

Gilad Arnold, Johannes Hölzl, Ali Sinan Köksal, Rastislav Bodík, Mooly Sagiv
2010 Proceedings of the 15th ACM SIGPLAN international conference on Functional programming - ICFP '10  
We show that it is reusable and extensible to hierarchical sparse formats. • We design a variable-free functional language for sparse matrix codes.  ...  Sparse matrix formats are typically implemented with low-level imperative programs.  ...  A prominent feature of JAD's proof goal is the double use of transpose, once during compression (jad) and once during multiplication (jadmv).  ... 
doi:10.1145/1863543.1863581 dblp:conf/icfp/ArnoldHKBS10 fatcat:nlousm3mkrh6hhq6cmz3zqb4ai
« Previous Showing results 1 — 15 out of 4,266 results