A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is `application/pdf`

.

## Filters

##
###
Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

2009
*
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures - SPAA '09
*

This paper introduces a storage format for

doi:10.1145/1583991.1584053
dblp:conf/spaa/BulucFFGL09
fatcat:xxbrscitjfakth36jayigda7om
*sparse*matrices, called*compressed**sparse**blocks*(CSB), which allows both Ax*and*A T x to be computed efficiently in*parallel*, where A is an n × n*sparse**matrix*... Our algorithms*use*Θ(nnz) work (serial running time)*and*Θ( √ n lg n) span (critical-path length), yielding a*parallelism*of Θ(nnz / √ n lg n), which is amply high for virtually any large*matrix*. ... CSB_SpMV*and*CSB_SpMV_T*use**compressed**sparse**blocks*to perform Ax*and*A T x, respectively. ...##
###
Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors

2013
*
IEICE Electronics Express
*

The method includes a new

doi:10.1587/elex.10.20130147
fatcat:ikfbytnsyzh2zdrqyctw4wii3i
*sparse**matrix**compressed*format, a*block*SpMV algorithm,*and*a*vector*write buffer. ... The low utilization of SIMD units*and*memory bandwidth is the main performance bottleneck on SIMD processors for*sparse**matrix*-*vector**multiplication*(SpMV), which is one of the most important kernels in ... A new*sparse**matrix**compressed*format, which is called stride-combination CSR with*transpose*(SCT), is proposed to increase the utilization of SIMD units. ...##
###
Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations

2014
*
2014 IEEE 28th International Parallel and Distributed Processing Symposium
*

of that

doi:10.1109/ipdps.2014.125
dblp:conf/ipps/AktulgaBWY14
fatcat:st5jejogzfdatinzbmd2jy3vem
*matrix*) with*multiple**vectors*(SpMM*and*SpMM T). ... We base our implementation on the*compressed**sparse**blocks*(CSB)*matrix*format*and*target systems with multi-core architectures. ... Conceptually SpMM performs a*sparse**matrix*-*vector**multiplication*on a*block*of m*vectors*. ...##
###
An Improved Sparse Matrix-Vector Multiply Based on Recursive Sparse Blocks Layout
[chapter]

2012
*
Lecture Notes in Computer Science
*

By laying out the

doi:10.1007/978-3-642-29843-1_69
fatcat:ie7qvvhvqzfihfova57qejzxdq
*matrix*in*sparse*, non overlapping*blocks*, we allow for the shared memory*parallel*execution of*transposed**SParse**Matrix*-*Vector*multiply (SpMV ), with higher efficiency than the traditional ... Second, we look at the performance of standard*and**transposed*shared memory*parallel*SpMV for unsymmetric matrices,*using*the proposed approach. ... We define the*sparse**matrix*-*vector*multiply (SpMV ) operation as "y ← A x"*and*its*transposed*version (SpMV T ) as "y ← A T x" (where A is a*sparse**matrix*, while x, y are*vectors*). ...##
###
Distributed Sparse Matrices for Very High Level Languages
[chapter]

2008
*
Advances in Computers
*

We describe the design

doi:10.1016/s0065-2458(08)00005-3
fatcat:hienrbxdu5hdjbnadvnulvl7ku
*and*implementation of a*sparse**matrix*infrastructure for Star-P, a*parallel*implementation of the Matlab R programming language. ...*Sparse*matrices are first class objects in many VHLLs (very high level languages)*used*for scientific computing. They are a basic building*block*for various numerical*and*combinatorial algorithms. ... We hope that the our experiences will shape the design of future*parallel**sparse**matrix*infrastructures in other languages. ...##
###
Parallel Transposition of Sparse Data Structures

2016
*
Proceedings of the 2016 International Conference on Supercomputing - ICS '16
*

Even though many

doi:10.1145/2925426.2926291
dblp:conf/ics/WangLHF16
fatcat:zhq2wcjy4vbdhf5bmtrqpsrxmq
*parallel**sparse*primitives such as*sparse**matrix*-*vector*(SpMV)*multiplication*have been extensively studied, some other important building*blocks*, e.g.,*parallel*transposition for*sparse*... In this paper, we first identify that the transposition operation can be a bottleneck of some fundamental*sparse**matrix**and*graph algorithms. ...*and*mapping (SLAM) problem. ...##
###
High-Performance Fortran and possible extensions to support conjugate gradient algorithms

1996
*
Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing HPDC-96
*

We discuss the

doi:10.1109/hpdc.1996.546175
dblp:conf/hpdc/DincerFH96
fatcat:ekwdyfrjrzatdjolfyolhczfju
*use*of intrinsic functions, data distribution directives*and*explicitly*parallel*constructs to optimize p e rjormance by minimizing communications requirements in a portable manner We focus ... We evaluate the High Pe$ormance Fortran (HPF) language f o r the compact expression*and*eficient implementation of conjugate gradient iterative*matrix*-solvers on High-Perjormance Computing*and*Communications ... We further thank Alok Choudhary for his suggestions*and*Elaine Weinman for proofreading this manuscript. ...##
###
yaSpMV

2014
*
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14
*

First, we devise a new SpMV format, called

doi:10.1145/2555243.2555255
dblp:conf/ppopp/YanLZZ14
fatcat:zdtn4yxa7rhzlmsq574tnj6jxm
*blocked**compressed*common coordinate (BCCOO), which*uses*bit flags to store the row indices in a*blocked*common coordinate (COO) format so as to alleviate the ... We further improve this format by partitioning the*matrix*into vertical slices to enhance the cache hit rates when accessing the*vector*to be multiplied. ... Related Work*Sparse**matrix*-*vector**multiplication*(SpMV) is so important that there have been numerous works optimizing its performance. We only discuss the most relevant ones here. ...##
###
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

2017
*
IEEE Transactions on Parallel and Distributed Systems
*

We consider a

doi:10.1109/tpds.2016.2630699
fatcat:6w4u7qyec5ehtjld4deuq45cwe
*block*iterative eigensolver whose main computational kernels are the*multiplication*of a*sparse**matrix*with*multiple**vectors*(SpMM),*and*tall-skinny*matrix*operations. ... We present techniques to significantly improve the SpMM*and*the*transpose*operation SpMM T by*using*the*compressed**sparse**blocks*(CSB) format. ... An existing implementation of CSB for*sparse**matrix*-*vector*(SpMV)*and**transpose**sparse**matrix*-*vector*(SpMV T )*multiplication*stores nonzeros within each*block**using*a space filling curve to exploit data ...##
###
Sparse GPU Kernels for Deep Learning
[article]

2020
*
arXiv
*
pre-print

Based on these insights, we develop high-performance GPU kernels for two

arXiv:2006.10901v2
fatcat:76wdsepdlffslgz3kkuxykwv5i
*sparse**matrix*operations widely applicable in neural networks:*sparse**matrix*-dense*matrix**multiplication**and*sampled dense-dense ...*matrix**multiplication*. ... Cisco, SAP,*and*the NSF under CAREER grant CNS-1651570. ...##
###
An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)

2012
*
Procedia Computer Science
*

On MPI level, our contributions include: (1) decompose both

doi:10.1016/j.procs.2012.04.009
fatcat:cg7salnelnajtor7snostd6moq
*matrix**and**vector*to increase*parallelism*; (2) design a static load balancing strategy. ... LSQR (*Sparse*Equations*and*Least Squares) is a widely*used*Krylov subspace method to solve large-scale linear systems in seismic tomography. ... Our*parallel*scheme utilizes this feature, so we partition*matrix*based on row. We also*use**Compressed**Sparse*Column (CSC) to store*matrix**transpose*. ...##
###
A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix

2004
*
Journal of Supercomputing
*

Keywords:

doi:10.1023/b:supe.0000026846.74050.18
fatcat:omizx2ectbgkzhpikiwsioredi
*compressed*diagonals remapping, data redistribution, banded*matrix*,*sparse**matrix*,*parallel*algorithm, runtime support. statements of arrays that were distributed in arbitrary*BLOCK*-CYCLIC(c) ... The CDR technique*uses*an efficient one-dimensional indexing scheme to perform data redistribution on banded*sparse**matrix*. ... The*parallel**sparse*redistribution code*uses*a*multiple*scan approach for unpacking each message to construct local CRS*vectors*in the receiving phase. ...##
###
A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix
[chapter]

2003
*
Lecture Notes in Computer Science
*

Keywords:

doi:10.1007/3-540-37619-4_8
fatcat:chcwbycbnzd3pdijtb42mibvta
*compressed*diagonals remapping, data redistribution, banded*matrix*,*sparse**matrix*,*parallel*algorithm, runtime support. statements of arrays that were distributed in arbitrary*BLOCK*-CYCLIC(c) ... The CDR technique*uses*an efficient one-dimensional indexing scheme to perform data redistribution on banded*sparse**matrix*. ... The*parallel**sparse*redistribution code*uses*a*multiple*scan approach for unpacking each message to construct local CRS*vectors*in the receiving phase. ...##
###
Joint Schedule and Layout Autotuning for Sparse Matrices with Compound Entries on GPUs

2019
*
International Symposium on Vision, Modeling, and Visualization
*

We generalize several

doi:10.2312/vmv.20191324
dblp:conf/vmv/Mueller-RoemerS19
fatcat:rfioejlbdve5dlpnrlshld3y34
*matrix*layouts*and*apply joint schedule*and*layout autotuning to improve the performance of the*sparse**matrix*-*vector*product on massively*parallel*graphics processing units. ... Large*sparse*matrices with compound entries, i.e., complex*and*quaternionic matrices as well as matrices with dense*blocks*, are a core component of many algorithms in geometry processing, physically based ... All*matrix*-*vector**multiplications*were also performed*using*cuSPARSE, NVIDIA's own highly tuned*sparse*linear algebra library. ...##
###
Specifying and verifying sparse matrix codes

2010
*
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming - ICFP '10
*

We show that it is reusable

doi:10.1145/1863543.1863581
dblp:conf/icfp/ArnoldHKBS10
fatcat:nlousm3mkrh6hhq6cmz3zqb4ai
*and*extensible to hierarchical*sparse*formats. • We design a variable-free functional language for*sparse**matrix*codes. ...*Sparse**matrix*formats are typically implemented with low-level imperative programs. ... A prominent feature of JAD's proof goal is the double*use*of*transpose*, once during*compression*(jad)*and*once during*multiplication*(jadmv). ...
« Previous

*Showing results 1 — 15 out of 4,266 results*