666 Hits in 5.2 sec

Parallel Candecomp/Parafac Decomposition of Sparse Tensors Using Dimension Trees

Oguz Kaya, Bora Uçar
2018 SIAM Journal on Scientific Computing  
We propose a fine-52 grain distributed memory parallel algorithm, and compare it against a medium-grain 53 variant [40].  ...  For this purpose, we investigate an efficient computation of the CP decomposition of sparse 8 tensors and its parallelization.  ...  medium-grain decomposition[40].619 However, using the fine-grain algorithm on a medium-grain partition effectively pro-620 vides a medium-grain algorithm.  ... 
doi:10.1137/16m1102744 fatcat:khyywxv4ebdrvdoud5wn7m54dq

An approach to locality-conscious load balancing and transparent memory hierarchy management with a global-address-space parallel programming model

S. Krishnamoorthy, U. Catalyurek, J. Nieplocha, P. Sadayappan
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
The programming model provides a global view of block-sparse matrices and a mechanism for the expression of parallel tasks that operate on blocksparse data.  ...  This paper describes a global-addressspace framework for the convenient specification and efficient execution of parallel out-of-core applications operating on block-sparse data.  ...  Acknowledgments We thank the National Science Foundation for the support of this research through grants 0121676, 0403342, and 0509467, and the U.S.  ... 
doi:10.1109/ipdps.2006.1639719 dblp:conf/ipps/KrishnamoorthyCNS06 fatcat:6vtxgp4spjhfbkkcmlh4gut2hq

An Exploration of Optimization Algorithms for High Performance Tensor Completion

Shaden Smith, Jongsoo Park, George Karypis
2016 SC16: International Conference for High Performance Computing, Networking, Storage and Analysis  
Tensor completion is most often accomplished via low-rank sparse tensor factorization, a computationally expensive non-convex optimization problem which has only recently been studied in the context of  ...  We explore opportunities for parallelism on shared-and distributed-memory systems and address challenges such as memory-and operation-efficiency, load balance, cache locality, and communication.  ...  ACKNOWLEDGMENTS The authors would like to thank anonymous reviewers for insightful feedback, Mikhail Smelyanskiy for valuable discussions, and Karlsson et al. for sharing source code used for evaluation  ... 
doi:10.1109/sc.2016.30 dblp:conf/sc/SmithPK16 fatcat:ecmrvmsngvcerjtfwlr23z3dom

Performance evaluation of OpenMP-based algorithms for handling Kronecker descriptors

Antonio M. Lima, Marco A.S. Netto, Thais Webber, Ricardo M. Czekster, Cesar A.F. De Rose, Paulo Fernandes
2012 Journal of Parallel and Distributed Computing  
This paper introduces a set of parallel implementations of a hybrid algorithm for handling descriptors and a detailed performance analysis on four real Markovian models.  ...  Numerical analysis of Markovian models is relevant for performance evaluation and probabilistic analysis of systems' behavior from several fields in science and engineering.  ...  Homogeneous tasks This section presents the performance results for the RS model. Each input size generates a different number of coarse-grained and fine-grained tasks (Table 5 ).  ... 
doi:10.1016/j.jpdc.2012.02.001 fatcat:jfz5mc24ifapdew2oqc3w7fute

Accelerated Stochastic Gradient for Nonnegative Tensor Completion and Parallel Implementation [article]

Ioanna Siaminou, Ioannis Marios Papagiannakos, Christos Kolomvakis, Athanasios P. Liavas
2021 arXiv   pre-print
We believe that our approach is a very competitive candidate for the solution of very large nonnegative tensor completion problems.  ...  We consider the problem of nonnegative tensor completion.  ...  In [18] , a hypergraph model for general medium-grain partitioning has been presented.  ... 
arXiv:2109.09534v1 fatcat:noeeuc4vwzes7g23oraq3xcqhi

ParCYCLIC: finite element modelling of earthquake liquefaction response on parallel computers

Jun Peng, Jinchi Lu, Kincho H. Law, Ahmed Elgamal
2004 International journal for numerical and analytical methods in geomechanics (Print)  
ordering strategies to minimize storage space for matrix coefficients; (c) an efficient scheme for the allocation of sparse matrix coefficients among the processors; and (d) a parallel sparse direct solver  ...  The elements of the computational strategy, designed for distributed-memory messagepassing parallel computer systems, include: (a) an automatic domain decomposer to partition finite element mesh; (b) nodal  ...  Mesh Partitioning Using Domain Decomposition In a parallel sparse solver, a domain decomposer is needed to partition the finite element mesh into subdomains.  ... 
doi:10.1002/nag.384 fatcat:s2r4jt6fazh7lmz6ptei3o7ppm

Parallel Nonnegative CP Decomposition of Dense Tensors [article]

Grey Ballard and Koby Hayashi and Ramakrishnan Kannan
2018 arXiv   pre-print
We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensor data that can enforce nonnegativity of the computed  ...  The CP tensor decomposition is a low-rank approximation of a tensor.  ...  [6] extend a parallel algorithm designed for sparse tensors [25] to the 3D dense case.  ... 
arXiv:1806.07985v1 fatcat:qx6g7qsrxvfsll2hu464cdvqsi

2021 Index IEEE Transactions on Parallel and Distributed Systems Vol. 32

2022 IEEE Transactions on Parallel and Distributed Systems  
., +, TPDS May 2021 1191-1209 Partitioning Models for General Medium-Grain Parallel Sparse Tensor Decomposition.  ...  ., +, TPDS July 2021 1765-1776 Partitioning Models for General Medium-Grain Parallel Sparse Tensor Decomposition.  ... 
doi:10.1109/tpds.2021.3107121 fatcat:e7bh2xssazdrjcpgn64mqh4hb4

Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product [article]

Grey Ballard and Nicholas Knight and Kathryn Rouse
2017 arXiv   pre-print
The matricized-tensor times Khatri-Rao product computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor.  ...  tensors.  ...  In particular, Smith and Karypis [16] describe a "medium-grained" parallelization scheme that is designed for sparse tensors but can be applied to dense tensors. Indeed, Liavas et al.  ... 
arXiv:1708.07401v2 fatcat:fczjrg5lyngv7gjpwgkair3fee

PLANC: Parallel Low Rank Approximation with Non-negativity Constraints [article]

Srinivas Eswar, Koby Hayashi, Grey Ballard, Ramakrishnan Kannan, Michael A. Matheson, Haesun Park
2019 arXiv   pre-print
We present a software package called PLANC (Parallel Low Rank Approximation with Non-negativity Constraints), which implements our solution and allows for extension in terms of data (dense or sparse, matrices  ...  We consider the problem of low-rank approximation of massive dense non-negative tensor data, for example to discover latent patterns in video and imaging applications.  ...  [35] extend a parallel algorithm designed for sparse tensors [50] to the 3D dense case.  ... 
arXiv:1909.01149v1 fatcat:6lqh5upkujf3fpbd5wucc6wd24

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights [article]

Shail Dave, Riyadh Baghdadi, Tony Nowatzki, Sasikanth Avancha, Aviral Shrivastava, Baoxin Li
2021 arXiv   pre-print
structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; understanding recent design trends for  ...  This paper provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators.  ...  Support for Sparse Tensors 1) Challenges in supporting sparse tensors: While compiler support is needed in general for targeting ML hardware accelerators with diverse features, sparse tensor computations  ... 
arXiv:2007.00864v2 fatcat:k4o2xboh4vbudadfiriiwjp7uu

High-performance finite-element simulations of seismic wave propagation in three-dimensional nonlinear inelastic geological media

Fabrice Dupros, Florent De Martin, Evelyne Foerster, Dimitri Komatitsch, Jean Roman
2010 Parallel Computing  
A specific methodology is introduced for the parallel assembly in the context of soil nonlinearity.  ...  We demonstrate the feasibility of large scale modeling based on an implicit numerical scheme and a nonlinear constitutive model.  ...  We thank Xavier Lacoste and Mathieu Faverge from INRIA Bordeaux Sud-Ouest, Bacchus project for the support on the solver usage and Faiza Boulahya and Luc Frauciel from BRGM, France for discussions on graph  ... 
doi:10.1016/j.parco.2009.12.011 fatcat:igti3ri2hjhyfpcsple4lbtejm

A generic interface for parallel cell-based finite element operator application

Martin Kronbichler, Katharina Kormann
2012 Computers & Fluids  
We present a memory-efficient and parallel framework for finite element operator application implemented in the generic open-source library deal.II.  ...  Instead of assembling a sparse matrix and using it for matrix-vector products, the operation is applied by cell-wise quadrature.  ...  Bangerth, Texas A&M University, for valuable discussions and comments on the manuscript. Also, discussions with M. Gustafsson and E. Rudberg, Uppsala University, are acknowledged.  ... 
doi:10.1016/j.compfluid.2012.04.012 fatcat:lgfjlhyjvjegpky7ckb3rmxwhe

Parallel finite element modeling of earthquake ground response and liquefaction

Jinchi Lu, Jun Peng, Ahmed Elgamal, Zhaohui Yang, Kincho H. Law
2004 Earthquake Engineering and Engineering Vibration  
Mesh Partitioning Using Domain Decomposition In a parallel FE program, a domain decomposer is needed to automatically partition the FE mesh into subdomains.  ...  The multilevel partitioning method is quite different from traditional methods ( This routine is very useful for generating the ordering for the sparse solver so that the storage of the sparse matrix can  ...  Appendix B Figures of Wharf Simulation Results Appendix B Figures of Wharf Simulation Results This appendix lists figures of the results for the wharf simulations discussed in Chapter 7.  ... 
doi:10.1007/bf02668848 fatcat:unurcxrwqvehhdfahnqsnmxbbu

Big Data Reduction Methods: A Survey

Muhammad Habib ur Rehman, Chee Sun Liew, Assad Abbas, Prem Prakash Jayaraman, Teh Ying Wah, Samee U. Khan
2016 Data Science and Engineering  
This article presents a review of methods that are used for big data reduction.  ...  Another perspective for big data reduction is that the million variables big datasets cause the curse of dimensionality which requires unbounded computational resources to uncover actionable knowledge  ...  The DisBelief model first achieves parallelism by partitioning large-scale networks into small blocks that are mapped to a single node and then achieves data parallelism using two separate distribution  ... 
doi:10.1007/s41019-016-0022-0 fatcat:3ivz52kpz5dhratokm4uenuoc4
« Previous Showing results 1 — 15 out of 666 results