4,320 Hits in 2.5 sec

Solving Tall Dense Linear Programs in Nearly Linear Time [article]

Jan van den Brand, Yin Tat Lee, Aaron Sidford, Zhao Song
2021 arXiv   pre-print
Interestingly, we obtain this running time without using fast matrix multiplication and consequently, barring a major advance in linear system solving, our running time is near optimal for solving dense  ...  In this paper we provide an Õ(nd+d^3) time randomized algorithm for solving linear programs with d variables and n constraints with high probability.  ...  Acknowledgements We thank Sébastien Bubeck, Ofer Dekel, Jerry Li, Ilya Razenshteyn, and Microsoft Research for facilitating conversations between and hosting researchers involved in this collaboration.  ... 
arXiv:2002.02304v2 fatcat:palnpmhb2rdsrp6yptsshobbpu

Mapping Dense LU Factorization on Multicore Supercomputer Nodes

Jonathan Lifflander, Phil Miller, Ramprasad Venkataraman, Anshu Arya, Laxmikant Kale, Terry Jones
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium  
Dense LU factorization is a prominent benchmark used to rank the performance of supercomputers.  ...  A blockcyclic mapping in the row-major order does not encounter this problem, but consequently sacrifices node and network locality in the critical pivoting steps.  ...  Running time on Jaguar, a resource of the National Center for Computational Sciences at Oak Ridge National Laboratory, was supported by DOE contract DE-AC05-00OR22725.  ... 
doi:10.1109/ipdps.2012.61 dblp:conf/ipps/LifflanderMVAKJ12 fatcat:5zahzsd2jjdwncf5rjbp65ggxq

Tall and skinny QR factorizations in MapReduce architectures

Paul G. Constantine, David F. Gleich
2011 Proceedings of the second international workshop on MapReduce and its applications - MapReduce '11  
We present an implementation of the tall and skinny QR (TSQR) factorization in the Map-Reduce framework, and we provide computational results for nearly terabyte-sized datasets.  ...  These tasks run in just a few minutes under a variety of parameter choices.  ...  We would also like to thank James Demmel for suggesting examining the reference streaming time.  ... 
doi:10.1145/1996092.1996103 fatcat:4ykdtyl6ezeppgz6z3mhkmtwzy

Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels [article]

Sivasankaran Rajamanickam, Seher Acer, Luc Berger-Vergiat, Vinh Dang, Nathan Ellingwood, Evan Harvey, Brian Kelley, Christian R. Trott, Jeremiah Wilke, Ichitaro Yamazaki
2021 arXiv   pre-print
We describe Kokkos Kernels, a library of kernels for sparse linear algebra, dense linear algebra and graph kernels.  ...  As hardware architectures are evolving in the push towards exascale, developing Computational Science and Engineering (CSE) applications depend on performance portable approaches for sustainable software  ...  However, no portable solution existed for sparse/dense linear algebra kernels before the Kokkos Kernels library was created as part of the Advanced Technology Development and Mitigation program and the  ... 
arXiv:2103.11991v1 fatcat:m7iskgt5kjdjnex7lenqsjj6z4

Analysis of a Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

Ang Li, Radu Serban, Dan Negrut
2017 SIAM Journal on Scientific Computing  
We discuss an approach for solving sparse or dense banded linear systems Ax = b on a graphics processing unit (GPU) card.  ...  In a comparison against Intel's MKL, SaP::GPU also fared well when used to solve dense banded systems that are close to being diagonally dominant.  ...  Note that K selection is relevant only in the context of solving sparse linear systems; for dense banded, K is a given.  ... 
doi:10.1137/15m1039523 fatcat:oblxmqbzpbcvjc4mbslummq6vi

Row Modifications of a Sparse Cholesky Factorization

Timothy A. Davis, William W. Hager
2005 SIAM Journal on Matrix Analysis and Applications  
Additional illustrations can be found in [12] . A variety of techniques for modifying a dense Cholesky factorization are given in the classic reference [11] .  ...  We also determine how the solution of a linear system Lx = b changes after changing a row and column of C or after a rank-r change in C. where e i is the ith column of the identity matrix.  ...  In each iteration of the linear programming dual active set algorithm (LPDASA) (see [5, 13, 14, 15, 16, 17] ), we solve a symmetric linear system of the form Cλ = f , C = A F A T F + σI, where σ > 0 is  ... 
doi:10.1137/s089547980343641x fatcat:64pphvso4vctvkptvofyoajgba

Accelerating an Iterative Eigensolver for Nuclear Structure Configuration Interaction Calculations on GPUs using OpenACC [article]

Pieter Maris, Chao Yang, Dossay Oryspayev, Brandon Cook
2021 arXiv   pre-print
step and dense linear algebra operations.  ...  (GPUs), we modified a previously developed hybrid MPI/OpenMP implementation of an eigensolver written in FORTRAN 90 by using an OpenACC directives based programming model.  ...  This work was supported in part by the U.  ... 
arXiv:2109.00485v1 fatcat:4f243aknszg43ljupfwxrqfxnu

Communication-Avoiding QR Decomposition for GPUs

Michael Anderson, Grey Ballard, James Demmel, Kurt Keutzer
2011 2011 IEEE International Parallel & Distributed Processing Symposium  
As a result, we outperform CULA, a parallel linear algebra library for GPUs, by up to 13x for tall-skinny matrices.  ...  We show that the reduction in memory traffic provided by CAQR allows us to outperform existing parallel GPU implementations of QR for a large class of tall-skinny matrices.  ...  The most common example is linear least squares, which is ubiquitous in nearly all branches of science and engineering and can be solved using QR.  ... 
doi:10.1109/ipdps.2011.15 dblp:conf/ipps/AndersonBDK11 fatcat:if32u2vn7natnmmnskpftflnua

Gravitational Instabilities in the Disks of Massive Protostars as an Explanation for Linear Distributions of Methanol Masers

Richard H. Durisen, Annie C. Mejia, Brian K. Pickett, Thomas W. Hartquist
2001 Astrophysical Journal  
This is particularly true for methanol (CH 3 OH), for which linear distributions of masers are found with disklike kinematics.  ...  instabilities leads to a complex of intersecting spiral shocks, clumps, and arclets within the disk and to significant time-dependent, nonaxisymmetric distortions of the disk surface.  ...  Note the tall ridges of material in the arms.  ... 
doi:10.1086/338738 fatcat:u2ixysyckjaopitjmibmothwju

Large Scale Distributed Linear Algebra With Tensor Processing Units [article]

Adam G.M. Lewis, Jackson Beall, Martin Ganahl, Markus Hauru, Shrestha Basu Mallick, Guifre Vidal
2021 arXiv   pre-print
Via curated algorithms emphasizing large, single-core matrix multiplications, other tasks in dense linear algebra can similarly scale.  ...  can multiply two matrices with linear size N= 220= 1 048 576 in about 2 minutes.  ...  We consider the case of A given as a dense, full-rank matrix, in which case ( 7 ) is typically solved in O(N 3 ) operations via an initial LU decomposition.  ... 
arXiv:2112.09017v1 fatcat:ahdbdepkq5ajjc7bcjay5dfaj4

Block Iterative Methods and Recycling for Improved Scalability of Linear Solvers

Pierre Jolivet, Pierre-Henri Tournier
2016 SC16: International Conference for High Performance Computing, Networking, Storage and Analysis  
This work has been supported in part by ANR through project MEDIMAX, ANR-13-MONU-0012. The first author was partially funded by the French Association of Mechanics (AFM) for this work.  ...  In the original paper, each linear system is solved with a single right-hand side, i.e. p = 1.  ...  Non-variable linear systems For some time-dependent PDEs, it is necessary to solve sequences of linear systems where the operator is the same throughout the sequence, and only the right-hand sides are  ... 
doi:10.1109/sc.2016.16 dblp:conf/sc/JolivetT16 fatcat:dhydaneaarcyfgjhvjwcmdofzu

Efficient Methods for Out-of-Core Sparse Cholesky Factorization

Edward Rothberg, Robert Schreiber
1999 SIAM Journal on Scientific Computing  
We find that straightforward implementations of all of them suffer from excessive disk I/O for large problems that arise in interiorpoint algorithms for linear programming.  ...  We nd that straightforward implementations of all of them su er from excessive disk I O for large problems that arise in interior-point algorithms for linear programming.  ...  The e ect is that extremely large sparse linear systems can be solved in reasonable time on very inexpensive systems.  ... 
doi:10.1137/s1064827597322975 fatcat:t5kpc3qcezhn3pf2e2m6jkgd5u

Parallel distributed-memory simplex for large-scale stochastic LP problems

Miles Lubin, J. A. Julian Hall, Cosmin G. Petra, Mihai Anitescu
2013 Computational optimization and applications  
We present a parallelization of the revised simplex method for large extensive forms of two-stage stochastic linear programming (LP) problems.  ...  It is built on novel analysis of the linear algebra for dual block-angular LP problems when solved by using the revised simplex method and a novel parallel scheme for applying product-form updates.  ...  Total time in PRICE per iteration is given on the right. the path for efficiently solving stochastic programming problems in these two contexts.  ... 
doi:10.1007/s10589-013-9542-y fatcat:obkds5dx6fdbddx55ncxpr3lym

Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software

Erik Elmroth, Fred Gustavson, Isak Jonsson, Bo Kågström
2004 SIAM Review  
In fact, the whole gamut of existing dense linear algebra factorization is beginning to be reexamined in view of the recursive paradigm.  ...  Novel recursive blocked algorithms offer new ways to compute factorizations such as Cholesky and QR and to solve matrix equations.  ...  Some of the main points are the following: • Recursion creates new algorithms for linear algebra software. • Recursion can be used to express dense linear algebra algorithms entirely in terms of level  ... 
doi:10.1137/s0036144503428693 fatcat:7zmqj5eee5adxk56lbccrlyq3m

A high-performance parallel algorithm for nonnegative matrix factorization

Ramakrishnan Kannan, Grey Ballard, Haesun Park
2016 Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '16  
It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild  ...  Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets.  ...  We also thank NSF for the travel grant to present this work in the conference through the grant CCF-1552229.  ... 
doi:10.1145/2851141.2851152 dblp:conf/ppopp/KannanBP16 fatcat:udekzdd7ffgqhajv3apnbbxfmi
« Previous Showing results 1 — 15 out of 4,320 results