Solving Tall Dense Linear Programs in Nearly Linear Time
2021
Interestingly, we obtain this running

In this paper we provide an Õ(nd+d^3) time randomized algorithm for solving linear programs with d variables and n constraints with high probability. Interestingly, we obtain this running time without using fast matrix multiplication and consequently, barring a major advance in linear system solving, our running time is near optimal for solving dense linear programs.
###
Mapping Dense LU Factorization on Multicore Supercomputer Nodes

2012
2012 IEEE 26th International Parallel and Distributed Processing Symposium
Dense LU factorization is a prominent benchmark used to rank the performance of supercomputers.

in the row-major order does not encounter this problem, but consequently sacrifices node and network locality in the critical pivoting steps.

*in*the critical pivoting steps. ... Running

time on Jaguar, a resource of the National Center for Computational Sciences at Oak Ridge National Laboratory, was supported by DOE contract DE-AC05-00OR22725.

##
###
Tall and skinny QR factorizations in MapReduce architectures

2011
Proceedings of the second international workshop on MapReduce and its applications - MapReduce '11
We present an implementation of the

tall and skinny QR (TSQR) factorization in the Map-Reduce framework, and we provide computational results for nearly terabyte-sized datasets. These tasks run in just a few minutes under a variety of parameter choices.
###
Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels
2021
arXiv
We describe Kokkos Kernels, a library of kernels for sparse

linear algebra, dense linear algebra and graph kernels. As hardware architectures are evolving in the push towards exascale, developing Computational Science and Engineering (CSE) applications depend on performance portable approaches for sustainable software. However, no portable solution existed for sparse/dense linear algebra kernels before the Kokkos Kernels library was created as part of the Advanced Technology Development and Mitigation program.
###
Analysis of a Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

2017
SIAM Journal on Scientific Computing
We discuss an approach for

solving sparse or dense banded linear systems Ax = b on a graphics processing unit (GPU) card. In a comparison against Intel's MKL, SaP::GPU also fared well when used to solve dense banded systems that are close to being diagonally dominant. Note that K selection is relevant only in the context of solving sparse linear systems; for dense banded, K is a given.
###
Row Modifications of a Sparse Cholesky Factorization

2005
SIAM Journal on Matrix Analysis and Applications
Additional illustrations can be found

in [12]. A variety of techniques for modifying a dense Cholesky factorization are given in the classic reference [11]. We also determine how the solution of a linear system Lx = b changes after changing a row and column of C or after a rank-r change in C. where e i is the ith column of the identity matrix. In each iteration of the linear programming dual active set algorithm (LPDASA) (see [5, 13, 14, 15, 16, 17]), we solve a symmetric linear system of the form Cλ = f, C = A F A T F + σI, where σ > 0 is
###
Accelerating an Iterative Eigensolver for Nuclear Structure Configuration Interaction Calculations on GPUs using OpenACC
2021
arXiv
step and

dense linear algebra operations. (GPUs), we modified a previously developed hybrid MPI/OpenMP implementation of an eigensolver written in FORTRAN 90 by using an OpenACC directives based programming model.
###
Communication-Avoiding QR Decomposition for GPUs

2011
2011 IEEE International Parallel & Distributed Processing Symposium
As a result, we outperform CULA, a parallel

linear algebra library for GPUs, by up to 13x for tall-skinny matrices. We show that the reduction in memory traffic provided by CAQR allows us to outperform existing parallel GPU implementations of QR for a large class of tall-skinny matrices. The most common example is linear least squares, which is ubiquitous in nearly all branches of science and engineering and can be solved using QR.
###
Gravitational Instabilities in the Disks of Massive Protostars as an Explanation for Linear Distributions of Methanol Masers

2001
Astrophysical Journal
This is particularly true for methanol (CH 3 OH), for which

linear distributions of masers are found with disklike kinematics. instabilities leads to a complex of intersecting spiral shocks, clumps, and arclets within the disk and to significant time-dependent, nonaxisymmetric distortions of the disk surface. Note the tall ridges of material in the arms.
###
Large Scale Distributed Linear Algebra With Tensor Processing Units
2021
arXiv
Via curated algorithms emphasizing large, single-core matrix multiplications, other tasks

in dense linear algebra can similarly scale. can multiply two matrices with linear size N= 220= 1 048 576 in about 2 minutes. We consider the case of A given as a dense, full-rank matrix, in which case (7) is typically solved in O(N 3) operations via an initial LU decomposition.
###
Block Iterative Methods and Recycling for Improved Scalability of Linear Solvers

2016
SC16: International Conference for High Performance Computing, Networking, Storage and Analysis
This work has been supported

in part by ANR through project MEDIMAX, ANR-13-MONU-0012. The first author was partially funded by the French Association of Mechanics (AFM) for this work. In the original paper, each linear system is solved with a single right-hand side, i.e. p = 1. Non-variable linear systems For some time-dependent PDEs, it is necessary to solve sequences of linear systems where the operator is the same throughout the sequence, and only the right-hand sides are
###
Efficient Methods for Out-of-Core Sparse Cholesky Factorization

1999
SIAM Journal on Scientific Computing
We find that straightforward implementations of all of them suffer from excessive disk I/O for large problems that arise

in interiorpoint algorithms for linear programming. We find that straightforward implementations of all of them suffer from excessive disk I/O for large problems that arise in interior-point algorithms for linear programming. The effect is that extremely large sparse linear systems can be solved in reasonable time on very inexpensive systems.
###
Parallel distributed-memory simplex for large-scale stochastic LP problems

2013
Computational optimization and applications
We present a parallelization of the revised simplex method for large extensive forms of two-stage stochastic

linear programming (LP) problems. It is built on novel analysis of the linear algebra for dual block-angular LP problems when solved by using the revised simplex method and a novel parallel scheme for applying product-form updates. Total time in PRICE per iteration is given on the right. the path for efficiently solving stochastic programming problems in these two contexts.
###
Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software

2004
SIAM Review
In fact, the whole gamut of existing

dense

linear algebra factorization is beginning to be reexamined

in view of the recursive paradigm. Novel recursive blocked algorithms offer new ways to compute factorizations such as Cholesky and QR and to

solve matrix equations. Some of the main points are the following: • Recursion creates new algorithms for

linear algebra software. • Recursion can be used to express

dense

linear algebra algorithms entirely

in terms of level

##
###
A high-performance parallel algorithm for nonnegative matrix factorization

2016
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '16
It maintains the data and factor matrices

in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets.
