Filters








96 Hits in 7.1 sec

Persistent Kernels for Iterative Memory-bound GPU Applications [article]

Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Satoshi Matsuoka
2022 arXiv   pre-print
large domains), and a Krylov subspace solver (geometric mean speedup of 4.67x in smaller SpMV datasets from SuiteSparse and 1.39x in larger SpMV datasets, for conjugate gradient).  ...  Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are.  ...  We show notable performance improvement for iterative 2D/3D stencils and a conjugate gradient solver for both V100 and A100 over highly optimized baselines.  ... 
arXiv:2204.02064v2 fatcat:campsz22iff5jfdmo7nrth7xje

XAMG: A library for solving linear systems with multiple right-hand side vectors

Boris Krasnopolsky, Alexey Medvedev
2021 SoftwareX  
Systems of linear algebraic equations Krylov subspace iterative methods Algebraic multigrid method Multiple right-hand sides Hybrid programming model MPI+POSIX shared memory a b s t r a c t This paper  ...  A corresponding set of numerical methods includes Krylov subspace, algebraic multigrid, Jacobi, Gauss-Seidel, and Chebyshev iterative methods.  ...  The main advantage of using iterative methods with performing simultaneous independent solutions for multiple RHSs is an increasing arithmetic intensity of calculations.  ... 
doi:10.1016/j.softx.2021.100695 fatcat:fmfkywojnbdtrltkcmct3f4s3q

Iterative Krylov solution methods for geophysical electromagnetic simulations on throughput-oriented processing units

Michael Commer, Filipe RNC Maia, Gregory A Newman
2011 The international journal of high performance computing applications  
For such systems, we have implemented three common iterative Krylov solution methods on graphics processing units and compare their performance with parallel host-based versions.  ...  We also thank the NERSC staff for support and computing time on a GPU cluster. This work was also supported by the Director,  ...  We greatly acknowledge the Chevron Energy Technology Corporation and the Petascale Initiative in Computational Science at the National Energy Research Scientific Computing Center (NERSC) for providing base  ... 
doi:10.1177/1094342011428145 fatcat:hug7oehhzvfk7e4g4jfa3lsjie

XAMG: A library for solving linear systems with multiple right-hand side vectors [article]

Boris Krasnopolsky, Alexey Medvedev
2021 arXiv   pre-print
A corresponding set of numerical methods includes Krylov subspace, algebraic multigrid, Jacobi, Gauss-Seidel, and Chebyshev iterative methods.  ...  The XAMG's own implementation for the solve phase of the iterative methods provides up to a twofold speedup compared to hypre for the tests performed.  ...  Among the only few exceptions is the Trilinos library [8] containing the implementation of several Krylov subspace and aggregation-based algebraic multigrid methods.  ... 
arXiv:2103.07329v1 fatcat:wzcogxmg7vdcbhmbknueee2t5u

High-performance parallel implicit CFD

William D Gropp, Dinesh K Kaushik, David E Keyes, Barry F Smith
2001 Parallel Computing  
Fluid dynamical simulations based on nite discretizations on quasi-static grids scale well in parallel, but execute at a disappointing percentage of per-processor peak oating point operation rates without  ...  This snapshot of ongoing work updates our 1999 Bell Prize-winning simulation on ASCI computers.  ...  Computer time was supplied by Argonne National Laboratory, L a wrence Livermore National Laboratory, NERSC, Sandia National Laboratories, and SGI-Cray.  ... 
doi:10.1016/s0167-8191(00)00075-2 fatcat:boiejxon5vhe5mfh2bumonwhti

Block Iterative Methods and Recycling for Improved Scalability of Linear Solvers

Pierre Jolivet, Pierre-Henri Tournier
2016 SC16: International Conference for High Performance Computing, Networking, Storage and Analysis  
Nataf for the discussions about domain decomposition methods for Maxwell's equation and X. Vasseur for the discussions about recycling strategies and block methods.  ...  The contribution of this paper is threefold, we present: • a uniform implementation of a pseudo-block 1 and block Krylov solver based on an existing theoretical work [22] , 1 method were operations  ...  of recycled Krylov subspaces.  ... 
doi:10.1109/sc.2016.16 dblp:conf/sc/JolivetT16 fatcat:dhydaneaarcyfgjhvjwcmdofzu

Auto-tuning the Matrix Powers Kernel with SEJITS [chapter]

Jeffrey Morlan, Shoaib Kamil, Armando Fox
2013 Lecture Notes in Computer Science  
Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently  ...  called a Krylov subspace.  ...  Introduction Krylov subspace methods (KSMs) are iterative algorithms in linear algebra used to solve linear systems (given matrix A and vector b, solve Ax = b for x) or to find eigenvalues and eigenvectors  ... 
doi:10.1007/978-3-642-38718-0_36 fatcat:dm5445ny35gy3ctnmcux43jxxi

Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs [article]

Eike Mueller, Xu Guo, Robert Scheichl, Sinan Shi
2013 arXiv   pre-print
We achieve this by using a matrix-free implementation which does not require explicit storage of the matrix and instead recalculates the local stencil.  ...  Graphics Processing Units have been shown to be highly efficient for a wide range of applications in scientific computing, and recently iterative solvers have been parallelised on these architectures.  ...  The numerical experiments in this work were carried out on a node of the aquila supercomputer at the University of Bath and we are grateful to Steven Chapman for his continuous and tireless technical support  ... 
arXiv:1302.7193v1 fatcat:migpyc2kbrdy7bdfx7tebfhdpe

PROSPECTS FOR CFD ON PETAFLOPS SYSTEMS [chapter]

David E. KEYES, Dinesh K. KAUSHIK, Barry F. SMITH
1998 Computational Fluid Dynamics Review 1998  
Fortunately, data use in most real programs has sufficient temporal and spatial locality to allow a distributed and hierarchical memory system, and this locality must be exploited at some level (by a combination  ...  A back-of-the-envelope parallel complexity analysis focuses on the latency of global synchronization steps in the implicit algorithm.  ...  Acknowledgements The authors owe a large debt of gratitude to W. Kyle Anderson  ... 
doi:10.1142/9789812812957_0060 fatcat:qwy6thxuzvgrto4jge2zsht66u

Prospects for CFD on Petaflops Systems [chapter]

David E. Keyes, Dinesh K. Kaushik, Barry F. Smith
2000 IMA Volumes in Mathematics and its Applications  
Fortunately, data use in most real programs has sufficient temporal and spatial locality to allow a distributed and hierarchical memory system, and this locality must be exploited at some level (by a combination  ...  A back-of-the-envelope parallel complexity analysis focuses on the latency of global synchronization steps in the implicit algorithm.  ...  Acknowledgements The authors owe a large debt of gratitude to W. Kyle Anderson  ... 
doi:10.1007/978-1-4612-1176-1_11 fatcat:ibsclde5x5g6pefdmkoumgr7ou

GPU implementation of a Helmholtz Krylov solver preconditioned by a shifted Laplace multigrid method

H. Knibbe, C.W. Oosterlee, C. Vuik
2011 Journal of Computational and Applied Mathematics  
A Helmholtz equation in two dimensions discretized by a second order finite difference scheme is considered. Krylov methods such as Bi-CGSTAB and IDR(s) have been chosen as solvers.  ...  Since the convergence of the Krylov solvers deteriorates with increasing wave number, a shifted Laplace multigrid preconditioner is used to improve the convergence.  ...  After discretization of Eq. (1) on Ω h using central finite differences we get the following linear system of equations: Aφ = g, A ∈ C N×N , φ, g ∈ C N . (5) The matrix A is based on the following stencil  ... 
doi:10.1016/j.cam.2011.07.021 fatcat:5ewml7w6xzdi7g57zkircv5j3i

Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs

Eike Müller, Xu Guo, Robert Scheichl, Sinan Shi
2013 Computing and Visualization in Science  
iterative solvers have been parallelised on these architectures.  ...  The elliptic solve is often the bottleneck of the forecast, and to meet operational requirements an algorithmically optimal method has to be used and implemented efficiently.  ...  The numerical experiments in this work were carried out on a node of the aquila supercomputer at the University of Bath and we are grateful to Steven Chapman for his continuous and tireless technical support  ... 
doi:10.1007/s00791-014-0223-x fatcat:66fe4fa4kfbezo462oe5yen3rm

Achieving high sustained performance in an unstructured mesh CFD application

W. K. Anderson, W. D. Gropp, D. K. Kaushik, D. E. Keyes, B. F. Smith
1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99  
This paper highlights a three-year project by an interdisciplinary team on a legacy F77 computational fluid dynamics code, with the aim of demonstrating that implicit unstructured grid simulations can  ...  This snapshot of ongoing work shows a performance of 15 microseconds per degree of freedom to steady-state convergence of Euler flow on a mesh with 2.8 million vertices using 3072 dualprocessor nodes of  ...  A good preconditioner saves time and space by permitting fewer iterations in the Krylov loop and smaller storage for the Krylov subspace.  ... 
doi:10.1145/331532.331600 dblp:conf/sc/AndersonGKKS99 fatcat:qfmfuhhemjb3nd3ybulnj6nkyu

Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor [chapter]

Douglas Doerfler, Jack Deslippe, Samuel Williams, Leonid Oliker, Brandon Cook, Thorsten Kurth, Mathieu Lobet, Tareq Malas, Jean-Luc Vay, Henri Vincenti
2016 Lecture Notes in Computer Science  
This library contains a Fortran kernel based on WARP with optimized subroutines. These high-performance subroutines are interfaced with a python class that can be imported and used in WARP scripts.  ...  For this purpose, EMGeo uses an Induced Dimensional Reduction (IDR) Krylov subspace solver. The Sparse Matrix Vector (SpMV) product is responsible for two thirds of the total runtime.  ... 
doi:10.1007/978-3-319-46079-6_24 fatcat:4lfmybdu5bdlfotf7ej3n56iq4

Trends in Algorithms for Nonuniform Applications on Hierarchical Distributed Architectures [chapter]

David E. Keyes
2000 Computational Aerosciences in the 21st Century  
For this purpose, pseudo-transient Newton-Krylov-Schwarz methods are briefly introduced and their parallel scalability in bulk synchronous SPMD applications is explored. We also indicate some funda-  ...  program execution.  ...  ), Argonne National Laboratory () Lawrence Livermore National Laboratory (B341996), and the Engineering and Physical Sciences Research Council (EPSRC) of the U.K. on a travel grant through the University  ... 
doi:10.1007/978-94-010-0948-5_6 fatcat:7y2rsfl5frdwhemru5agrzj4ge
« Previous Showing results 1 — 15 out of 96 results