1,740 Hits in 5.2 sec

Linear algebra software for large-scale accelerated multicore computing

A. Abdelfattah, H. Anzt, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, A. YarKhan
2016 Acta Numerica  
Here we present the state-of-the-art design and implementation practices for the acceleration of the predominant linear algebra algorithms on large-scale accelerated multicore systems.  ...  of high interest for accelerated multicore systems.  ...  Here we present the state-of-the-art design and implementation practices for the acceleration of the predominant linear algebra algorithms on large-scale accelerated multicore systems.  ... 
doi:10.1017/s0962492916000015 fatcat:cwsstweghjaj7ff6fu62lmn6ce

Dense linear algebra solvers for multicore with GPU accelerators

Stanimire Tomov, Rajib Nath, Hatem Ltaief, Jack Dongarra
2010 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)  
We describe current efforts toward the development of these critical solvers in the area of dense linear algebra (DLA) for multicore with GPU accelerators.  ...  Solving dense linear systems of equations is a fundamental problem in scientific computing.  ...  ACKNOWLEDGMENT The authors would like to thank the National Science Foundation, Microsoft Research, and NVIDIA for supporting this research effort.  ... 
doi:10.1109/ipdpsw.2010.5470941 dblp:conf/ipps/TomovNLD10 fatcat:gidjy5qpwrd4jgrmmerja4qg2i

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators [chapter]

Hatem Ltaief, Stanimire Tomov, Rajib Nath, Peng Du, Jack Dongarra
2011 Lecture Notes in Computer Science  
We show an approach that is largely based on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing.  ...  We present a Cholesky factorization for multicore with GPU accelerators systems.  ...  We show an approach that is largely based on software infrastructures that have already been developed -namely, the Parallel Linear Algebra for Scalable Multicore Architectures (PLASMA) [6] and MAGMA  ... 
doi:10.1007/978-3-642-19328-6_11 fatcat:qd6j6a2hifdqzjdtyxpx7lz6wy

Flexible Linear Algebra Development and Scheduling with Cholesky Factorization

Azzam Haidar, Asim YarKhan, Chongxiao Cao, Piotr Luszczek, Stanimire Tomov, Jack Dongarra
2015 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems  
We demonstrate the effectiveness of our approach with performance results for dense linear algebra applications, specifically the Cholesky factorization.  ...  Modern high performance computing environments are composed of networks of compute nodes that often contain a variety of heterogeneous compute resources, such as multicore-CPUs, GPUs, and coprocessors.  ...  We believe that the compute nodes of large-scale machines will contain a mixed-core approach to hardware, combining multicores and GPUs or coprocessors, each of which appropriate for various work granularities  ... 
doi:10.1109/hpcc-css-icess.2015.285 dblp:conf/hpcc/HaidarYCLTD15 fatcat:x66ibo3q75gbbfgjf6k43yrynq

Toward a Multi-level Parallel Framework on GPU Cluster with PetSC-CUDA for PDE-based Optical Flow Computation

S. Cuomo, A. Galletti, G. Giunta, L. Marcellino
2015 Procedia Computer Science  
In this work we present a multi-level parallel framework for the Optical Flow computation on a GPUs cluster, equipped with a scientific computing middleware (the PetSc library).  ...  that is suitable for heterogeneous computing environments (multiprocessor, single GPU and cluster of GPUs).  ...  The basic idea of our parallel software for implementing Algorithm 1 is to follow a domain decomposition approach for the outer iteration task of the algorithm, while the linear algebra of the internal  ... 
doi:10.1016/j.procs.2015.05.220 fatcat:onyqy2t4c5bcvfdwn356g7vge4

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations [chapter]

Hartwig Anzt, Marc Baboulin, Jack Dongarra, Yvan Fournier, Frank Hulsemann, Amal Khabou, Yushan Wang
2017 Lecture Notes in Computer Science  
For sparse linear systems arising from finite volume discretization, we evaluate and optimize the performance of Conjugate Gradient (CG) routines designed for manycore accelerators and compare against  ...  This paper illustrates how GPU computing can be used to accelerate computational fluid dynamics (CFD) simulations.  ...  We are grateful to Karl Rupp (TU Wien) for his support in using the ViennaCL library. Bibliography  ... 
doi:10.1007/978-3-319-61982-8_5 fatcat:ssxa5w6oz5efrnpuf3n745os3y

A survey of power and energy efficient techniques for high performance numerical linear algebra operations

Li Tan, Shashank Kothapalli, Longxiang Chen, Omar Hussaini, Ryan Bissiri, Zizhong Chen
2014 Parallel Computing  
This paper surveys the research on saving power and energy for numerical linear algebra algorithms in high performance scientific computing on supercomputers around the world.  ...  We first stress the significance of numerical linear algebra algorithms in high performance scientific computing nowadays, followed by a background introduction on widely used numerical linear algebra  ...  For the purpose of scientific computing, ScaLAPACK [4] and DPLASMA [5] are two extensively used high performance and scalable numerical linear algebra software libraries for distributed-memory multicore  ... 
doi:10.1016/j.parco.2014.09.001 fatcat:twdkr2hrizebvglto6dwd7jqem

Introducing Scalable Quantum Approaches in Language Representation [chapter]

Peter Wittek, Sándor Darányi
2011 Lecture Notes in Computer Science  
High-performance computational resources and distributed systems are crucial for the success of real-world language technology applications.  ...  SQUALAR aims to match quantum algorithms with heterogeneous computing to develop new formalisms of information representation for natural language processing in quantum environments.  ...  This is actually a requirement for good performance: the software must use a large number of threads.  ... 
doi:10.1007/978-3-642-24971-6_2 fatcat:emliiuolnzdtpnhfflc7wsmkde

CPU and GPU Performance of Large Scale Numerical Simulations in Geophysics [chapter]

Ali Dorostkar, Dimitar Lukarski, Björn Lund, Maya Neytcheva, Yvan Notay, Peter Schmidt
2014 Lecture Notes in Computer Science  
These packages provide toolboxes with state-of-the-art implementations of iterative solution methods and preconditioners for multicore computer platforms and GPU.  ...  In this work we benchmark the performance of a preconditioned iterative method, used in large scale computer simulations of a geophysical application, namely, the elastic Glacial Isostatic Adjustment model  ...  Acknowledgments, This work has been supported by the Linnaeus center of excellence UPMARC, Uppsala Programming for Multicore Architectures Research Center.  ... 
doi:10.1007/978-3-319-14325-5_2 fatcat:gm6ukwbp6bbv5oedbjx7fv5eqe

Accelerating Relevance-Vector-Machine-Based Classification of Hyperspectral Image with Parallel Computing

Chao Dong, Lianfang Tian
2012 Mathematical Problems in Engineering  
The sparse property requires much less time in the prediction, making RVM potential in classifying the large-scale hyperspectral image.  ...  The parallel RVMs are implemented using the C language plus the parallel functions of the linear algebra packages and the message passing interface library.  ...  The parallel functions have been realized in many parallel linear algebra packages, such as Intel's Math Kernel Library MKL and Automatically Tuned Linear Algebra Software ATLAS .  ... 
doi:10.1155/2012/252979 fatcat:qfroy2gbqne4rblgwh7s3aprmq

Guest Editorial: Application Specific Processors and Architectures

Melissa C. Smith, Kubilay Atasu
2014 Journal of Signal Processing Systems  
The methods also support zero-overhead looping not only for the innermost loops but also for arbitrarily nested loops.  ...  The first paper in the compilers and operating systems category "Compact Code Generation for Tightly-Coupled Processor Arrays", by Boppu et al., presents methods for code compaction and generation for  ...  Using a highly efficient hybrid linear algebra/FFT core, the authors co-design the on-chip memory hierarchy, on-chip interconnect, and FFT algorithms for the multicore FFT processor.  ... 
doi:10.1007/s11265-014-0934-8 fatcat:eezcddmvffaybprjc45hms2a5e

Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems

2016 Supercomputing Frontiers and Innovations  
We present a review of the current best practices in parallel programming models for dense linear algebra (DLA) on heterogeneous architectures.  ...  that current applications need -in order to motivate work and future directions for the next generation of parallel programming models for high-performance linear algebra libraries on heterogeneous systems  ...  PLASMA and the task approach Parallel Linear Algebra Software for Multicore Architectures (PLASMA) [2] was developed to address the performance deficiency of the LAPACK library on multicore processors  ... 
doi:10.14529/jsfi150405 fatcat:avnmwu4dozdmjksknrlznhpv7u

High-performance dynamic quantum clustering on graphics processors

Peter Wittek
2013 Journal of Computational Physics  
In this paper, we develop an implementation on graphics hardware and investigate how this approach can accelerate the computations.  ...  We achieve a speedup of up to two magnitudes over a multicore CPU implementation, which proves that quantum-like methods and acceleration by graphics processing units have a great relevance to machine  ...  This makes DQC a good candidate for acceleration and large-scale deployment.  ... 
doi:10.1016/ fatcat:4lzs2shgwncofnlmrykqxkhnfi

Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments [chapter]

Azzam Haidar, Piotr Luszczek, Stanimire Tomov, Jack Dongarra
2015 Lecture Notes in Computer Science  
The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA).  ...  We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments.  ...  Our runtime model is build on the QUARK [10] superscalar execution environment that has been originally used with great success for linear algebra software on just multicore platforms [5] .  ... 
doi:10.1007/978-3-319-17353-5_3 fatcat:2k2x5hof5naednpkprtj72sf3q

A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations [chapter]

Azzam Haidar, Tingxing Tim Dong, Stanimire Tomov, Piotr Luszczek, Jack Dongarra
2015 Lecture Notes in Computer Science  
The hybrid CPU-GPU algorithms rely heavily on using the multicore CPU for specific part of the workload.  ...  Compared to a batched QR factorization featured in the CUBLAS library for GPUs, we achieved up to 5 speedup on the K GPU. ⁴ Historically, similar issues were associated with strong scaling [ ] and were  ...  The emergence of large-scale, heterogeneous systems with GPU accelerators and coprocessors has made the near total absence of linear algebra software for such small matrix operations especially noticeable  ... 
doi:10.1007/978-3-319-20119-1_3 fatcat:hgupli7firaqhebgiuojf35xwq
« Previous Showing results 1 — 15 out of 1,740 results