Filters








959 Hits in 3.6 sec

Achieving Performance Portability for a Heat Conduction Solver Mini-Application on Modern Multi-core Systems

Richard O. Kirk, Gihan R. Mudalige, Istvan Z. Reguly, Steven A. Wright, Matt J. Martineau, Stephen A. Jarvis
2017 2017 IEEE International Conference on Cluster Computing (CLUSTER)  
C. Kokkos and RAJA Kokkos and RAJA are both C++ template libraries, designed with a similar goal to OPS. Through template metaprogramming, they aim to add portability to applications.  ...  Both OPS and RAJA achieved very similar performance portability scores across both CPU architectures, with only a 2.19% difference.  ... 
doi:10.1109/cluster.2017.122 dblp:conf/cluster/KirkMRWMJ17 fatcat:dgmpx6st4be3jfbb5mpb7k6vle

An Evaluation of Emerging Many-Core Parallel Programming Models

Matt Martineau, Simon McIntosh-Smith, Mike Boulton, Wayne Gaudin
2016 Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM'16  
We find that the best performance is achieved with device-tuned implementations but that, in many cases, the performance portable models are able to solve the same problems to within a 5-20% performance  ...  In this work we directly evaluate several emerging parallel programming models: Kokkos, RAJA, OpenACC, and OpenMP 4.0, against the mature CUDA and OpenCL APIs.  ...  The authors would like to extend our gratitude to David Beckingsale at Lawrence Livermore National Laboratory for his support with the RAJA port, and Christian Trott from Sandia National Laboratory for  ... 
doi:10.1145/2883404.2883420 dblp:conf/ppopp/MartineauMBG16 fatcat:6p2cbco3vbgx5bgxbuoywx4tse

Programming Models for Performance Portability (SIAM CSE'21) [article]

Rajeev Thakur
2021 figshare.com  
The Exascale Computing Project (ECP) supports the development of several programming models and runtime libraries that applications can use to achieve performance portability across diverse exascale platforms  ...  via C++ abstractions -Support complex node architectures with multiple types of execution resources and multilevel memories -Many ECP applications use Kokkos and RAJA to write portable code for a variety  ...  Argo Beckman, ANL Low-level resource management for the operating system and runtime *OpenMP (Kokkos and RAJAC++ performance portability abstractions developed at Sandia and Livermore labs  ... 
doi:10.6084/m9.figshare.14132336.v1 fatcat:cmquasc75baplnr5xlydomoneq

The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer [chapter]

William Killian, Tom Scogland, Adam Kunen, John Cavazos
2018 Lecture Notes in Computer Science  
Portability abstraction layers such as RAJA enable users to quickly change how a loop nest is executed with minimal modifications to high-level source code.  ...  In this work, we introduce an updated implementation of two directive-based backends which helps mitigate the aforementioned version explosion problem by leveraging the C++ type system and template meta-programming  ...  Performance Portability Layers make a limited set of assumptions about programs and allow a user to represent a program as an embedded domain specific language.  ... 
doi:10.1007/978-3-319-74896-2_4 fatcat:ucwjqprudjh4jfbwm7el7wp6hi

Evaluating attainable memory bandwidth of parallel programming models via BabelStream

Matt Martineau, Simon McIntosh Smith, James Price, Tom Deakin
2017 International Journal of Computational Science and Engineering (IJCSE)  
The choice of one programming model over another should ideally not limit the performance that can be achieved on a device.  ...  We augment the standard set of STREAM kernels with a dot product kernel to investigate the performance of simple reduction operations on large arrays.  ...  The authors extend their thanks to Si Hammond and Sandia National Labs; the results used are run on the Sandia ASC Architecture Test Beds program.  ... 
doi:10.1504/ijcse.2017.10011352 fatcat:rn76twry4fd7jcfwlqlixv7ulu

Martin-CSE21.pdf [article]

Daniel Martin
2021 figshare.com  
In this talk, an overview of the approaches to performance portability taken by these projects will be presented.  ...  portability needs.  ...  C++ • Legacy components in Fortran • YAKL ("Yet Another Kernel Launcher") -"Fortran-friendly" C++ performance portability layer -Compatible with KOKKOS Illustration of the "superparameterization" cloud-resolving  ... 
doi:10.6084/m9.figshare.14153717.v1 fatcat:kioz7cyzmrdntbriwp7akp6une

Evaluation of performance portability frameworks for the implementation of a particle-in-cell code [article]

Victor Artigues, Katharina Kormann, Markus Rampp, Klaus Reuter
2019 arXiv   pre-print
This paper reports on an in-depth evaluation of the performance portability frameworks Kokkos and RAJA with respect to their suitability for the implementation of complex particle-in-cell (PIC) simulation  ...  Both, Kokkos and RAJA appear mature, are usable for complex codes, and keep their promise to provide performance portability across different architectures.  ...  With OpenMP [4] , OpenACC [5] , and OpenCL [6] , to name the most relevant and widespread ones, there is a set of language extensions to C and Fortran available that-at least partly-offer portable programming  ... 
arXiv:1911.08394v1 fatcat:ae5m6w4thfb3xpx5seh2m7yjmi

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models [chapter]

Tom Deakin, James Price, Matt Martineau, Simon McIntosh-Smith
2016 Lecture Notes in Computer Science  
performance portability.  ...  Whatever definition of 'performance portability' one might wish, a performance portable code must also at least be functionally portable across different devices.  ... 
doi:10.1007/978-3-319-46079-6_34 fatcat:6l3mt6xn65d67fwmemxiaqf5um

Pointers Inside Lambda Closure Objects in OpenMP Target Offload Regions

David Truby, Carlo Bertolli, Steven A. Wright, Gheorghe-Teodor Bercea, Kevin O'Brien, Stephen A. Jarvis
2018 2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)  
However, lower level programming APIs that these implementations must use are often designed with C in mind and do not specify how they interact with C++ features such as lambda expressions.  ...  Abstract-With the diversification of HPC architectures beyond traditional CPU-based clusters, a number of new frameworks for performance portability across architectures have arisen.  ...  ACKNOWLEDGEMENTS This work was completed as part of IBM's 2017 Summer Research Internship Program.  ... 
doi:10.1109/llvm-hpc.2018.8639410 fatcat:vd5mpxhicjfefmt6hvf3neavwm

FLASH 1.0: A Software Framework for Rapid Parallel Deployment and Enhancing Host Code Portability in Heterogeneous Computing [article]

Michael Riera, Masudul Hassan Quraishi, Erfan Bank Tavakoli, Fengbo Ren
2021 arXiv   pre-print
portability with a normalized framework overhead between 1% - 13% of the total kernel runtime.  ...  In this paper, we present FLASH 1.0, a C++-based software framework for rapid parallel deployment and enhancing host code portability in heterogeneous computing.  ...  With partial reflection, virtual-dispatching and meta-programming, FLASH provides a 100% framework portability across all categories.  ... 
arXiv:2106.13645v1 fatcat:fuu2rc6abffhnibbv7wuc5hn4q

Views on Software Sustainability from a Computing Facility Perspective [article]

Judith Hill
2020 Figshare  
-Recommended strategies for portability include: • Avoid proprietary programming models when possible • Directives-based or abstraction approaches may offer portability with little-to-no performance loss  ...  Raja Allows incremental enhancements to codes. Many back-ends. Only supports C++. Lacks data "views" for more advanced portability requirements.  ...  This work was performed under the auspices of the U.S. DOE by the Oak Ridge Leadership Computing Facility at ORNL under contracts DEAC05-00OR22725  ... 
doi:10.6084/m9.figshare.11871969 fatcat:3dw7kb4blfbnbk3xyflf34zd54

Matrix-free approaches for GPU acceleration of a high-order finite element hydrodynamics application using MFEM, Umpire, and RAJA [article]

Arturo Vargas, Thomas M. Stitt, Kenneth Weiss, Vladimir Z. Tomov, Jean-Sylvain Camier, Tzanio Kolev, Robert N. Rieben
2021 arXiv   pre-print
In this work we discuss our co-design strategy to address these challenges and achieve performance and portability with MARBL, a next-generation multi-physics code in development at Lawrence Livermore  ...  With the introduction of advanced heterogeneous computing architectures based on GPU accelerators, large-scale production codes have had to rethink their numerical algorithms and incorporate new programming  ...  Acknowledgments This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-JRNL-829593.  ... 
arXiv:2112.07075v1 fatcat:itdrddooobfqjnuw23vdftpwz4

How ECP Software Technologies and Math Libraries are Working Toward Performance Portability at Exascale [article]

Lois Curfman McInnes
2021 figshare.com  
This presentation will provide an overview of approaches being used by ECP Software Technology teams, with emphasis on ECP mathematical libraries, to address performance portability while working toward  ...  algorithms and data structures for efficient and scalable performance.  ...  o Use of portable programming models that provide abstractions: • Kokkos (Trilinos, …) • RAJA (SUNDIALS, …) • OpenMP 5x, OpenCL, … Portability strategies of ECP math libraries xSDK: Primary delivery  ... 
doi:10.6084/m9.figshare.14156903.v1 fatcat:t34u2qaltng65dr6eb5trdbavm

Enabling GPU Accelerated Computing in the SUNDIALS Time Integration Library [article]

Cody J. Balos and David J. Gardner and Carol S. Woodward and Daniel R. Reynolds
2020 arXiv   pre-print
We also present performance results for several of the features on the Summit supercomputer and early access hardware for the Frontier supercomputer, which demonstrate negligible performance overhead resulting  ...  This effort has resulted in several new GPU-enabled implementations of core SUNDIALS data structures, support for programming paradigms which are aware of the heterogeneous architectures, and the introduction  ...  rent performance portable GPU programming models (RAJA To facilitate these use cases, SUNDIALS needed to be equipped and OpenMP offloading) can still provide significant speedup with vector data structures  ... 
arXiv:2011.12984v1 fatcat:jdabv4thkrb53iasg3k7tmnq7q

How ECP Software Technologies and Math Libraries are Working Toward Performance Portability at Exascale [article]

Lois Curfman McInnes
2021 figshare.com  
This presentation will provide an overview of approaches being used by ECP Software Technology teams, with emphasis on ECP mathematical libraries, to address performance portability while working toward  ...  algorithms and data structures for efficient and scalable performance.  ...  o Use of portable programming models that provide abstractions: • Kokkos (Trilinos, …) • RAJA (SUNDIALS, …) • OpenMP 5x, OpenCL, … Portability strategies of ECP math libraries xSDK: Primary delivery  ... 
doi:10.6084/m9.figshare.14156903.v2 fatcat:6l4gtrlvnjdoxesj5wedcephlu
« Previous Showing results 1 — 15 out of 959 results