Filters








24 Hits in 3.9 sec

Alpaka -- An Abstraction Library for Parallel Kernel Acceleration

Erik Zenker, Benjamin Worpitz, Rene Widera, Axel Huebl, Guido Juckeland, Andreas Knupfer, Wolfgang E. Nagel, Michael Bussmann
2016 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
The Alpaka library defines and implements an abstract hierarchical redundant parallelism model.  ...  The Alpaka C++ template interface allows for straightforward extension of the library to support other accelerators and specialization of its internals for optimization.  ...  Listing 2 : 2 of an Alpaka kernel. The kernel needs to implement the operator() with prefix ALPAKA FN ACC, which takes at least the accelerator as parameter.  ... 
doi:10.1109/ipdpsw.2016.50 dblp:conf/ipps/ZenkerWWHJKNB16 fatcat:zgvpbvbeeneuznpica7l6h2owm

Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library [chapter]

Alexander Matthes, René Widera, Erik Zenker, Benjamin Worpitz, Axel Huebl, Michael Bussmann
2017 Lecture Notes in Computer Science  
We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library.  ...  In addition we analyze the optimization potential available with vendor-specific compilers when confronted with the heavily templated abstractions of Alpaka.  ...  Our open-source projects PIConGPU [3, 2] and HaseOnGPU [5] both use Alpaka for the kernel abstraction for various many-core hardware [27, 28] , but different libraries for the mentioned topics not  ... 
doi:10.1007/978-3-319-67630-2_36 fatcat:k76veiy34zdtzbbxpvx6kxw23q

Investigating Performance Portability Of A Highly Scalable Particle-In-Cell Simulation Code On Various Multi-Core Architectures

Benjamin Worpitz, Prof. Dr. Wolfgang E. Nagel, Dr. Michael Bussmann, Dr. Guido Juckeland, Dr. Andreas Knüpfer, Dr. Bernd Trenkler
2015 Zenodo  
The alpaka library defines and implements an abstract hierarchical redundant parallelism model.  ...  The C++ template interface provided allows for straightforward extension of the library to support other accelerators and specialization of its internals for optimization.  ...  This C++ interface library is called alpaka -Abstraction Library for Parallel Kernel Acceleration.  ... 
doi:10.5281/zenodo.49768 fatcat:gw53fnzwxfa53n2xqg6dpohqle

Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond [chapter]

Erik Zenker, René Widera, Axel Huebl, Guido Juckeland, Andreas Knüpfer, Wolfgang E. Nagel, Michael Bussmann
2016 Lecture Notes in Computer Science  
wraps the abstract parallel C++11 kernel acceleration library Alpaka.  ...  We demonstrate how PIConGPU can benefit from the tunable kernel execution strategies of the Alpaka library, achieving portability and performance with single-source kernels on conventional CPUs, Power8  ...  With Alpaka [20] , there exists an interface for parallel kernel acceleration which enables the programmer to compile single-source C++ kernels to various architectures, while providing all the requirements  ... 
doi:10.1007/978-3-319-46079-6_21 fatcat:z5v3vlmg4jcz7i2hniirbqg2l4

Portable Node-Level Parallelism for the PGAS Model

Pascal Jungblut, Karl Fürlinger
2021 International journal of parallel programming  
In this paper we present an approach to integrate node-level programming abstractions with the PGAS programming model.  ...  Even with an abstract and unifying virtual global address space it is, however, challenging to use the full potential of different systems.  ...  Alpaka The Abstraction Library for Parallel Kernel Acceleration (Alpaka) [20] is a C??14 header only meta-programming library for node-level parallelism. It supports several back ends like C??  ... 
doi:10.1007/s10766-021-00718-x fatcat:xfabzbhiajdopamolwi4rpth5u

Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading [article]

Jeffrey Kelling, Sergei Bastrakov, Alexander Debus, Thomas Kluge, Matt Leinhauser, Richard Pausch, Klaus Steiniger, Jan Stephan, René Widera, Jeff Young, Michael Bussmann, Sunita Chandrasekaran (+1 others)
2022 arXiv   pre-print
abstraction layer alpaka and avoiding other modifications to the application code.  ...  HPC systems employ a growing variety of compute accelerators with different architectures and from different vendors.  ...  We acknowledge the IT Center of RWTH Aachen for access to their infrastructure and Jonas Hahnfeld for support. This material is based upon work supported by the U.S.  ... 
arXiv:2110.08650v2 fatcat:65k2l6te6baldcjhsqjcvbgupq

GPU-Powered Particle-in-Cell Community Frameworks for Laser-Plasma Interaction

Axel Huebl, Et Al.
2020 Zenodo  
Both codes' software libraries and abstractions are build on top of a generalized, single-source programming model (Alpaka) or parallel-for/-reduce based kernels.  ...  A common, open data format for particle and mesh data (openPMD) avoids duplicating I/O efforts and allows to reuse scalable data workflows with common libraries.  ...  (+), additional parallel algorithms (-/0) Rely on a community library: e.g.  ... 
doi:10.5281/zenodo.3900296 fatcat:guq6x5kftrd35mpdcidayrtfda

Portability: A Necessary Approach for Future Scientific Software [article]

Meghna Bhattacharya, Paolo Calafiura, Taylor Childers, Mark Dewing, Zhihua Dong, Oliver Gutsche, Salman Habib, Xiangyang Ju, Michael Kirby, Kyle Knoepfel, Matti Kortelainen, Martin Kwok (+7 others)
2022 arXiv   pre-print
coding of an algorithm once, and the ability to execute it on a variety of hardware products from many vendors, especially including accelerators.  ...  The portable parallelization strategies (PPS) project of the High Energy Physics Center for Computational Excellence (HEP/CCE) is investigating solutions for portability techniques that will allow the  ...  The SYCL specification is designed to be a higher level abstraction above low-level native acceleration APIs with interoperability between existing libraries and other parallel programming models and can  ... 
arXiv:2203.09945v1 fatcat:vfkoylh32nfg3mqx4lzl3ngeiy

Metrics and Design of an Instruction Roofline Model for AMD GPUs [article]

Matthew Leinhauser, René Widera, Sergei Bastrakov, Alexander Debus, Michael Bussmann, Sunita Chandrasekaran
2021 arXiv   pre-print
In this paper, we design an instruction roofline model for AMD GPUs using AMD's ROCProfiler and a benchmarking tool, BabelStream (the HIP implementation), as a way to measure an application's performance  ...  Specifically, we create instruction roofline models for a case study scientific application, PIConGPU, an open source particle-in-cell (PIC) simulations application used for plasma and laser-plasma physics  ...  CONCLUSIONS AND FUTURE WORK In this paper, we showed how to construct IRMs for AMD GPUs using metrics from rocProf and micro-kernel benchmarking suites.  ... 
arXiv:2110.08221v2 fatcat:uusc263lufd7fiu6dpm6qupi5e

Porting CMS Heterogeneous Pixel Reconstruction to Kokkos

Matti J. Kortelainen, Martin Kwok, Taylor Childers, Alexei Strelchenko, Yunsong Wang, (on behalf of the CMS Collaboration), C. Biscarat, S. Campana, B. Hegner, S. Roiser, C.I. Rovelli, G.A. Stewart
2021 EPJ Web of Conferences  
Programming for a diverse set of compute accelerators in addition to the CPU is a challenge.  ...  Fortunately there are several portability technologies on the market such as Alpaka, Kokkos, and SYCL.  ...  An algorithm can also be offloaded to compute accelerators with device parallel execution spaces.  ... 
doi:10.1051/epjconf/202125103034 fatcat:wspdrmb5qzgvplcssj75xtvwrq

DASH: Distributed Data Structures and Parallel Algorithms in a Global Address Space [chapter]

Karl Fürlinger, José Gracia, Andreas Knüpfer, Tobias Fuchs, Denis Hünich, Pascal Jungblut, Roger Kowalewski, Joseph Schuchart
2020 Lecture Notes in Computational Science and Engineering  
DASH is a new programming approach offering distributed data structures and parallel algorithms in the form of a C++ template library.  ...  We also present a performance and productivity study where we compare DASH with a set of established parallel programming models.  ...  We would also like to thank the German research foundation (DFG) for the funding received through the SPPEXA priority programme and initiators and managers of SPPEXA for their foresight and level-headed  ... 
doi:10.1007/978-3-030-47956-5_6 fatcat:44avzbgnkvh73iriqceboti4wu

Parallel Programming Models for Heterogeneous Many-Cores : A Survey [article]

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 arXiv   pre-print
In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability  ...  While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to  ...  IBM developed IBM SDK for Multicore Acceleration with a suite of software tools and libraries [50] .  ... 
arXiv:2005.04094v1 fatcat:e2psrdnyajh3hih3znnjjbezae

LLAMA: The Low-Level Abstraction For Memory Access [article]

Bernhard Manfred Gruber, Guilherme Amadio, Jakob Blomer, Alexander Matthes, René Widera, Michael Bussmann
2021 arXiv   pre-print
We present the Low-Level Abstraction of Memory Access (LLAMA), a C++ library that provides such a data structure abstraction layer with example implementations for multidimensional arrays of nested, structured  ...  LLAMA provides fully C++ compliant methods for defining and switching custom memory layouts for user-defined data types. The library is extensible with third-party allocators.  ...  , Verena Gruber for proof-reading and consulting on color, layout and design of the figures Jiří Vyskočil for discussing and improving the nbody benchmark in its early state, and Simeon Ehrig for proof-reading  ... 
arXiv:2106.04284v2 fatcat:pg2m3hbcrfhc5ip5g6glgnlf4i

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 CCF Transactions on High Performance Computing  
In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability  ...  While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to  ...  An OpenCL application is composed of two parts: one or more kernels and an OpenCL host program. The kernel specifies functions to be executed in a parallel fashion on the processing cores.  ... 
doi:10.1007/s42514-020-00039-4 fatcat:nn56xhjm6rcu7kya6gfnyjg66q

CLUE: A Fast Parallel Clustering Algorithm for High Granularity Calorimeters in High-Energy Physics

Marco Rovere, Ziheng Chen, Antonio Di Pilato, Felice Pantaleo, Chris Seez
2020 Frontiers in Big Data  
We also show a comparison of the performance on CPU and GPU implementations, demonstrating the power of algorithmic parallelization in the coming era of heterogeneous computing in high-energy physics.  ...  The algorithm uses a grid spatial index for fast querying of neighbors and its timing scales linearly with the number of hits within the range considered.  ...  The authors would like to thank Vincenzo Innocente for the suggestions and guidance while developing the clustering algorithm.  ... 
doi:10.3389/fdata.2020.591315 pmid:33937749 pmcid:PMC8080903 fatcat:vwxfimb6snbfvlsm4zslvit7na
« Previous Showing results 1 — 15 out of 24 results