Filters








50 Hits in 9.7 sec

Integration of CUDA Processing within the C++ Library for Parallelism and Concurrency (HPX)

Patrick Diehl, Madhavan Seshadri, Thomas Heller, Hartmut Kaiser
2018 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)  
For the integration of CUDA code we extended HPX, a general purpose C++ run time system for parallel and distributed applications of any scale, and enabled asynchronous data transfers from and to the GPU  ...  We present asynchronous implementations for the data transfers and kernel launches for CUDA code as part of a HPX asynchronous execution graph.  ...  ACKNOWLEDGEMENTS This material is based upon work supported by the NSF Award 1737785 and a Google Summer of Code stipend.  ... 
doi:10.1109/espm2.2018.00006 dblp:conf/sc/DiehlSHK18 fatcat:of2nfsuvpze2rp6qa3xzp4vixa

HPX

Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, Dietmar Fey
2014 Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models - PGAS '14  
We present HPX -a parallel runtime system which extends the C++11/14 standard to facilitate distributed operations, enable fine-grained constraint based parallelism, and support runtime adaptive resource  ...  The significant increase in complexity of Exascale platforms due to energy-constrained, billion-way parallelism, with major changes to processor and memory architecture, requires new energy-efficient and  ...  We would like to acknowledge the NSF, DoE, the Center for Computation and Technology at Louisiana State University, and the Department of Computer Science 3 at the University of Erlangen Nuremberg who  ... 
doi:10.1145/2676870.2676883 dblp:conf/pgas/KaiserHASF14 fatcat:qnv2cktjufdzvo2jlsmdpw77he

Performance Analysis of a Quantum Monte Carlo Application on Multiple Hardware Architectures Using the HPX Runtime [article]

Weile Wei, Arghya Chatterjee, Kevin Huck, Oscar Hernandez, Hartmut Kaiser
2020 arXiv   pre-print
We also describe how we used HPX-APEX to raise the level of abstraction to understand performance issues and to identify tasking optimization opportunities in the code, and how these relate to CPU/GPU  ...  We describe the lessons we can learn from this experience as well as the benefits of enabling the HPX in the application to improve the CPU threading part of the code, which led to an overall 21% improvement  ...  The authors thank John Biddiscombe (ETHZ / CSCS) for initiating the port of DCA++ to HPX, for providing the initial implementation, and insightful discussions.  ... 
arXiv:2010.07098v3 fatcat:cola27ah4ndudfzgv734y7euee

Octo-Tiger's New Hydro Module and Performance Using HPX+CUDA on ORNL's Summit [article]

Patrick Diehl and Gregor Daiß and Dominic Marcello and Kevin Huck and Sagiv Shiber and Hartmut Kaiser and Juhan Frank and Dirk Pflüger
2021 arXiv   pre-print
Octo-Tiger is parallelized for distributed systems using the asynchronous many-task runtime system, the C++ standard library for parallelism and concurrency (HPX) and utilizes CUDA for its gravity solver  ...  Octo-Tiger is a code for modeling three-dimensional self-gravitating astrophysical fluids. It was particularly designed for the study of dynamical mass transfer between interacting binary stars.  ...  Acknowledgment This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facil-  ... 
arXiv:2107.10987v2 fatcat:cjy6rsg645hsncykikkbxqhun4

Octo-Tiger: A New, 3D Hydrodynamic Code for Stellar Mergers that uses HPX Parallelisation [article]

Dominic C. Marcello, Sagiv Shiber, Orsola De Marco, Juhan Frank, Geoffrey C. Clayton, Patrick M. Motl, Patrick Diehl, Hartmut Kaiser
2021 arXiv   pre-print
This code uses HPX parallelization, allowing the overlap of work and communication and leading to excellent scaling properties, allowing for the computation of large problems in reasonable wall-clock times  ...  OCTO-TIGER is an astrophysics code to simulate the evolution of self-gravitating and rotat-ing systems of arbitrary geometry based on the fast multipole method, using adaptive mesh refinement.  ...  The C++ Standard Library for Concurrency and Parallelism (HPX) OCTO-TIGER is parallelized for distributed systems using the C++ Standard Library for Concurrency and Parallelism (HPX, Kaiser et al. 2020  ... 
arXiv:2101.08226v2 fatcat:cncng2eoardl3o3cbhbtotldde

An Introduction to hpxMP -- A Modern OpenMP Implementation Leveraging Asynchronous Many-Tasking System [article]

Tianyi Zhang, Shahrzad Shirzad, Patrick Diehl, R. Tohid, Weile Wei, Hartmut Kaiser
2019 arXiv   pre-print
In this work, we compare hpxMP with Clang's OpenMP library with four linear algebra benchmarks of the Blaze C++ library.  ...  This approach leverages the C++ interfaces exposed by HPX and allows users to execute their applications on an AMT system without changing their code.  ...  Acknowledgment We thank Jeremy Kemp for providing the initial implementation of hpxMP 13 which was extended by the authors. 13 https://github.com/kempj/hpxMP The work on hpxMP is funded by the National  ... 
arXiv:1903.03023v2 fatcat:ww26clnqhfdxje2genxe6qm4vq

Closing the Performance Gap with Modern C++ [chapter]

Thomas Heller, Hartmut Kaiser, Patrick Diehl, Dietmar Fey, Marc Alexander Schweitzer
2016 Lecture Notes in Computer Science  
The recent revival of interest in the industry and the wider community for the C++ language has spurred a remarkable amount of standardization proposals and technical specifications in the arena of concurrency  ...  and for various types of parallelism.  ...  Generic implementation of the STREAM benchmark using HPX and C++ standards conforming parallel algorithms.  ... 
doi:10.1007/978-3-319-46079-6_2 fatcat:qmzryv6j4fhlpmiqddjbxh776q

From piz daint to the stars

Gregor Daiß, Dirk Pfüger, Parsa Amini, John Biddiscombe, Patrick Diehl, Juhan Frank, Kevin Huck, Hartmut Kaiser, Dominic Marcello, David Pfander
2019 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '19  
For the scenario's maximum resolution, the compute-critical parts (hydrodynamics and gravity) achieve 68.1% parallel efficiency at 2048 nodes.  ...  We use HPX with its futurization capabilities to ensure scalability both between nodes and within, and present first results replacing MPI with libfabric achieving up to a 2.8x speedup.  ...  ACKNOWLEDGMENTS We thank the Swiss National Supercomputing Centre and the National Energy Research Scientific Computing Center for providing is with the node hours to run the simulations and the Center  ... 
doi:10.1145/3295500.3356221 dblp:conf/sc/DaissABDFHKMPP19 fatcat:wk2folknurevnf7vklqkg4mjyy

Porting CMS Heterogeneous Pixel Reconstruction to Kokkos

Matti J. Kortelainen, Martin Kwok, Taylor Childers, Alexei Strelchenko, Yunsong Wang, (on behalf of the CMS Collaboration), C. Biscarat, S. Campana, B. Hegner, S. Roiser, C.I. Rovelli, G.A. Stewart
2021 EPJ Web of Conferences  
We also compare the achieved event processing throughput to the original CUDA code and a CPU version of it.  ...  The development was done in a standalone program that attempts to model many of the complexities of a HEP data processing framework such as CMSSW.  ...  is only one event in flight, and all parallelization is within the data of that event.  ... 
doi:10.1051/epjconf/202125103034 fatcat:wspdrmb5qzgvplcssj75xtvwrq

D7.7: Hardware developments IV

Alan Ó Cais, Jony Castagna, Godehard Sutmann
2019 Zenodo  
and detailed feedback to the project software developers; - discussion of project software needs with hardware and software vendors, completion of survey of what is already available for particular hardware  ...  platforms; and, - detailed output from direct face-to-face session between the project endusers, developers and hardware vendors.  ...  Leveraging HPX within E-CAM HPX is a C++ runtime system for parallelism and concurrency.  ... 
doi:10.5281/zenodo.3256136 fatcat:hfpwvelb3zdxlk6fmkgddqgqoq

Parallel Programming Models for Heterogeneous Many-Cores : A Survey [article]

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 arXiv   pre-print
In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability  ...  While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to  ...  The backend of the computing platform includes CUDA, HCC and SYCL [73, 116] . Thrust is a C++ standard parallel template library for NVIDIA GPUs.  ... 
arXiv:2005.04094v1 fatcat:e2psrdnyajh3hih3znnjjbezae

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 CCF Transactions on High Performance Computing  
In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability  ...  While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to  ...  HPX enables the support of heterogeneous computing, by introducing the concepts of target, allocator and executor within the hpx. compute subproject.  ... 
doi:10.1007/s42514-020-00039-4 fatcat:nn56xhjm6rcu7kya6gfnyjg66q

Hardware Developments Iii

Alan Ó Cais, Liang Liang, Jony Castagna
2018 Zenodo  
and detailed feedback to the project software developers; - discussion of project software needs with hardware and software vendors, completion of survey of what is already available for particular hardware  ...  platforms; and, - detailed output from direct face-to-face session between the project endusers, developers and hardware vendors.  ...  Leveraging HPX within E-CAM HPX is a C++ runtime system for parallelism and concurrency.  ... 
doi:10.5281/zenodo.1304087 fatcat:itkihkoikvas5ajgxzqyswsez4

Tasking in Accelerators: Performance Evaluation

Leonel Toledo, Antonio J. Pena, Sandra Catalan, Pedro Valero-Lara
2019 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)  
In this work, we analyze the implications and results of implementing dynamic parallelism, concurrent kernels and CUDA Graphs to solve task-oriented problems.  ...  Using dynamic parallelism and CUDA Streams we were able to achieve up to 30% speedups and for CUDA Graph API up to 25x acceleration outperforming state of the art results.  ...  The contribution of this work is the analysis and evaluation of the current CUDA features for tasking in latest GPU architecture, this is a preliminary study and analysis for a future integration of NVIDIA  ... 
doi:10.1109/pdcat46702.2019.00034 dblp:conf/pdcat/ToledoPCV19 fatcat:av6okzlpcfdtbklpm5khluswm4

OpenCL-HPX integration [article]

Michael Schupikov, Universität Stuttgart
2021
HPX is a library for concurrent, parallel applications. It strives not only to address challenges regarding distributed systems, but also to conform to current and upcoming C++ standards.  ...  In this work, we combine HPX and OpenCL in form of an executor. The OpenCL executor enables HPX users to benefit from more resources on heterogeneous nodes.  ...  HPX is a library for concurrency and parallelism [KDL+20] . It is written in C++ and is developed by the STE||AR Group. The goal of HPX consists of two major aspects.  ... 
doi:10.18419/opus-11849 fatcat:tkz6ln37dnhkhcq7qt5fp24vju
« Previous Showing results 1 — 15 out of 50 results