A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Accelerating a C++ CFD Code with OpenACC
2014
2014 First Workshop on Accelerator Programming using Directives
Taking the C++ flow solver ZFS as an example, we show that the directive-based programming model allows one to achieve good performance with reasonable effort, even for mature codes with many lines of ...
For the kernel most affected by the memory access pattern, we compare the initial array of structures memory layout with a structure of arrays layout. ...
ACKNOWLEDGEMENTS This work has been carried out in the scope of the NVIDIA Application Lab at Jülich in collaboration with the JARA-HPC SimLab Fluids & Solids Engineering and the Institute of Aerodynamics ...
doi:10.1109/waccpd.2014.11
dblp:conf/sc/KrausSAP14
fatcat:jn42zcof7bczhl4uajlpljblue
Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption
[article]
2017
arXiv
pre-print
a GPU accelerator or an Intel Xeon Phi co-processor. ...
To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines that was required to parallelize the code using a specific framework ...
OCL-Ida
CUDA-Ida
OMP-Emil
OCL-Ida
CUDA-Ida
CFD
Hotspot
LUD
Energy [J]
CPU
Accelerator
(b) Energy Consumption
Figure 4: A comparison of OpenMP, OpenCL, and CUDA with respect to (a) ...
arXiv:1704.05316v1
fatcat:lax3kghaxnanxixklx3haavlxa
Directive-based GPU programming for computational fluid dynamics
2015
Computers & Fluids
We examine the process of applying the OpenACC Fortran API to a test CFD code that serves as a proxy for a full-scale research code developed at Virginia Tech; this test code is used to asses the performance ...
Directive-based programming OpenACC Fortran Finite-difference method a b s t r a c t Directive-based programming of graphics processing units (GPUs) has recently appeared as a viable alternative to using ...
Acknowledgments This work was supported by an Air Force Office of Scientific Research (AFOSR) Basic Research Initiative in the Computational Mathematics program with Dr. ...
doi:10.1016/j.compfluid.2015.03.008
fatcat:guzjdb7llnbsbosoczysbe3r7y
An Early Performance Comparison of CUDA and OpenACC
2018
MATEC Web of Conferences
Overall we found that OpenACC is a reliable programming model and a good alternative to CUDA for accelerator devices. ...
The results show that in terms of kernel running time, the OpenACC performance is lower than the CUDA performance because PGI compiler needs to translate OpenACC kernels into object code while CUDA codes ...
In OpenACC, porting of legacy CPU-base code only requires to add several lines of annotations before the sections where they need to be accelerated, without changing code structures [2] . ...
doi:10.1051/matecconf/201820805002
fatcat:ul6gcqz3jrgqrallnkoqjjlqie
Recent progress and challenges in exploiting graphics processors in computational fluid dynamics
2013
Journal of Supercomputing
Finally, recommendations for implementing CFD codes on GPUs are given and remaining challenges are discussed, such as the need to develop new strategies and redesign algorithms to enable GPU acceleration ...
simple codes. ...
Another avenue for accelerating applications using GPUs is OpenACC [78, 91] , which uses compiler directives (e.g., #pragma) placed in Fortran, C, and C ++ codes to identify sections of code to be run ...
doi:10.1007/s11227-013-1015-7
fatcat:jyjmfa7wqzgnjf3qakgk6cyadi
Energy Applications Challenges (SIAM CSE21)
[article]
2021
figshare.com
These areas include wind power, combustion, nuclear energy, carbon capture, fusion energy,and plasma accelerators. ...
particle transport that, due to its stochastic nature, does not directly map to SIMT architectures.Multiple programming models and approaches are being used to achieve performance portability across a ...
• Resolving electromagnetic
turbulence
• Coupling numerics between core
and edge codes
WarpX
Challenge Problem and Codes
Modeling of a chain of tens of plasma
acceleration stages resulting ...
doi:10.6084/m9.figshare.14125667.v2
fatcat:gvh4wndrrbh7vcgk7ily24fmdi
Energy Applications Challenges (SIAM CSE21)
[article]
2021
figshare.com
These areas include wind power, combustion, nuclear energy, carbon capture, fusion energy,and plasma accelerators. ...
particle transport that, due to its stochastic nature, does not directly map to SIMT architectures.Multiple programming models and approaches are being used to achieve performance portability across a ...
• Resolving electromagnetic
turbulence
• Coupling numerics between core
and edge codes
WarpX
Challenge Problem and Codes
Modeling of a chain of tens of plasma
acceleration stages resulting ...
doi:10.6084/m9.figshare.14125667.v3
fatcat:jiazayqlsjgbpaxe7hiolhukrq
Multi-GPU Performance Optimization of a CFD Code using OpenACC on Different Platforms
[article]
2020
arXiv
pre-print
This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. ...
Since the buoyancy driven cavity code is latency-bounded on the clusters examined, a series of optimizations both agnostic and tailored to the platforms are designed to reduce the latency cost and improve ...
McCall and Behzad Baghapour for creating the original BDC code as well as giving advice, and thank Charles W. Jackson for reviewing the paper and participating in various helpful discussions. ...
arXiv:2006.02602v1
fatcat:vkldeh3tqfhyfovny27go7zx5y
Accelerating Hydrocodes with OpenACC, OpenCL and CUDA
2012
2012 SC Companion: High Performance Computing, Networking Storage and Analysis
We find that OpenACC is an extremely viable programming model for accelerator devices, improving programmer productivity and achieving better performance than OpenCL and CUDA. ...
, and portability using a recently developed Lagrangian-Eulerian explicit hydrodynamics mini-application. ...
The authors would like to express their thanks to Cray, in particular Alistair Hart of the Cray European Exascale Research Initiative, for their help with OpenACC and also to John Pennycook of the University ...
doi:10.1109/sc.companion.2012.66
dblp:conf/sc/HerdmanGMBBMJ12
fatcat:hu77tqzljrgjfbdcwsoonjz5yu
JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization
[article]
2021
arXiv
pre-print
Efforts on such models involve a least engineering cost for enabling computational acceleration on multiple architectures while programmers are only required to add meta information upon sequential code ...
This paper introduces JACC, an OpenACC runtime framework which enables the dynamic extension of OpenACC programs by serving as a transparent layer between the program and the compiler. ...
On the other hand, JACC wraps OpenACC compilers and holds C/Fortran code for optimization. A few projects aim to assist code generation with directivebased programming. ...
arXiv:2110.14340v1
fatcat:acfa6g7xm5dyfajen7fqkn4yri
NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model
[chapter]
2015
Lecture Notes in Computer Science
The right choice or combination of techniques/hints are crucial for compilers to generate highly efficient codes tuned to a particular type of accelerator. ...
such as OpenACC. ...
For evaluation purposes, we compare the performances of our OpenACC programs with serial and third-party well tuned OpenCL Benchmark
EP
CG
FT
IS
Data Size
A
B
C
A
B
C
A
B
C
A
B
C
NPB-SER ...
doi:10.1007/978-3-319-17473-0_5
fatcat:25gloejzqzeetcxmaa4cxctkwu
Parallel Reservoir Simulation with OpenACC and Domain Decomposition
2018
Algorithms
In order to address the problems, we propose a parallel method with OpenACC to accelerate serial code and reduce the time and effort during porting an application to GPU. ...
The experimental results indicate that (1) the proposed GPU-aided approach can outperform the CPU-based one up to about two times, meanwhile with the help of OpenACC, the workload of the transplant code ...
Like OpenMP, its benchmark is for C/C++ and Fortran source code to identify the areas that should be accelerated using compiler directives and additional functions. ...
doi:10.3390/a11120213
fatcat:swrmqhkzujc3fjvozi3jckd7rm
An Improved Framework of GPU Computing for CFD Applications on Structured Grids using OpenACC
[article]
2020
arXiv
pre-print
This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. ...
A series of performance issues related to the scaling for the multi-block CFD code are addressed by applying various optimizations. ...
[24] further applied the heterogeneous computing to accelerate a complicated CFD code on a CPU/GPU platform using MPI and OpenACC. ...
arXiv:2012.02925v1
fatcat:evqp7zr7afebhksc2iqexs4ewy
D7.2.2 Exploitation of HPC Tools and Techniques
2014
Zenodo
For a more detailed description of each of the exploitation projects summarised here, we refer the reader to the PRACE-3IP whitepaper associated with each of the 17 projects. ...
The objective of PRACE-3IP Work Package 7 (WP7) 'Application Enabling and Support' is to provide applications enabling support for HPC applications codes which are important for European researchers to ...
With OpenACC, a developer can annotate C, C++ and Fortran source code to identify the areas to be accelerated using #pragma compiler directives and additional functions. ...
doi:10.5281/zenodo.6575525
fatcat:5y3cjsculrdejllndosbjpcgiq
Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems
[article]
2021
arXiv
pre-print
We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. ...
The performance results show that speed-up between 3-5 can be achieved using the GPU accelerated version compared with the CPU version on these different systems. ...
The acceleration of Nek5000 with OpenACC directives was first explored by Markidis et al. by accelerating the mini-app Nekbone in [21] and then improved with CUDA Fortran implementations for the core ...
arXiv:2109.03592v3
fatcat:6e75xxahnfhpxn3plml6lradf4
« Previous
Showing results 1 — 15 out of 176 results