2,083 Hits in 4.0 sec

Porting Applications with OpenMP Using Similarity Analysis [chapter]

Wei Ding, Oscar Hernandez, Tony Curtis, Barbara Chapman
2014 Lecture Notes in Computer Science  
We evaluate Klonos by applying it to a real scientific application porting to a shared memory environment using OpenMP.  ...  According to our experiment result, which shows that Klonos is very accurate to detect similar codes which can be ported similarly.  ...  They may need to use a hybrid programming model, such as adding OpenMP [1, 3] and accelerator directives [12, 13] to MPI applications [27] , or they may introduce Pthreads [4] or an API designed  ... 
doi:10.1007/978-3-319-09967-5_2 fatcat:omm43ax5w5bmpdhqazu5fivlk4

Application of OpenMP to Weather, Wave and Ocean Codes

Paolo Malfetti
2001 Scientific Programming  
to the availability of OpenMP as standard shared memory paradigm.  ...  These three models were written for vector machines, so the paper will describe the technique used to port a vector code to a SMP-ccNUMA architecture.  ...  Unless specified the experiments were run using SGI's miser (or equivalent tool), so that the cpus were dedicated to the application.  ... 
doi:10.1155/2001/717549 fatcat:lwnmaw3nqzdmfbstmuhmgtemry

Enabling Execution of a Legacy CFD Mini Application on Accelerators Using OpenMP [chapter]

Ioannis Nompelis, Gabriele Jost, Alice Koniges, Christopher Daley, David Eder, Christopher Stone
2020 Lecture Notes in Computer Science  
Our objective was to examine how efficiently legacy Fortran codes can be ported to accelerators by leveraging OpenMP directives.  ...  We describe the process and outcome of our efforts to port a legacy Fortran benchmark code to heterogeneous GPU-accelerated computing architectures using OpenMP.  ...  The authors would like to thank all those who helped with valuable advice during the NERSC/OLCF Oakland Hackathon, especially, Kevin Gott and Jack Deslippe from NERSC and Tom Papatheodore from OLCF for  ... 
doi:10.1007/978-3-030-50743-5_14 fatcat:g2323rh6lvcsdaekgzosozrtrq

Extending OpenMP for the optimization of parallel component applications

Yunfeng Peng, Hai Liu
2020 IEEE Access  
To better utilize resources, we provide mapping of virtual computing resources to real resources. We extends the OpenMP programming model and execution model to accommodate the extended pragmas.  ...  We extends the OpenMP pragma to support virtual computing resources and the deployment of complex structure codes on them.  ...  Our future work will include more performance experiments on different kinds of parallel applications.  ... 
doi:10.1109/access.2020.2996669 fatcat:n7b4um3sjvgonpp3t7k6eeitia

A hybrid MPI-OpenMP parallel implementation for pseudospectral simulations with application to Taylor–Couette flow

Liang Shi, Markus Rampp, Björn Hof, Marc Avila
2015 Computers & Fluids  
The code is parallelized using a hybrid MPI-OpenMP strategy, which is simpler to implement, reduces inter-node communications and is more efficient compared to a flat MPI parallelization.  ...  A hybrid-parallel direct-numerical-simulation method with application to turbulent Taylor-Couette flow is presented.  ...  Experiments conducted by Ji et al.  ... 
doi:10.1016/j.compfluid.2014.09.021 fatcat:smrr2ej62jayra2znz6oeykaqu

The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs [chapter]

Matt Martineau, Simon McIntosh-Smith
2017 Lecture Notes in Computer Science  
The results show that while current OpenMP implementations are able to achieve good performance on the breadth of modern hardware for memory bandwidth bound applications, our memory latency bound application  ...  We discuss important considerations for scientific application developers tackling large software projects with OpenMP, including straightforward code mechanisms to improve productivity and portability  ...  We would also like to thank the Intel Parallel Computing Center (IPCC) at the University of Bristol for access to Intel hardware, and the EPSRC GW4 Tier 2 Isambard service for access to phase 1 of the  ... 
doi:10.1007/978-3-319-65578-9_13 fatcat:n5c4xuldpbeijhzsremgofabw4

Accelerated application development: The ORNL Titan experience

Wayne Joubert, Rick Archibald, Mark Berrill, W. Michael Brown, Markus Eisenbach, Ray Grout, Jeff Larkin, John Levesque, Bronson Messer, Matt Norman, Bobby Philip, Ramanan Sankaran (+2 others)
2015 Computers & electrical engineering  
In this paper we discuss experiences porting applications to the Titan system.  ...  To ready applications for accelerated computing, a preparedness effort was undertaken prior to delivery of Titan.  ...  The authors would like to thank Gregory Ruetsch, Adrian Tate, Massimiliano Fatica, Peng Wang, Yang Wang, Aurelian  ... 
doi:10.1016/j.compeleceng.2015.04.008 fatcat:ir4nljoznvfvrbg3vdcjtggxc4

Gromacs On Hybrid Cpu-Gpu And Cpu-Mic Clusters: Preliminary Porting Experiences, Results And Next Steps

Sadaf Alam
2014 Zenodo  
As a result, the execution model is multi-faceted where end users can tune the application execution according to the underlying platforms.  ...  GROMACS currently employs message-passing MPI parallelism, multi-threading using OpenMP and contains kernels for non-bonded interactions that are accelerated using the CUDA programming language.  ...  GROMACS performance and scaling is highly sensitive to the implementation and selection of the non-bonded acceleration code (GMX ACCELERATION), number of OpenMP threads per node, the number of MPI tasks  ... 
doi:10.5281/zenodo.822571 fatcat:g2vl3pizpnhrnmn6agtkz64lci

Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience [article]

Reuben D. Budiardja, Christian Y. Cardall
2018 arXiv   pre-print
We use OpenMP directives to target hardware accelerators (GPUs) on Summit, a newly deployed supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), demonstrating simplified access to GPU devices  ...  of nodes' CPU cores, and reasonable weak scaling up to 8000 GPUs vs. 56,000 CPU cores (1333 1/3 Summit nodes).  ...  Porting a Fluid Dynamics Application: RiemannProblem As a concrete working example of targeting GPUs with OpenMP directives, we use our implementation to solve a RiemannProblem using GenASiS Basics.  ... 
arXiv:1812.07977v1 fatcat:2iftgsehfvc7pnwjjff6vyziee

Experiences with task-based programming using cluster nodes as OpenMP devices [article]

Ilias Keftakis, Vassilios V. Dimakopoulos
2022 arXiv   pre-print
We make use of the OpenMP device model to provide an easy and intuitive way to program available cluster nodes.  ...  In this work, we consider each node of a cluster as a separate OpenMP device, able to run code with OpenMP directives in parallel.  ...  Moreover, we utilize the new device to (re)write task-based OpenMP applications and report on our experiences.  ... 
arXiv:2205.10656v1 fatcat:anybtpejqjedpjkttgstp5to7q

Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor

Joao V.F. Lima, Francois Broquedis, Thierry Gautier, Bruno Raffin
2013 2013 25th International Symposium on Computer Architecture and High Performance Computing  
This paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their  ...  Our benchmark suite is composed of three computing kernels: a Fibonacci computation that allows to study the overhead and the scalability of the runtime system, a NQueens application generating irregular  ...  With the introduction of the Intel Xeon Phi coprocessor, Intel proposed a strong evolution in the way to develop applications for accelerators.  ... 
doi:10.1109/sbac-pad.2013.28 dblp:conf/sbac-pad/LimaBGR13 fatcat:i2ie22b7ubcahcj3p2g6dhuweq

Honing and proofing Astrophysical codes on the road to Exascale. Experiences from code modernization on many-core systems [article]

Salvatore Cielo, Luigi Iapichino, Fabio Baruffa, Matteo Bugli and Christoph Federrath
2020 arXiv   pre-print
The complexity of modern and upcoming computing architectures poses severe challenges for code developers and application specialists, and forces them to expose the highest possible degree of parallelism  ...  , in order to make the best use of the available hardware.  ...  isolated kernel was OpenMP-only, whereas in the back-porting the OpenMP improvements were interfaced with the MPI section of the full code.  ... 
arXiv:2002.08161v1 fatcat:3r4prracnrc45b3pl7yta4l3m4

Usage Experience With Dl_Poly And Opencl On Prace Prototypes

Agnieszka Kwiecień
2004 Zenodo  
Due to the heterogeneity of the prototypes we decided to use the DL_POLY molecular simulation package and its OpenCL port for the tests.  ...  We show the performance results and discuss the usage experience with prototypes in a context of ease of use, porting effort required, and energy consumption.  ...  Conclusions We were able to compile and run the application on both prototypes without many difficulties.  ... 
doi:10.5281/zenodo.825515 fatcat:p7ylc4bxmzharjt2asty672qmy

Experience Report: Writing A Portable GPU Runtime with OpenMP 5.1 [article]

Shilei Tian and Jon Chesterfield and Johannes Doerfert and Barbara Chapman
2021 arXiv   pre-print
The library we ported to OpenMP is the OpenMP device runtime that provides OpenMP functionality on the GPU.  ...  While we tried to be OpenMP compliant, we identified the need for compiler extensions to achieve the CUDA performance with our OpenMP runtime.  ...  Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications  ... 
arXiv:2106.03219v1 fatcat:gfgnjqsesjhhbicr2xdmvsndpq

Two Parallel Implementations of Ehrlich-Aberth Algorithm for Root-Finding of Polynomials on Multiple GPUs with OpenMP and MPI

Kahina Ghidouche, Abderrahmane Sider, Lilia Ziane Khodja, Raphael Couturier
2016 2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES)  
Moreover, other experiments show it is possible to find the roots of polynomials of degree up-to 5 million.  ...  The experiments show a quasi-linear speedup by using up-to 4 GPU devices compared to 1 GPU to find the roots of polynomials of degree up-to 1.4 million.  ...  Experiments show that, using parallel programming model like OpenMP or MPI, we can efficiently manage multiple graphics cards to solve the same problem and accelerate the parallel execution with 4 GPUs  ... 
doi:10.1109/cse-euc-dcabes.2016.196 dblp:conf/cse/GhidoucheSKC16 fatcat:2eqxbhmytfaetdeesfrzul2dv4
« Previous Showing results 1 — 15 out of 2,083 results