Filters








37 Hits in 4.4 sec

Enabling efficient stencil code generation in OpenACC

Alyson D. Pereira, Rodrigo C.O. Rocha, Márcio Castro, Luís F.W. Góes, Mario A.R. Dantas
2017 Procedia Computer Science  
In this paper, we propose stencil extensions to enable efficient code generation in OpenACC.  ...  Therefore, this general-purpose approach delivers good performance on average, but it misses optimization opportunities for code generation and execution of specific classes of applications.  ...  In this paper, we propose OpenACC extensions to enable efficient code generation and execution of stencil applications by parallel skeleton frameworks.  ... 
doi:10.1016/j.procs.2017.05.155 fatcat:lrp2ydzi7ne37oh2nmlqaigr64

HPC Based Algorithmic Species Extraction Tool for Automatic Parallelization of Program Code

2019 International journal of recent technology and engineering  
The unique approach is developed to generate code automatically for parallel target machines.  ...  The evaluation of algorithmic species and the validation of extended A-Darwin are done by testing the tool against the benchmark suit HPCC.  ...  By classifying such irregular algorithms, the insights in to structures of data locality and parallelism could help in producing efficient code for programmers and compilers.  ... 
doi:10.35940/ijrte.b1188.0782s319 fatcat:7ogb3uukezaqhi6hcsawuu3sxe

A parallel pattern for iterative stencil + reduce

M. Aldinucci, M. Danelutto, M. Drocco, P. Kilpatrick, C. Misale, G. Peretti Pezzi, M. Torquati
2016 Journal of Supercomputing  
Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination  ...  We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns.  ...  Acknowledgment This work has been supported by the EU FP7 REPARA project (no. 609666) and by the NVidia GPU Research Center at University of Torino.  ... 
doi:10.1007/s11227-016-1871-z fatcat:5zoh5a64lndidkmqeg6gnrkw6a

Protocols by Default [chapter]

Nicholas Ng, Jose Gabriel de Figueiredo Coutinho, Nobuko Yoshida
2015 Lecture Notes in Computer Science  
This paper presents a code generation framework for typesafe and deadlock-free Message Passing Interface (MPI) programs.  ...  For instance, our benchmarks involving representative parallel and application-specific patterns speed up sequential execution by up to 31 times and reduce programming effort by an average of 39%.  ...  We thank Raymond Hu, Dominic Orchard and the anonymous reviewers for comments and suggestions.  ... 
doi:10.1007/978-3-662-46663-6_11 fatcat:6alwdcibi5ajba2uizaywdcdoi

Improving Utility of GPU in Accelerating Industrial Applications With User-Centered Automatic Code Translation

Po Yang, Feng Dong, Valeriu Codreanu, David Williams, Jos B. T. M. Roerdink, Baoquan Liu, Amjad Anvari-Moghaddam, Geyong Min
2018 IEEE Transactions on Industrial Informatics  
Also, existing automatic CPU-to-GPU code translators are mainly designed for research purposes with poor user interface design and hard-to-use.  ...  Our experiments with non-expert GPU users in 4 SMEs reflect that GPSME system can efficiently accelerate real-world applications with at least 4x and have a better applicability, usability and learnability  ...  Many algorithm skeletons in general applications are beyond stencil computing, which hardly copy with by MINT kernel.  ... 
doi:10.1109/tii.2017.2731362 fatcat:hsia7532pfaerbmj7i4ucuikhm

A Survey of Loop Parallelization: Models, Approaches, and Recent Developments

Hong Yao, Huifang Deng, Caifeng Zou
2016 International Journal of Grid and Distributed Computing  
Section 6 introduces the application of intelligent algorithms for this area. The discussion of future trend and conclusions are included in section 8.  ...  Section 4 focuses on the models based on semantics directives, or stencils, or some other high level abstracts; Section 5 discusses the dynamic approaches of run time.  ...  Acknowledgments We are grateful and thankful to anonymous reviewers for the helpful comments.  ... 
doi:10.14257/ijgdc.2016.9.11.12 fatcat:7v4mgkgqp5cmpakkn3mqmts7wi

Algorithmic Skeletons and Parallel Design Patterns in Mainstream Parallel Programming

Marco Danelutto, Gabriele Mencagli, Massimo Torquati, Horacio González–Vélez, Peter Kilpatrick
2020 International journal of parallel programming  
state-of-the-art industrial and research parallel programming frameworks, and the perspectives they open in relation to the exploitation of forthcoming massively-parallel (both general and special-purpose  ...  Finally, we give our personal overview—as researchers active for more than two decades in the parallel programming models and frameworks area—of the process that led to the adoption of these concepts in  ...  IC1406 High Performance Modelling and Simulation for Big Data Applications (cHiPSet).  ... 
doi:10.1007/s10766-020-00684-w fatcat:vtqcyf4he5gu3eefbjsb7nrxne

Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications

Mohamed Wahib, Naoya Maruyama
2015 Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15  
, searching for optimal kernel fissions/fusions, and generation of optimized code.  ...  This paper proposes an end-to-end framework for automatically transforming stencil-based CUDA programs to exploit inter-kernel data locality.  ...  Acknowledgments This project was partially supported by JST, CREST through its research program: "Highly Productive, High Performance Application Frameworks for Post Petascale Computing."  ... 
doi:10.1145/2749246.2749255 dblp:conf/hpdc/WahibM15 fatcat:ek2v4yc2dnhkzj2bnqgftwjepm

Heterogeneous CPU-GPU Execution of Stencil Applications

Balint Siklosi, Istvan Z Reguly, Gihan R Mudalige
2018 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)  
In this paper we present research on the hybrid CPU-GPU execution of an important class of applications: structured mesh stencil codes.  ...  We explore the traditional perloop load balancing approach used by others, and highlighting its shortcomings, we develop an algorithm that relies on polyhedral analysis and transformations in OPS to allow  ...  Code generation relies on parsing the user code for calls to the OPS API, and generating boilerplate code around the computational kernel provided by the user.  ... 
doi:10.1109/p3hpc.2018.00010 fatcat:gfwnxurlmnhezjfv5g2qdliw4u

Runtime Code Generation and Data Management for Heterogeneous Computing in Java

Juan José Fumero, Toomas Remmelg, Michel Steuwer, Christophe Dubach
2015 Proceedings of the Principles and Practices of Programming on The Java Platform - PPPJ '15  
Applications written with our API can then be transparently accelerated on a device such as a GPU using our runtime OpenCL code generator.  ...  This paper shows how marshal affects runtime and present a novel technique in Java to avoid this cost by implementing our own customised array data structure.  ...  The authors would also like to thank the anonymous rewiers, as well as Thibaut Lutz, Alberto Magni and Andrew McLeod for fruitful discussions and their help regarding our implementation and benchmarks.  ... 
doi:10.1145/2807426.2807428 dblp:conf/pppj/FumeroRSD15 fatcat:y2p4d62oyrgn7imzonrmjoynei

Optimising purely functional GPU programs

Trevor L. McDonell, Manuel M.T. Chakravarty, Gabriele Keller, Ben Lippmeier
2013 Proceedings of the 18th ACM SIGPLAN international conference on Functional programming - ICFP '13  
Both techniques are well known from other contexts, but they present unique challenges for an embedded language compiled for execution on a GPU.  ...  In this paper, we discuss two optimisation techniques, sharing recovery and array fusion, that tackle code explosion and eliminate superfluous intermediate structures.  ...  This work was supported in part by the Australian Research Council under grant number LP0989507.  ... 
doi:10.1145/2500365.2500595 dblp:conf/icfp/McDonellCKL13 fatcat:jf5n7oqnejeulejrtedygjjuxa

Optimising purely functional GPU programs

Trevor L. McDonell, Manuel M.T. Chakravarty, Gabriele Keller, Ben Lippmeier
2013 SIGPLAN notices  
Both techniques are well known from other contexts, but they present unique challenges for an embedded language compiled for execution on a GPU.  ...  In this paper, we discuss two optimisation techniques, sharing recovery and array fusion, that tackle code explosion and eliminate superfluous intermediate structures.  ...  This work was supported in part by the Australian Research Council under grant number LP0989507.  ... 
doi:10.1145/2544174.2500595 fatcat:duwcm3bo3faydmjth5kvpnpv5y

Supporting multiple accelerators in high-level programming models

Yonghong Yan, Pei-Hung Lin, Chunhua Liao, Bronis R. de Supinski, Daniel J. Quinlan
2015 Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15  
We implement our solutions for NVIDIA GPUs and demonstrate through example OpenMP codes the effectiveness of our solutions for the improvement of both performance and scalability.  ...  Efficiently exploiting the massive parallelism these accelerators provide requires the designs and implementations of productive programming models.  ...  Acknowledgment This work was supported by the National Science Foundations under Award No. CNS-1205708 and SHF-1422961. This work was also performed under the auspices of the U.S.  ... 
doi:10.1145/2712386.2712405 dblp:conf/ppopp/0001LLSQ15 fatcat:3bkv56c7ojb2te3q7da32zjgde

A Theoretical Model for Global Optimization of Parallel Algorithms

Julian Miller, Lukas Trümper, Christian Terboven, Matthias S. Müller
2021 Mathematics  
With the quickly evolving hardware landscape of high-performance computing (HPC) and its increasing specialization, the implementation of efficient software applications becomes more challenging.  ...  The presented model strictly separates the structure of an algorithm from its executed functions.  ...  [14] have developed algorithmic skeleton frameworks as the first approach to this end.  ... 
doi:10.3390/math9141685 fatcat:disxmrpqtfa7fbwt6n7cuyqz7q

A Theoretical Model for Global Optimization of Parallel Algorithms

Julian Miller, Lukas Trümper, Christian Terboven, Matthias S. Müller
2021 Mathematics  
A Theoretical Model for Global Optimization of Parallel Algorithms. Mathematics 2021, 9, 1685. https://  ...  [14] have developed algorithmic skeleton frameworks as the first approach to this end.  ...  Architectural Mapping and Code Generation There are multiple approaches for mapping parallel algorithms to specific hardware architecture.  ... 
doi:10.18154/rwth-2021-07803 fatcat:mseuigfcufakvpvtdtkhmzf77y
« Previous Showing results 1 — 15 out of 37 results