A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Performance Evaluation of Matrix Multiplication Using Mix Mode Optimization Techniques And Open MP For Multi-Core Processors
2014
IOSR Journal of Engineering
The evaluation is based on simple execution of the algorithm that uses single thread for computation whereas the one with optimization techniques and OpenMP with multi-threads. ...
Optimization techniques reduces space requirement and ensures fast execution. OpenMP is a very well known standard that exploits parallelism in shared memory architecture. ...
To some extent Optimizing techniques individually are being used for speed ups execution and reducing memory requirements for the different tasks of very small sizes only, on simple machines.
VI. ...
doi:10.9790/3021-04311922
fatcat:kzepdrng7zazxlwffmj4ar2c2m
Towards OpenMP Execution on Software Distributed Shared Memory Systems
[chapter]
2002
Lecture Notes in Computer Science
We point out pitfalls of a naive translation approach from OpenMP into the API provided by a Software DSM system, and we discuss a set of possible program optimization techniques. ...
In this paper, we examine some of the challenges present in providing support for OpenMP applications on a Software Distributed Shared Memory(DSM) based cluster system. ...
through a simple experiment. ...
doi:10.1007/3-540-47847-7_42
fatcat:z4snvupe3zal3czxr3wouq2b3i
OpenMP to GPGPU
2009
SIGPLAN notices
Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both ...
In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. ...
Second, efficient global memory access is one of the most important targets of GPU optimizations, but simple transformation techniques, such as the ones proposed in this paper, are effective in optimizing ...
doi:10.1145/1594835.1504194
fatcat:wbpl7ohbzffedndc6s6tafkfny
OpenMP to GPGPU
2008
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '09
Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both ...
In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. ...
Second, efficient global memory access is one of the most important targets of GPU optimizations, but simple transformation techniques, such as the ones proposed in this paper, are effective in optimizing ...
doi:10.1145/1504176.1504194
dblp:conf/ppopp/LeeME09
fatcat:7ru27sozu5h5hhlni4w4cdx6hi
Is OpenMP for grids ?
2002
Proceedings 16th International Parallel and Distributed Processing Symposium
A second part of the paper presents ideas for OpenMP extensions that enable the programmer to override the compiler whenever automatic methods fail to generate high-quality code. ...
This paper presents an overview of an ongoing NSFsponsored project for the study of runtime systems and compilers to support the development of efficient OpenMP parallel programs for distributed memory ...
Some of the private data will be that identified in OpenMP private clauses and other will be identified by our OpenMP compiler as an optimization. ...
doi:10.1109/ipdps.2002.1016571
dblp:conf/ipps/EigenmannHKPBMZ02
fatcat:sklzgnldmbfktosjv4c7rr6vwi
OpenMP Optimization and its Translation to OpenGL
2010
International Journal of Computer Applications
Programming GPGPUs is complex when compared to programming general purpose CPUs and parallel programming models such as OpenMP. ...
Goal of our translation is to improve programmability and make existing OpenMP applications to be able to execute on GPGPUs. ...
The OpenMP stream optimizer transforms traditional CPU oriented OpenMP programs into OpenMP programs optimized for GPGPUs, using our high-level optimization techniques: parallel loop-swap and loop-collapsing ...
doi:10.5120/1209-1732
fatcat:phym2pbt7vevhgqzxwoh7ekymq
An OpenMP Compiler Benchmark
2003
Scientific Programming
The purpose of this benchmark is to propose several optimization techniques and to test their existence in current OpenMP compilers. ...
Six out of seven proposed optimization techniques are already implemented in different compilers. However, most compilers implement only one or two of them. ...
Conclusion This small benchmark contains a collection of various optimization techniques that might be implemented in OpenMP compilers. ...
doi:10.1155/2003/287461
fatcat:lwygisbdszcmjbef2r35hmgnne
Unrolling Loops Containing Task Parallelism
[chapter]
2010
Lecture Notes in Computer Science
Our aggregation technique covers the special cases where task parallelism appears inside branches or where the loop is uncountable. ...
We present an implementation of such extended loop unrolling for OpenMP tasks with two phases: a classical unroll followed by a task aggregation phase. ...
As a very simple optimization, if no tasks are created in the whole body of the unrolled loop no aggregated task is created either. ...
doi:10.1007/978-3-642-13374-9_30
fatcat:zip3nicw6nabtpjetdthrbvmom
Scaling irregular parallel codes with minimal programming effort
2001
Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '01
We present a simple runtime methodology for scaling irregular applications parallelized with the standard OpenMP interface. ...
This is probably the first time such a result is obtained from OpenMP, more so, by keeping the OpenMP API intact. ...
As an alternative to the automatic runtime optimizations, we present a simple scheme for implementing arbitrary irregular data distributions through proper distribution of the iterations of OpenMP parallel ...
doi:10.1145/582034.582050
dblp:conf/sc/NikolopoulosPA01
fatcat:iq75fa4my5bsjbfe5kmx4fq2te
OpenMPC: extended OpenMP for efficient programming and tuning on GPUs
2013
International Journal of Computational Science and Engineering (IJCSE)
In addition to a range of compiler transformations and optimizations, the system includes tuning capabilities for generating, pruning, and navigating the search space of compilation variants. ...
Compiler Optimizations Compiler optimizations related to GPU memory accesses can be classified as follows: (1) techniques to optimize data movement between CPU and GPU, (2) techniques to optimize GPU global ...
Transformation Techniques Supporting OpenMP-to-CUDA Translation This section explains transformation techniques that are used to address various issues arising during the OpenMP-to-CUDA translation. ...
doi:10.1504/ijcse.2013.052110
fatcat:eipvcpeaejghnl73jler5jafky
Reducing data access latency in SDSM systems using runtime optimizations
2010
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research - CASCON '10
Currently we are targeting OpenMP applications due to the ease of use this programming model provides. In this paper we show the performance of ...
Our main research interest is to develop a set of compiler and runtime system techniques that widen the range of applications that can efficiently run on SDSM systems. ...
Both features might seem as a performance killer for many applications, and of course the system has to pay some performance penalty for them, but there are optimization techniques that have been applied ...
doi:10.1145/1923947.1923965
dblp:conf/cascon/BuenoMCCAZBS10
fatcat:3glcalkqmfh25ggu5trhn3q5me
OpenMP-Based Approach for High Level C Loops Synthesis
2017
International Journal of Software Innovation
In addition, techniques to accelerate the code production process have appeared. In this context, the automatic code generation is an interesting technique for the embedded systems project. ...
This work presents an automatic VHDL code generation method based on the OpenMP parallel programming specification. ...
., 2001) , DWARV (OpenMP Application Program Interface, 2016) and ROCCC (Gupta, Gupta, Dutt et al., 2004) projects emphasize parallelizing transformations and some also address memory access optimizations ...
doi:10.4018/ijsi.2017010101
fatcat:w7hcdygvzzg7dl56usbaw45if4
The OpenTM Transactional Application Programming Interface
2007
Parallel Architecture and Compilation Techniques (PACT), Proceedings of the International Conference on
Overall, OpenTM provides a practical and efficient TM programming environment within the familiar scope of OpenMP. ...
The implementation builds upon the OpenMP support in the GCC compiler and includes a runtime for the C programming language. We evaluate the performance and programmability features of OpenTM. ...
As is the case with OpenMP, the OpenTM code requires simple, high-level annotations for parallelism and memory transactions. ...
doi:10.1109/pact.2007.4336227
fatcat:nn7gbfngvrff5egt4jzfgurptm
Producing scalable performance with OpenMP: Experiments with two CFD applications
2001
Parallel Computing
We conclude with a list of key issues which need to be addressed to make OpenMP a more easily scalable paradigm. Some of these are OpenMP implementation issues; some are language issues. 06/03/00 ...
The list of incremental transformations includes well-known techniques such as loop interchange and loop fusion, plus new ones which make use of the unique features of OpenMP, such as barrier removal and ...
Some problems are likely to be fixed in future releases of the compiler we used. Some will be aided by changes that are expected in OpenMP V2.0. ...
doi:10.1016/s0167-8191(00)00071-5
fatcat:otjwibnz3vb6vkpgsi5aat47ae
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
2010
2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
and optimizations. ...
This paper proposes a new programming interface, called OpenMPC, which builds on OpenMP to provide an abstraction of the complex CUDA programming model and offers high-level controls of the involved parameters ...
Compiler Optimizations Our translation system includes several optimizations of GPU memory accesses: • Techniques to optimize data movement between CPU and GPU • Techniques to optimize GPU global memory ...
doi:10.1109/sc.2010.36
dblp:conf/sc/LeeE10
fatcat:gsjpvpy4bbaz5ou4ahygtcgwq4
« Previous
Showing results 1 — 15 out of 9,163 results