9,163 Hits in 3.0 sec

Performance Evaluation of Matrix Multiplication Using Mix Mode Optimization Techniques And Open MP For Multi-Core Processors

Yogesh Singh Rathore
2014 IOSR Journal of Engineering  
The evaluation is based on simple execution of the algorithm that uses single thread for computation whereas the one with optimization techniques and OpenMP with multi-threads.  ...  Optimization techniques reduces space requirement and ensures fast execution. OpenMP is a very well known standard that exploits parallelism in shared memory architecture.  ...  To some extent Optimizing techniques individually are being used for speed ups execution and reducing memory requirements for the different tasks of very small sizes only, on simple machines. VI.  ... 
doi:10.9790/3021-04311922 fatcat:kzepdrng7zazxlwffmj4ar2c2m

Towards OpenMP Execution on Software Distributed Shared Memory Systems [chapter]

Ayon Basumallik, Seung-Jai Min, Rudolf Eigenmann
2002 Lecture Notes in Computer Science  
We point out pitfalls of a naive translation approach from OpenMP into the API provided by a Software DSM system, and we discuss a set of possible program optimization techniques.  ...  In this paper, we examine some of the challenges present in providing support for OpenMP applications on a Software Distributed Shared Memory(DSM) based cluster system.  ...  through a simple experiment.  ... 
doi:10.1007/3-540-47847-7_42 fatcat:z4snvupe3zal3czxr3wouq2b3i


Seyong Lee, Seung-Jai Min, Rudolf Eigenmann
2009 SIGPLAN notices  
Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both  ...  In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance.  ...  Second, efficient global memory access is one of the most important targets of GPU optimizations, but simple transformation techniques, such as the ones proposed in this paper, are effective in optimizing  ... 
doi:10.1145/1594835.1504194 fatcat:wbpl7ohbzffedndc6s6tafkfny


Seyong Lee, Seung-Jai Min, Rudolf Eigenmann
2008 Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '09  
Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both  ...  In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance.  ...  Second, efficient global memory access is one of the most important targets of GPU optimizations, but simple transformation techniques, such as the ones proposed in this paper, are effective in optimizing  ... 
doi:10.1145/1504176.1504194 dblp:conf/ppopp/LeeME09 fatcat:7ru27sozu5h5hhlni4w4cdx6hi

Is OpenMP for grids ?

R. Eigenmann, J. Hoeflinger, R.H. Kuhn, D. Padua, A. Basumallik, Seung-Jai Min, Jiajing Zhu
2002 Proceedings 16th International Parallel and Distributed Processing Symposium  
A second part of the paper presents ideas for OpenMP extensions that enable the programmer to override the compiler whenever automatic methods fail to generate high-quality code.  ...  This paper presents an overview of an ongoing NSFsponsored project for the study of runtime systems and compilers to support the development of efficient OpenMP parallel programs for distributed memory  ...  Some of the private data will be that identified in OpenMP private clauses and other will be identified by our OpenMP compiler as an optimization.  ... 
doi:10.1109/ipdps.2002.1016571 dblp:conf/ipps/EigenmannHKPBMZ02 fatcat:sklzgnldmbfktosjv4c7rr6vwi

OpenMP Optimization and its Translation to OpenGL

Santosh Kumar, V.M. Wadhai, Prasad S. Halgaonkar, Kiran P. Gaikwad
2010 International Journal of Computer Applications  
Programming GPGPUs is complex when compared to programming general purpose CPUs and parallel programming models such as OpenMP.  ...  Goal of our translation is to improve programmability and make existing OpenMP applications to be able to execute on GPGPUs.  ...  The OpenMP stream optimizer transforms traditional CPU oriented OpenMP programs into OpenMP programs optimized for GPGPUs, using our high-level optimization techniques: parallel loop-swap and loop-collapsing  ... 
doi:10.5120/1209-1732 fatcat:phym2pbt7vevhgqzxwoh7ekymq

An OpenMP Compiler Benchmark

Matthias S. Müller
2003 Scientific Programming  
The purpose of this benchmark is to propose several optimization techniques and to test their existence in current OpenMP compilers.  ...  Six out of seven proposed optimization techniques are already implemented in different compilers. However, most compilers implement only one or two of them.  ...  Conclusion This small benchmark contains a collection of various optimization techniques that might be implemented in OpenMP compilers.  ... 
doi:10.1155/2003/287461 fatcat:lwygisbdszcmjbef2r35hmgnne

Unrolling Loops Containing Task Parallelism [chapter]

Roger Ferrer, Alejandro Duran, Xavier Martorell, Eduard Ayguadé
2010 Lecture Notes in Computer Science  
Our aggregation technique covers the special cases where task parallelism appears inside branches or where the loop is uncountable.  ...  We present an implementation of such extended loop unrolling for OpenMP tasks with two phases: a classical unroll followed by a task aggregation phase.  ...  As a very simple optimization, if no tasks are created in the whole body of the unrolled loop no aggregated task is created either.  ... 
doi:10.1007/978-3-642-13374-9_30 fatcat:zip3nicw6nabtpjetdthrbvmom

Scaling irregular parallel codes with minimal programming effort

Dimitrios S. Nikolopoulos, Constantine D. Polychronopoulos, Eduard Ayguadé
2001 Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '01  
We present a simple runtime methodology for scaling irregular applications parallelized with the standard OpenMP interface.  ...  This is probably the first time such a result is obtained from OpenMP, more so, by keeping the OpenMP API intact.  ...  As an alternative to the automatic runtime optimizations, we present a simple scheme for implementing arbitrary irregular data distributions through proper distribution of the iterations of OpenMP parallel  ... 
doi:10.1145/582034.582050 dblp:conf/sc/NikolopoulosPA01 fatcat:iq75fa4my5bsjbfe5kmx4fq2te

OpenMPC: extended OpenMP for efficient programming and tuning on GPUs

Seyong Lee, Rudolf Eigenmann
2013 International Journal of Computational Science and Engineering (IJCSE)  
In addition to a range of compiler transformations and optimizations, the system includes tuning capabilities for generating, pruning, and navigating the search space of compilation variants.  ...  Compiler Optimizations Compiler optimizations related to GPU memory accesses can be classified as follows: (1) techniques to optimize data movement between CPU and GPU, (2) techniques to optimize GPU global  ...  Transformation Techniques Supporting OpenMP-to-CUDA Translation This section explains transformation techniques that are used to address various issues arising during the OpenMP-to-CUDA translation.  ... 
doi:10.1504/ijcse.2013.052110 fatcat:eipvcpeaejghnl73jler5jafky

Reducing data access latency in SDSM systems using runtime optimizations

Javier Bueno, Xavier Martorell, Juan José Costa, Toni Cortés, Eduard Ayguadé, Guansong Zhang, Christopher Barton, Raul Silvera
2010 Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research - CASCON '10  
Currently we are targeting OpenMP applications due to the ease of use this programming model provides. In this paper we show the performance of  ...  Our main research interest is to develop a set of compiler and runtime system techniques that widen the range of applications that can efficiently run on SDSM systems.  ...  Both features might seem as a performance killer for many applications, and of course the system has to pay some performance penalty for them, but there are optimization techniques that have been applied  ... 
doi:10.1145/1923947.1923965 dblp:conf/cascon/BuenoMCCAZBS10 fatcat:3glcalkqmfh25ggu5trhn3q5me

OpenMP-Based Approach for High Level C Loops Synthesis

Emna Kallel, Yassine Aoudni, Mohamed Abid
2017 International Journal of Software Innovation  
In addition, techniques to accelerate the code production process have appeared. In this context, the automatic code generation is an interesting technique for the embedded systems project.  ...  This work presents an automatic VHDL code generation method based on the OpenMP parallel programming specification.  ...  ., 2001) , DWARV (OpenMP Application Program Interface, 2016) and ROCCC (Gupta, Gupta, Dutt et al., 2004) projects emphasize parallelizing transformations and some also address memory access optimizations  ... 
doi:10.4018/ijsi.2017010101 fatcat:w7hcdygvzzg7dl56usbaw45if4

The OpenTM Transactional Application Programming Interface

Woongki Baek, Chi Cao Minh, Martin Trautmann, Christos Kozyrakis, Kunle Olukotun
2007 Parallel Architecture and Compilation Techniques (PACT), Proceedings of the International Conference on  
Overall, OpenTM provides a practical and efficient TM programming environment within the familiar scope of OpenMP.  ...  The implementation builds upon the OpenMP support in the GCC compiler and includes a runtime for the C programming language. We evaluate the performance and programmability features of OpenTM.  ...  As is the case with OpenMP, the OpenTM code requires simple, high-level annotations for parallelism and memory transactions.  ... 
doi:10.1109/pact.2007.4336227 fatcat:nn7gbfngvrff5egt4jzfgurptm

Producing scalable performance with OpenMP: Experiments with two CFD applications

Jay Hoeflinger, Prasad Alavilli, Thomas Jackson, Bob Kuhn
2001 Parallel Computing  
We conclude with a list of key issues which need to be addressed to make OpenMP a more easily scalable paradigm. Some of these are OpenMP implementation issues; some are language issues. 06/03/00  ...  The list of incremental transformations includes well-known techniques such as loop interchange and loop fusion, plus new ones which make use of the unique features of OpenMP, such as barrier removal and  ...  Some problems are likely to be fixed in future releases of the compiler we used. Some will be aided by changes that are expected in OpenMP V2.0.  ... 
doi:10.1016/s0167-8191(00)00071-5 fatcat:otjwibnz3vb6vkpgsi5aat47ae

OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Seyong Lee, Rudolf Eigenmann
2010 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis  
and optimizations.  ...  This paper proposes a new programming interface, called OpenMPC, which builds on OpenMP to provide an abstraction of the complex CUDA programming model and offers high-level controls of the involved parameters  ...  Compiler Optimizations Our translation system includes several optimizations of GPU memory accesses: • Techniques to optimize data movement between CPU and GPU • Techniques to optimize GPU global memory  ... 
doi:10.1109/sc.2010.36 dblp:conf/sc/LeeE10 fatcat:gsjpvpy4bbaz5ou4ahygtcgwq4
« Previous Showing results 1 — 15 out of 9,163 results