162 Hits in 4.9 sec

A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks [chapter]

Dimitrios S. Nikolopoulos, Eduard Ayguadé
2001 Lecture Notes in Computer Science  
This paper evaluates the effectiveness of using this runtime data distribution method in non embarrassingly parallel codes, such as the SPEC benchmarks.  ...  The speedups are close to the theoretical maximum speedups for the problem sizes used and they are obtained with a minimal programming effort of about a couple of hours per benchmark.  ...  Background UPMlib uses dynamic page migration as a tool for implicit data distribution.  ... 
doi:10.1007/3-540-44587-0_11 fatcat:u243gnubyfcqlce2c35fik42oe

Large System Performance of SPEC OMP2001 Benchmarks [chapter]

Hideki Saito, Greg Gaertner, Wesley Jones, Rudolf Eigenmann, Hidetoshi Iwashita, Ron Lieberman, Matthijs van Waveren, Brian Whitney
2002 Lecture Notes in Computer Science  
SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems.  ...  The ongoing development of the SPEC OMP2001 benchmark suites is also discussed. Its main feature is the increased data set for large-scale systems.  ...  Acknowledgement The authors would like to thank all of those who developed the application programs and data sets used in the benchmark.  ... 
doi:10.1007/3-540-47847-7_34 fatcat:cxiv3sm55fahvoibul4dz5jbhm

Programming Distributed Memory Sytems Using OpenMP

Ayon Basumallik, Seung-Jai Min, Rudolf Eigenmann
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
Second, we introduce a direct translation of standard OpenMP into MPI message-passing programs for execution on distributed memory systems.  ...  We present a compiler algorithm to detect such repetitive data references and an API to an underlying software distributed shared memory system to orchestrate the learning and proactive reuse of communication  ...  We used five Fortran programs: WUPWISE, SWIM, and APPLU from the SPEC OMP benchmarks and CG from the NAS OpenMP benchmarks and SpMul.  ... 
doi:10.1109/ipdps.2007.370397 dblp:conf/ipps/BasumallikME07 fatcat:cdpbjy7ghndcxa6kh5zlryc6q4

Towards automatic translation of OpenMP to MPI

Ayon Basumallik, Rudolf Eigenmann
2005 Proceedings of the 19th annual international conference on Supercomputing - ICS '05  
A comparison with High Performance Fortran (HPF) versions of two NAS benchmarks indicates that our translated OpenMP versions achieve 12% to 89% better performance than the HPF versions.  ...  This translation aims to extend the ease of creating parallel applications with OpenMP to a wider variety of platforms, such as commodity cluster systems.  ...  Our experiments used the full ref data sets (up to 2 GB) of the SPEC OMPM2001 benchmarks.  ... 
doi:10.1145/1088149.1088174 dblp:conf/ics/BasumallikE05 fatcat:nh3haoritfayjhkpbjvsd5eaay

OpenMP compiler for distributed memory architectures

Jue Wang, ChangJun Hu, JiLin Zhang, JianJiang Li
2010 Science China Information Sciences  
Skeleton method [10] is used in LLCoMP to translate extended OpenMP to MPI. Because the skeleton is difficult for compiling optimization, it has no effect on discontinuous data accesses.  ...  While OpenMP has advantages on its ease of use and incremental programming, message passing is today still the most widely-used programming model for distributed memory architectures.  ...  Ayon Basumallik from Purdue University for the discussion of Cluster OpenMP. We thank the members from the HPC lab of University of Science and Technology Beijing.  ... 
doi:10.1007/s11432-010-0074-0 fatcat:xvnhb6emcrcmfdudqcppexsua4

An Optimized Reduction Design to Minimize Atomic Operations in Shared Memory Multiprocessors

Ettore Speziale, Andrea di Biagio, Giovanni Agosta
2011 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum  
We report a speedup of 59.64% on the 312.swim m SPEC OMP2001 benchmark and a speedup of 24.89% on the streamcluster benchmark from the PARSEC suite over the GCC libgomp baseline. 1 libgomp is the OpenMP  ...  Reduction operations play a key role in modern massively data parallel computation.  ...  [11] show the bottlenecks for the SPEC OMP2001 benchmarks.  ... 
doi:10.1109/ipdps.2011.271 dblp:conf/ipps/SpezialeBA11 fatcat:5h3qqgkyfbfzlbzndjr7ny263a

Accomodating Diversity in CMPs with Heterogeneous Frequencies [chapter]

Major Bhadauria, Vince Weaver, Sally A. McKee
2009 Lecture Notes in Computer Science  
For the NAS and SPEC OpenMP benchmarks, only the partitioning of loop iterations was changed to be set at run time.  ...  All the SPEC-OMP benchmarks show excellent scaling, with about a third of the total instructions for each core, and only a 1% variation. The NAS benchmarks show more variation for two benchmarks.  ... 
doi:10.1007/978-3-540-92990-1_19 fatcat:ptrr36gzczfd5dmsbkcwddsgtu

Quantitative Performance Analysis of the SPEC OMPM2001 Benchmarks

Vishal Aslot, Rudolf Eigenmann
2003 Scientific Programming  
The Standard Performance Evaluation Corporation (SPEC) has created a suite of parallel programs called SPEC OMP to compare and evaluate modern shared-memory multiprocessor systems using the OpenMP standard  ...  We have studied these benchmarks in detail to understand their performance on a modern architecture. In this paper, we present detailed measurements of the benchmarks.  ...  The suite contains SPEC OMPM2001 (a medium, 2GB data set) and SPEC OMPL2001 (a large, 7GB dataset). The data set sizes define the maximum memory requirements for a single-processor run.  ... 
doi:10.1155/2003/401032 fatcat:e3pkx7ni2jdndfetlkz7m2zp3i

SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance [chapter]

Guido Juckeland, William Brantley, Sunita Chandrasekaran, Barbara Chapman, Shuai Che, Mathew Colgrove, Huiyu Feng, Alexander Grund, Robert Henschel, Wen-Mei W. Hwu, Huian Li, Matthias S. Müller (+12 others)
2015 Lecture Notes in Computer Science  
The SPEC High Performance Group (HPG) has developed a set of performance metrics to evaluate the performance and power consumption of accelerators for various science applications.  ...  The new benchmark comprises two suites of applications written in OpenCL and OpenACC and measures the performance of accelerators with respect to a reference platform.  ...  AMD is a trademarks of Advanced Micro Devices, Inc. OpenCL is a trademark of Apple, Inc. used by permission by Khronos.  ... 
doi:10.1007/978-3-319-17248-4_3 fatcat:wcdquz4gqffsrihtu3olf5nuty

Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks

Jaydeep Marathe, Frank Mueller
2007 IEEE Transactions on Parallel and Distributed Systems  
We provide tool support to extract these reference traces and synchronization information from OpenMP threads at run-time using dynamic binary rewriting of the application executable.  ...  Our quantitative results show that: (a) Cache coherence traffic can be simulated with a considerable degree of accuracy for SPMD programs, as the invalidation traffic closely matches corresponding hardware  ...  IRS can use MPI, OpenMP or a mixture of both for parallelization. We use the pure OpenMP version of IRS for our study.  ... 
doi:10.1109/tpds.2007.1058 fatcat:5fhv5hflfjamlfn4rzrtvobrmq

PBench: A Parallel, Real-Time Benchmark Suite

Sevil Serttaş, Veysel Harun Şahin
2018 Academic Perspective Procedia  
In this paper, we present the first version of PBench, a parallel, real-time benchmark suite.  ...  For this purpose, they use benchmark applications. Today many of our computing systems are multicore and/or multiprocessor systems.  ...  Acknowledgements The authors would like to acknowledge that this work is supported by the Real-Time Systems Research Laboratory [4] at Sakarya University, Faculty of Computer and Information Sciences  ... 
doi:10.33793/acperpro.01.01.37 fatcat:ioieemawana2fkaeqjdroxobbu

Recent Developments in the Scalasca Toolset [chapter]

Markus Geimer, Felix Wolf, Brian J. N. Wylie, Daniel Becker, David Böhme, Wolfgang Frings, Marc-André Hermanns, Bernd Mohr, Zoltán Szebenyi
2010 Tools for High Performance Computing 2009  
At the center of our activities lies the development of Scalasca, a performance-analysis tool that has been specifically designed for large-scale systems and that allows the automatic identification of  ...  The situation is exacerbated by the rising number of cores imposing scalability demands not only on applications but also on the software tools needed for their development.  ...  Using prototype implementations of these new tools, we evaluated the performance behavior of the SPEC MPI2007 benchmark suite on the IBM SP p690 cluster JUMP, observing a large variety of complex temporal  ... 
doi:10.1007/978-3-642-11261-4_4 dblp:conf/ptw/GeimerWWBBFHMS09 fatcat:wt2msn3zbravpoinn3inzlbbvm

Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems

Paweł Czarnul, Jerzy Proficz, Krzysztof Drypczewski
2020 Scientific Programming  
This paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid  ...  Such detailed analysis has led us to the identification of trends in high-performance computing and of the challenges to be addressed in the near future.  ...  Conflicts of Interest e authors declare that there are no conflicts of interest regarding the publication of this paper.  ... 
doi:10.1155/2020/4176794 fatcat:j52aegknyrdxzg2nopk73g3uly

An Architectural Characterization Study of Data Mining and Bioinformatics Workloads

Berkin Ozisikyilmaz, Ramanathan Narayanan, Joseph Zambreno, Gokhan Memik, Alok Choudhary
2006 2006 IEEE International Symposium on Workload Characterization  
Data mining is the process of automatically finding implicit, previously unknown, and potentially useful information from large volumes of data.  ...  In this paper, we present MineBench, a publicly available benchmark suite containing fifteen representative data mining applications belonging to various categories: classification, clustering, association  ...  A similar approach has been used to identify a representative workload of SPEC benchmarks [5] .  ... 
doi:10.1109/iiswc.2006.302730 dblp:conf/iiswc/OzisikyilmazNZMC06 fatcat:6j7r6fcsdnao7i67bh4p7qplma

A novel compiler support for automatic parallelization on multicore systems

José M. Andión, Manuel Arenaz, Gabriel Rodríguez, Juan Touriño
2013 Parallel Computing  
The widespread use of multicore processors is not a consequence of significant advances in parallel programming.  ...  This paper proposes a new method for converting a sequential application into a parallel counterpart that can be executed on current multicore processors.  ...  Section 4 details the behavior of our approach for the case studies of the benchmark suite. Section 5 presents the experimental results. Section 6 discusses related work.  ... 
doi:10.1016/j.parco.2013.04.003 fatcat:cbptzynydfdl3pm4rcnktfq5ji
« Previous Showing results 1 — 15 out of 162 results