A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks
[chapter]
2001
Lecture Notes in Computer Science
This paper evaluates the effectiveness of using this runtime data distribution method in non embarrassingly parallel codes, such as the SPEC benchmarks. ...
The speedups are close to the theoretical maximum speedups for the problem sizes used and they are obtained with a minimal programming effort of about a couple of hours per benchmark. ...
Background UPMlib uses dynamic page migration as a tool for implicit data distribution. ...
doi:10.1007/3-540-44587-0_11
fatcat:u243gnubyfcqlce2c35fik42oe
Large System Performance of SPEC OMP2001 Benchmarks
[chapter]
2002
Lecture Notes in Computer Science
SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems. ...
The ongoing development of the SPEC OMP2001 benchmark suites is also discussed. Its main feature is the increased data set for large-scale systems. ...
Acknowledgement The authors would like to thank all of those who developed the application programs and data sets used in the benchmark. ...
doi:10.1007/3-540-47847-7_34
fatcat:cxiv3sm55fahvoibul4dz5jbhm
Programming Distributed Memory Sytems Using OpenMP
2007
2007 IEEE International Parallel and Distributed Processing Symposium
Second, we introduce a direct translation of standard OpenMP into MPI message-passing programs for execution on distributed memory systems. ...
We present a compiler algorithm to detect such repetitive data references and an API to an underlying software distributed shared memory system to orchestrate the learning and proactive reuse of communication ...
We used five Fortran programs: WUPWISE, SWIM, and APPLU from the SPEC OMP benchmarks and CG from the NAS OpenMP benchmarks and SpMul. ...
doi:10.1109/ipdps.2007.370397
dblp:conf/ipps/BasumallikME07
fatcat:cdpbjy7ghndcxa6kh5zlryc6q4
Towards automatic translation of OpenMP to MPI
2005
Proceedings of the 19th annual international conference on Supercomputing - ICS '05
A comparison with High Performance Fortran (HPF) versions of two NAS benchmarks indicates that our translated OpenMP versions achieve 12% to 89% better performance than the HPF versions. ...
This translation aims to extend the ease of creating parallel applications with OpenMP to a wider variety of platforms, such as commodity cluster systems. ...
Our experiments used the full ref data sets (up to 2 GB) of the SPEC OMPM2001 benchmarks. ...
doi:10.1145/1088149.1088174
dblp:conf/ics/BasumallikE05
fatcat:nh3haoritfayjhkpbjvsd5eaay
OpenMP compiler for distributed memory architectures
2010
Science China Information Sciences
Skeleton method [10] is used in LLCoMP to translate extended OpenMP to MPI. Because the skeleton is difficult for compiling optimization, it has no effect on discontinuous data accesses. ...
While OpenMP has advantages on its ease of use and incremental programming, message passing is today still the most widely-used programming model for distributed memory architectures. ...
Ayon Basumallik from Purdue University for the discussion of Cluster OpenMP. We thank the members from the HPC lab of University of Science and Technology Beijing. ...
doi:10.1007/s11432-010-0074-0
fatcat:xvnhb6emcrcmfdudqcppexsua4
An Optimized Reduction Design to Minimize Atomic Operations in Shared Memory Multiprocessors
2011
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
We report a speedup of 59.64% on the 312.swim m SPEC OMP2001 benchmark and a speedup of 24.89% on the streamcluster benchmark from the PARSEC suite over the GCC libgomp baseline. 1 libgomp is the OpenMP ...
Reduction operations play a key role in modern massively data parallel computation. ...
[11] show the bottlenecks for the SPEC OMP2001 benchmarks. ...
doi:10.1109/ipdps.2011.271
dblp:conf/ipps/SpezialeBA11
fatcat:5h3qqgkyfbfzlbzndjr7ny263a
Accomodating Diversity in CMPs with Heterogeneous Frequencies
[chapter]
2009
Lecture Notes in Computer Science
For the NAS and SPEC OpenMP benchmarks, only the partitioning of loop iterations was changed to be set at run time. ...
All the SPEC-OMP benchmarks show excellent scaling, with about a third of the total instructions for each core, and only a 1% variation. The NAS benchmarks show more variation for two benchmarks. ...
doi:10.1007/978-3-540-92990-1_19
fatcat:ptrr36gzczfd5dmsbkcwddsgtu
Quantitative Performance Analysis of the SPEC OMPM2001 Benchmarks
2003
Scientific Programming
The Standard Performance Evaluation Corporation (SPEC) has created a suite of parallel programs called SPEC OMP to compare and evaluate modern shared-memory multiprocessor systems using the OpenMP standard ...
We have studied these benchmarks in detail to understand their performance on a modern architecture. In this paper, we present detailed measurements of the benchmarks. ...
The suite contains SPEC OMPM2001 (a medium, 2GB data set) and SPEC OMPL2001 (a large, 7GB dataset). The data set sizes define the maximum memory requirements for a single-processor run. ...
doi:10.1155/2003/401032
fatcat:e3pkx7ni2jdndfetlkz7m2zp3i
SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance
[chapter]
2015
Lecture Notes in Computer Science
The SPEC High Performance Group (HPG) has developed a set of performance metrics to evaluate the performance and power consumption of accelerators for various science applications. ...
The new benchmark comprises two suites of applications written in OpenCL and OpenACC and measures the performance of accelerators with respect to a reference platform. ...
AMD is a trademarks of Advanced Micro Devices, Inc. OpenCL is a trademark of Apple, Inc. used by permission by Khronos. ...
doi:10.1007/978-3-319-17248-4_3
fatcat:wcdquz4gqffsrihtu3olf5nuty
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
2007
IEEE Transactions on Parallel and Distributed Systems
We provide tool support to extract these reference traces and synchronization information from OpenMP threads at run-time using dynamic binary rewriting of the application executable. ...
Our quantitative results show that: (a) Cache coherence traffic can be simulated with a considerable degree of accuracy for SPMD programs, as the invalidation traffic closely matches corresponding hardware ...
IRS can use MPI, OpenMP or a mixture of both for parallelization. We use the pure OpenMP version of IRS for our study. ...
doi:10.1109/tpds.2007.1058
fatcat:5fhv5hflfjamlfn4rzrtvobrmq
PBench: A Parallel, Real-Time Benchmark Suite
2018
Academic Perspective Procedia
In this paper, we present the first version of PBench, a parallel, real-time benchmark suite. ...
For this purpose, they use benchmark applications. Today many of our computing systems are multicore and/or multiprocessor systems. ...
Acknowledgements The authors would like to acknowledge that this work is supported by the Real-Time Systems Research Laboratory [4] at Sakarya University, Faculty of Computer and Information Sciences ...
doi:10.33793/acperpro.01.01.37
fatcat:ioieemawana2fkaeqjdroxobbu
Recent Developments in the Scalasca Toolset
[chapter]
2010
Tools for High Performance Computing 2009
At the center of our activities lies the development of Scalasca, a performance-analysis tool that has been specifically designed for large-scale systems and that allows the automatic identification of ...
The situation is exacerbated by the rising number of cores imposing scalability demands not only on applications but also on the software tools needed for their development. ...
Using prototype implementations of these new tools, we evaluated the performance behavior of the SPEC MPI2007 benchmark suite on the IBM SP p690 cluster JUMP, observing a large variety of complex temporal ...
doi:10.1007/978-3-642-11261-4_4
dblp:conf/ptw/GeimerWWBBFHMS09
fatcat:wt2msn3zbravpoinn3inzlbbvm
Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems
2020
Scientific Programming
This paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid ...
Such detailed analysis has led us to the identification of trends in high-performance computing and of the challenges to be addressed in the near future. ...
Conflicts of Interest e authors declare that there are no conflicts of interest regarding the publication of this paper. ...
doi:10.1155/2020/4176794
fatcat:j52aegknyrdxzg2nopk73g3uly
An Architectural Characterization Study of Data Mining and Bioinformatics Workloads
2006
2006 IEEE International Symposium on Workload Characterization
Data mining is the process of automatically finding implicit, previously unknown, and potentially useful information from large volumes of data. ...
In this paper, we present MineBench, a publicly available benchmark suite containing fifteen representative data mining applications belonging to various categories: classification, clustering, association ...
A similar approach has been used to identify a representative workload of SPEC benchmarks [5] . ...
doi:10.1109/iiswc.2006.302730
dblp:conf/iiswc/OzisikyilmazNZMC06
fatcat:6j7r6fcsdnao7i67bh4p7qplma
A novel compiler support for automatic parallelization on multicore systems
2013
Parallel Computing
The widespread use of multicore processors is not a consequence of significant advances in parallel programming. ...
This paper proposes a new method for converting a sequential application into a parallel counterpart that can be executed on current multicore processors. ...
Section 4 details the behavior of our approach for the case studies of the benchmark suite. Section 5 presents the experimental results. Section 6 discusses related work. ...
doi:10.1016/j.parco.2013.04.003
fatcat:cbptzynydfdl3pm4rcnktfq5ji
« Previous
Showing results 1 — 15 out of 162 results