Filters








9 Hits in 4.7 sec

UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable Shared-Memory Multiprocessors [chapter]

Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou, Constantine D. Polychronopoulos2, Jesús Labarta3, Eduard Ayguadé3
Languages, Compilers, and Run-Time Systems for Scalable Computers  
We present the design and implementation of UPMLIB, a runtime system that provides transparent facilities for dynamically tuning the memory performance of OpenMP programs on scalable shared-memory multiprocessors  ...  Our experimental evidence shows that UPMLIB makes OpenMP programs immune to the page placement strategy of the operating system, thus obviating the need for introducing data placement directives in OpenMP  ...  Conclusion This paper outlined the design and implementation of UPMLIB, a runtime system for tuning the page placement of OpenMP programs on scalable shared-memory multiprocessors, in which shared-memory  ... 
doi:10.1007/3-540-40889-4_7 dblp:conf/lcr/NikolopoulosPPLA00 fatcat:2vujezj5ircqlphs4nkbimldai

Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration [chapter]

Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou, Constantine D. Polychronopoulos, Jesús Labarta, Eduard Ayguadé
2000 Lecture Notes in Computer Science  
We have implemented a runtime system called UPMlib, which allows the compiler to inject into the application a smart user-level page migration engine.  ...  This paper describes transparent mechanisms for emulating some of the data distribution facilities offered by traditional data-parallel programming models, such as High Performance Fortran, in OpenMP.  ...  Dynamic page migration [14] is an operating system mechanism for tuning page placement on distributed shared memory multiprocessors, based on the observed memory reference traces of each program at runtime  ... 
doi:10.1007/3-540-39999-2_40 fatcat:e7tdiomq3nc6bgemkkuufaacfi

A Transparent Runtime Data Distribution Engine for OpenMP

Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou, Constantine D. Polychronopoulos, Jesús Labarta, Eduard Ayguadé
2000 Scientific Programming  
Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems.  ...  First, the paper investigates the performance implications of data placement in OpenMP programs running on modern NUMA multiprocessors.  ...  An overview of OpenMP The OpenMP application programming interface (API) [30] provides a directive-based paradigm for programming parallel applications on shared-memory multiprocessors.  ... 
doi:10.1155/2000/417570 fatcat:54ksufgvujh5tpxrxlfoigiclu

Is Data Distribution Necessary in OpenMP?

D.S. Nikolopoulos, T.S. Papatheodorou, C.D. Polychronopoulos, J. Labarta, E. Ayguade
2000 ACM/IEEE SC 2000 Conference (SC'00)  
Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems.  ...  This paper investigates the performance implications of data placement in OpenMP programs running on modern ccNUMA multiprocessors.  ...  Introduction The OpenMP application programming interface [1] provides a simple and flexible means for programming parallel applications on shared memory multiprocessors.  ... 
doi:10.1109/sc.2000.10025 dblp:conf/sc/NikolopoulosPPLA00 fatcat:rtv6tp4gofgitclqgk3dennrtm

Scaling irregular parallel codes with minimal programming effort

Dimitrios S. Nikolopoulos, Constantine D. Polychronopoulos, Eduard Ayguadé
2001 Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '01  
We present a simple runtime methodology for scaling irregular applications parallelized with the standard OpenMP interface.  ...  From a performance perspective, shared-memory models still fall short of scaling as well as message-passing models in irregular applications, although they require less coding effort.  ...  Acknowledgments We are grateful to the ECMWF and Siegfried Benkner for providing us with the irregular kernels.  ... 
doi:10.1145/582034.582050 dblp:conf/sc/NikolopoulosPA01 fatcat:iq75fa4my5bsjbfe5kmx4fq2te

On the Energy-Efficiency of Byte-Addressable Non-Volatile Memory

Hans Vandierendonck, Ahmad Hassan, Dimitrios S. Nikolopoulos
2015 IEEE computer architecture letters  
UPMlib: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Distributed Shared Memory Multiprocessors.  ...  A Scalable Runtime for FPGA-Based Heterogeneous Exascale Hardware. In: Proceedings of the Sixth International Workshop on Runtime and Operating Systems for Supercomputers (ROSS).  ... 
doi:10.1109/lca.2014.2355195 fatcat:35mkkiczcnd5thic5aqpwodiry

Quantifying and resolving remote memory access contention on hardware DSM multiprocessors

D. S. Nikolopoulos
2002 Proceedings 16th International Parallel and Distributed Processing Symposium  
This paper makes the following contributions: It proposes a new methodology for quantifying remote memory access contention on hardware DSM multiprocessors.  ...  A trace of the memory accesses of the program obtained at runtime from hardware counters is used to compute an accurate estimate of the fraction of execution time wasted due to contention.  ...  We applied our contention resolution algorithm to the benchmarks by linking them with UPMlib [13], a runtime system developed to optimize transparently memory access locality in OpenMP programs running  ... 
doi:10.1109/ipdps.2002.1015503 dblp:conf/ipps/Nikolopoulos02 fatcat:vrcgidsqwbbuhla4yppzc6e4ty

Runtime Adaptation for Autonomic Heterogeneous Computing

Thomas R.W. Scogland, Wu-Chun Feng
2014 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  
The focus of this dissertation is to lay the foundation for an autonomic system for heterogeneous computing, employing runtime adaptation to improve performance portability and performance consistency  ...  accelerator based systems as well as a synthesis of the two to address multiple levels of heterogeneity as a coherent whole.  ...  Each multiprocessor has a set of caches and shared memory, a local memory space that only threads run on that multiprocessor can access.  ... 
doi:10.1109/ccgrid.2014.23 dblp:conf/ccgrid/ScoglandF14 fatcat:eat26fkykvebnfeqpuuwhhbfbu

User-level dynamic page migration for multiprogrammed shared-memory multiprocessors

D.S. Nikolopoulos, T.S. Papatheodorou, C.D. Polychronopoulos, J. Labarta, E. Ayguade
Proceedings 2000 International Conference on Parallel Processing  
This paper presents algorithms for improving the performance of parallel programs on multiprogrammed sharedmemory NUMA multiprocessors, via the use of user-level dynamic page migration.  ...  The necessary page migrations can be performed as a response to scheduling events that break the implicit association between threads and their memory affinity sets.  ...  Acknowledgements This work was supported by the European Commission, through the TMR Contract ERBFMGECT-950062 and in  ... 
doi:10.1109/icpp.2000.876083 dblp:conf/icpp/NikolopoulosPPLA00 fatcat:ek2s5iyc4rfodcvxolf5jclwyu