Filters








7,734 Hits in 4.8 sec

Exploiting wavefront parallelism on large-scale shared-memory multiprocessors

N. Manjikian, T.S. Abdelrahman
2001 IEEE Transactions on Parallel and Distributed Systems  
In this paper, we show that on large-scale shared-memory multiprocessors, locality is a crucial factor.  ...  Static scheduling outperforms dynamic self-scheduling by a factor of up to 2.3 on 30 processors.  ...  Access to the HP/Convex SPP1000 multiprocessor was provided by the Center for Parallel Computing at the University of Michigan. The authors also thank the anonymous referees for their comments.  ... 
doi:10.1109/71.914756 fatcat:2ucmp4mx2vcuxmbyt4mx7ftsey

Using processor affinity in loop scheduling on shared-memory multiprocessors

E.P. Markatos, T.J. LeBlanc
1994 IEEE Transactions on Parallel and Distributed Systems  
We conclude that loop scheduling algorithms for shared-memory multiprocessors cannot a ord to ignore the location of data, particularly in light of the increasing disparity b e t ween processor and memory  ...  We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a signi cant performance penalty on modern shared-memory multiprocessors  ...  In many shared-memory multiprocessor systems, a single ready queue is the primary mechanism for process scheduling 30, 2 9 , 10, 3].  ... 
doi:10.1109/71.273046 fatcat:c6qwvhnjezh7fihxnhckix245y

Performance Implications of Synchronization Support for Parallel Fortran Programs

S. Anik, W.M.W. Hwu
1994 Journal of Parallel and Distributed Computing  
The needs are due to task scheduling, iteration scheduling, barriers, and data dependence handling. We present synchronization algorithms for e cient execution of programs with nested parallel loops.  ...  Lastly, w e ran experiments to quantify the impact of various architectural support on the performance of a bus-based shared memory multiprocessor running automatically parallelized numerical programs.  ...  The model is then used to explain the di erences in the iteration scheduling overhead of di erent synchronization primitives for a simulated shared-memory multiprocessor.  ... 
doi:10.1006/jpdc.1994.1081 fatcat:mtoqnucxfbb6xebywgokkuwr2u

The impact of parallel loop scheduling strategies on prefetching in a shared memory multiprocessor

D.J. Lilja
1994 IEEE Transactions on Parallel and Distributed Systems  
Trace-driven simulations of numerical Fortran programs are used to study the impact of the parallel loop scheduling strategy on data prefetching in a shared memory multiprocessor with private data caches  ...  The simulations indicate that to maximize memory performance it is important to schedule blocks of consecutive iterations to execute on each processor, and then to adaptively prefetch singleword cache  ...  Support for this work was provided in part by the National Science Foundation under grants CCR-9209458 and MIP-9221900.  ... 
doi:10.1109/71.285604 fatcat:xgdco7egunbwfltpeu7xyd4kqu

A hybrid scheme for efficiently executing nested loops on multiprocessors

Chien-Min Wang, Sheng-De Wang
1992 Parallel Computing  
Wang, A hybrid scheme for efficiently executing nested loops on multiprocessors, Parallel Computing i 8 ( ! 992) 625-637.  ...  In this paper, we address the problem of scheduling parallel processors for efficiently executing nested loops.  ...  Background The multiprocessor system considered in this paper is a shared memory multiprocessor system that contains p identical processors.  ... 
doi:10.1016/0167-8191(92)90003-p fatcat:oyy52sngfbckdf43su6mp37rui

FORTRAN for clusters of IBM ES/3090 multiprocessors

R. J. Sahulka, E. C. Plachy, L. J. Scarborough, R. G. Scarborough, S. W. White
1991 IBM Systems Journal  
IBM ES/3090 multiprocessors are tightly- coupled, shared-memory multiprocessor systems that support up to six processors and share a global memory; each of these processors may be 296 SAHULKA ET AL.  ...  that allow FORTRAN jobs to use all of the processors of the cluster.° These two IBM 3090 multiprocessors are distributed, since there is no shared memory between the two multiprocessors.  ... 
doi:10.1147/sj.303.0296 fatcat:mudoyuljenc3boqmzdwealzry4

Control-theoretic adaptive cache-fair scheduling of chip multiprocessor systems

Huseyin G Arslan, Yu-Chu Tian, Fenglian Li, Chen Peng, Min-Rui Fei
2017 Transactions of the Institute of Measurement and Control  
Control-theoretic adaptive cache-fair scheduling of chip multiprocessor systems. Transactions of the Institute of Measurement and Control, 40(10), pp. 3095-3104.  ...  As multiprocessors in a CMP system share the limited memory, memory awareness becomes critical in achieving satisfactory scheduling performance of the system.  ...  Recently, with the proliferation of CMP architecture, multiprocessor scheduling faces new challenges in scheduling resources with share memory access constraints (Schliecker et al. 2009) .  ... 
doi:10.1177/0142331217715064 fatcat:deydkzyqivhklfddswjd6dwh3u

Evaluation of design alternatives for a multiprocessor microprocessor

Basem A. Nayfeh, Lance Hammond, Kunle Olukotun
1996 SIGARCH Computer Architecture News  
In the future, advanced integrated circuit processing and packaging tecbnolog y will allow for several design options for multiprocessor microprocessors.  ...  In this paper we consider three architectures: shared-primary cache, shared-secondary cache, and shared-memory.  ...  The second set of results include the effects of dynamic scheduling, speculative execution and nonblocking memory references.  ... 
doi:10.1145/232974.232982 fatcat:xbb7j5le5jcifhoa74rw44ive4

Evaluation of design alternatives for a multiprocessor microprocessor

Basem A. Nayfeh, Lance Hammond, Kunle Olukotun
1996 Proceedings of the 23rd annual international symposium on Computer architecture - ISCA '96  
In the future, advanced integrated circuit processing and packaging tecbnolog y will allow for several design options for multiprocessor microprocessors.  ...  In this paper we consider three architectures: shared-primary cache, shared-secondary cache, and shared-memory.  ...  The second set of results include the effects of dynamic scheduling, speculative execution and nonblocking memory references.  ... 
doi:10.1145/232973.232982 dblp:conf/isca/NayfehHO96 fatcat:qbw6ybk4ljcapa66vs32osa73e

A compiler framework for optimization of affine loop nests for gpgpus

Muthu Manikandan Baskaran, Uday Bondhugula, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan
2008 Proceedings of the 22nd annual international conference on Supercomputing - ICS '08  
factors for conflict-minimal data access from GPU shared memory; and 3) model-driven empirical search to determine optimal parameters for unrolling and tiling.  ...  In this paper, a number of issues are addressed towards the goal of developing a compiler framework for automatic parallelization and performance optimization of affine loop nests on GPGPUs: 1) approach  ...  The shared memory and the register bank in a multiprocessor are dynamically partitioned among the active thread blocks on that multiprocessor.  ... 
doi:10.1145/1375527.1375562 dblp:conf/ics/BaskaranBKRRS08 fatcat:x6rdnmlkvzaw7jfcet3pxzsewi

Exploiting the parallelism available in loops

D.J. Lilja
1994 Computer  
In addition, several techniques for scheduling independent loop iterations on a shared memory multiprocessor are described.  ...  Loops can provide a rich opportunity for exploiting parallelism since the body of a loop may be executed many times.  ...  In the coarse-grained shared memory multiprocessors, loops with dependences between p = number of processors in a shared memory multiprocessor T L = sequential execution time of a single iteration of  ... 
doi:10.1109/2.261915 fatcat:a5eirnpuobhqpexqua5243fzqi

"No-Compile-Time Knowledge" distribution of finite element computations on multiprocessors

J. Erhel, M. Hahad, T. Priol
1996 Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences  
This paper addresses partitioning and scheduling of irregular loops arising in finite element computations on unstructured meshes.  ...  We introduce the concept of ")conditioned Iterations Loop" which distributes the iterations dynamically according to a runtime condition. This technique is improved by a learning approach.  ...  Irregular loop patterns compilation on distributed shared memory multiprocessors. In International Conference on Parallel Processing, Occonomowoc, Wisconsin, August 1995.  ... 
doi:10.1109/hicss.1996.495512 dblp:conf/hicss/ErhelHP96 fatcat:r4szgys2unbz5er2t6mkgyz3ia

Enhancing the performance of autoscheduling in Distributed Shared Memory multiprocessors [chapter]

Dimitrios S. Nikolopoulos, Eleftherios D. Polychronopoulos, Theodore S. Papatheodorou
1998 Lecture Notes in Computer Science  
scheduling of parallel tasks, and dynamic program adaptability on multiprogrammed shared memory multiprocessors.  ...  This paper presents a technique that enhances the performance of autoscheduling in Distributed Shared Memory (DSM) multiprocessors, targetting mainly at medium and large scale systems, where poor data  ...  Acknowledgements We would like to thank Constantine Polychronopoulos for his support and valuable comments, the European Center for Parallelism in Barcelona (CEPBA) for providing us access to their Origin2000  ... 
doi:10.1007/bfb0057892 fatcat:sdudtgqxeje3jhwt56fxhzipwq

OpenMP

Mitsuhisa Sato
2002 Proceedings of the 15th international symposium on System Synthesis - ISSS '02  
latency communication and to support flexible thread scheduling for nested dynamic parallelism.  ...  The programmer may add parallelization directives to loops or statements in the program. OpenMP is currently used for high performance computing applications running on shared memory multiprocessors.  ... 
doi:10.1145/581199.581224 fatcat:eq3h2gerc5dbthk3wvmxr5u2la

A Scratch-Pad Memory Aware Dynamic Loop Scheduling Algorithm

Ozcan Ozturk, Mahmut Kandemir, Sri Hari Krishna Narayanan
2008 9th International Symposium on Quality Electronic Design (isqed 2008)  
In comparison, this paper proposes the first dynamic loop scheduler, to our knowledge, that targets scratch-pad memory (SPM) based chip multiprocessors, and presents an experimental evaluation of it.  ...  Therefore, the proposed dynamic scheduler takes advantage of the SPM in performing the loop iteration-to-processor mapping.  ...  In this paper, we present and experimentally evaluate an SPM aware dynamic loop scheduling scheme in the context of chip multiprocessors.  ... 
doi:10.1109/isqed.2008.4479830 dblp:conf/isqed/OzturkKN08 fatcat:txddxmr47nffzmjks64jnwawba
« Previous Showing results 1 — 15 out of 7,734 results