A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit the original URL.
The file type is application/pdf
.
Filters
Exploiting wavefront parallelism on large-scale shared-memory multiprocessors
2001
IEEE Transactions on Parallel and Distributed Systems
In this paper, we show that on large-scale shared-memory multiprocessors, locality is a crucial factor. ...
Static scheduling outperforms dynamic self-scheduling by a factor of up to 2.3 on 30 processors. ...
Access to the HP/Convex SPP1000 multiprocessor was provided by the Center for Parallel Computing at the University of Michigan. The authors also thank the anonymous referees for their comments. ...
doi:10.1109/71.914756
fatcat:2ucmp4mx2vcuxmbyt4mx7ftsey
Using processor affinity in loop scheduling on shared-memory multiprocessors
1994
IEEE Transactions on Parallel and Distributed Systems
We conclude that loop scheduling algorithms for shared-memory multiprocessors cannot a ord to ignore the location of data, particularly in light of the increasing disparity b e t ween processor and memory ...
We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a signi cant performance penalty on modern shared-memory multiprocessors ...
In many shared-memory multiprocessor systems, a single ready queue is the primary mechanism for process scheduling 30, 2 9 , 10, 3]. ...
doi:10.1109/71.273046
fatcat:c6qwvhnjezh7fihxnhckix245y
Performance Implications of Synchronization Support for Parallel Fortran Programs
1994
Journal of Parallel and Distributed Computing
The needs are due to task scheduling, iteration scheduling, barriers, and data dependence handling. We present synchronization algorithms for e cient execution of programs with nested parallel loops. ...
Lastly, w e ran experiments to quantify the impact of various architectural support on the performance of a bus-based shared memory multiprocessor running automatically parallelized numerical programs. ...
The model is then used to explain the di erences in the iteration scheduling overhead of di erent synchronization primitives for a simulated shared-memory multiprocessor. ...
doi:10.1006/jpdc.1994.1081
fatcat:mtoqnucxfbb6xebywgokkuwr2u
The impact of parallel loop scheduling strategies on prefetching in a shared memory multiprocessor
1994
IEEE Transactions on Parallel and Distributed Systems
Trace-driven simulations of numerical Fortran programs are used to study the impact of the parallel loop scheduling strategy on data prefetching in a shared memory multiprocessor with private data caches ...
The simulations indicate that to maximize memory performance it is important to schedule blocks of consecutive iterations to execute on each processor, and then to adaptively prefetch singleword cache ...
Support for this work was provided in part by the National Science Foundation under grants CCR-9209458 and MIP-9221900. ...
doi:10.1109/71.285604
fatcat:xgdco7egunbwfltpeu7xyd4kqu
A hybrid scheme for efficiently executing nested loops on multiprocessors
1992
Parallel Computing
Wang, A hybrid scheme for efficiently executing nested loops on multiprocessors, Parallel Computing i 8 ( ! 992) 625-637. ...
In this paper, we address the problem of scheduling parallel processors for efficiently executing nested loops. ...
Background The multiprocessor system considered in this paper is a shared memory multiprocessor system that contains p identical processors. ...
doi:10.1016/0167-8191(92)90003-p
fatcat:oyy52sngfbckdf43su6mp37rui
FORTRAN for clusters of IBM ES/3090 multiprocessors
1991
IBM Systems Journal
IBM ES/3090 multiprocessors are tightly- coupled, shared-memory multiprocessor systems that support up to six processors and share a global memory; each of these processors may be
296 SAHULKA ET AL. ...
that allow FORTRAN jobs to use all of the processors of the cluster.° These two IBM 3090 multiprocessors are distributed, since there is no shared memory between the two multiprocessors. ...
doi:10.1147/sj.303.0296
fatcat:mudoyuljenc3boqmzdwealzry4
Control-theoretic adaptive cache-fair scheduling of chip multiprocessor systems
2017
Transactions of the Institute of Measurement and Control
Control-theoretic adaptive cache-fair scheduling of chip multiprocessor systems. Transactions of the Institute of Measurement and Control, 40(10), pp. 3095-3104. ...
As multiprocessors in a CMP system share the limited memory, memory awareness becomes critical in achieving satisfactory scheduling performance of the system. ...
Recently, with the proliferation of CMP architecture, multiprocessor scheduling faces new challenges in scheduling resources with share memory access constraints (Schliecker et al. 2009) . ...
doi:10.1177/0142331217715064
fatcat:deydkzyqivhklfddswjd6dwh3u
Evaluation of design alternatives for a multiprocessor microprocessor
1996
SIGARCH Computer Architecture News
In the future, advanced integrated circuit processing and packaging tecbnolog y will allow for several design options for multiprocessor microprocessors. ...
In this paper we consider three architectures: shared-primary cache, shared-secondary cache, and shared-memory. ...
The second set of results include the effects of dynamic scheduling, speculative execution and nonblocking memory references. ...
doi:10.1145/232974.232982
fatcat:xbb7j5le5jcifhoa74rw44ive4
Evaluation of design alternatives for a multiprocessor microprocessor
1996
Proceedings of the 23rd annual international symposium on Computer architecture - ISCA '96
In the future, advanced integrated circuit processing and packaging tecbnolog y will allow for several design options for multiprocessor microprocessors. ...
In this paper we consider three architectures: shared-primary cache, shared-secondary cache, and shared-memory. ...
The second set of results include the effects of dynamic scheduling, speculative execution and nonblocking memory references. ...
doi:10.1145/232973.232982
dblp:conf/isca/NayfehHO96
fatcat:qbw6ybk4ljcapa66vs32osa73e
A compiler framework for optimization of affine loop nests for gpgpus
2008
Proceedings of the 22nd annual international conference on Supercomputing - ICS '08
factors for conflict-minimal data access from GPU shared memory; and 3) model-driven empirical search to determine optimal parameters for unrolling and tiling. ...
In this paper, a number of issues are addressed towards the goal of developing a compiler framework for automatic parallelization and performance optimization of affine loop nests on GPGPUs: 1) approach ...
The shared memory and the register bank in a multiprocessor are dynamically partitioned among the active thread blocks on that multiprocessor. ...
doi:10.1145/1375527.1375562
dblp:conf/ics/BaskaranBKRRS08
fatcat:x6rdnmlkvzaw7jfcet3pxzsewi
Exploiting the parallelism available in loops
1994
Computer
In addition, several techniques for scheduling independent loop iterations on a shared memory multiprocessor are described. ...
Loops can provide a rich opportunity for exploiting parallelism since the body of a loop may be executed many times. ...
In the coarse-grained shared memory multiprocessors, loops with dependences between
p = number of processors in a shared memory multiprocessor T L = sequential execution time of a single iteration of ...
doi:10.1109/2.261915
fatcat:a5eirnpuobhqpexqua5243fzqi
"No-Compile-Time Knowledge" distribution of finite element computations on multiprocessors
1996
Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences
This paper addresses partitioning and scheduling of irregular loops arising in finite element computations on unstructured meshes. ...
We introduce the concept of ")conditioned Iterations Loop" which distributes the iterations dynamically according to a runtime condition. This technique is improved by a learning approach. ...
Irregular loop patterns compilation on distributed shared memory multiprocessors. In International Conference on Parallel Processing, Occonomowoc, Wisconsin, August 1995. ...
doi:10.1109/hicss.1996.495512
dblp:conf/hicss/ErhelHP96
fatcat:r4szgys2unbz5er2t6mkgyz3ia
Enhancing the performance of autoscheduling in Distributed Shared Memory multiprocessors
[chapter]
1998
Lecture Notes in Computer Science
scheduling of parallel tasks, and dynamic program adaptability on multiprogrammed shared memory multiprocessors. ...
This paper presents a technique that enhances the performance of autoscheduling in Distributed Shared Memory (DSM) multiprocessors, targetting mainly at medium and large scale systems, where poor data ...
Acknowledgements We would like to thank Constantine Polychronopoulos for his support and valuable comments, the European Center for Parallelism in Barcelona (CEPBA) for providing us access to their Origin2000 ...
doi:10.1007/bfb0057892
fatcat:sdudtgqxeje3jhwt56fxhzipwq
OpenMP
2002
Proceedings of the 15th international symposium on System Synthesis - ISSS '02
latency communication and to support flexible thread scheduling for nested dynamic parallelism. ...
The programmer may add parallelization directives to loops or statements in the program. OpenMP is currently used for high performance computing applications running on shared memory multiprocessors. ...
doi:10.1145/581199.581224
fatcat:eq3h2gerc5dbthk3wvmxr5u2la
A Scratch-Pad Memory Aware Dynamic Loop Scheduling Algorithm
2008
9th International Symposium on Quality Electronic Design (isqed 2008)
In comparison, this paper proposes the first dynamic loop scheduler, to our knowledge, that targets scratch-pad memory (SPM) based chip multiprocessors, and presents an experimental evaluation of it. ...
Therefore, the proposed dynamic scheduler takes advantage of the SPM in performing the loop iteration-to-processor mapping. ...
In this paper, we present and experimentally evaluate an SPM aware dynamic loop scheduling scheme in the context of chip multiprocessors. ...
doi:10.1109/isqed.2008.4479830
dblp:conf/isqed/OzturkKN08
fatcat:txddxmr47nffzmjks64jnwawba
« Previous
Showing results 1 — 15 out of 7,734 results