A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Execution of compute-intensive applications into parallel machines
1997
Information Sciences
Scheduling and load balancing of applications on distributed or shared memory machine architectures can be executed by optimizing algorithms in various levels of the architecture. ...
The approach to scheduling and load balancing ranges from very specialized and directly dependent on the application, in the application layer, to a more general approach taken by the operating system ...
Early Work The rst scheduling and load balancing policies in shared-memory multiprocessors were based on their uniprocessor predecessors. ...
doi:10.1016/s0020-0255(96)00174-0
fatcat:pszidonmirajdpcon4lsmucxxu
Using processor affinity in loop scheduling on shared-memory multiprocessors
1994
IEEE Transactions on Parallel and Distributed Systems
In this paper we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to non-local data. ...
We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a signi cant performance penalty on modern shared-memory multiprocessors ...
In many shared-memory multiprocessor systems, a single ready queue is the primary mechanism for process scheduling 30, 2 9 , 10, 3]. ...
doi:10.1109/71.273046
fatcat:c6qwvhnjezh7fihxnhckix245y
Synchronization, coherence, and event ordering in multiprocessors
1988
Computer
The instruction set of a multiprocessor usually contains basic instructions that are used to implement synchronization and communication between cooperating processes. ...
The notions of synchronization and communication are difficult to separate because communication ...
All multiprocessors include hardware mechanisms to enforce atomic operations. The most primitive memory operations in a machine are Loads and Stores. ...
doi:10.1109/2.15
fatcat:yflu46ikqjbbdh4tdgalpc5wmm
Multithreading with distributed functional units
1997
IEEE transactions on computers
Detailed simulations of Concurro processors indicate that instruction throughputs for programs accessing main memory directly can be scaled, without recompilation, from one to over eight instructions per ...
With suitable prefetching, multiple instruction caches can be avoided, and multithreading is shown to obviate the need for sophisticated instruction dispatch mechanisms on parallel workloads. ...
Multiprocessor data was obtained with a simulator derived from the Concurro simulator, and thus every multiprocessor shares with Concurro a common instruction set and almost identical subsystem architectures ...
doi:10.1109/12.588034
fatcat:bb67gixdrvgmjdeaxnnyjyhb6a
Dynamic node reconfiguration in a parallel-distributed environment
1991
SIGPLAN notices
But the set of machines free to participate in load sharing changes over time as users come and go from their workstations. ...
This paper describes a node reconfiguration facility for Amber, an object-based parallel programming system for networks of multiprocessors. ...
Acknowledgments We would like to thank Rik Littlefield, Cathy McCann, Simon Koeman, Tom Anderson, Kathy Faust, and Ed Lazowska for discussing with us the issues raised in this paper. ...
doi:10.1145/109626.109638
fatcat:qcbzdnbebvcapf5ghql5ki4wze
An incremental benchmark suite for performance tuning of parallel discrete event simulation
1996
Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences
Such benchmarks should allow the designer of simulation kernels to: (i) evaluate how efficiently the simulation kernel runs on specific architectures: and (ii) evaluate how simulation problems scale on ...
To evaluate parallel simulation environments there is a need for a common benchmark suite. ...
This facilitates the relocation of LPs for load balancing purposes. The additional cost for using shared memory compared to private memory was found to be negligible. ...
doi:10.1109/hicss.1996.495484
dblp:conf/hicss/RonngrenBA96
fatcat:ogf3sbpcgzc6tmho7xbjpfkiay
High-Performance Buffer Mapping to Exploit DRAM Concurrency in Multiprocessor DSP Systems
2009
2009 IEEE/IFIP International Symposium on Rapid System Prototyping
In this paper, to help alleviate the memory wall problem, we propose a novel, highperformance buffer mapping policy for SDF-represented DSP applications on multiprocessor systems that support the shared-memory ...
Design methodologies and tools based on the synchronous dataflow (SDF) model of computation have proven useful for rapid prototyping and implementation of digital signal processing (DSP) applications on ...
Bambha of the US Army Research Laboratory for providing his scheduling simulator. This research was supported in part by grant number 0325119 from the U.S. National Science Foundation. ...
doi:10.1109/rsp.2009.34
dblp:conf/rsp/LeeBW09
fatcat:ssa7ubxw2vaihlgw6pfhd2b46m
Load Balancing and Data Locality in Adaptive Hierarchical N-Body Methods: Barnes-Hut, Fast Multipole, and Radiosity
1995
Journal of Parallel and Distributed Computing
We find that straightforward decomposition techniques which an automatic scheduler might implement do not scale well, because they are unable to simultaneously provide load balancing and data locality. ...
speedups on a 128-processor simulated architecture. ...
Acknowledgements We would like to thank Joshua Barnes for providing us with the sequential Barnes-Hut program. ...
doi:10.1006/jpdc.1995.1077
fatcat:xc6rf5l73rchndyvjx6sct7k3i
Adaptive placement of parallel Java agents in a scalable computing cluster
1998
Concurrency Practice and Experience
This is accomplished by agent migrations among the nodes using on-line algorithms for load leveling and reduction of the inter agent communication overhead. ...
This paper describes a framework for parallel computing in a locally con ned, scalable computing cluster SCC using Java agents. ...
parallel programming in distributed-memory multicomputers, is to use Distributed Shared Memory DSM in order to provide the shared-memory illusion for threads 9 . ...
doi:10.1002/(sici)1096-9128(199809/11)10:11/13<971::aid-cpe395>3.0.co;2-e
fatcat:btoyl3i6s5dfne42sdvobnkbie
Timepatch
1995
Performance Evaluation Review
We present a new technique for the parallel simulation of cache coherent shared memory multiprocessors. ...
Our technique is based on the fact that the functional correctness of the simulation can be decoupled from its timing correctness. ...
We w ould also like to thank the members of our weekly arch beer" meetings for constructive feedback. ...
doi:10.1145/223586.223628
fatcat:offuyo4jbjgnhd2fr2rzz3r2mu
We present a new technique for the parallel simulation of cache coherent shared memory multiprocessors. ...
Our technique is based on the fact that the functional correctness of the simulation can be decoupled from its timing correctness. ...
We w ould also like to thank the members of our weekly arch beer" meetings for constructive feedback. ...
doi:10.1145/223587.223628
dblp:conf/sigmetrics/ShahRF95
fatcat:mlkeiezjwzgh7hakumlketbcz4
Dynamic node reconfiguration in a parallel-distributed environment
1991
Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '91
But the set of machines free to participate in load sharing changes over time as users come and go from their workstations. ...
This paper describes a node reconfiguration facility for Amber, an object-based parallel programming system for networks of multiprocessors. ...
Acknowledgments We would like to thank Rik Littlefield, Cathy McCann, Simon Koeman, Tom Anderson, Kathy Faust, and Ed Lazowska for discussing with us the issues raised in this paper. ...
doi:10.1145/109625.109638
dblp:conf/ppopp/FeeleyBCL91
fatcat:fpkjr7zudjd7lljtn3cbkrh374
The power of SIMDs vs. MIMDs in real-time scheduling
2002
Proceedings 16th International Parallel and Distributed Processing Symposium
In this paper, we compare SIMDs and MIMDs in real-time scheduling, e.g., scheduling for air traffic control. ...
SIMDs and MIMDs are the most important categories of computer systems for parallel computing in Flynn's classification scheme. ...
Since all data are stored in the local memory of individual PEs, data is loaded into the PEs in parallel for computations. A data item can be broadcast to one or more PEs in one step. ...
doi:10.1109/ipdps.2002.1016671
dblp:conf/ipps/JinBM02
fatcat:zbx222oicveuza4r3gftnfcinq
Concurrent Data Structures
[chapter]
2004
Handbook of Data Structures and Applications
Shared-memory multiprocessors are systems that concurrently execute multiple threads of computation which communicate and synchronize through data structures in shared memory. ...
Transactional support for multiprocessor synchronization was originally suggested by Herlihy and Moss, who also proposed a hardware-based transactional memory mechanism for supporting it [56] . ...
1-9 multiprogramming, 1-3 mutex, 1-2 mutual exclusion, 1-2 non-uniform memory access, 1-4 nonblocking memory reclamation, 1-18 nonblocking progress conditions, 1-5 nonblocking synchronization, 1-3 NUMA ...
doi:10.1201/9781420035179.ch47
fatcat:b6onln3r3fb2ff3sitg64nilhi
FastLanes: An FPGA accelerated GPU microarchitecture simulator
2013
2013 IEEE 31st International Conference on Computer Design (ICCD)
A corresponding context shifting mechanism is proposed to store execution states of threads from FPGA to external on-board memory, and vice versa. ...
Such a mechanism makes it possible to simulate hundreds of GPU cores on a single FPGA evaluation board. ...
In addition, the scratchpad memory (i.e., shared memory in NVIDIA's terminology) also need to be preserved. ...
doi:10.1109/iccd.2013.6657049
dblp:conf/iccd/FangNHLMD13
fatcat:li4p6f2ebrfmlcx7m7u3sb7dc4
« Previous
Showing results 1 — 15 out of 1,603 results