Filters








1,603 Hits in 5.6 sec

Execution of compute-intensive applications into parallel machines

Catherine Houstis, Sarantos Kapidakis, Evangelos P. Markatos, Erol Gelenbe
1997 Information Sciences  
Scheduling and load balancing of applications on distributed or shared memory machine architectures can be executed by optimizing algorithms in various levels of the architecture.  ...  The approach to scheduling and load balancing ranges from very specialized and directly dependent on the application, in the application layer, to a more general approach taken by the operating system  ...  Early Work The rst scheduling and load balancing policies in shared-memory multiprocessors were based on their uniprocessor predecessors.  ... 
doi:10.1016/s0020-0255(96)00174-0 fatcat:pszidonmirajdpcon4lsmucxxu

Using processor affinity in loop scheduling on shared-memory multiprocessors

E.P. Markatos, T.J. LeBlanc
1994 IEEE Transactions on Parallel and Distributed Systems  
In this paper we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to non-local data.  ...  We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a signi cant performance penalty on modern shared-memory multiprocessors  ...  In many shared-memory multiprocessor systems, a single ready queue is the primary mechanism for process scheduling 30, 2 9 , 10, 3].  ... 
doi:10.1109/71.273046 fatcat:c6qwvhnjezh7fihxnhckix245y

Synchronization, coherence, and event ordering in multiprocessors

M. Dubois, C. Scheurich, F.A. Briggs
1988 Computer  
The instruction set of a multiprocessor usually contains basic instructions that are used to implement synchronization and communication between cooperating processes.  ...  The notions of synchronization and communication are difficult to separate because communication  ...  All multiprocessors include hardware mechanisms to enforce atomic operations. The most primitive memory operations in a machine are Loads and Stores.  ... 
doi:10.1109/2.15 fatcat:yflu46ikqjbbdh4tdgalpc5wmm

Multithreading with distributed functional units

B.K. Gunther
1997 IEEE transactions on computers  
Detailed simulations of Concurro processors indicate that instruction throughputs for programs accessing main memory directly can be scaled, without recompilation, from one to over eight instructions per  ...  With suitable prefetching, multiple instruction caches can be avoided, and multithreading is shown to obviate the need for sophisticated instruction dispatch mechanisms on parallel workloads.  ...  Multiprocessor data was obtained with a simulator derived from the Concurro simulator, and thus every multiprocessor shares with Concurro a common instruction set and almost identical subsystem architectures  ... 
doi:10.1109/12.588034 fatcat:bb67gixdrvgmjdeaxnnyjyhb6a

Dynamic node reconfiguration in a parallel-distributed environment

Michael J. Feeley, Brian N. Bershad, Jeffrey S. Chase, Henry M. Levy
1991 SIGPLAN notices  
But the set of machines free to participate in load sharing changes over time as users come and go from their workstations.  ...  This paper describes a node reconfiguration facility for Amber, an object-based parallel programming system for networks of multiprocessors.  ...  Acknowledgments We would like to thank Rik Littlefield, Cathy McCann, Simon Koeman, Tom Anderson, Kathy Faust, and Ed Lazowska for discussing with us the issues raised in this paper.  ... 
doi:10.1145/109626.109638 fatcat:qcbzdnbebvcapf5ghql5ki4wze

An incremental benchmark suite for performance tuning of parallel discrete event simulation

R. Ronngren, L. Barriga, R. Ayani
1996 Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences  
Such benchmarks should allow the designer of simulation kernels to: (i) evaluate how efficiently the simulation kernel runs on specific architectures: and (ii) evaluate how simulation problems scale on  ...  To evaluate parallel simulation environments there is a need for a common benchmark suite.  ...  This facilitates the relocation of LPs for load balancing purposes. The additional cost for using shared memory compared to private memory was found to be negligible.  ... 
doi:10.1109/hicss.1996.495484 dblp:conf/hicss/RonngrenBA96 fatcat:ogf3sbpcgzc6tmho7xbjpfkiay

High-Performance Buffer Mapping to Exploit DRAM Concurrency in Multiprocessor DSP Systems

Dongwon Lee, Shuvra S. Bhattacharyya, Wayne Wolf
2009 2009 IEEE/IFIP International Symposium on Rapid System Prototyping  
In this paper, to help alleviate the memory wall problem, we propose a novel, highperformance buffer mapping policy for SDF-represented DSP applications on multiprocessor systems that support the shared-memory  ...  Design methodologies and tools based on the synchronous dataflow (SDF) model of computation have proven useful for rapid prototyping and implementation of digital signal processing (DSP) applications on  ...  Bambha of the US Army Research Laboratory for providing his scheduling simulator. This research was supported in part by grant number 0325119 from the U.S. National Science Foundation.  ... 
doi:10.1109/rsp.2009.34 dblp:conf/rsp/LeeBW09 fatcat:ssa7ubxw2vaihlgw6pfhd2b46m

Load Balancing and Data Locality in Adaptive Hierarchical N-Body Methods: Barnes-Hut, Fast Multipole, and Radiosity

J.P. Singh, C. Holt, T. Totsuka, A. Gupta, J. Hennessy
1995 Journal of Parallel and Distributed Computing  
We find that straightforward decomposition techniques which an automatic scheduler might implement do not scale well, because they are unable to simultaneously provide load balancing and data locality.  ...  speedups on a 128-processor simulated architecture.  ...  Acknowledgements We would like to thank Joshua Barnes for providing us with the sequential Barnes-Hut program.  ... 
doi:10.1006/jpdc.1995.1077 fatcat:xc6rf5l73rchndyvjx6sct7k3i

Adaptive placement of parallel Java agents in a scalable computing cluster

Arie Keren, Amnon Barak
1998 Concurrency Practice and Experience  
This is accomplished by agent migrations among the nodes using on-line algorithms for load leveling and reduction of the inter agent communication overhead.  ...  This paper describes a framework for parallel computing in a locally con ned, scalable computing cluster SCC using Java agents.  ...  parallel programming in distributed-memory multicomputers, is to use Distributed Shared Memory DSM in order to provide the shared-memory illusion for threads 9 .  ... 
doi:10.1002/(sici)1096-9128(199809/11)10:11/13<971::aid-cpe395>3.0.co;2-e fatcat:btoyl3i6s5dfne42sdvobnkbie

Timepatch

Gautam Shah, Umakishore Ramachandran, Richard Fujimoto
1995 Performance Evaluation Review  
We present a new technique for the parallel simulation of cache coherent shared memory multiprocessors.  ...  Our technique is based on the fact that the functional correctness of the simulation can be decoupled from its timing correctness.  ...  We w ould also like to thank the members of our weekly arch beer" meetings for constructive feedback.  ... 
doi:10.1145/223586.223628 fatcat:offuyo4jbjgnhd2fr2rzz3r2mu

Timepatch

Gautam Shah, Umakishore Ramachandran, Richard Fujimoto
1995 Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems - SIGMETRICS '95/PERFORMANCE '95  
We present a new technique for the parallel simulation of cache coherent shared memory multiprocessors.  ...  Our technique is based on the fact that the functional correctness of the simulation can be decoupled from its timing correctness.  ...  We w ould also like to thank the members of our weekly arch beer" meetings for constructive feedback.  ... 
doi:10.1145/223587.223628 dblp:conf/sigmetrics/ShahRF95 fatcat:mlkeiezjwzgh7hakumlketbcz4

Dynamic node reconfiguration in a parallel-distributed environment

Michael J. Feeley, Brian N. Bershad, Jeffrey S. Chase, Henry M. Levy
1991 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '91  
But the set of machines free to participate in load sharing changes over time as users come and go from their workstations.  ...  This paper describes a node reconfiguration facility for Amber, an object-based parallel programming system for networks of multiprocessors.  ...  Acknowledgments We would like to thank Rik Littlefield, Cathy McCann, Simon Koeman, Tom Anderson, Kathy Faust, and Ed Lazowska for discussing with us the issues raised in this paper.  ... 
doi:10.1145/109625.109638 dblp:conf/ppopp/FeeleyBCL91 fatcat:fpkjr7zudjd7lljtn3cbkrh374

The power of SIMDs vs. MIMDs in real-time scheduling

Mingxian Jin, J.W. Baker, W.C. Meilander
2002 Proceedings 16th International Parallel and Distributed Processing Symposium  
In this paper, we compare SIMDs and MIMDs in real-time scheduling, e.g., scheduling for air traffic control.  ...  SIMDs and MIMDs are the most important categories of computer systems for parallel computing in Flynn's classification scheme.  ...  Since all data are stored in the local memory of individual PEs, data is loaded into the PEs in parallel for computations. A data item can be broadcast to one or more PEs in one step.  ... 
doi:10.1109/ipdps.2002.1016671 dblp:conf/ipps/JinBM02 fatcat:zbx222oicveuza4r3gftnfcinq

Concurrent Data Structures [chapter]

Mark Moir, Nir Shavit
2004 Handbook of Data Structures and Applications  
Shared-memory multiprocessors are systems that concurrently execute multiple threads of computation which communicate and synchronize through data structures in shared memory.  ...  Transactional support for multiprocessor synchronization was originally suggested by Herlihy and Moss, who also proposed a hardware-based transactional memory mechanism for supporting it [56] .  ...  1-9 multiprogramming, 1-3 mutex, 1-2 mutual exclusion, 1-2 non-uniform memory access, 1-4 nonblocking memory reclamation, 1-18 nonblocking progress conditions, 1-5 nonblocking synchronization, 1-3 NUMA  ... 
doi:10.1201/9781420035179.ch47 fatcat:b6onln3r3fb2ff3sitg64nilhi

FastLanes: An FPGA accelerated GPU microarchitecture simulator

Kuan Fang, Yufei Ni, Jiayuan He, Zonghui Li, Shuai Mu, Yangdong Deng
2013 2013 IEEE 31st International Conference on Computer Design (ICCD)  
A corresponding context shifting mechanism is proposed to store execution states of threads from FPGA to external on-board memory, and vice versa.  ...  Such a mechanism makes it possible to simulate hundreds of GPU cores on a single FPGA evaluation board.  ...  In addition, the scratchpad memory (i.e., shared memory in NVIDIA's terminology) also need to be preserved.  ... 
doi:10.1109/iccd.2013.6657049 dblp:conf/iccd/FangNHLMD13 fatcat:li4p6f2ebrfmlcx7m7u3sb7dc4
« Previous Showing results 1 — 15 out of 1,603 results