2,837 Hits in 2.5 sec

Implementation of atomic primitives on distributed shared memory multiprocessors

M.M. Michael, M.L. Scott
Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture  
In this paper we consider several hardware implementations of the general-purpose atomic primitives fetch and Φ, compare and swap, load linked, and store conditionalon large-scale shared-memory multiprocessors  ...  These primitives have proven popular on small-scale bus-based machines, but have yet to become widely available on large-scale, distributed shared memory machines.  ...  We also thank Robert Wisniewski, Wei Li, Michal Cierniak, and Raj Rao for their comments on the paper.  ... 
doi:10.1109/hpca.1995.386540 dblp:conf/hpca/MichaelS95 fatcat:5j64adodhbdkjhoyidqsyn5llu

A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems

Philippas Tsigas, Yi Zhang
2001 Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '01  
implemented e ciently on them.  ...  A non-blocking FIFO queue algorithm for multiprocessor shared memory systems is presented in this paper.  ...  Acknowledgements We would like to thank David Rutter for his great help during the writing phase of this paper.  ... 
doi:10.1145/378580.378611 dblp:conf/spaa/TsigasZ01 fatcat:bokwbnerurcgrdlevfl34xmj4q

Synchronization, coherence, and event ordering in multiprocessors

M. Dubois, C. Scheurich, F.A. Briggs
1988 Computer  
The instruction set of a multiprocessor usually contains basic instructions that are used to implement synchronization and communication between cooperating processes.  ...  The notions of synchronization and communication are difficult to separate because communication  ...  Acknowledgment Through many technical discussions, William Collier of IBM Poughkeepsie helped shape the content of this article.  ... 
doi:10.1109/2.15 fatcat:yflu46ikqjbbdh4tdgalpc5wmm

The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou
2001 International journal of parallel programming  
This paper investigates the performance of synchronization algorithms on ccNUMA multiprocessors, from the perspectives of the architecture and the operating system.  ...  Along with visiting the aforementioned issues, the paper contributes a new methodology for implementing fast synchronization algorithms on ccNUMA multiprocessors.  ...  Motivation The overhead of synchronization on shared-memory multiprocessors stems from three sources. The first is the latency of synchronization primitives.  ... 
doi:10.1023/a:1011168003859 dblp:journals/ijpp/NikolopoulosP01 fatcat:kggvvrj4c5cphh4ft2b4pkazpu

Performance Implications of Synchronization Support for Parallel Fortran Programs

S. Anik, W.M.W. Hwu
1994 Journal of Parallel and Distributed Computing  
Lastly, w e ran experiments to quantify the impact of various architectural support on the performance of a bus-based shared memory multiprocessor running automatically parallelized numerical programs.  ...  We found that supporting an atomic fetch&add primitive in shared memory is as e ective as supporting lock unlock operations with a synchronization bus.  ...  In the experiments, because of the low l o c k hit rate, the atomic memory operations are implemented in shared memory.  ... 
doi:10.1006/jpdc.1994.1081 fatcat:mtoqnucxfbb6xebywgokkuwr2u

Efficient synchronization primitives for large-scale cache-coherent multiprocessors

James R. Goodman, Mary K. Vernon, Philip J. Woest
1989 SIGARCH Computer Architecture News  
The efficient implementation of the primitives is simpler if the multiprocessor has a hardware cache-consistency protocol.  ...  The only assumptions made in developing the set of primitives are that hardware combining is not implemented in the hterconnect, and (in one case) that the interconnect supports broadcast.  ...  As in the case of the NYU Fetch-and-Add primitive, the RP3 prixnitives require logic in the shared memory to implement the seven atomic read-mod@-write operations.  ... 
doi:10.1145/68182.68188 fatcat:wrtqnodqmjhjdbffagf4bnkl3a

Evaluating the performance of non-blocking synchronization on shared-memory multiprocessors

Philippas Tsigas, Yi Zhang
2001 Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems - SIGMETRICS '01  
Parallel programs running on shared memory multiprocessors coordinate via shared data objects/structures.  ...  In this paper we study the impact of the non-blocking synchronisation on parallel applications running on top of a modern, 64 processor, cache-coherent, shared memory multiprocessor system: the SGI Origin  ...  Based on "commodity" processing modules and a distributed, but uni ed, coherent memory, ccNUMA extends the power and performance of shared memory multiprocessor systems while preserving the shared memory  ... 
doi:10.1145/378420.378810 dblp:conf/sigmetrics/TsigasZ01 fatcat:jy2ncus2srfnfjtuplsowmpuia

Efficient Synchronization Techniques in a Decentralized Memory Management System Enabling Shared Memory

Oliver Mattes, Martin Schindewolf, Roland Sedler, Rainer Buchty, Wolfgang Karl
2011 PARS Parallel-Algorithmen -Rechnerstrukturen und -Systemsoftware  
These platforms offer access to shared memory over a limited number of controllers which may lead to congestion.  ...  The rising integration level enables combining more logic on a single chip. This is exploited in multiprocessor systems-on-chip (MPSoCs) or manycore research prototypes such as the Intel SCC.  ...  Integration of Synchronization Primitives Software synchronization approaches rely on hardware primitives, which atomically read and modify a memory location.  ... 
doi:10.1007/bf03341993 fatcat:irzgx7y75zgqxdwsn5yfroijrq

Experiments with Parallelizing Tribology Simulations

V. Chaudhary, W. L. Hase, H. Jiang, L. Sun, D. Thaker
2004 Journal of Supercomputing  
The problem size and computing infrastructure is changed to assess the impact of this on various parallelization methods.  ...  All of them exhibit good performance improvements and it exhibits the necessity and importance of applying parallelization in this field.  ...  Some of them can execute only on shared memory multiprocessors whereas others can achieve speedups on networks of workstations.  ... 
doi:10.1023/b:supe.0000022103.01620.f3 fatcat:7ns7sjh32zbo5hd4z7aa7zil3m

Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors

Maged M. Michael, Michael L. Scott
1998 Journal of Parallel and Distributed Computing  
of our own, in microbenchmarks and real applications on a 12-processor SGI Challenge multiprocessor.  ...  To address this problem, researchers have developed two principal strategies for a concurrent, atomic update of shared data structures: (1) preemption-safe locking and (2) nonblocking (lock-free) algorithms  ...  It appears to be the algorithm of choice for any queue-based application on a multiprocessor with a universal atomic primitive. Also, we have presented a two-lock queue algorithm.  ... 
doi:10.1006/jpdc.1998.1446 fatcat:erkbq73nejghdoskxy63pcz7zi

Emulating Transactional Memory on FPGA Multiprocessors [chapter]

Matteo Pusceddu, Simone Ceccolini, Antonino Tumeo, Gianluca Palermo, Donatella Sciuto
2011 Lecture Notes in Computer Science  
In this paper we discuss the development of two emulation platforms for transactional memory systems on a single Field Programmable Gate Array (FPGA).  ...  We analyze and compare these two architectures to a lock based multiprocessor prototype, discussing the trade-offs in terms of design complexity, performance and scalability.  ...  Introduction Transactional memory [7] has emerged as a promising programming paradigm for shared memory multiprocessor architectures.  ... 
doi:10.1007/978-3-642-19137-4_7 fatcat:6flipy4ij5hqjf2dwbjfxpdmda

MP-LOCKs: replacing H/W synchronization primitives with message passing

Chen-Chi Kuo, J. Carter, R. Kuramkote
1999 Proceedings Fifth International Symposium on High-Performance Computer Architecture  
The most common synchronization operators are locks, which are traditionally implemented via a mix of shared memory accesses and hardware synchronization primitives like test-and-set.  ...  Shared memory programs guarantee the correctness of concurrent accesses to shared data using interprocessor synchronization operations.  ...  We present the results of the first study that compares the performance of message passing locks and shared memory locks on macrobenchmarks.  ... 
doi:10.1109/hpca.1999.744381 dblp:conf/hpca/KuoCK99 fatcat:egdu26242jatrlgkxg3t7urwpa

HW/SW methodologies for synchronization in FPGA multiprocessors

Antonino Tumeo, Christian Pilato, Gianluca Palermo, Fabrizio Ferrandi, Donatella Sciuto
2009 Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '09  
These solutions can be used for MultiProcessor Systems-on-Chip (MPSoCs) prototyping or even for final implementation.  ...  Nevertheless, efficient synchronization is required to guarantee performance in multiprocessing environments with the simple cores that do not support atomic instructions and are normally used in the standard  ...  A trylock primitive, that checks one time the value of the lock bit and sets it to 1 is also available. Barriers are implemented using one of the mutexes to protect a counter in shared memory.  ... 
doi:10.1145/1508128.1508174 dblp:conf/fpga/TumeoPPFS09 fatcat:xd6er3ajg5bbbaxn6x4puygynm

The Amber system: parallel programming on a network of multiprocessors

J. Chase, F. Amador, E. Lazowska, H. Levy, R. Littlefield
1989 Proceedings of the twelfth ACM symposium on Operating systems principles - SOSP '89  
Amber is specifically designed for high performance in the case where each node in the network is a shared-memory multiprocessor.  ...  Amber programmers use object migration primitives to control the location of data and processing.  ...  On multiprocessor hardware the atomicity of descriptor checks can no longer be guaranteed.  ... 
doi:10.1145/74850.74865 dblp:conf/sosp/ChaseALLL89 fatcat:jyjm27f4xngqxhgpaorglktfsy

Distributed and low-power synchronization architecture for embedded multiprocessors

Chenjie Yu, Peter Petrov
2008 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis - CODES/ISSS '08  
In this paper we present a framework for a distributed and very low-cost implementation of synchronization controllers and protocols for embedded multiprocessors.  ...  The proposed architecture effectively implements the queued-lock semantics in a completely distributed way.  ...  FUNCTIONAL OVERVIEW Conventional synchronization implementations rely on atomic operations to access and modify memory.  ... 
doi:10.1145/1450135.1450153 dblp:conf/codes/YuP08 fatcat:aosdm7es6jao3e6pe2a6kfac44
« Previous Showing results 1 — 15 out of 2,837 results