Filters








9,934 Hits in 8.0 sec

The effect of network total order, broadcast, and remote-write capability on network-based shared memory computing

R. Stets, S. Dwarkadas, L. Kontothanassis, U. Rencuzogullari, M.L. Scott
Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550)  
Cashmere has been implemented on the Compaq Memory Channel network, which supports remote memory writes, inexpensive broadcast, and total ordering of network packets.  ...  In our investigation, we examine remote-write, along with features for inexpensive broadcast and network total order.  ...  Conclusions In this paper, we have studied the effect of advanced network features, in particular, remote writes, inexpensive broadcast, and total packet ordering, on SDSM.  ... 
doi:10.1109/hpca.2000.824356 dblp:conf/hpca/StetsDKRS00 fatcat:r6f3dmylr5ctbgsfhx6grakwhu

Shared memory computing on clusters with symmetric multiprocessors and system area networks

Leonidas Kontothanassis, Robert Stets, Galen Hunt, Umit Rencuzogullari, Gautam Altekar, Sandhya Dwarkadas, Michael L. Scott
2005 ACM Transactions on Computer Systems  
Moreover, contrary to our original expectations, noncoherent hardware support for remote memory writes, total message ordering, and broadcast, provide comparatively little in the way of additional benefits  ...  It is distinguished from most other S-DSM projects by (1) the effective use of fast user-level messaging, as provided by modern system-area networks, and (2) a "two-level" protocol structure that exploits  ...  The Shasta results reported in Section 4.2 were obtained with the generous assistance of Dan Scales and Kourosh Gharachorloo. The authors would like to thank Ricardo Bianchini and Alan L.  ... 
doi:10.1145/1082469.1082472 fatcat:itz3q5b2fbczhcgzkoszmabr5a

A Cache coherence protocol for MIN-based multiprocessors

Mazin S. Yousif, Chita R. Das, Matthew J. Thazhuthaveetil
1994 Journal of Supercomputing  
The impact of the coherence protocol on system performance is evaluated through a performance study of three phases.  ...  The performance of our system is compared to that of a system with an equivalent-sized unified cache and with a multiprocessor implementing a directory-based coherence protocol.  ...  A remote or local memory access could take several clock cycles, depending on network traffic, location of the block, and system size.  ... 
doi:10.1007/bf01204660 fatcat:ari6h2fbhrcoriz6znxzlsjv2a

Shared virtual memory with automatic update support

Liviu Iftode, Matthias Blumrich, Cezary Dubnicki, David L. Oppenheimer, Jaswinder Pal Singh, Kai Li
1999 Proceedings of the 13th international conference on Supercomputing - ICS '99  
Shared virtual memory systems provide the abstraction of a shared address space on top of a messagepassing communication architecture.  ...  Automatic update propagates local memory writes to remote memory locations automatically.  ...  We thank Stefanos Damianakis for his help in improving the quality of the presentation.  ... 
doi:10.1145/305138.305191 dblp:conf/ics/IftodeBDOSL99 fatcat:t5map2ohjzgmtaeoesf3vbgp6q

Timestamp snooping

Milo M. K. Martin, David A. Wood, Daniel J. Sorin, Anatassia Ailamaki, Alaa R. Alameldeen, Ross M. Dickson, Carl J. Mauer, Kevin E. Moore, Manoj Plakal, Mark D. Hill
2000 SIGARCH Computer Architecture News  
Processors and memories then reorder transactions based on their timestamps to establish a total order.  ...  Conversely, directory-based shared-memory systems must indirectly locate the owner and sharers through a directory, resulting in larger average miss latencies.  ...  , and Bernard Beaton for their support of IBM DB2; Paul Barford for the SURGE client; and Ernest Artiaga for the PARMACS macros.  ... 
doi:10.1145/378995.378998 fatcat:56rdl6n7zrd5da7hxqvpowq4zi

Timestamp snooping

Milo M. K. Martin, David A. Wood, Daniel J. Sorin, Anatassia Ailamaki, Alaa R. Alameldeen, Ross M. Dickson, Carl J. Mauer, Kevin E. Moore, Manoj Plakal, Mark D. Hill
2000 ACM SIGOPS Operating Systems Review  
Processors and memories then reorder transactions based on their timestamps to establish a total order.  ...  Conversely, directory-based shared-memory systems must indirectly locate the owner and sharers through a directory, resulting in larger average miss latencies.  ...  , and Bernard Beaton for their support of IBM DB2; Paul Barford for the SURGE client; and Ernest Artiaga for the PARMACS macros.  ... 
doi:10.1145/384264.378998 fatcat:nzuouyvxubcxnfddwwpk54lxgu

Timestamp snooping

Milo M. K. Martin, David A. Wood, Daniel J. Sorin, Anatassia Ailamaki, Alaa R. Alameldeen, Ross M. Dickson, Carl J. Mauer, Kevin E. Moore, Manoj Plakal, Mark D. Hill
2000 Proceedings of the ninth international conference on Architectural support for programming languages and operating systems - ASPLOS-IX  
Processors and memories then reorder transactions based on their timestamps to establish a total order.  ...  Conversely, directory-based shared-memory systems must indirectly locate the owner and sharers through a directory, resulting in larger average miss latencies.  ...  , and Bernard Beaton for their support of IBM DB2; Paul Barford for the SURGE client; and Ernest Artiaga for the PARMACS macros.  ... 
doi:10.1145/378993.378998 fatcat:n4vqo545enggji5bce7qvjk54y

Coherence Protocols for Bus-Based and Scalable Multiprocessors, Internet, and Wireless Distributed Computing Environments: A Survey [chapter]

John Sustersic, Ali Hurson
2003 Advances in Computers  
Acknowledgement: This work in part has been supported by the Office of the Naval Support under the contract N00014-02-1-0282.  ...  Formally, a distributed memory (storage) system is said to be coherent if, for each shared memory location in the system, there exists some total serial order of the operations on those storage locations  ...  Scale of shared memory space -Clearly, the total memory available over the World Wide Web is by orders of magnitudes greater than the largest distributed shared memory multiprocessor organization.  ... 
doi:10.1016/s0065-2458(03)59005-2 fatcat:hrflfqanffa7bo4dmj2l32pzba

Timestamp snooping

Milo M. K. Martin, David H. Wood, Daniel J. Sorin, Anastassia Ailamaki, Alaa R. Alameldeen, Ross M. Dickson, Carl J. Mauer, Kevin E. Moore, Manoj Plakal, Mark D. Hill
2000 SIGPLAN notices  
Processors and memories then reorder transactions based on their timestamps to establish a total order.  ...  Conversely, directory-based shared-memory systems must indirectly locate the owner and sharers through a directory, resulting in larger average miss latencies.  ...  , and Bernard Beaton for their support of IBM DB2; Paul Barford for the SURGE client; and Ernest Artiaga for the PARMACS macros.  ... 
doi:10.1145/356989.356992 fatcat:5eqevkd2nnegloacnbyg342vvy

RAPID: Reconfigurable and Scalable All-Photonic Interconnect for Distributed Shared Memory Multiprocessors

A.K. Kodi, A. Louri
2004 Journal of Lightwave Technology  
As the network size increases, network contention results in increasing the critical remote memory access latency, which significantly penalizes the performance of DSM systems.  ...  In this paper, we describe the design and analysis of a scalable architecture suitable for large-scale distributed shared memory (DSM) systems.  ...  Effects of Varying MSHRs: Effects of Varying the Coherence and Data Packet Sizes: Fig. 5(b) shows the effects of varying the coherence and data packet sizes on the average remote memory latency for 64  ... 
doi:10.1109/jlt.2004.833249 fatcat:xisqf3pcsbd6verftgm7yzwaae

MODELS OF DISTRIBUTED-SHARED-MEMORY ON AN INTERCONNECTION NETWORK FOR BROADCAST COMMUNICATION

CONSTANTINE KATSINIS
2003 Journal of Interconnection Networks (JOIN)  
Such systems are scalable and capable of high computing power. Processes on different nodes communicate by passing messages.  ...  This paper examines the performance of distributed-shared-memory (DSM) systems based on the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) using queuing network models and develops theoretical  ...  A shared memory multiprocessor based on a 4x4 mesh network with wormhole routing is studied in [6] . The performance of two hardware-based prefetching schemes are evaluated with simulation.  ... 
doi:10.1142/s021926590300074x fatcat:hggkb6sjarcxtkd3eybvmet7xi

Genetic Programming in Wireless Sensor Networks [chapter]

Derek M. Johnson, Ankur M. Teredesai, Robert T. Saltarelli
2005 Lecture Notes in Computer Science  
We demonstrate the utility of our formulations and validate the proposed ideas using a variety of problem sets and describe the results.  ...  Several adaptations including a novel representation scheme, an approximate fitness computation method and a sufficient statistics based data reduction technique lead to the development of a GP implementation  ...  After each generation of the BEA on a mote M b a random member of the population {M pi b | i < |M p b |} is selected and broadcast to remote motes 2 . The entire individual is sent.  ... 
doi:10.1007/978-3-540-31989-4_9 fatcat:tdse2733efeh7h36lp5qsdqrcu

Scale-out ccNUMA

Vasilis Gavrielatos, Antonios Katsarakis, Arpit Joshi, Nicolai Oswald, Boris Grot, Vijay Nagarajan
2018 Proceedings of the Thirteenth EuroSys Conference on - EuroSys '18  
In a 9-node RDMA-based rack and with modest write ratios, our prototype design, dubbed ccKVS, achieves 2.2× the throughput of the state-ofthe-art KVS while guaranteeing strong consistency.  ...  Such KVS typically use a scale-out architecture, whereby the dataset is partitioned across a pool of servers, each holding a chunk of the dataset in memory and being responsible for serving queries against  ...  This work was supported in part by EPSRC (grants EP/M027317/1 and EP/L01503X/1 to The University of Edinburgh), ARM and Microsoft Research through their PhD Scholarship Programmes.  ... 
doi:10.1145/3190508.3190550 dblp:conf/eurosys/GavrielatosKJOG18 fatcat:k5u2nhpfhbeopj5lmbdgbznlyu

An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor [chapter]

James Ross, David Richie
2016 Lecture Notes in Computer Science  
While fully capable of MPMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to Partitioned Global Address Space (PGAS) programming models and SPMD  ...  The Epiphany architecture exhibits massive many-core scalability with a physically compact 2D array of RISC CPU cores and a fast network-on-chip (NoC).  ...  Broadcasts are important in the context of the Epiphany application development in order to limit the replication of off-chip memory accesses to common memory.  ... 
doi:10.1007/978-3-319-50995-2_10 fatcat:5ptmxnc2vja2xhrywahou3du5q

Cashmere-2L

Robert Stets, Sandhya Dwarkadas, Nikolaos Hardavellas, Galen Hunt, Leonidas Kontothanassis, Srinivasan Parthasarathy, Michael Scott
1997 ACM SIGOPS Operating Systems Review  
Low-latency remote-write networks, such as DEC's Memory Channel, provide the possibility of transparent, inexpensive, large-scale shared-memory parallel computing on clusters of shared memory multiprocessors  ...  Remote interrupts are minimized by exploiting the remote-write capabilities of the Memory Channel network. Cashmere-2L currently runs on an 8-node, 32-processor DEC AlphaServer system.  ...  A broadcast of directory modifications is performed due to the lack of remote-read capability on the Memory Channel.  ... 
doi:10.1145/269005.266675 fatcat:6fimgoffsff4ndbf7i2myo7mbe
« Previous Showing results 1 — 15 out of 9,934 results