Filters








86 Hits in 3.5 sec

On timing constraints of snooping in a bus-based COMA multiprocessor

Sangyeun Choa, Jinseok Kong, Gyungho Lee
1998 Microprocessors and microsystems  
In this paper, we propose a scheme to relax the timing constraints of snooping in a bus-based COMA multiprocessor, which allows an efficient design of a global bus protocol, and a cost-effective implementation  ...  Cache only memory architecture has the potential to decrease global bus traffic in shared-bus multiprocessors, thereby reducing the speed gap between modem microprocessors and global backplane bus systems  ...  A version of this paper appeared in the Proc. of lASTED International Conference on Parallel and Distributed Computing and Systems, Chicago, IL, October 1996.  ... 
doi:10.1016/s0141-9331(97)00055-0 fatcat:s3o74fusyfa57o7vx6mlolxm44

Design of a bus-based shared-memory multiprocessor DICE

Gyungho Lee, Bland W Quattlebaum, Sangyeun Cho, Larry L Kinney
1999 Microprocessors and microsystems  
DICE tries to optimize COMA for a shared-bus medium, in particular to reduce detrimental effects of the cache coherence and the 'last memory block' problem on replacement.  ...  Considering the benefits of COMA and the moderate design complexity it adds to the conventional shared-bus multiprocessor design, a bus-based COMA multiprocessor, such as DICE, can become a viable candidate  ...  An earlier version of the paper was presented in Ref. [26] .  ... 
doi:10.1016/s0141-9331(98)00097-0 fatcat:gmluftwjg5g3ximidir4rq3b7y

Coherence and Replacement Protocol of DICE—A Bus-Based COMA Multiprocessor

Sangyeun Cho, Jinseok Kong, Gyungho Lee
1999 Journal of Parallel and Distributed Computing  
DICE tries to optimize COMA for a shared-bus medium, in particular to reduce the detrimental effects of cache coherence and the"last memory block" problem on replacement.  ...  Replacement, which poses a unique overhead problem of COMA, requires that a victim block with ownership be relocated to a remote node in order not to discard the last cached memory block.  ...  Sangyeun Cho was supported in part by a fellowship from the Korea Foundation for Advanced Studies.  ... 
doi:10.1006/jpdc.1998.1524 fatcat:fdu7qc55k5f2jazhvs5sbazdwi

A quantitative analysis of the performance and scalability of distributed shared memory cache coherence protocols

M. Heinrich, V. Soundararajan, J. Hennessy, A. Gupta
1999 IEEE transactions on computers  
Existing commercial implementations use a variety of different protocols including bit-vector/coarse-vector protocols, SCI-based protocols, and COMA protocols.  ...  In addition to measurements of the characteristics of protocol execution (e.g. memory overhead, protocol execution time, and message count) and of overall performance, we examine the effects of scaling  ...  The authors wish to thank the FLASH team members as well as Robert Bosch for his tireless support of the simulation environment.  ... 
doi:10.1109/12.752662 fatcat:kforuwbdtbfmnarn7uqmivan2a

Cache-coherent distributed shared memory: perspectives on its development and future challenges

J. Hennessy, M. Heinrich, A. Gupta
1999 Proceedings of the IEEE  
Cache coherence allows such architectures to use caching to take advantage of locality in applications without changing the programmer's model of memory.  ...  We review the key developments that led to the creation of cache-coherent distributed shared memory and describe the Stanford DASH Multiprocessor, the first working implementation of hardware-supported  ...  Bus-based, shared-memory multiprocessors remain the dominant multiprocessor architecture for small processor counts.  ... 
doi:10.1109/5.747863 fatcat:koqfmkqdibaylcxfiheb33bwly

Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors

Vijayaraghavan Soundararajan, Mark Heinrich, Ben Verghese, Kourosh Gharachorloo, Anoop Gupta, John Hennessy
1998 SIGARCH Computer Architecture News  
Given the limitations of bus-based multiprocessors, CC-NUMA is the scalable architecture of choice for shared-memory machines.  ...  This work is done in the context of the Stanford FLASH multiprocessor.  ...  Acknowledgments We would like to thank Shigehiro Asano for his contributions early on in this work.  ... 
doi:10.1145/279361.279403 fatcat:doy7vvwsrvekjfrrgroip2c6gm

Reactive NUMA

Babak Falsafi, David A. Wood
1997 Proceedings of the 24th annual international symposium on Computer architecture - ISCA '97  
CC-NUMA and S-COMA, for our benchmarks and base system assumptions.  ...  This reactive behavior allows each node in an R-NUMA system to independently choose the best protocol for a particular page, thus providing much greater performance stability than either CC-NUMA or S-COMA  ...  Acknowledgements We would like to thank Steve Reinhardt for helping with the development of our simulator, Beng-Hong Lim and Sandra Irani for their comments on our performance models, and Scott Breach,  ... 
doi:10.1145/264107.264205 dblp:conf/isca/FalsafiW97 fatcat:xdrodc2f6rhgbpr7shsx5ookk4

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

An-Chow Lai, Babak Falsafi
2000 Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures - SPAA '00  
In this paper, we compare and contrast page migration/replication and R-NUMA on simulated clusters of symmetric multiprocessors executing shared-memory applications.  ...  Page migration/replication optimizes read-write accesses to a page used by a single processor by migrating the page to that processor and replicates all read-shared pages in the sharers' local memories  ...  Each node is a 4way multiprocessor with 600 MHz dual-issue processors interconnected by a 100 MHz split-transaction bus.  ... 
doi:10.1145/341800.341811 dblp:conf/spaa/LaiF00 fatcat:xpdbdm5wmfcctazvqt2h476h5u

Optimizing Traffic in DSM Clusters: Fine-Grain Memory Caching versus Page Migration/ Replication

An-Chow Lai, Babak Falsafi
2002 Theory of Computing Systems  
In this paper, we compare and contrast page migration/replication and R-NUMA on simulated clusters of symmetric multiprocessors executing shared-memory applications.  ...  Page migration/replication optimizes read-write accesses to a page used by a single processor by migrating the page to that processor and replicates all read-shared pages in the sharers' local memories  ...  Each node is a 4way multiprocessor with 600 MHz dual-issue processors interconnected by a 100 MHz split-transaction bus.  ... 
doi:10.1007/s00224-002-1054-6 fatcat:dc7x2u6svzh55of3u2s3s6reim

Comparative performance evaluation of cache-coherent NUMA and COMA architectures

Per Stenström, Truman Joe, Anoop Gupta
1992 SIGARCH Computer Architecture News  
One could reduce the node miss penalty for COMA by using faster (and more expensive) SRAM-based directories.  ...  The reason is that we have significantly reduced the number of page migrations (e.g., down from 2.7K to 135 migrations for 4 Kbyte pages), thus reducing the soflware overhead of migration.  ... 
doi:10.1145/146628.139705 fatcat:sqp5agrzlrhh5nm5wmu2r2izfq

Comparative performance evaluation of cache-coherent NUMA and COMA architectures

Per Stenström, Truman Joe, Anoop Gupta
1992 Proceedings of the 19th annual international symposium on Computer architecture - ISCA '92  
One could reduce the node miss penalty for COMA by using faster (and more expensive) SRAM-based directories.  ...  The reason is that we have significantly reduced the number of page migrations (e.g., down from 2.7K to 135 migrations for 4 Kbyte pages), thus reducing the soflware overhead of migration.  ... 
doi:10.1145/139669.139705 dblp:conf/isca/StenstromJG92 fatcat:54cuhamf4vc53i7pcfg7ht4zvi

Decoupled hardware support for distributed shared memory

Steven K. Reinhardt, Robert W. Pfile, David A. Wood
1996 SIGARCH Computer Architecture News  
To demonstrate the feasibility and simplicity of this access control device, we designed and built an FPGA-based version in under one year.  ...  Two benchmarks are hampered by high communication overheads, but selectively replacing shared-memory operations with message passing provides speedups of at least 16 on both decoupled systems.  ...  Babak Falsafi and Shubu Mukherjee contributed to the development of the simulator used in this paper. Mark Hill and Jim Larus provided valuable comments on drafts of this paper.  ... 
doi:10.1145/232974.232979 fatcat:xlvt3pco3vakrhow3fr5taaefy

Decoupled hardware support for distributed shared memory

Steven K. Reinhardt, Robert W. Pfile, David A. Wood
1996 Proceedings of the 23rd annual international symposium on Computer architecture - ISCA '96  
To demonstrate the feasibility and simplicity of this access control device, we designed and built an FPGA-based version in under one year.  ...  Two benchmarks are hampered by high communication overheads, but selectively replacing shared-memory operations with message passing provides speedups of at least 16 on both decoupled systems.  ...  Babak Falsafi and Shubu Mukherjee contributed to the development of the simulator used in this paper. Mark Hill and Jim Larus provided valuable comments on drafts of this paper.  ... 
doi:10.1145/232973.232979 dblp:conf/isca/ReinhardtPW96 fatcat:3brtvx2ihjcczkrrqbw7qj6wra

PSCR: a coherence protocol for eliminating passive sharing in shared-bus shared-memory multiprocessors

R. Giorgi, C.A. Prete
1999 IEEE Transactions on Parallel and Distributed Systems  
Shared-bus shared-memory multiprocessor can be used to speed-up the execution of such workload.  ...  We evaluate the complexity in terms of the number of protocol states, additional bus lines, and required software support.  ...  Two new WU protocols have been defined for two special bus-based machines: on-chip multiprocessor [64] and bus-based COMA [41] .  ... 
doi:10.1109/71.780868 fatcat:c44pju2v5fgm3ozmltso7pz2nu

The Scalable Coherent Interface (SCI)

D.B. Gustavson, Qiang Li
1996 IEEE Communications Magazine  
A new approach t o communication is required, one that can eliminate the delay due t o software overheads, if we are t o reap the full benefit of the far higher bandwidths that modern hardware can provide  ...  This article first reviews the general properties that an appropriate system architecture should have, and introduces an architectural model, the Local Area MultiProcessor, distinguished by i t s shared-memory  ...  Only a few of the traditional few-processor supercomputer companies have survived, and they are now adding multiprocessor-based product lines.  ... 
doi:10.1109/35.533919 fatcat:icmnbvnsfffv7hzxejen5fd77m
« Previous Showing results 1 — 15 out of 86 results