Filters








92 Hits in 7.6 sec

Design and evaluation of a switch cache architecture for CC-NUMA multiprocessors

R.R. Iyer, L.N. Bhuyan
2000 IEEE transactions on computers  
In this paper, we propose a novel hardware caching technique, called switch cache, to improve the remote memory access performance of CC-NUMA multiprocessors.  ...  Our results show that the CAESAR switch cache is capable of improving the performance of CC-NUMA multiprocessors by up to 45 percent reduction in remote memory accesses for some applications.  ...  CONCLUSIONS In this paper, we presented a novel hardware caching technique, called switch cache, to improve the remote memory access performance of CC-NUMA multiprocessors.  ... 
doi:10.1109/12.868025 fatcat:t77mkwpnofhphhsoxd4ozuw6uy

Dual-layered file cache on cc-NUMA system

Zhou Yingchao, Meng Dan, Ma Jie
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
CC-NUMA is a widely adopted and deployed architecture of high performance computers. These machines are attractive for their transparent access to local and remote memory.  ...  To address this problem, we suggest and implement a mechanism that uses local memory to cache remote file cache, of which the main purpose is to improve data locality.  ...  Introduction Cache Coherent Non-Uniform Memory Access (cc-NUMA) multiprocessors provide transparent access to local and remote memory. However, the access latency gap between them is very high.  ... 
doi:10.1109/ipdps.2006.1639307 dblp:conf/ipps/YingchaoDJ06 fatcat:x7dxbuvkgzgq5l5r2znx6nqbc4

ASR: Adaptive Selective Replication for CMP Caches

Bradford Beckmann, Michael Marty, David Wood
2006 Microarchitecture (MICRO), Proceedings of the Annual International Symposium on  
The large working sets of commercial and scientific workloads stress the L2 caches of Chip Multiprocessors (CMPs).  ...  ASR replicates cache blocks only when it estimates the benefit of replication (lower L2 hit latency) exceeds the cost (more L2 misses).  ...  for their comments on this work.  ... 
doi:10.1109/micro.2006.10 dblp:conf/micro/BeckmannMW06 fatcat:ybh52lm5ajenvezppy2f2rbjpy

A performance study of cache coherence protocols and write caches for parallel‐multithreaded shared‐memory multiprocessors

Chao‐Chin Wu, Cheng Chen
1998 Zhongguó gongchéng xuékan  
Moreover, incorporating write caches improves the system performance of clean and competitive-update protocols.  ...  According to our simulation results, the clean protocol provided the best performance for five out of six SPLASH programs.  ...  ACKNOWLEDGMENTS The authors would like to thank the reviewers for their helpful comments.  ... 
doi:10.1080/02533839.1998.9670368 fatcat:g2huh4ji7bbg7atkhgmpqsin5q

Two proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessors

Alberto Ros, Ricardo Fernández-Pascual, Manuel E. Acacio, José M. García
2008 Journal of Parallel and Distributed Computing  
In glueless shared-memory multiprocessors where cache coherence is usually maintained using a directory-based protocol, the fast access to the on-chip components (caches and network router, among others  ...  In this work, we propose two alternative designs for the last-level private cache of glueless shared-memory multiprocessors: the lightweight directory and the SGluM cache.  ...  Acknowledgments The authors would like to thank the anonymous referees for their detailed comments and valuable suggestions, which have helped to improve the quality of the paper.  ... 
doi:10.1016/j.jpdc.2008.07.001 fatcat:2ridrw3lsrajjn2hy2vuprlcge

Verification techniques for cache coherence protocols

Fong Pong, Michel Dubois
1997 ACM Computing Surveys  
In this article we present a comprehensive survey of various approaches for the verification of cache coherence protocols based on state enumeration, (symbolic) model checking, and symbolic state models  ...  To be successful for systems of arbitrary complexity, a verification technique must solve this so-called state space explosion problem.  ...  Finding such a framework would certainly be a breakthrough in the field of formal architecture verification. Figure 1 . 1 Shared-memory models: (a) UMA and CC-UMA; (b) NUMA and CC-NUMA; (c) COMA.  ... 
doi:10.1145/248621.248624 fatcat:neylws7pgfe2jlwxuezr5unigm

Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection

Håkan Grahn, Per Stenström
1996 Journal of Parallel and Distributed Computing  
Although directory-based write-invalidate cache coherence protocols have a potential to improve the performance of large-scale multiprocessors, coherence misses limit the processor utilization.  ...  coherence miss rate and have been shown to be a better coherence policy for a wide range of applications.  ...  Acknowledgments We would like to thank Lars Jönsson for implementing the basic adaptive protocol in our simulator as a part of his Master's thesis.  ... 
doi:10.1006/jpdc.1996.0164 fatcat:a6a6nrmkrfcmbjvyz7jwxgjniq

Shared memory multiprocessor architectures for software ip routers

Yan Luo, L.N. Bhuyan, Xi Chen
2003 IEEE Transactions on Parallel and Distributed Systems  
Non-Uniform Memory Access (CC-NUMA) paradigms.  ...  Results also show that the CC-NUMA architecture can sustain good lookup performance even at a high frequency of route updates.  ...  Accesses to other nodes in the remote memory are accomplished RTR lookup time detail on CC-NUMA architecture.  ... 
doi:10.1109/tpds.2003.1255636 fatcat:qxirvu2m6nb2lgcfsaaufp57im

Operating system support for improving data locality on CC-NUMA compute servers

Ben Verghese, Scott Devine, Anoop Gupta, Mendel Rosenblum
1996 ACM SIGOPS Operating Systems Review  
The dominant architecture for the next generation of sharedmemory multiprocessors is CC-NUMA (cache-coherent non-w@orm memory architecture).  ...  CC-NOW machines provide the benejits of cache coherence to networks of workstations, at the cost of even higher remote access latency.  ...  We would also like to thank the FLASH hardware team for their help with Flashlite, and for providing the pieces of the engineering workload.  ... 
doi:10.1145/248208.237205 fatcat:gjxtc5wmjrafberbnylc5qeioe

Operating system support for improving data locality on CC-NUMA compute servers

Ben Verghese, Scott Devine, Anoop Gupta, Mendel Rosenblum
1996 SIGPLAN notices  
The dominant architecture for the next generation of sharedmemory multiprocessors is CC-NUMA (cache-coherent non-w@orm memory architecture).  ...  CC-NOW machines provide the benejits of cache coherence to networks of workstations, at the cost of even higher remote access latency.  ...  We would also like to thank the FLASH hardware team for their help with Flashlite, and for providing the pieces of the engineering workload.  ... 
doi:10.1145/248209.237205 fatcat:ll62vd7k3zdcjijav5ysr2jy6q

Operating system support for improving data locality on CC-NUMA compute servers

Ben Verghese, Scott Devine, Anoop Gupta, Mendel Rosenblum
1996 Proceedings of the seventh international conference on Architectural support for programming languages and operating systems - ASPLOS-VII  
The dominant architecture for the next generation of sharedmemory multiprocessors is CC-NUMA (cache-coherent non-w@orm memory architecture).  ...  CC-NOW machines provide the benejits of cache coherence to networks of workstations, at the cost of even higher remote access latency.  ...  We would also like to thank the FLASH hardware team for their help with Flashlite, and for providing the pieces of the engineering workload.  ... 
doi:10.1145/237090.237205 dblp:conf/asplos/VergheseDGR96 fatcat:2hhnllpmpneypbex3h4y6zvly4

Evaluating kilo-instruction multiprocessors

Marco Galluzzi, Ramón Beivide, Valentin Puente, José-Ángel Gregorio, Adrian Cristal, Mateo Valero
2004 Proceedings of the 3rd workshop on Memory performance issues in conjunction with the 31st international symposium on computer architecture - WMPI '04  
What we propose, in this paper, is the use of Kilo-instruction processors as computing nodes for small-scale CC-NUMA multiprocessors.  ...  First, the great amount of in-flight instructions makes the system not just to hide the latencies coming from the memory accesses but also the inherent communication latencies involved in remote memory  ...  to emulate a complete CC-NUMA multiprocessor using local caches, a directory-based coherency protocol and a switched interconnection network.  ... 
doi:10.1145/1054943.1054953 dblp:conf/wmpi/GalluzziBPGCV04 fatcat:2r3xxn2qy5b4ppazmw5pc54wnm

Cacheminer: A runtime approach to exploit cache locality on SMP

Yong Yan, Xiaodong Zhang
2000 IEEE Transactions on Parallel and Distributed Systems  
We propose a memory-layout oriented technique to exploit cache locality of parallel loops at runtime on Symmetric Multiprocessor (SMP) systems.  ...  However, our experimental results show that our approach is able to significantly improve the memory performance for the applications with irregular computation and dynamic memory access patterns.  ...  Finally, we appreciate the insightful comments and critiques from the anonymous referees, which are helpful to improve the quality and readability of the paper.  ... 
doi:10.1109/71.850833 fatcat:wv4tg2o76nc4xjexamt6jaathi

Reducing coherence overhead and boosting performance of high-end SMP multiprocessors running a DSS workload

Pierfrancesco Foglia, Roberto Giorgi, Cosimo Antonio Prete
2005 Journal of Parallel and Distributed Computing  
In this work, we characterized the memory performance-and in particular the impact of coherence overhead and process migration-of a shared-bus shared-memory multiprocessor running a DSS workload.  ...  In these conditions, the use of a write-update protocol with a selective invalidation strategy for private data improves performance (and scalability) of about 20% with respect to a classical MESI-based  ...  The simulated system is a CC-NUMA shared-memory multiprocessor with advanced ILP support.  ... 
doi:10.1016/j.jpdc.2004.10.003 fatcat:6bc2psq4ubcrpm2b637bnm64ni

Proximity-aware directory-based coherence for multi-core processor architectures

Jeffery A. Brown, Rakesh Kumar, Dean Tullsen
2007 Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07  
As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue for multi-core performance.  ...  This has the dual benefit of eliminating unnecessary accesses to off-chip memory, and minimizing the distance over which communicated data moves across the network.  ...  Acknowledgments The authors would like to thank the anonymous reviewers for their helpful insights.  ... 
doi:10.1145/1248377.1248398 dblp:conf/spaa/BrownKT07 fatcat:y7c3zgv3dncirjggikdfmyfuwi
« Previous Showing results 1 — 15 out of 92 results