Filters








440 Hits in 6.6 sec

Improving Multiple-CMP Systems Using Token Coherence

M.R. Marty, J.D. Bingham, M.D. Hill, Alan J. Hu, M.M.K. Martin, D.A. Wood
11th International Symposium on High-Performance Computer Architecture  
Coherence is a particular challenge for Multiple-CMP (M-CMP) systems.  ...  Coherence is a particular challenge for Multiple-CMP (M-CMP) systems.  ...  Acknowledgments We thank Virtutech AB, the Wisconsin Condor group, and the Wisconsin Computer Systems Lab for their help and support.  ... 
doi:10.1109/hpca.2005.17 dblp:conf/hpca/MartyBHHMW05 fatcat:kntm2bor2nfvrggjyffswwguem

DiCo-CMP: Efficient cache coherency in tiled CMP architectures

Alberto Ros, Manuel E. Acacio, Jose M. Garcia
2008 Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)  
Using an extended version of GEMS simulator we show that DiCo-CMP achieves improvements in execution time of up to 8% on average over a directory protocol, and reductions in terms of network traffic of  ...  up to 42% on average compared to Token-CMP.  ...  Token-CMP targets CMP systems, and uses a distributed arbitration scheme for persistent requests, which are issued after a single retry to optimize the access to contended blocks.  ... 
doi:10.1109/ipdps.2008.4536287 dblp:conf/ipps/RosAG08 fatcat:27irw74qhnaavpvvbaserakqvq

Exploit Temporal Locality of Shared Data in SRC Enabled CMP [chapter]

Haixia Wang, Dongsheng Wang, Peng Li, Jinglei Wang, XianPing Fu
2007 Lecture Notes in Computer Science  
Token-SRC protocol integrates SRC into token protocol,reducing network traffic of token protocol.Simulations using SPLASH-2 benchmarks show that, a 16-core CMP system with token-SRC achieved average 15%  ...  Based on this characteristic, we present a sharing relation cache (SRC for short) based CMP architecture, saving recently used sharing relations to provide destination set information for following cache-to-cache  ...  CMP systems share many critical design issues with traditional share-memory multiprocessor systems, especially the cache-coherence protocols.  ... 
doi:10.1007/978-3-540-74784-0_39 fatcat:hmd6bmtozne6znxdfuhgaxepjq

Rainbow: A Composable Coherence Protocol for Multi-Chip Servers [article]

Lucia G. Menezo, Valentin Puente, Jose A. Gregorio
2020 arXiv   pre-print
Our proposal is able to improve on the performance of a HyperTransport-like coherence protocol by from 25%-to-60%.  ...  This paper introduces a new coherence protocol suitable, in terms of complexity and scalability, for this class of systems.  ...  The implementation supports multiple 3-level cache CMP systems and it is capable of being used in a full-system simulation.  ... 
arXiv:2002.03944v1 fatcat:wfk2htl7efcutmmgg2kacqpp5e

Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs [chapter]

Alberto Ros, Manuel E. Acacio, José M. García
2009 Lecture Notes in Computer Science  
Token-CMP and DiCo-CMP are cache coherence protocols that have been recently proposed to avoid the indirection problem of traditional directory-based protocols.  ...  Area constraints limit the use of precise sharing codes to small-or medium-scale CMPs. Power constraints make impractical to use broadcast-based protocols for large-scale CMPs.  ...  Hammer is the cache coherence protocol used by AMD in their Opteron systems.  ... 
doi:10.1007/978-3-642-03644-6_2 fatcat:q4unqfdh7bfxlmofp2zrtdzf2q

Importance of Coherence Protocols with Network Applications on Multicore Processors

Kyueun Yi, Won W. Ro, Jean-Luc Gaudiot
2013 IEEE transactions on computers  
With an 8-core configuration, token protocols improves the performance compared to directory protocols by a factor of nearly 4 on average.  ...  Our simulation results show that token protocols have a significantly higher performance than directory protocols.  ...  To test the performance of the coherence protocols, two cache coherence protocols, MOESI-directory and MOESI-token, are used with network workloads.  ... 
doi:10.1109/tc.2011.199 fatcat:6cficu3jqnbr3eqxc6bgnis7ay

Coherence Ordering for Ring-based Chip Multiprocessors

Michael R. Marty, Mark D. Hill
2006 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)  
Second, it improves performance stability relative to GREEDY-ORDER by not using retries.  ...  Existing cache coherence protocols for rings either establish a (total) ordering point (ORDERING-POINT) or use a greedy order (GREEDY-ORDER) with unbounded retries.  ...  Finally we thank Virtutech, the Wisconsin Condor group, and the Wisconsin Computer Systems Lab for their support.  ... 
doi:10.1109/micro.2006.14 dblp:conf/micro/MartyH06 fatcat:hqykjtz7ozhxjdyfnht5rm4jxu

A Direct Coherence Protocol for Many-Core Chip Multiprocessors

A Ros, M E Acacio, J M Garcia
2010 IEEE Transactions on Parallel and Distributed Systems  
Power constraints make impractical to rely on broadcasts (as, for example, Token-CMP does) or any other brute-force method for keeping cache coherence, and directory-based cache coherence protocols are  ...  In this work, we present DiCo-CMP, a novel cache coherence protocol especially suited to future many-core tiled CMP architectures.  ...  Token-CMP targets CMP systems, and uses a distributed arbitration scheme for persistent requests, which are issued after a single retry to optimize the access to contended blocks.  ... 
doi:10.1109/tpds.2010.43 fatcat:qctrazzuorhttie2jgk7sh2eym

Cache Coherence Protocols for Many-Core CMPs [chapter]

Alberto Ros, Manuel E., Jose M.
2010 Parallel and Distributed Computing  
Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource  ...  Particularly, Token-CMP obtains average improvements of 11% compared to Hammer-CMP and 1% compared to Directory-CMP.  ...  DiCo-FM improves the execution time by 14%, 5% and 4% compared to Hammer-CMP, Directory-CMP and Token-CMP, respectively.  ... 
doi:10.5772/9454 fatcat:dgesjev5ebeddkop63hle3hd7a

A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures

Ricardo Fernandez-Pascual, Jose M. Garcia, Manuel E. Acacio, Jose Duato
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
Using GEMS full system simulator, we compare our proposal against a similar protocol without fault tolerance (TOKENCMP).  ...  In particular, our proposal extends a token-based cache coherence protocol so that no data can be lost and no deadlock can occur due to any dropped message.  ...  TOKENCMP [10] is a performance policy which targets hierarchical multiple CMP systems.  ... 
doi:10.1109/hpca.2007.346194 dblp:conf/hpca/PascualGAD07 fatcat:wovatyfofrajdmp7fv464kcgoe

Extending the TokenCMP Cache Coherence Protocol for Low Overhead Fault Tolerance in CMP Architectures

R. Fernandez-Pascual, J.M. Garcia, M.E. Acacio, J. Duato
2008 IEEE Transactions on Parallel and Distributed Systems  
Using the GEMS full-system simulator, we compare our proposal against a similar protocol without fault tolerance (TOKENCMP).  ...  In particular, our proposal extends a token-based cache coherence protocol so that no data can be lost and no deadlock can occur due to any dropped message.  ...  ACKNOWLEDGMENTS The authors thank the anonymous reviewers for their insightful comments and suggestions that have significantly helped improve the final version of this paper.  ... 
doi:10.1109/tpds.2007.70803 fatcat:tk4ya5zynjabrotregysydpsue

TokenTLB+CUP: A Token-Based Page Classification with Cooperative Usage Prediction

Albert Esteve, Alberto Ros, Antonio Robles, Maria E. Gomez
2018 IEEE Transactions on Parallel and Distributed Systems  
Token counting on TLBs is a natural and efficient way for classifying memory pages, and it does not require the use of complex and undesirable persistent requests or arbitration.  ...  In addition, classification is extended with Cooperative Usage Predictor (CUP), a token-based system-wide page usage predictor retrieved through TLB cooperation, in order to perform a classification unaffected  ...  These requests are a source of complexity for the coherence protocol, being one of the main causes why Token coherence has not been implemented in commodity systems.  ... 
doi:10.1109/tpds.2017.2782808 fatcat:erwbuztfwbct3jxgr3lbbs5u5u

ATLAS: A Chip-Multiprocessor with Transactional Memory Support

Njuguna Njoroge, Jared Casper, Sewook Wee, Yuriy Teslyar, Daxia Ge, Christos Kozyrakis, Kunle Olukotun
2007 2007 Design, Automation & Test in Europe Conference & Exhibition  
However, the practical success of CMPs strongly depends on addressing the difficulty of multithreaded application development for such systems.  ...  We have mapped ATLAS to the BEE2 multi-FPGA board to create a full-system prototype that operates at 100MHz, boots Linux, and provides significant performance and ease-of-use benefits for a range of parallel  ...  The insights from application studies will be very useful in terms of improving TM implementations and programming models.  ... 
doi:10.1109/date.2007.364558 dblp:conf/date/NjorogeCWTGKO07 fatcat:v3l7ee7vy5c23nrvsfgejeueyq

The Power of Priority: NoC Based Distributed Cache Coherency

Evgeny Bolotin, Zvika Guz, Israel Cidon, Ran Ginosar, Avinoam Kolodny
2007 First International Symposium on Networks-on-Chip (NOCS'07)  
For further system improvements, we introduce additional low cost NoC mechanisms that include: virtual invalidation rings, efficient store-and-forward multicast for short messages which is embedded within  ...  Then we show how several low cost mechanisms incorporated into such a Vanilla NoC can facilitate CMP and boost performance of a cache coherent NUCA CMP.  ...  The token coherence method [19] suggests to exchange and count tokens to control coherence permissions. Several recent papers focused on NoC-based CMP cache coherency [21] [22] [23] .  ... 
doi:10.1109/nocs.2007.42 dblp:conf/nocs/BolotinGCGK07 fatcat:vofomf4eavcgtlcl3po6i2jcga

A practical FPGA-based framework for novel CMP research

Sewook Wee, Jared Casper, Njuguna Njoroge, Yuriy Tesylar, Daxia Ge, Christos Kozyrakis, Kunle Olukotun
2007 Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays - FPGA '07  
ATLAS uses the BEE2 multi-FPGA board to provide a system with 8 PowerPC cores that run at 100MHz and runs Linux.  ...  ATLAS provides significant benefits for CMP research such as 100x performance improvement over a software simulator and good visibility that helps with software tuning and architectural improvements.  ...  of FPGA-based systems will improve as the number of cores in CMPs increases.  ... 
doi:10.1145/1216919.1216936 dblp:conf/fpga/WeeCNTGKO07 fatcat:qamfrw43zngcvmcjcr3xsxfvwq
« Previous Showing results 1 — 15 out of 440 results