3,890 Hits in 5.8 sec

An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

Liqun Cheng, John B. Carter, Donglai Dai
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
We focus on improving the performance of applications that exhibit producer-consumer sharing. We first present a simple hardware mechanism for detecting producerconsumer sharing.  ...  We then describe a directory delegation mechanism whereby the "home node" of a cache line can be delegated to a producer node, thereby converting 3-hop coherence operations into 2-hop operations.  ...  In this paper, we present a novel adaptive mechanism that identifies and optimizes for producer-consumer sharing.  ... 
doi:10.1109/hpca.2007.346210 dblp:conf/hpca/ChengCD07 fatcat:agljguiq5bedzfbmkcxbjaqsia

Improving CC-NUMA performance using Instruction-based Prediction

S. Kaxiras, J.R. Goodman
1999 Proceedings Fifth International Symposium on High-Performance Computer Architecture  
Typically, in this environment, prediction is based on datablock access history (address-based prediction) in the form of adaptive cache coherence protocols.  ...  We propose Instruction-based Prediction as a means to optimize directory-based cache coherent NUMA shared-memory.  ...  Acknowledgments We would like to thank Alain Kägi, Doug Burger, Ravi Rajwar, David Wood, Mark Hill, and Guri Sohi for their helpful comments on drafts of this paper.  ... 
doi:10.1109/hpca.1999.744359 dblp:conf/hpca/KaxirasG99 fatcat:i7z2urwwlnap5dxtbzoju3zgyy

Using prediction to accelerate coherence protocols

Shubhendu S. Mukherjee, Mark D. Hill
1998 SIGARCH Computer Architecture News  
To ameliorate this latency, researchers have augmented standard coherence protocols with optimizations for specific sharing patterns, such as read-modify-write, producer-consumer, and migratory sharing  ...  Most large shared-memory multiprocessors use directory protocols to keep per-processor caches coherent.  ...  Acknowledgments We would like to thank members of the Wisconsin Wind Tunnel group ( for providing an environment conducive to this work and David Wood for providing key feedback  ... 
doi:10.1145/279361.279386 fatcat:hja46jd2unfadowt5acakx5geu

Improving support for locality and fine-grain sharing in chip multiprocessors

Hemayet Hossain, Sandhya Dwarkadas, Michael C. Huang
2008 Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08  
In this paper, we present Adaptive Replication, Migration, and producer-Consumer Optimization (ARMCO), a coherence protocol that, to the best of our knowledge, is the first to exploit direct access to  ...  Our goal is to provide support for tightly coupled sharing by recognizing and adapting to common sharing patterns such as migratory, producer-consumer, multiple-reader, and multiple read-write.  ...  We present Adaptive Replication, Migration, and producer-Consumer Optimization (ARMCO), a protocol that adaptively optimizes data communication for migratory, producer-consumer, multiple-readers, multiple-writers  ... 
doi:10.1145/1454115.1454138 dblp:conf/IEEEpact/HossainDH08 fatcat:vcrt3364lfcyvjfwkpctu2cn4u

A Case for Fine-grain Coherence Specialization in Heterogeneous Systems [article]

Johnathan Alsop, Weon Taek Na, Matthew D. Sinclair, Samuel Grayson, Sarita V. Adve
2021 arXiv   pre-print
We then describe how to optimize individual memory requests to improve cache reuse and performance-critical memory latency in emerging heterogeneous workloads.  ...  This paper demonstrates the benefits of fine-grained coherence specialization for heterogeneous systems.  ...  By using ReqO+data for the consumer access (load or RMW) in a producer-consumer pair and ReqWTfwd[+data] for the producer access (store or RMW), the producer update will be forwarded to the owner and enable  ... 
arXiv:2104.11678v1 fatcat:urn2a4zn75d3jevj2wmpf4dzpi

Achieving high performance in bus-based shared-memory multiprocessors

A. Milenkovic
2000 IEEE Concurrency  
Snooping cache coherence protocols can effectively keep shared data coherent because they use the simple and effective broadcast capability of a single bus. 2 Cache coherence protocols fall into two broad  ...  classes-write-invalidate and writeupdate -depending on whether a write to a shared cache block invalidates or updates all other copies of the block. 3 Although a write-update protocol produces fewer cache  ...  Injection on first read is always applicable when there is more than one consumer-for read-only shared data or for a 1P-MC (1 producer-multiple consumers) sharing pattern.  ... 
doi:10.1109/4434.865891 fatcat:e42udkwrhjdrfbi2rsjfhna5tm

On the design of global object space for efficient multi-threading Java computing on clusters

Weijian Fang, Cho-Li Wang, Francis C.M. Lau
2003 Parallel Computing  
With this framework, we are able to effectively calibrate the runtime memory access patterns and dynamically apply optimized cache coherence protocols to minimize consistency maintenance overhead.  ...  place remotely at the home node of its locked object, and connectivity-based object pushing that uses object connectivity information to optimize the producer-consumer access pattern.  ...  In our design, we use an object-based adaptive cache coherence protocol to implement the Java memory model.  ... 
doi:10.1016/j.parco.2003.05.007 fatcat:cvo2lmmq5vdfleyhtzysrbwdlm

DDCache: Decoupled and Delegable Cache Data and Metadata

Hemayet Hossain, Sandhya Dwarkadas, Michael C. Huang
2009 2009 18th International Conference on Parallel Architectures and Compilation Techniques  
We present a new cache coherence protocol that decouples the logical binding between data and metadata in a cache set.  ...  nonuniform cache access protocol, while generating only 65% and 74% of the on-chip and off-chip traffic respectively, and consuming 74% of the corresponding energy (95% of the power) in the on-chip memory  ...  Several coherence protocols that detect and optimize coherence actions for specific sharing patterns have been proposed in the past [9] , [12] , [35] .  ... 
doi:10.1109/pact.2009.24 dblp:conf/IEEEpact/HossainDH09 fatcat:xiumut6errhgnnihu5xcnvcojq

Adaptive software cache management for distributed shared memory architectures

John K. Bennett, John B. Carter, Willy Zwaenepoel
1990 SIGARCH Computer Architecture News  
We contend that, in distributed shared memory systems, adaptive cache coherence mechanisms will outperform static cache coherence mechanisms.  ...  An adaptive cache coherence mechanism exploits semantic information about the expected or observed access behavior of particular data objects.  ...  Acknowledgements The authors would like to thank the referees and the members of the Rice computer systems group (Elmootazbellah Elnozahy, Jerry Fowler, David Johnson, Pete Keleher, and Mark Mazina) for  ... 
doi:10.1145/325096.325124 fatcat:3zhu34cnazhepfkh6omjpk4afu

MPSoC Design Using Application-Specific Architecturally Visible Communication [chapter]

Theo Kluter, Philip Brisk, Edoardo Charbon, Paolo Ienne
2009 Lecture Notes in Computer Science  
consumes less energy and takes fewer cycles than a cache access; and (4) cache coherence traffic for producer/consumer data is eliminated, reducing pressure on the memory subsystem.  ...  Producer/consumer relationships map poorly onto cache-based MPSoCs.  ...  Furthermore, cache-to-cache communication is a bottleneck, as the concurrent transfer of data from producer to consumer leads to an excess of coherence traffic within the memory system.  ... 
doi:10.1007/978-3-540-92990-1_15 fatcat:eeff7nkbpncjrkcb2qnt35if74

Automatic sharing classification and timely push for cache-coherent systems

Malek Musleh, Vijay S. Pai
2015 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15  
We integrate STAP into a MOESI cache-coherence protocol using heuristics to detect different data sharing patterns, including broadcasts, producer/consumer, and migratory-data sharing.  ...  This paper proposes and evaluates Sharing/Timing Adaptive Push (STAP), a dynamic scheme for preemptively sending data from producers to consumers to minimize criticalpath communication latency.  ...  Previous work falls into four categories: timely data prefetching, hybrid cache coherence protocols, producer-initiated primitives, and data-sharing set predictions. Timely prefetching.  ... 
doi:10.1145/2807591.2807649 dblp:conf/sc/MuslehP15 fatcat:dcmwjqtnzrg2hp36fk2wjridna

Shared-memory performance profiling

Zhichen Xu, James R. Larus, Barton P. Miller
1997 SIGPLAN notices  
This approach exploits the underlying system's cache coherence protocol to detect data sharing patterns that indicate potential performance bottlenecks and presents performance measurements in a data-centric  ...  distributed shared memory system.  ...  ACKNOWLEDGMENTS Thanks to Sang Tae Kim and Atipat Rojnuckarin for providing the application code and the insight and effort for tuning its performance.  ... 
doi:10.1145/263767.263796 fatcat:6tgmda63wvf4jpwfhyuwuzepli

Shared-memory performance profiling

Zhichen Xu, James R. Larus, Barton P. Miller
1997 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '97  
This approach exploits the underlying system's cache coherence protocol to detect data sharing patterns that indicate potential performance bottlenecks and presents performance measurements in a data-centric  ...  distributed shared memory system.  ...  ACKNOWLEDGMENTS Thanks to Sang Tae Kim and Atipat Rojnuckarin for providing the application code and the insight and effort for tuning its performance.  ... 
doi:10.1145/263764.263796 dblp:conf/ppopp/XuLM97 fatcat:ymcsip25sbhojhcnovfrbopg7q

A tuneable software cache coherence protocol for heterogeneous MPSoCs

Frank Ophelders, Marco J.G. Bekooij, Henk Corporaal
2009 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis - CODES+ISSS '09  
Our software cache coherence protocol is implemented on an ARM926EJ-S MPSoC which is mapped on an FPGA.  ...  Existing hardware cache coherence protocols are less suitable for MPSoCs because many off-the-shelf processors used in MPSoCs do not support these protocols.  ...  In our software cache coherence protocol we propose FIFO communication as an optimization, and we discuss an efficient software cache coherence protocol for FIFO buffers.  ... 
doi:10.1145/1629435.1629488 dblp:conf/codes/OpheldersBC09 fatcat:iszf54kppbeq5he5m7gbwlffiy

Parallel and Distributed Processing with Applications: Preface

Jesus Carretero, Laurence T. Yang
2013 International journal of parallel programming  
Nicolau, for approving this special issue and for his help along the process of its preparation.  ...  The paper Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors presents two mechanisms to design efficient and scalable cache coherence protocols for CMPs by using an adaptive hybrid  ...  protocol to reduce coherence misses observed in write-invalidate based protocols pushing updates to potential consumers based on observed producer-consumer sharing patterns.  ... 
doi:10.1007/s10766-013-0254-9 fatcat:nt32g52t5zgehkpxkmxqsulthq
« Previous Showing results 1 — 15 out of 3,890 results