Filters








588 Hits in 4.6 sec

Proximity-aware directory-based coherence for multi-core processor architectures

Jeffery A. Brown, Rakesh Kumar, Dean Tullsen
2007 Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07  
This paper describes mechanisms to accelerate coherence for a multi-core architecture that has multiple private L2 caches and a scalable point-to-point interconnect between cores.  ...  In this paper, we discuss implementations of coherence for CMPs and propose and evaluate a novel directory-based coherence scheme to improve the performance of parallel programs on such processors.  ...  Acknowledgments The authors would like to thank the anonymous reviewers for their helpful insights.  ... 
doi:10.1145/1248377.1248398 dblp:conf/spaa/BrownKT07 fatcat:y7c3zgv3dncirjggikdfmyfuwi

Proximity coherence for chip multiprocessors

Nick Barrow-Williams, Christian Fensch, Simon Moore
2010 Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10  
We compare our Proximity Coherence protocol to an existing directory-based MESI protocol using fullsystem simulations of a 32 core system.  ...  As such, they seldom take advantage of the new possibilities that many-core architectures offer.  ...  Many-core processors are unconstrained by the packaging and interconnect latencies of larger multi-node machines, suggesting many possible architectural advances.  ... 
doi:10.1145/1854273.1854293 dblp:conf/IEEEpact/Barrow-WilliamsFM10 fatcat:fqtzq4ffrrclfdkvu7cj2ushda

Enhancing Cache Coherent Architectures with access patterns for embedded manycore systems

Jussara Marandola, Stephane Louise, Loic Cudennec, Jean-Thomas Acquaviva, David A. Bader
2012 2012 International Symposium on System on Chip (SoC)  
In this paper, we present a Cache Coherent Architecture that optimizes memory accesses to patterns using both a hardware component and specialized instructions.  ...  The high performance hardwarecomponent in our context is aimed at CMP (Chip Multi-Processing) and MPSoC (Multiprocessor System-on-Chip).  ...  Context: Cache Coherence for CMP Architectures Shared Memory Chip Multi-Processor Architectures are expected to host up to hundreds of cores.  ... 
doi:10.1109/issoc.2012.6376369 dblp:conf/issoc/MarandolaLCAB12 fatcat:3ovjidxtgfendpydlbes4uzqgu

Reactive NUCA

Nikos Hardavellas, Michael Ferdman, Babak Falsafi, Anastasia Ailamaki
2009 Proceedings of the 36th annual international symposium on Computer architecture - ISCA '09  
Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last-level cache for multicore processors.  ...  R-NUCA cooperates with the operating system to support intelligent placement, migration, and replication without the overhead of an explicit coherence mechanism for the on-chip last-level cache.  ...  Somogyi for their technical assistance, and T. Brecht, T. Strigkos, and the anonymous reviewers for their feedback on earlier drafts of this paper.  ... 
doi:10.1145/1555754.1555779 dblp:conf/isca/HardavellasFFA09 fatcat:326qapu44fd47o5dt3qm7ghbgy

Optimizing Coherence Traffic in Manycore Processors using Closed-Form Caching/Home Agent Mappings

Steve Kommrusch, Marcos Horro, Louis-Noel Pouchet, Gabriel Rodriguez, Juan Tourino
2021 IEEE Access  
Manycore processors feature a high number of general-purpose cores designed to work in a multithreaded fashion. Recent manycore processors are kept coherent using scalable distributed directories.  ...  INDEX TERMS Network-on-chip, manycores, coherence traffic, distributed directories, architectural discovery, reverse engineering.  ...  ACKNOWLEDGMENT The authors wish to thank John McCalpin for his invaluable insights into the KNL architecture.  ... 
doi:10.1109/access.2021.3058280 fatcat:zd26nmknuvfpbi5mtvhj6zz2ry

Coherence Traffic in Manycore Processors with Opaque Distributed Directories [article]

Steve Kommrusch, Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, Juan Touriño
2020 arXiv   pre-print
Manycore processors feature a high number of general-purpose cores designed to work in a multithreaded fashion. Recent manycore processors are kept coherent using scalable distributed directories.  ...  The distributed coherence subsystem must be queried for every out-of-tile access, imposing an overhead on memory latency.  ...  [13] developed a simulator for the traffic on the NoC of distributed directory architectures based on the Tejas architectural simulator [25] , predicting that codes with coherence traffic control would  ... 
arXiv:2011.05422v1 fatcat:277m4su4bnholla6fiv4vwgx54

Practically private

Yong Li, Rami Melhem, Alex K. Jones
2012 Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12  
While this proposed data classification scheme can be applied to many micro-architectural constructs including the TLB, coherence directory and interconnect, we demonstrate its potential through an efficient  ...  cache coherence design.  ...  Figure 2 : 2 Different scenarios for practically private data Figure 5 : 5 Architecture organization for data classification aware caching Figure 6 : 6 Examples of data flow and coherence protocol  ... 
doi:10.1145/2370816.2370852 dblp:conf/IEEEpact/LiMJ12 fatcat:nuix3bvp2veojdwikmbagigsem

A Dynamic Pressure-Aware Associative Placement Strategy for Large Scale Chip Multiprocessors

Mohammad Hammoud, Sangyeun Cho, Rami Melhem
2010 IEEE computer architecture letters  
CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences.  ...  This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets usages.  ...  This work is based on the shared scheme and employs a distributed directory protocol for coherence maintenance.  ... 
doi:10.1109/l-ca.2010.7 fatcat:5obf374lfnbnzhyuy2qr2r4r2i

Predicting Coherence Communication by Tracking Synchronization Points at Run Time

Socrates Demetriades, Sangyeun Cho
2012 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture  
In directory coherence protocols, directly communicating with the predicted processors avoids costly indirection to the directory.  ...  Based on this observation, we build a predictor that can improve the miss latency of a directory protocol by 13%.  ...  Milos Prvulovic, members of Pitt's XCG (formerly CAST) group, and the anonymous reviewers for their constructive comments and suggestions.  ... 
doi:10.1109/micro.2012.40 dblp:conf/micro/DemetriadesC12 fatcat:uytjuogdargh7hlttq7hstnyki

A flexible data to L2 cache mapping approach for future multicore processors

Lei Jin, Hyunjin Lee, Sangyeun Cho
2006 Proceedings of the 2006 workshop on Memory system performance and correctness - MSPC '06  
This paper proposes and studies a distributed L2 cache management approach through page-level data to cache slice mapping in a future processor chip comprising many cores.  ...  L2 cache management is a crucial multicore processor design aspect to overcome non-uniform cache access latency for high program performance and to reduce on-chip network traffic and related power consumption  ...  For coherence enforcement, we model a distributed directory scheme. When data is traversing through the mesh-based network, a two-cycle latency is incurred per each hop.  ... 
doi:10.1145/1178597.1178613 dblp:conf/ACMmsp/JinLC06 fatcat:3puyalgmtvao7fzvbm2yzigqzm

Accurate, scalable and informative design space exploration for large and sophisticated multi-core oriented architectures

Chang-Burm Cho, J. Poe, Tao Li, Jingling Yuan
2009 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems  
The trend toward multi-/many-core processors will result in sophisticated largescale architecture substrates (e.g. non-uniformly accessed caches interconnected by network-on-chip) that exhibit increasingly  ...  Through case studies, we demonstrate that the proposed techniques can be used to informatively explore and accurately evaluate global, cooperative multi-core resource allocation and thermal-aware designs  ...  ACKNOWLEDGMENT This work is supported in part by NSF CAREER Award CCF-0845721, and by Microsoft Research Safe and Scalable Multi-core Computing Award.  ... 
doi:10.1109/mascot.2009.5366283 dblp:conf/mascots/ChoPLY09 fatcat:npsknlayezc65h6w4plxnjtcqu

Cooperative Caching for Chip Multiprocessors

Jichuan Chang, Gurindar S. Sohi
2006 SIGARCH Computer Architecture News  
These policies can be implemented by modifying an existing cache replacement policy and cache coherence protocol, or by the new implementation of a directory-based protocol presented in this paper.  ...  For an 8-core CMP with 1MB L2 cache per core, the best cooperative caching scheme improves the performance of multithreaded commercial workloads by 5-11% compared with a shared cache and 4-38% compared  ...  This implementation is based on a MOSI directory protocol to maintain cache coherence, but differs from a traditional directory-based system in several ways: (1) the directory memory for private caches  ... 
doi:10.1145/1150019.1136509 fatcat:2czrs2vmsvexhifohrweo3rxpy

Cooperative Caching for Chip Multiprocessors [chapter]

J. Chang, E. Herrero, R. Canal, G. Sohi
2011 Cooperative Networking  
These policies can be implemented by modifying an existing cache replacement policy and cache coherence protocol, or by the new implementation of a directory-based protocol presented in this paper.  ...  For an 8-core CMP with 1MB L2 cache per core, the best cooperative caching scheme improves the performance of multithreaded commercial workloads by 5-11% compared with a shared cache and 4-38% compared  ...  This implementation is based on a MOSI directory protocol to maintain cache coherence, but differs from a traditional directory-based system in several ways: (1) the directory memory for private caches  ... 
doi:10.1002/9781119973584.ch13 fatcat:2r526mewvvg2dmxfwkp2sfc5eq

Cluster Cache Monitor: Leveraging the Proximity Data in CMP

Guohong Li, Olivier Temam, Zhenyu Liu, Sanchuan Guo, Dongsheng Wang
2014 International journal of parallel programming  
We organize the multi-core into clusters of 2 × 2 nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM).  ...  We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving  ...  Leveraging Data Proximity Proximity-Aware directory coherence (PADC) [10] shares many intuitions with DCC: both are based on private L2s and take advantage of neighbor data again.  ... 
doi:10.1007/s10766-014-0339-0 fatcat:aqpvo3kcrve5pjnjfcxzbnpcvu

Pattern Based Cache Coherency Architecture for Embedded Manycores

Jussara Marandola, Stephane Louise, Loic Cudennec
2016 Procedia Computer Science  
cores on a manycore system.  ...  But for embedded devices, memory coherence protocols tend to account for a sizable portion of chip's power consumption. This is why any means to lower this impact is important.  ...  The base architecture is a multi-core system, each core fitted with its memory hierarchy (L1, L2), Directory and Pattern Table, and all cores have access to a Network on Chip (NoC) that permits each  ... 
doi:10.1016/j.procs.2016.05.481 fatcat:7znxuor2lffw7djjufvrganosu
« Previous Showing results 1 — 15 out of 588 results