Filters








59 Hits in 4.0 sec

Adaptive Spill-Receive for robust high-performance caching in CMPs

Moinuddin K. Qureshi
2009 2009 IEEE 15th International Symposium on High Performance Computer Architecture  
In a DSR architecture, each cache uses Set Dueling to learn whether it should act as a "spiller cache" or "receiver cache" for best overall performance.  ...  This paper proposes Dynamic Spill-Receive (DSR) for efficient capacity sharing.  ...  Acknowledgments The author thanks William Starke, Ravi Nair, Viji Srinivasan, and Trey Cain for their comments and feedback.  ... 
doi:10.1109/hpca.2009.4798236 dblp:conf/hpca/Qureshi09 fatcat:y7zirfvbgra4da77r6nlbqukwq

Dynamic Program Behavior Identification for High Performance CMPs with Private LLCs

Xiaomin JIA, Pingjing LU, Caixia SUN, Minxuan ZHANG
2010 IEICE transactions on information and systems  
Chip Multi-Processors (CMPs) emerge as a mainstream architectural design alternative for high performance parallel and distributed computing.  ...  chip multi-processors (CMPs), performance, program behavior, last-level cache (LLC), spilling  ...  Acknowledgements We thank the anonymous reviewers for their constructive comments and suggestions. This work was partially supported by the National Natural Science  ... 
doi:10.1587/transinf.e93.d.3211 fatcat:o4excs5ekvdhzgznjnjahpkafm

Cooperative Caching for Chip Multiprocessors

Jichuan Chang, Gurindar S. Sohi
2006 SIGARCH Computer Architecture News  
For an 8-core CMP with 1MB L2 cache per core, the best cooperative caching scheme improves the performance of multithreaded commercial workloads by 5-11% compared with a shared cache and 4-38% compared  ...  For a 4-core CMP running multiprogrammed SPEC2000 workloads, cooperative caching is on average 11% and 6% faster than shared and private cache organizations, respectively.  ...  Acknowledgements This research is supported in part by NSF grants EIA-0071924 and CCR-0311572, and donations from Intel Corporation.  ... 
doi:10.1145/1150019.1136509 fatcat:2czrs2vmsvexhifohrweo3rxpy

Cooperative Caching for Chip Multiprocessors [chapter]

J. Chang, E. Herrero, R. Canal, G. Sohi
2011 Cooperative Networking  
For an 8-core CMP with 1MB L2 cache per core, the best cooperative caching scheme improves the performance of multithreaded commercial workloads by 5-11% compared with a shared cache and 4-38% compared  ...  For a 4-core CMP running multiprogrammed SPEC2000 workloads, cooperative caching is on average 11% and 6% faster than shared and private cache organizations, respectively.  ...  Acknowledgements This research is supported in part by NSF grants EIA-0071924 and CCR-0311572, and donations from Intel Corporation.  ... 
doi:10.1002/9781119973584.ch13 fatcat:2r526mewvvg2dmxfwkp2sfc5eq

Cache equalizer

Mohammad Hammoud, Sangyeun Cho, Rami G. Melhem
2011 Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers - HiPEAC '11  
This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large-scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets' usages.  ...  CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences.  ...  A recent work by Qureshi [22] proposed dynamic spill-receive (DSR) to improve upon CC by allowing private caches to either spill or receive cache blocks, but not both at the same time.  ... 
doi:10.1145/1944862.1944889 dblp:conf/hipeac/HammoudCM11 fatcat:gzndgemzqzabtn4jdec2dmq5hi

StimulusCache: Boosting performance of chip multiprocessors with excess cache

Hyunjin Lee, Sangyeun Cho, Bruce R. Childers
2010 HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture  
Consequently, the number of cores in a single chip multiprocessor (CMP) is expected to grow in coming years.  ...  The two major components in a multicore chip are compute cores and on-chip memory such as L2 cache.  ...  Figure 13 . 13 (a) Performance improvement with three StimulusCache policies and Dynamic Spill-Receive (DSR).  ... 
doi:10.1109/hpca.2010.5416644 dblp:conf/hpca/LeeCC10 fatcat:ylwktnqoprgmpfn5ozeevhtl5u

ASR: Adaptive Selective Replication for CMP Caches

Bradford Beckmann, Michael Marty, David Wood
2006 Microarchitecture (MICRO), Proceedings of the Annual International Symposium on  
Full-system simulations of 8-processor CMPs show that ASR provides robust performance: improving performance by as much as 29% versus shared caches, 19% versus private caches, and 12% versus CMP-NuRapid  ...  Recent hybrid proposals use selective replication to balance latency and capacity, but their static replication rules result in performance degradation for some combinations of workloads and system configurations  ...  Acknowledgements We thank Luke Yen, Dan Gibson, the Wisconsin Computer Architecture Affiliates, Virtutech AB, the Wisconsin Condor group, the Wisconsin Computer Systems Lab, and the anonymous reviewers for  ... 
doi:10.1109/micro.2006.10 dblp:conf/micro/BeckmannMW06 fatcat:ybh52lm5ajenvezppy2f2rbjpy

A Dynamic Pressure-Aware Associative Placement Strategy for Large Scale Chip Multiprocessors

Mohammad Hammoud, Sangyeun Cho, Rami Melhem
2010 IEEE computer architecture letters  
This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets usages.  ...  CE provides Quality of Service (QoS) by robustly offering better performance than the baseline shared NUCA cache.  ...  We compared in Section 4. [20] proposed dynamic spill-receive (DSR) to improve upon CC by allowing private caches to either spill or receive cache blocks, but not both at the same time.  ... 
doi:10.1109/l-ca.2010.7 fatcat:5obf374lfnbnzhyuy2qr2r4r2i

An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors

Haakon Dybdahl, Per Stenstrom
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
We show that our scheme outperforms a private and shared cache organization as well as a hybrid NUCA organization in which blocks in a local partition can spill over to neighbor core partitions.  ...  The significant speed-gap between processor and memory and the limited chip memory bandwidth make last-level cache performance crucial for future chip multiprocessors.  ...  We contribute in this paper a novel NUCA design for CMPs based on private partitioning in which the size of the core-local partitions that are shared is chosen adaptively to maximize the overall performance  ... 
doi:10.1109/hpca.2007.346180 dblp:conf/hpca/DybdahlS07 fatcat:oooqssulxndyxlpawkymz2qk5a

In-network Monitoring and Control Policy for DVFS of CMP Networks-on-Chip and Last Level Caches

Xi Chen, Zheng Xu, Hyungjun Kim, Paul Gratz, Jiang Hu, Michael Kishinevsky, Umit Ogras
2012 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip  
In chip design today and for a foreseeable future, on-chip communication is not only a performance bottleneck but also a substantial power consumer.  ...  This work focuses on employing dynamic voltage and frequency scaling (DVFS) policies for networks-on-chip (NoC) and shared, distributed last-level caches (LLC).  ...  Acknowledgments We would like to thank Pritha Ghoshal, Mark Browning and David Kadjo for their helpful discussions. This research is supported by a gift from Intel Corp.  ... 
doi:10.1109/nocs.2012.12 dblp:conf/nocs/ChenXKGHKO12 fatcat:b5isnrtd4rcyda3crsk2cp6nxy

In-network monitoring and control policy for DVFS of CMP networks-on-chip and last level caches

Xi Chen, Zheng Xu, Hyungjun Kim, Paul Gratz, Jiang Hu, Michael Kishinevsky, Umit Ogras
2013 ACM Transactions on Design Automation of Electronic Systems  
In chip design today and for a foreseeable future, on-chip communication is not only a performance bottleneck but also a substantial power consumer.  ...  This work focuses on employing dynamic voltage and frequency scaling (DVFS) policies for networks-on-chip (NoC) and shared, distributed last-level caches (LLC).  ...  Acknowledgments We would like to thank Pritha Ghoshal, Mark Browning and David Kadjo for their helpful discussions. This research is supported by a gift from Intel Corp.  ... 
doi:10.1145/2504905 fatcat:4yfhec5yabckxlnitmu6ulf2ae

Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches

Manu Awasthi, Kshitij Sudan, Rajeev Balasubramonian, John Carter
2009 2009 IEEE 15th International Symposium on High Performance Computer Architecture  
In future multi-cores, large amounts of delay and power will be spent accessing data in large L2/L3 caches.  ...  In this work, we extend that concept with mechanisms that dynamically move data within caches.  ...  The cache and core layouts for the 4 and 8 core CMP systems are shown in Figure 2 .  ... 
doi:10.1109/hpca.2009.4798260 dblp:conf/hpca/AwasthiSBC09 fatcat:bxzvguck3jgijf5zvnvzh5qdia

Hardware support for protective and collaborative cache sharing

Raj Parihar, Jacob Brock, Chen Ding, Michael C. Huang
2016 SIGPLAN notices  
We show that rationing provides good resource protection and full cache utilization of the shared cache for a variety of co-runs.  ...  This paper explores cache management policies that allow conservative sharing to protect the cache occupancy for individual programs, yet enable full cache utilization whenever there is an opportunity  ...  Acknowledgments The research is supported in part by the National Science Foundation (Contract No.  ... 
doi:10.1145/3241624.2926705 fatcat:6fp3bq6cc5bhjk36b76wsfsmsm

Hardware support for protective and collaborative cache sharing

Raj Parihar, Jacob Brock, Chen Ding, Michael C. Huang
2016 Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management - ISMM 2016  
We show that rationing provides good resource protection and full cache utilization of the shared cache for a variety of co-runs.  ...  This paper explores cache management policies that allow conservative sharing to protect the cache occupancy for individual programs, yet enable full cache utilization whenever there is an opportunity  ...  Acknowledgments The research is supported in part by the National Science Foundation (Contract No.  ... 
doi:10.1145/2926697.2926705 dblp:conf/iwmm/PariharBDH16 fatcat:3j35kybiqfbxld6zqzbzhvsdqq

T-CREST: Time-predictable multi-core architecture for embedded systems

Martin Schoeberl, Sahar Abbaspour, Benny Akesson, Neil Audsley, Raffaele Capasso, Jamie Garside, Kees Goossens, Sven Goossens, Scott Hansen, Reinhold Heckmann, Stefan Hepp, Benedikt Huber (+11 others)
2015 Journal of systems architecture  
Compared to other processors the WCET performance is outstanding. The T-CREST platform is evaluated with two industrial use cases.  ...  Within the T-CREST project we propose novel solutions for time-predictable multi-core architectures that are optimized for the WCET instead of the average-case execution time.  ...  Even with high performance processors in our desktop PCs we notice once in a while that the PC is ''frozen'' for a few seconds.  ... 
doi:10.1016/j.sysarc.2015.04.002 fatcat:yts4coszkbg7vbes3b4hzdyzui
« Previous Showing results 1 — 15 out of 59 results