Filters








28 Hits in 5.6 sec

A Survey of Cache Bypassing Techniques

Sparsh Mittal
2016 Journal of Low Power Electronics and Applications  
This paper presents a survey of cache bypassing techniques for CPUs, GPUs and CPU-GPU heterogeneous systems, and for caches designed with SRAM, non-volatile memory (NVM) and die-stacked DRAM.  ...  However, due to strict area/power budgets and presence of poor data-locality workloads, blindly scaling cache capacity is both infeasible and ineffective.  ...  On a TLB miss, the requested block is allocated in cache (if not there already) and both page table and cTLB are updated with virtual-to-cache mapping.  ... 
doi:10.3390/jlpea6020005 fatcat:rkiqtcjbcvggde5utaqogg5xxa

The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence

Mahdad Davari, Alberto Ros, Erik Hagersten, Stefanos Kaxiras
2015 ACM Transactions on Architecture and Code Optimization (TACO)  
In this paper we ask the question: how granularity-page-level vs. cache-line level-and adaptivity-going from shared to private-affect the outcome of classification and what is its final impact on coherence  ...  To answer this, we create a classification technique, called Generational Classification, and a coherence protocol called Generational Coherence, which treats data as private or shared based on cache-line  ...  dead-block predictors [Lai et al. 2001] .  ... 
doi:10.1145/2790301 fatcat:wk2bsaxxaje57b7sq2q4yqcxyq

An Efficient, Self-Contained, On-chip Directory: DIR1-SISD

Mahdad Davari, Alberto Ros, Erik Hagersten, Stefanos Kaxiras
2015 2015 International Conference on Parallel Architecture and Compilation (PACT)  
However, such schemes introduce new challenges by transferring some of the directory complexity and functionality to the OS and using the page table and the TLBs to store data classification information  ...  ) and, further, outperforms a SISD protocol that relies on the OS to provide a persistent page-based directory (4% in execution time and 20% in traffic).  ...  621-2012-5332), Vinnova Vinn-Verifiering (award: VIPS 2013-01113), "Fundación Seneca-Agencia de Ciencia y Tecnología de la Región de Murcia" under grant "Jóvenes Líderes en Investigación" 18956/JLI/13, and  ... 
doi:10.1109/pact.2015.23 dblp:conf/IEEEpact/DavariRHK15 fatcat:yhyrbyue6jbznkfqam6ibep3eq

Combating NBTI-induced aging in data caches

Shuai Wang, Guangshan Duan, Chuanlei Zheng, Tao Jin
2013 Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI - GLSVLSI '13  
Due to the unbalanced duty cycle ratio of the SRAM cells, the data cache suffers a heavy NBTI stress and this will further exacerbate the aging effect in the data cache.  ...  By applying our proposed idle-time-based cacheline invalidation, early write-back, and bit-flipping schemes, the duty cycle ratio of the data cache can be well balanced.  ...  Branch Predictor Alpha 21264 tournament predictor 32-entry RAS BTB 2048-entry 2-way Memory Hierarchy L1 I/DCache 64KB, 2 ways, 64B blocks, 2 cycles L2 UCache 4MB, 8 ways, 128B blocks, 12 cycles  ... 
doi:10.1145/2483028.2483096 dblp:conf/glvlsi/WangDZJ13 fatcat:7jxljeplvvejxcy7dr7vju3f6a

Cache decay

Stefanos Kaxiras, Zhigang Hu, Margaret Martonosi
2001 SIGARCH Computer Architecture News  
That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of "dead time" before they are evicted.  ...  We discuss policies and implementations for reducing cache leakage by invalidating and "turning off" cache lines when they hold data not likely to be reused.  ...  Our thanks to Jim Goodman who turned our attention to adaptive decay techniques and to Alan J. Smith for pointing out LRU decay and multiprogramming.  ... 
doi:10.1145/384285.379268 fatcat:reh2x2cj2nfd3kdytq7ptin6mq

Reliability-aware Garbage Collection for Hybrid HBM-DRAM Memories

Wenjie Liu, Shoaib Akram, Jennifer B. Sartor, Lieven Eeckhout
2021 ACM Transactions on Architecture and Code Optimization (TACO)  
are hot and low-risk, and (3) allocation site is a good predictor for hotness and risk.  ...  Unfortunately, these approaches operate at a coarse-grained page granularity, and frequent page migrations hurt performance.  ...  The mark-region mature space consists of a hierarchy of blocks and lines. Blocks are multiples of page sizes and constitute multiple lines. Lines are multiples of cache line sizes.  ... 
doi:10.1145/3431803 fatcat:let3q6xgmrfv3kuldijuekh3a4

ReStore: Symptom-Based Soft Error Detection in Microprocessors

N.J. Wang, S.J. Patel
2006 IEEE Transactions on Dependable and Secure Computing  
Example symptoms include exceptions, control flow misspeculations, and cache or translation look-aside buffer misses.  ...  To date, in all but the most demanding applications, implementing parity and ECC for caches and other large, regular SRAM structures have been sufficient to stem the growing soft error tide.  ...  In addition, more subtle events like cache and TLB misses can also be caused by soft errors.  ... 
doi:10.1109/tdsc.2006.40 fatcat:lsq55w3zl5guxohv4mvlvwb5pi

HW/SW co-designed processors: Challenges, design choices and a simulation infrastructure for evaluation

Rakesh Kumar, Jose Cano, Aleksandar Brankovicy, Demos Pavlouz, Kyriakos Stavrouz, Enric Gibertx, Alejandro Martinez, Antonio Gonzalez
2017 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)  
A fundamental requirement for evaluating different design choices and trade-offs to meet these challenges is to have a simulation infrastructure.  ...  This paper identifies the key challenges that HW/SW codesigned processors face and the basic requirements for a simulation infrastructure targeting these architectures.  ...  ACKNOWLEDGMENTS This work was supported by the Spanish State Research Agency under grants TIN2013-44375-R and TIN2016-75344-R (AEI/FEDER, EU).  ... 
doi:10.1109/ispass.2017.7975290 dblp:conf/ispass/KumarCBPSGMG17 fatcat:h7zmsyabebbhjitopdldmsfadm

Rethinking the Memory Hierarchy for Modern Languages

Po-An Tsai, Yee Ling Gan, Daniel Sanchez
2018 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)  
This avoids the need for associative caches. Instead, Hotpads moves objects across a hierarchy of directly addressed memories.  ...  As a result, Hotpads improves memory performance and efficiency substantially, and unlocks many new optimizations.  ...  This work was supported in part by NSF grant CAREER-1452994 and by a grant from the Qatar Computing Research Institute.  ... 
doi:10.1109/micro.2018.00025 dblp:conf/micro/TsaiG018 fatcat:zq5ywopzbfgxjme6ccdw4u46cq

Profile-guided proactive garbage collection for locality optimization

Wen-ke Chen, Sanjay Bhansali, Trishul Chilimbi, Xiaofeng Gao, Weihaw Chuang
2006 Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation - PLDI '06  
In this thesis, we introduce Page-based Transactional Memory to support unbounded transactions.  ...  Bounds checking protects pointer and array accesses. It is the process of keeping track of the address boundaries for objects, and checking that the loads and stores do not stray outside bounds.  ...  Each 128 byte block of memory was organized as 16 blocks per page, and had a bit in a lock bit-vector that is associated with the page via the page table and TLB.  ... 
doi:10.1145/1133981.1134021 dblp:conf/pldi/ChenBCGC06 fatcat:bnv6i26dnfaj3fel6vudd2ut3u

Profile-guided proactive garbage collection for locality optimization

Wen-ke Chen, Sanjay Bhansali, Trishul Chilimbi, Xiaofeng Gao, Weihaw Chuang
2006 SIGPLAN notices  
In this thesis, we introduce Page-based Transactional Memory to support unbounded transactions.  ...  Bounds checking protects pointer and array accesses. It is the process of keeping track of the address boundaries for objects, and checking that the loads and stores do not stray outside bounds.  ...  Each 128 byte block of memory was organized as 16 blocks per page, and had a bit in a lock bit-vector that is associated with the page via the page table and TLB.  ... 
doi:10.1145/1133255.1134021 fatcat:rdddng3bj5fdfprxfg6ut3dg3y

Dynamic cache reconfiguration based techniques for improving cache energy efficiency [article]

Sparsh Mittal
2013 arXiv   pre-print
In this research, we propose novel cache leakage energy saving schemes for single-core and multicore systems; desktop, QoS, real-time and server systems.  ...  The conventional schemes of cache energy saving either aim at saving dynamic energy or are based on properties specific to first-level caches, and thus these schemes have limited utility for last-level  ...  Out of the blocks not fitting the available cache space, the clean blocks are discarded and the dirty blocks are written back.  ... 
arXiv:1310.4231v1 fatcat:im5qck2ojngl5no6fm636rncxu

Revisiting LP-NUCA Energy Consumption

Darío Suárez Gracia, Alexandra Ferrerón, Luis Montesano Del Campo, Teresa Monreal Arnal, Víctor Viñals Yúfera
2014 ACM Transactions on Architecture and Code Optimization (TACO)  
This work identifies the sources of energy waste in LP-NUCAs: parallel access to the tag and data arrays of the tiles and low locality phases with useless block migration.  ...  Cache working-set adaptation is key as embedded systems move to multiprocessor and Simultaneous Multithreaded Architectures (SMT) because interthread pollution harms system performance and battery life  ...  ACKNOWLEDGMENTS The authors would like to thank Laura Neville, Jimi Xenidis, Jorge Albericio Latorre, and the anonymous referees for their helpful comments and suggestions.  ... 
doi:10.1145/2632217 fatcat:dfuzo6cvybcjtihd6bnz7lphfm

Profile-based pretenuring

Stephen M. Blackburn, Matthew Hertz, Kathryn S. Mckinley, J. Eliot B. Moss, Ting Yang
2007 ACM Transactions on Programming Languages and Systems  
to both applications and Jikes RVM, a compiler and run-time system for Java written in Java.  ...  an immortal allocation space in addition to a nursery and older generation, and show that pretenuring to immortal space has substantial benefit.  ...  Cavazos, Asjad Khan, and Narendran Sachindran for their contributions to various incarnations of this work.  ... 
doi:10.1145/1180475.1180477 fatcat:l52csauynnh5fehf7dvloy4byy

Hints and Principles for Computer System Design [article]

Butler Lampson
2021 arXiv   pre-print
It also gives some principles for system design that are more than just hints, and many examples of how to apply the ideas.  ...  This new long version of my 1983 paper suggests the goals you might have for your system -- Simple, Timely, Efficient, Adaptable, Dependable, Yummy (STEADY) -- and techniques for achieving them -- Approximate  ...  Examples: TLBs, shadow page tables, dynamic linking, JIT compiling of interpreted code, content distribution networks.  ... 
arXiv:2011.02455v3 fatcat:jolyz5lknjdbpjpxjcrx5rh6fa
« Previous Showing results 1 — 15 out of 28 results