A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Survey of Cache Bypassing Techniques
2016
Journal of Low Power Electronics and Applications
This paper presents a survey of cache bypassing techniques for CPUs, GPUs and CPU-GPU heterogeneous systems, and for caches designed with SRAM, non-volatile memory (NVM) and die-stacked DRAM. ...
However, due to strict area/power budgets and presence of poor data-locality workloads, blindly scaling cache capacity is both infeasible and ineffective. ...
On a TLB miss, the requested block is allocated in cache (if not there already) and both page table and cTLB are updated with virtual-to-cache mapping. ...
doi:10.3390/jlpea6020005
fatcat:rkiqtcjbcvggde5utaqogg5xxa
The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence
2015
ACM Transactions on Architecture and Code Optimization (TACO)
In this paper we ask the question: how granularity-page-level vs. cache-line level-and adaptivity-going from shared to private-affect the outcome of classification and what is its final impact on coherence ...
To answer this, we create a classification technique, called Generational Classification, and a coherence protocol called Generational Coherence, which treats data as private or shared based on cache-line ...
dead-block predictors [Lai et al. 2001] . ...
doi:10.1145/2790301
fatcat:wk2bsaxxaje57b7sq2q4yqcxyq
An Efficient, Self-Contained, On-chip Directory: DIR1-SISD
2015
2015 International Conference on Parallel Architecture and Compilation (PACT)
However, such schemes introduce new challenges by transferring some of the directory complexity and functionality to the OS and using the page table and the TLBs to store data classification information ...
) and, further, outperforms a SISD protocol that relies on the OS to provide a persistent page-based directory (4% in execution time and 20% in traffic). ...
621-2012-5332), Vinnova Vinn-Verifiering (award: VIPS 2013-01113), "Fundación Seneca-Agencia de Ciencia y Tecnología de la Región de Murcia" under grant "Jóvenes Líderes en Investigación" 18956/JLI/13, and ...
doi:10.1109/pact.2015.23
dblp:conf/IEEEpact/DavariRHK15
fatcat:yhyrbyue6jbznkfqam6ibep3eq
Combating NBTI-induced aging in data caches
2013
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI - GLSVLSI '13
Due to the unbalanced duty cycle ratio of the SRAM cells, the data cache suffers a heavy NBTI stress and this will further exacerbate the aging effect in the data cache. ...
By applying our proposed idle-time-based cacheline invalidation, early write-back, and bit-flipping schemes, the duty cycle ratio of the data cache can be well balanced. ...
Branch Predictor
Alpha 21264 tournament predictor
32-entry RAS
BTB
2048-entry 2-way
Memory Hierarchy
L1 I/DCache
64KB, 2 ways, 64B blocks, 2 cycles
L2 UCache
4MB, 8 ways, 128B blocks, 12 cycles ...
doi:10.1145/2483028.2483096
dblp:conf/glvlsi/WangDZJ13
fatcat:7jxljeplvvejxcy7dr7vju3f6a
Cache decay
2001
SIGARCH Computer Architecture News
That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of "dead time" before they are evicted. ...
We discuss policies and implementations for reducing cache leakage by invalidating and "turning off" cache lines when they hold data not likely to be reused. ...
Our thanks to Jim Goodman who turned our attention to adaptive decay techniques and to Alan J. Smith for pointing out LRU decay and multiprogramming. ...
doi:10.1145/384285.379268
fatcat:reh2x2cj2nfd3kdytq7ptin6mq
Reliability-aware Garbage Collection for Hybrid HBM-DRAM Memories
2021
ACM Transactions on Architecture and Code Optimization (TACO)
are hot and low-risk, and (3) allocation site is a good predictor for hotness and risk. ...
Unfortunately, these approaches operate at a coarse-grained page granularity, and frequent page migrations hurt performance. ...
The mark-region mature space consists of a hierarchy of blocks and lines. Blocks are multiples of page sizes and constitute multiple lines. Lines are multiples of cache line sizes. ...
doi:10.1145/3431803
fatcat:let3q6xgmrfv3kuldijuekh3a4
ReStore: Symptom-Based Soft Error Detection in Microprocessors
2006
IEEE Transactions on Dependable and Secure Computing
Example symptoms include exceptions, control flow misspeculations, and cache or translation look-aside buffer misses. ...
To date, in all but the most demanding applications, implementing parity and ECC for caches and other large, regular SRAM structures have been sufficient to stem the growing soft error tide. ...
In addition, more subtle events like cache and TLB misses can also be caused by soft errors. ...
doi:10.1109/tdsc.2006.40
fatcat:lsq55w3zl5guxohv4mvlvwb5pi
HW/SW co-designed processors: Challenges, design choices and a simulation infrastructure for evaluation
2017
2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
A fundamental requirement for evaluating different design choices and trade-offs to meet these challenges is to have a simulation infrastructure. ...
This paper identifies the key challenges that HW/SW codesigned processors face and the basic requirements for a simulation infrastructure targeting these architectures. ...
ACKNOWLEDGMENTS This work was supported by the Spanish State Research Agency under grants TIN2013-44375-R and TIN2016-75344-R (AEI/FEDER, EU). ...
doi:10.1109/ispass.2017.7975290
dblp:conf/ispass/KumarCBPSGMG17
fatcat:h7zmsyabebbhjitopdldmsfadm
Rethinking the Memory Hierarchy for Modern Languages
2018
2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
This avoids the need for associative caches. Instead, Hotpads moves objects across a hierarchy of directly addressed memories. ...
As a result, Hotpads improves memory performance and efficiency substantially, and unlocks many new optimizations. ...
This work was supported in part by NSF grant CAREER-1452994 and by a grant from the Qatar Computing Research Institute. ...
doi:10.1109/micro.2018.00025
dblp:conf/micro/TsaiG018
fatcat:zq5ywopzbfgxjme6ccdw4u46cq
Profile-guided proactive garbage collection for locality optimization
2006
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation - PLDI '06
In this thesis, we introduce Page-based Transactional Memory to support unbounded transactions. ...
Bounds checking protects pointer and array accesses. It is the process of keeping track of the address boundaries for objects, and checking that the loads and stores do not stray outside bounds. ...
Each 128 byte block of memory was organized as 16 blocks per page, and had a bit in a lock bit-vector that is associated with the page via the page table and TLB. ...
doi:10.1145/1133981.1134021
dblp:conf/pldi/ChenBCGC06
fatcat:bnv6i26dnfaj3fel6vudd2ut3u
Profile-guided proactive garbage collection for locality optimization
2006
SIGPLAN notices
In this thesis, we introduce Page-based Transactional Memory to support unbounded transactions. ...
Bounds checking protects pointer and array accesses. It is the process of keeping track of the address boundaries for objects, and checking that the loads and stores do not stray outside bounds. ...
Each 128 byte block of memory was organized as 16 blocks per page, and had a bit in a lock bit-vector that is associated with the page via the page table and TLB. ...
doi:10.1145/1133255.1134021
fatcat:rdddng3bj5fdfprxfg6ut3dg3y
Dynamic cache reconfiguration based techniques for improving cache energy efficiency
[article]
2013
arXiv
pre-print
In this research, we propose novel cache leakage energy saving schemes for single-core and multicore systems; desktop, QoS, real-time and server systems. ...
The conventional schemes of cache energy saving either aim at saving dynamic energy or are based on properties specific to first-level caches, and thus these schemes have limited utility for last-level ...
Out of the blocks not fitting the available cache space, the clean blocks are discarded and the dirty blocks are written back. ...
arXiv:1310.4231v1
fatcat:im5qck2ojngl5no6fm636rncxu
Revisiting LP-NUCA Energy Consumption
2014
ACM Transactions on Architecture and Code Optimization (TACO)
This work identifies the sources of energy waste in LP-NUCAs: parallel access to the tag and data arrays of the tiles and low locality phases with useless block migration. ...
Cache working-set adaptation is key as embedded systems move to multiprocessor and Simultaneous Multithreaded Architectures (SMT) because interthread pollution harms system performance and battery life ...
ACKNOWLEDGMENTS The authors would like to thank Laura Neville, Jimi Xenidis, Jorge Albericio Latorre, and the anonymous referees for their helpful comments and suggestions. ...
doi:10.1145/2632217
fatcat:dfuzo6cvybcjtihd6bnz7lphfm
Profile-based pretenuring
2007
ACM Transactions on Programming Languages and Systems
to both applications and Jikes RVM, a compiler and run-time system for Java written in Java. ...
an immortal allocation space in addition to a nursery and older generation, and show that pretenuring to immortal space has substantial benefit. ...
Cavazos, Asjad Khan, and Narendran Sachindran for their contributions to various incarnations of this work. ...
doi:10.1145/1180475.1180477
fatcat:l52csauynnh5fehf7dvloy4byy
Hints and Principles for Computer System Design
[article]
2021
arXiv
pre-print
It also gives some principles for system design that are more than just hints, and many examples of how to apply the ideas. ...
This new long version of my 1983 paper suggests the goals you might have for your system -- Simple, Timely, Efficient, Adaptable, Dependable, Yummy (STEADY) -- and techniques for achieving them -- Approximate ...
Examples: TLBs, shadow page tables, dynamic linking, JIT compiling of interpreted code, content distribution networks. ...
arXiv:2011.02455v3
fatcat:jolyz5lknjdbpjpxjcrx5rh6fa
« Previous
Showing results 1 — 15 out of 28 results