A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2008; you can also visit the original URL.
The file type is application/pdf
.
Filters
A low power front-end for embedded processors using a block-aware instruction set
2007
Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems - CASES '07
Energy, power, and area efficiency are critical design concerns for embedded processors. ...
This paper evaluates and compares optimizations that improve the performance of embedded processors with small front-end caches. ...
Finally, compiler generated hints can improve the instruction cache performance by guiding the hardware to wisely use the limited resources. ...
doi:10.1145/1289881.1289926
dblp:conf/cases/ZmilyK07
fatcat:gqhikqs4ibddbkebmgt3xeid6e
Energy-efficient and high-performance instruction fetch using a block-aware ISA
2005
Proceedings of the 2005 international symposium on Low power electronics and design - ISLPED '05
It also allows for accurate instruction prefetching and energy efficient instruction access. ...
A BLISS-based front-end leads to 14% IPC, 16% total energy, and 83% energydelay-squared product improvements for wide-issue processors. ...
ACKNOWLEDGEMENTS We would like to acknowledge Earl Kilian for his valuable input. This work was supported by a Stanford OTL grant. ...
doi:10.1145/1077603.1077614
dblp:conf/islped/ZmilyK05
fatcat:hgje6kudqvgkxiwlvic2qrpf7u
Energy-efficient and high-performance instruction fetch using a block-aware ISA
2005
ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.
It also allows for accurate instruction prefetching and energy efficient instruction access. ...
A BLISS-based front-end leads to 14% IPC, 16% total energy, and 83% energydelay-squared product improvements for wide-issue processors. ...
ACKNOWLEDGEMENTS We would like to acknowledge Earl Kilian for his valuable input. This work was supported by a Stanford OTL grant. ...
doi:10.1109/lpe.2005.195482
fatcat:t6ltfuvaqrhzxfg77qwqsh5gqu
Reuse Distance-Based Cache Hint Selection
[chapter]
2002
Lecture Notes in Computer Science
In order to improve a program's cache behavior, the cache hint is selected based on the data locality of the instruction. ...
The distribution allows to efficiently estimate the cache level where the data will be found, and to determine the level where the data should be stored to improve the hit rate. ...
Small and fast caches are efficient when there is a high data locality, while for larger and slower caches lower data locality suffices. ...
doi:10.1007/3-540-45706-2_35
fatcat:wrtp4yvnhvfo5oyqxcgtrqcapm
Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers
2014
2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
Using a fully functional protean code compiler and runtime built on LLVM, we design PC3D, Protean Code for Cache Contention in Datacenters. ...
In this work we introduce protean code, a novel approach for enacting arbitrary compiler transformations at runtime for native programs running on commodity hardware with negligible (<1%) overhead. ...
ACKNOWLEDGMENT We thank our anonymous reviewers for their feedback and suggestions. We also thank Balaji Soundararajan for his help setting up experimental infrastructure. ...
doi:10.1109/micro.2014.21
dblp:conf/micro/LaurenzanoZTM14
fatcat:l2xqedekynh4fexjexkvzemjxm
SOS: A Software-Oriented Distributed Shared Cache Management Approach for Chip Multiprocessors
2009
2009 18th International Conference on Parallel Architectures and Compilation Techniques
The OS utilizes the hints to guide proper data placement in the L2 cache with page coloring. The derived hints are independent of the program input and can be used for multiple runs. ...
By using the hints for guiding page coloring alone, SOS achieves an average speedup of 10% and up to 23% over the shared cache scheme. ...
The recognized patterns are independent across program inputs and can be used for multiple runs. ...
doi:10.1109/pact.2009.14
dblp:conf/IEEEpact/JinC09
fatcat:w6rcuo73vve3tmpj6jvyasckvm
Design and Evaluation of an Agent-Based Communication Model for a Parallel File System
[chapter]
2004
Lecture Notes in Computer Science
MAPFS implementation is based on nearer technologies to system programming, although its design makes usage of the abstraction of a multiagent system. ...
and prefetching agents, associated with one or more extractor agents, caching or prefetching their data; and (iv) hints agents, which must study applications access patterns to build hints for improving ...
the
hint request from an cache
agent to a hint agent. ...
doi:10.1007/978-3-540-24709-8_10
fatcat:ixgye3cf2nflfdzafgylymlxse
A Study of the Performance Potential for Dynamic Instruction Hints Selection
[chapter]
2006
Lecture Notes in Computer Science
They can be generated by the compiler and the post-link optimizer to reduce cache misses, improve branch prediction and minimize other performance bottlenecks. ...
This paper discusses different instruction hints available on modern processor architectures and shows the potential performance impact on many benchmark programs. ...
The authors want to thank Abhinav Das and Jinpyo Kim for their suggestions and help. We also thank all of the anonymous reviewers for their valuable comments. ...
doi:10.1007/11859802_7
fatcat:tkw4ji4j5zca3j2otayn4ueugm
Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation
[chapter]
2004
Lecture Notes in Computer Science
In this paper we formulate this problem -giving cache hints to memory instructions such that cache miss rate is minimized -as a 0/1 knapsack problem, which can be efficiently solved using a dynamic programming ...
Initial results show that our approach is effective on reducing the cache miss rate and improving program performance. ...
Impact of our approach on locality of regular and nt-hint objects.
Table 2 . 2 Effectiveness of our approach in improving program performance. ...
doi:10.1007/978-3-540-24644-2_6
fatcat:vt6ordkugnedhi3f4hffx4sfjy
Location-aware cache management for many-core processors with deep cache hierarchy
2013
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
As cache hierarchies become deeper and the number of cores on a chip increases, managing caches becomes more important for performance and energy. ...
We propose load and store instructions that carry hints regarding into which cache(s) the accessed data should be placed. ...
Van der Wijngaart for discussion during the initial stage of our project. ...
doi:10.1145/2503210.2503224
dblp:conf/sc/ParkYKHK13
fatcat:yvtqvwtg3rbnbcfgdbamqq5dy4
Data-centric execution of speculative parallel programs
2016
2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Hints also make speculation far more efficient, reducing wasted work by 6.4× and traffic by 3.5× on average. ...
We show it is easy to modify programs to convey locality through hints. ...
William Hasenplaugh and Chia-Hsin Chen graciously shared the serial code for the color [30] and nocsim benchmarks. ...
doi:10.1109/micro.2016.7783708
dblp:conf/micro/JeffreySAES16
fatcat:b6nbzdafhzcazp74ify77niwa4
Energy-Aware Data Prefetching for General-Purpose Programs
[chapter]
2005
Lecture Notes in Computer Science
We also propose a hardware-based filtering technique to further reduce the energy overhead due to prefetching in the L1 cache. ...
There has been intensive research on data prefetching focusing on performance improvement, however, the energy aspect of prefetching is relatively unknown. ...
Power-aware prefetching architecture for general-purpose programs software prefetching techniques are more energy-efficient for most of the benchmarks. ...
doi:10.1007/11574859_6
fatcat:kpuecakaxfcrfmn7cae7qszp24
A generalized theory of collaborative caching
2013
SIGPLAN notices
We show two theoretical results for the general hint. The first is a new cache replacement policy, priority LRU, which permits the complete range of choices between MRU and LRU. ...
We show the generality in a hierarchical relation where collaborative caching subsumes noncollaborative caching, and within collaborative caching, the priority hint subsumes the previous binary hint. ...
We also wish to thank Michael Scott, Engin Ipek, Tongxin Bai, and anonymous reviewers for their helpful comments. ...
doi:10.1145/2426642.2259012
fatcat:fv2aq7rtczdwdnjacqhcz2vjkq
A generalized theory of collaborative caching
2012
Proceedings of the 2012 international symposium on Memory Management - ISMM '12
We show two theoretical results for the general hint. The first is a new cache replacement policy, priority LRU, which permits the complete range of choices between MRU and LRU. ...
We show the generality in a hierarchical relation where collaborative caching subsumes noncollaborative caching, and within collaborative caching, the priority hint subsumes the previous binary hint. ...
We also wish to thank Michael Scott, Engin Ipek, Tongxin Bai, and anonymous reviewers for their helpful comments. ...
doi:10.1145/2258996.2259012
dblp:conf/iwmm/GuD12
fatcat:nywf2vbsjjf6vm74vv5tbeye4q
Developing correct and efficient multithreaded programs with thread-specific data and a partial evaluator
2000
ACM SIGOPS Operating Systems Review
Figure 1 shows how to get a specialized random number generator by Tempo 1 , a partial evaluator for the C language. Tempo takes a source program and hints in C and ML. ...
In this paper, we describe a development method of correct and efficient multithreaded programs using thread-specific data (TSD) and a partial evaluator. ...
doi:10.1145/346152.346228
fatcat:apsgiffmwrcvtdzsx4qvykp3r4
« Previous
Showing results 1 — 15 out of 9,960 results