Filters








9,960 Hits in 5.5 sec

A low power front-end for embedded processors using a block-aware instruction set

Ahmad Zmily, Christos Kozyrakis
2007 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems - CASES '07  
Energy, power, and area efficiency are critical design concerns for embedded processors.  ...  This paper evaluates and compares optimizations that improve the performance of embedded processors with small front-end caches.  ...  Finally, compiler generated hints can improve the instruction cache performance by guiding the hardware to wisely use the limited resources.  ... 
doi:10.1145/1289881.1289926 dblp:conf/cases/ZmilyK07 fatcat:gqhikqs4ibddbkebmgt3xeid6e

Energy-efficient and high-performance instruction fetch using a block-aware ISA

Ahmad Zmily, Christos Kozyrakis
2005 Proceedings of the 2005 international symposium on Low power electronics and design - ISLPED '05  
It also allows for accurate instruction prefetching and energy efficient instruction access.  ...  A BLISS-based front-end leads to 14% IPC, 16% total energy, and 83% energydelay-squared product improvements for wide-issue processors.  ...  ACKNOWLEDGEMENTS We would like to acknowledge Earl Kilian for his valuable input. This work was supported by a Stanford OTL grant.  ... 
doi:10.1145/1077603.1077614 dblp:conf/islped/ZmilyK05 fatcat:hgje6kudqvgkxiwlvic2qrpf7u

Energy-efficient and high-performance instruction fetch using a block-aware ISA

A. Zmily, C. Kozyrakis
2005 ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.  
It also allows for accurate instruction prefetching and energy efficient instruction access.  ...  A BLISS-based front-end leads to 14% IPC, 16% total energy, and 83% energydelay-squared product improvements for wide-issue processors.  ...  ACKNOWLEDGEMENTS We would like to acknowledge Earl Kilian for his valuable input. This work was supported by a Stanford OTL grant.  ... 
doi:10.1109/lpe.2005.195482 fatcat:t6ltfuvaqrhzxfg77qwqsh5gqu

Reuse Distance-Based Cache Hint Selection [chapter]

Kristof Beyls, Erik H. D'Hollander
2002 Lecture Notes in Computer Science  
In order to improve a program's cache behavior, the cache hint is selected based on the data locality of the instruction.  ...  The distribution allows to efficiently estimate the cache level where the data will be found, and to determine the level where the data should be stored to improve the hit rate.  ...  Small and fast caches are efficient when there is a high data locality, while for larger and slower caches lower data locality suffices.  ... 
doi:10.1007/3-540-45706-2_35 fatcat:wrtp4yvnhvfo5oyqxcgtrqcapm

Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers

Michael A. Laurenzano, Yunqi Zhang, Lingjia Tang, Jason Mars
2014 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture  
Using a fully functional protean code compiler and runtime built on LLVM, we design PC3D, Protean Code for Cache Contention in Datacenters.  ...  In this work we introduce protean code, a novel approach for enacting arbitrary compiler transformations at runtime for native programs running on commodity hardware with negligible (<1%) overhead.  ...  ACKNOWLEDGMENT We thank our anonymous reviewers for their feedback and suggestions. We also thank Balaji Soundararajan for his help setting up experimental infrastructure.  ... 
doi:10.1109/micro.2014.21 dblp:conf/micro/LaurenzanoZTM14 fatcat:l2xqedekynh4fexjexkvzemjxm

SOS: A Software-Oriented Distributed Shared Cache Management Approach for Chip Multiprocessors

Lei Jin, Sangyeun Cho
2009 2009 18th International Conference on Parallel Architectures and Compilation Techniques  
The OS utilizes the hints to guide proper data placement in the L2 cache with page coloring. The derived hints are independent of the program input and can be used for multiple runs.  ...  By using the hints for guiding page coloring alone, SOS achieves an average speedup of 10% and up to 23% over the shared cache scheme.  ...  The recognized patterns are independent across program inputs and can be used for multiple runs.  ... 
doi:10.1109/pact.2009.14 dblp:conf/IEEEpact/JinC09 fatcat:w6rcuo73vve3tmpj6jvyasckvm

Design and Evaluation of an Agent-Based Communication Model for a Parallel File System [chapter]

María S. Pérez, Alberto Sánchez, Jemal Abawajy, Víctor Robles, José M. Peña
2004 Lecture Notes in Computer Science  
MAPFS implementation is based on nearer technologies to system programming, although its design makes usage of the abstraction of a multiagent system.  ...  and prefetching agents, associated with one or more extractor agents, caching or prefetching their data; and (iv) hints agents, which must study applications access patterns to build hints for improving  ...  the hint request from an cache agent to a hint agent.  ... 
doi:10.1007/978-3-540-24709-8_10 fatcat:ixgye3cf2nflfdzafgylymlxse

A Study of the Performance Potential for Dynamic Instruction Hints Selection [chapter]

Rao Fu, Jiwei Lu, Antonia Zhai, Wei-Chung Hsu
2006 Lecture Notes in Computer Science  
They can be generated by the compiler and the post-link optimizer to reduce cache misses, improve branch prediction and minimize other performance bottlenecks.  ...  This paper discusses different instruction hints available on modern processor architectures and shows the potential performance impact on many benchmark programs.  ...  The authors want to thank Abhinav Das and Jinpyo Kim for their suggestions and help. We also thank all of the anonymous reviewers for their valuable comments.  ... 
doi:10.1007/11859802_7 fatcat:tkw4ji4j5zca3j2otayn4ueugm

Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation [chapter]

Hongbo Yang, R. Govindarajan, Guang R. Gao, Ziang Hu
2004 Lecture Notes in Computer Science  
In this paper we formulate this problem -giving cache hints to memory instructions such that cache miss rate is minimized -as a 0/1 knapsack problem, which can be efficiently solved using a dynamic programming  ...  Initial results show that our approach is effective on reducing the cache miss rate and improving program performance.  ...  Impact of our approach on locality of regular and nt-hint objects. Table 2 . 2 Effectiveness of our approach in improving program performance.  ... 
doi:10.1007/978-3-540-24644-2_6 fatcat:vt6ordkugnedhi3f4hffx4sfjy

Location-aware cache management for many-core processors with deep cache hierarchy

Jongsoo Park, Richard M. Yoo, Daya S. Khudia, Christopher J. Hughes, Daehyun Kim
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
As cache hierarchies become deeper and the number of cores on a chip increases, managing caches becomes more important for performance and energy.  ...  We propose load and store instructions that carry hints regarding into which cache(s) the accessed data should be placed.  ...  Van der Wijngaart for discussion during the initial stage of our project.  ... 
doi:10.1145/2503210.2503224 dblp:conf/sc/ParkYKHK13 fatcat:yvtqvwtg3rbnbcfgdbamqq5dy4

Data-centric execution of speculative parallel programs

Mark C. Jeffrey, Suvinay Subramanian, Maleen Abeydeera, Joel Emer, Daniel Sanchez
2016 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)  
Hints also make speculation far more efficient, reducing wasted work by 6.4× and traffic by 3.5× on average.  ...  We show it is easy to modify programs to convey locality through hints.  ...  William Hasenplaugh and Chia-Hsin Chen graciously shared the serial code for the color [30] and nocsim benchmarks.  ... 
doi:10.1109/micro.2016.7783708 dblp:conf/micro/JeffreySAES16 fatcat:b6nbzdafhzcazp74ify77niwa4

Energy-Aware Data Prefetching for General-Purpose Programs [chapter]

Yao Guo, Saurabh Chheda, Israel Koren, C. Mani Krishna, Csaba Andras Moritz
2005 Lecture Notes in Computer Science  
We also propose a hardware-based filtering technique to further reduce the energy overhead due to prefetching in the L1 cache.  ...  There has been intensive research on data prefetching focusing on performance improvement, however, the energy aspect of prefetching is relatively unknown.  ...  Power-aware prefetching architecture for general-purpose programs software prefetching techniques are more energy-efficient for most of the benchmarks.  ... 
doi:10.1007/11574859_6 fatcat:kpuecakaxfcrfmn7cae7qszp24

A generalized theory of collaborative caching

Xiaoming Gu, Chen Ding
2013 SIGPLAN notices  
We show two theoretical results for the general hint. The first is a new cache replacement policy, priority LRU, which permits the complete range of choices between MRU and LRU.  ...  We show the generality in a hierarchical relation where collaborative caching subsumes noncollaborative caching, and within collaborative caching, the priority hint subsumes the previous binary hint.  ...  We also wish to thank Michael Scott, Engin Ipek, Tongxin Bai, and anonymous reviewers for their helpful comments.  ... 
doi:10.1145/2426642.2259012 fatcat:fv2aq7rtczdwdnjacqhcz2vjkq

A generalized theory of collaborative caching

Xiaoming Gu, Chen Ding
2012 Proceedings of the 2012 international symposium on Memory Management - ISMM '12  
We show two theoretical results for the general hint. The first is a new cache replacement policy, priority LRU, which permits the complete range of choices between MRU and LRU.  ...  We show the generality in a hierarchical relation where collaborative caching subsumes noncollaborative caching, and within collaborative caching, the priority hint subsumes the previous binary hint.  ...  We also wish to thank Michael Scott, Engin Ipek, Tongxin Bai, and anonymous reviewers for their helpful comments.  ... 
doi:10.1145/2258996.2259012 dblp:conf/iwmm/GuD12 fatcat:nywf2vbsjjf6vm74vv5tbeye4q

Developing correct and efficient multithreaded programs with thread-specific data and a partial evaluator

Yasushi Shinjo, Calton Pu
2000 ACM SIGOPS Operating Systems Review  
Figure 1 shows how to get a specialized random number generator by Tempo 1 , a partial evaluator for the C language. Tempo takes a source program and hints in C and ML.  ...  In this paper, we describe a development method of correct and efficient multithreaded programs using thread-specific data (TSD) and a partial evaluator.  ... 
doi:10.1145/346152.346228 fatcat:apsgiffmwrcvtdzsx4qvykp3r4
« Previous Showing results 1 — 15 out of 9,960 results