Filters








1,508 Hits in 6.7 sec

Code Layout Optimization for Defensiveness and Politeness in Shared Cache

Pengcheng Li, Hao Luo, Chen Ding, Ziang Hu, Handong Ye
2014 2014 43rd International Conference on Parallel Processing  
On multicore, parallel executions improve the throughput but may significantly increase the cache contention, because the co-run programs share the cache and in the case of hyper-threading, the instruction  ...  Code layout optimization seeks to reorganize the instructions of a program to better utilize the cache.  ...  We also thank the reviewers of ICPP 2014 for the insightful review comments and feedback.  ... 
doi:10.1109/icpp.2014.24 dblp:conf/icpp/LiLDHY14 fatcat:qiadtpth5na4xhkryqthasgdi4

Analysis of temporal-based program behavior for improved instruction cache performance

J. Kalamatianos, A. Khalafi, D.R. Kaeli, W. Meleis
1999 IEEE transactions on computers  
Using several C and C++ benchmarks, we show the benefits of letting both types of graphs guide procedure reordering to improve instruction cache hit rates.  ...  AbstractÐIn this paper, we examine temporal-based program interaction in order to improve layout by reducing the probability that program units will conflict in an instruction cache.  ...  Related Work Pettis and Hansen [3] employ procedure and basic block reordering, as well as procedure splitting based on frequency counts to minimize instruction cache conflicts.  ... 
doi:10.1109/12.752658 fatcat:udbeyaecsnhlbdzetn72jq5cua

A first look at the interplay of code reordering and configurable caches

Ann Gordon-Ross, Frank Vahid, Nikil Dutt
2005 Proceedings of the 15th ACM Great Lakes symposium on VLSI - GLSVSLI '05  
The instruction cache is a popular target for optimizations of microprocessor-based systems because of the cache's high impact on system performance and power, and because of the cache's predictable temporal  ...  We explore for the first time the interplay of two popular instruction cache optimization techniques: the long-known technique of code reordering and the relatively-new technique of cache configuration  ...  ACKNOWLEDGEMENTS We would like to thank Professor Saumya Debray and Patrick Moseley from the University of Arizona for providing PLTO and the trap profiler.  ... 
doi:10.1145/1057661.1057760 dblp:conf/glvlsi/Gordon-RossVD05 fatcat:wasdnmzakfga5cnanyjxyq3aam

Efficient procedure mapping using cache line coloring

Amir H. Hashemi, David R. Kaeli, Brad Calder
1997 Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation - PLDI '97  
In this paper we present a link-time procedure m a pping algorithm which can signi cantly improve the e ectiveness of the instruction cache.  ...  Our algorithm produces an improved p r ogram layout by performing a color mapping of procedures to cache lines, taking into consideration the procedure size, cache size, cache line size, and call graph  ...  Acknowledgments We w ould like to thank Amitabh Srivastava and Alan Eustace for providing ATOM, which greatly simpli ed our work, and Je rey Dean, Alan Eustace, Nick Gloy, W aleed providing useful suggestions  ... 
doi:10.1145/258915.258931 dblp:conf/pldi/HashemiKC97 fatcat:qryi6setwbb5td3k7ryycsc4hu

Efficient procedure mapping using cache line coloring

Amir H. Hashemi, David R. Kaeli, Brad Calder
1997 SIGPLAN notices  
In this paper we present a link-time procedure m a pping algorithm which can signi cantly improve the e ectiveness of the instruction cache.  ...  Our algorithm produces an improved p r ogram layout by performing a color mapping of procedures to cache lines, taking into consideration the procedure size, cache size, cache line size, and call graph  ...  Acknowledgments We w ould like to thank Amitabh Srivastava and Alan Eustace for providing ATOM, which greatly simpli ed our work, and Je rey Dean, Alan Eustace, Nick Gloy, W aleed providing useful suggestions  ... 
doi:10.1145/258916.258931 fatcat:rhethqpwkvdh5m45askw5slnb4

Instruction cache locking using temporal reuse profile

Yun Liang, Tulika Mitra
2010 Proceedings of the 47th Design Automation Conference on - DAC '10  
In this paper, we explore static instruction cache locking to improve average-case program performance.  ...  Improving the cache hit rate can have significant positive impact on the performance of an application.  ...  The procedure placement techniques improve instruction cache performance through procedure reordering such that the conflict misses in the cache can be reduced.  ... 
doi:10.1145/1837274.1837362 dblp:conf/dac/LiangM10 fatcat:olna245mbfdjjbo76jdofokx5a

Dynamic Round-Robin Task Scheduling to Reduce Cache Misses for Embedded Systems

Ken W. Batcher, Robert A. Walker
2008 2008 Design, Automation and Test in Europe  
Modern embedded CPU systems rely on a growing number of software features, but this growth increases the memory footprint and increases the need for efficient instruction and data caches.  ...  Our technique reduces cache misses by continuously monitoring CPU cache misses to grade the performance of running tasks.  ...  Kalamatianos and Kaeli introduced Temporal Based Procedure Reordering [13] which involves constructing a conflict miss graph and coloring algorithm to produce an improved code reordering for instruction  ... 
doi:10.1109/date.2008.4484893 dblp:conf/date/BatcherW08 fatcat:x4x3yqahtnasjgvwtltxb3qi5m

Dynamic round-robin task scheduling to reduce cache misses for embedded systems

Ken W. Batcher, Robert A. Walker
2008 Proceedings of the conference on Design, automation and test in Europe - DATE '08  
Modern embedded CPU systems rely on a growing number of software features, but this growth increases the memory footprint and increases the need for efficient instruction and data caches.  ...  Our technique reduces cache misses by continuously monitoring CPU cache misses to grade the performance of running tasks.  ...  Kalamatianos and Kaeli introduced Temporal Based Procedure Reordering [13] which involves constructing a conflict miss graph and coloring algorithm to produce an improved code reordering for instruction  ... 
doi:10.1145/1403375.1403438 fatcat:zke6enwb5jchhi4sh4ye7mavm4

A comparison of software code reordering and victim buffers

Iris Bahar, Brad Calder, Dirk Grunwald
1999 SIGARCH Computer Architecture News  
Instruction cache performance is critical to instruction fetch efficiency and overall processor performance.  ...  This means that the performance of an executable can be improved significantly by applying a codeplacement algorithm that minimizes instruction cache conflicts.  ...  Procedure Placement Algorithms Many software techniques have been developed for improving instruction cache performance.  ... 
doi:10.1145/309758.309781 fatcat:uqtb7lxmkbdrvjw6h7siw7nsmi

Profile-directed restructuring of operating system code

W. J. Schmidt, R. R. Roediger, C. S. Mestad, B. Mendelson, I. Shavit-Lottem, V. Bortnikov-Sitnitsky
1998 IBM Systems Journal  
In this paper we describe how a profiling system can be successfully used to restructure the components of an operating system for improved overall performance.  ...  Aware of the performance benefits achieved using FDPR on other platforms within IBM, we began to consider how this technology could be used to improve performance on the Application System/4OO* (~s i 4  ...  By reordering instructions within a procedure so that they are likely to be executed in sequence, we can improve performance of the instruction cache by reducing cache pollution, improve the efficiency  ... 
doi:10.1147/sj.372.0270 fatcat:4ri2wdy7ujfm3apbmeav6rddfe

Fast and efficient partial code reordering

Xianglong Huang, Stephen M. Blackburn, David Grove, Kathryn S. McKinley
2006 Proceedings of the 2006 international symposium on Memory management - ISMM '06  
For example, our simulation results show that eliminating all instruction cache misses improves performance by as much as 16% for a modestly sized instruction cache.  ...  These programs however have very small instruction cache footprints that limit opportunities for DCR to improve performance.  ...  [25] developed a code reordering system, called the Software Trace Cache (STC), that not only tries to improve the instruction cache hit rate, but also increase the processor's effective instruction  ... 
doi:10.1145/1133956.1133980 dblp:conf/iwmm/HuangBGM06 fatcat:gp3azl4lqrgmvplq7w373vfx5i

Codestitcher: Inter-Procedural Basic Block Layout Optimization [article]

Rahman Lavaee, John Criswell, Chen Ding
2018 arXiv   pre-print
This paper presents Codestitcher, an inter-procedural basic block code layout optimizer which reorders basic blocks in an executable to benefit from better cache and TLB performance.  ...  It gives an additional improvement of 4\% over LLVM's PGO and 3\% over PGO combined with the best function reordering technique.  ...  It has a modest performance gain, but a clear improvement in cache and TLB performances. For the instruction cache, the relative MPKI for CS is 8% lower than PH.BB and 4.4% lower than CS+PO.  ... 
arXiv:1810.00905v1 fatcat:gpe6l3snsnhe3nfozh6chflohi

WCET-driven Cache-based Procedure Positioning Optimizations

Paul Lokuciejewski, Heiko Falk, Peter Marwedel
2008 2008 Euromicro Conference on Real-Time Systems  
Procedure Positioning is a well known compiler optimization aiming at the improvement of the instruction cache behavior.  ...  In standard literature, these positioning techniques are guided by execution profile data and focus on an improved average-case performance.  ...  Acknowledgments The authors would like to thank AbsInt Angewandte Informatik GmbH for their support concerning WCET analysis using the aiT framework.  ... 
doi:10.1109/ecrts.2008.20 dblp:conf/ecrts/LokuciejewskiFM08 fatcat:cu43krpy6ngczdoelvcqtpdsnu

Dynamic code management

Xianglong Huang, Brian T Lewis, Kathryn S McKinley
2006 Proceedings of the 2nd international conference on Virtual execution environments - VEE '06  
Poor code locality degrades application performance by increasing memory stalls due to instruction cache and TLB misses.  ...  This paper describes a Dynamic Code Management system (DCM) in a managed runtime that performs whole program code layout optimizations to improve instruction locality.  ...  We would especially like to thank James Stichnoth for his ideas and continuous support of the code management work. We are grateful for Brian Murphy's help with the StarJIT compiler.  ... 
doi:10.1145/1134760.1134779 dblp:conf/vee/HuangLM06 fatcat:aapetbxtefdc5fhanp6ghp6efa

Code layout optimizations for transaction processing workloads

Alex Ramirez, Luiz André Barroso, Kourosh Gharachorloo, Robert Cohn, Josep Larriba-Pey, P. Geoffrey Lowney, Mateo Valero
2001 SIGARCH Computer Architecture News  
Our results show that code layout optimizations can provide a major improvement in the instruction cache behavior, providing a 55% to 65% reduction in the application misses for 64-128K caches.  ...  Our analysis shows that this improvement primarily arises from longer sequences of consecutively executed instructions and more reuse of cache lines before they are replaced.  ...  Acknowledgments We would like to thank Jennifer Anderson for her early involvement in this work. We also thank the anonymous reviewers for their comments.  ... 
doi:10.1145/384285.379260 fatcat:p3k6jgq7wzhgrc7mpo5ynsdkcu
« Previous Showing results 1 — 15 out of 1,508 results