Filters








278 Hits in 4.2 sec

Effective padding of multidimensional arrays to avoid cache conflict misses

Changwan Hong, Wenlei Bao, Albert Cohen, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, J. Ramanujam, P. Sadayappan
2016 SIGPLAN notices  
Array padding (increasing the size of array dimensions) is a well-known optimization technique that can reduce conflict misses.  ...  This can cause conflict misses and lower performance, even if the working set is much smaller than cache capacity.  ...  Acknowledgements We are grateful to the PLDI'16 reviewers for their very detailed feedback and suggestions, which helped improve the paper.  ... 
doi:10.1145/2980983.2908123 fatcat:yjjkzwru3nek3bgcpz3se77ltq

Fusion of loops for parallelism and locality

N. Manjikian, T.S. Abdelrahman
1997 IEEE Transactions on Parallel and Distributed Systems  
In addition, performance losses result from cache conflicts in fused loops.  ...  minimal synchronization, and (3) eliminate cache conflicts in fused loops.  ...  The authors would also like to thank the anonymous referees for their useful comments and suggestions.  ... 
doi:10.1109/71.577265 fatcat:rticunkxsbgpvjoa4hbczr6tei

Minimizing Associativity Conflicts in Morton Layout [chapter]

Jeyarajan Thiyagalingam, Olav Beckmann, Paul H. J. Kelly
2006 Lecture Notes in Computer Science  
It is our hypothesis that associativity conflicts between Morton blocks cause this behavior and we show that carefully arranging the Morton blocks can minimize this effect.  ...  Hierarchically-blocked non-linear storage layouts, such as the Morton ordering, have been shown to be a potentially attractive compromise between row-major and column-major for two-dimensional arrays.  ...  A simple method to avoid these systematically recurring conflicts is to pad each row of the array by a Morton block, whenever the number of Morton blocks in a row is even.  ... 
doi:10.1007/11752578_131 fatcat:bgowmwicpjeofj2gd3b74ibknq

Improving Cache Effectiveness through Array Data Layout Manipulation in SAC [chapter]

Clemens Grelck
2001 Lecture Notes in Computer Science  
Cache conflicts due to limited set associativity are one relevant source of inefficiency.  ...  This paper describes the realization of an optimization technique which aims at eliminating cache conflicts by adjusting the data layout of arrays to specific access patterns and cache configurations.  ...  In contrast to the choice of a padding dimension for the elimination of spatial reuse conflicts, an eligible padding dimension to avoid temporal reuse conflicts not necessarily exists.  ... 
doi:10.1007/3-540-45361-x_14 fatcat:eko3o3dorbd23pd2p7aphysg44

On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

Preeti Ranjan Panda, Nikil D. Dutt, Alexandru Nicolau
2000 ACM Transactions on Design Automation of Electronic Systems  
In addition to a data cache that interfaces with slower off-chip memory, a fast on-chip SRAM, called Scratch-Pad memory, is often used in several applications, so that critical data can be stored there  ...  We present a technique for efficiently exploiting on-chip Scratch-Pad memory by partitioning the application's scalar and arrayed variables into off-chip DRAM and on-chip Scratch-Pad SRAM, with the goal  ...  Since arrays a and d have intersecting lifetimes, cache conflicts among them can be avoided by mapping one of them to the SRAM.  ... 
doi:10.1145/348019.348570 fatcat:te5thzscn5bkbfz4fbum52izoq

Tuning Blocked Array Layouts to Exploit Memory Hierarchy in SMT Architectures [chapter]

Evangelia Athanasaki, Kornilios Kourtis, Nikos Anastopoulos, Nectarios Koziris
2005 Lecture Notes in Computer Science  
Cache misses form a major bottleneck for memory-intensive applications, due to the significant latency of main memory accesses.  ...  According to this analysis, the optimal tile size that maximizes L1 cache utilization, should completely fit in the L1 cache, even for loop bodies that access more than just one array.  ...  Combined loop and data transformations were proposed to avoid any negative effect to the number of cache hits for some referenced arrays, while increasing the locality of references for a group of arrays  ... 
doi:10.1007/11573036_57 fatcat:5cjvz72qlzghjdn2zjf54wjcg4

A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness

David F. Bacon, Jyh-Herng Chow, Dz-ching R. Ju, Kalyan Muthukumar, Vivek Sarkar
2010 CASCON First Decade High Impact Papers on - CASCON '10  
In addition to reducing the number of misses, we identify the importance of reducing the impact of cache miss jamming by spreading cache misses more uniformly across loop iterations.  ...  We translate undesirable cache and TLB behaviors into a set of constraints on padding amounts and propose a heuristic algorithm of polynomial time complexity to nd the padding amounts to satisfy these  ...  Acknowledgements We w ould like to thank Paula Newman, John Ng, and Lelia Vazquez for their past work on the array padding transformation at IBM, which p r o vided valuable inputs for our study.  ... 
doi:10.1145/1925805.1925813 fatcat:2enklcgybjf4thfjzum6qhbi34

An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms [chapter]

Markus Kowarschik, Christian Weiß
2003 Lecture Notes in Computer Science  
They are intended to contain copies of main memory blocks to speed up accesses to frequently needed data [378, 392].  ...  Unfortunately, today's compilers cannot introduce highly sophisticated cache-based transformations and, consequently, much of this optimization effort is left to the programmer [335, 517].  ...  These transformations aim at avoiding effects like cache conflict misses and false sharing [392], see Chapter 16. They are further intended to improve the spatial locality of a code.  ... 
doi:10.1007/3-540-36574-5_10 fatcat:dfbfztb4ajgeppcqik2k75hcie

Combined partitioning and data padding for scheduling multiple loop nests

Zhong Wang, Edwin H.-M. Sha, Xiaobo (Sharon) Hu
2001 Proceedings of the international conference on Compilers, architecture, and synthesis for embedded systems - CASES '01  
Data padding is applied in our technique to eliminate the cache interference, which overcomes the problem of cache conflict misses arisen from loop partition.  ...  Loop partition is an effective way to exploit the data locality. Traditional loop partition techniques, however, consider only a singleton nested loop.  ...  With direct mapped cache, conflict miss may occur due to the fact that different elements may map to the same cache location. Our objective is to eliminate all the cache conflict misses.  ... 
doi:10.1145/502217.502228 dblp:conf/cases/WangSH01 fatcat:33bvcvzgpnbytmwz24p5svcuum

Combined partitioning and data padding for scheduling multiple loop nests

Zhong Wang, Edwin H.-M. Sha, Xiaobo (Sharon) Hu
2001 Proceedings of the international conference on Compilers, architecture, and synthesis for embedded systems - CASES '01  
Data padding is applied in our technique to eliminate the cache interference, which overcomes the problem of cache conflict misses arisen from loop partition.  ...  Loop partition is an effective way to exploit the data locality. Traditional loop partition techniques, however, consider only a singleton nested loop.  ...  With direct mapped cache, conflict miss may occur due to the fact that different elements may map to the same cache location. Our objective is to eliminate all the cache conflict misses.  ... 
doi:10.1145/502225.502228 fatcat:nge25mjqibecfiganrifyi46hq

Recurrence analysis for effective array prefetching in Java

Brendon Cahoon, Kathryn S. McKinley
2005 Concurrency and Computation  
ACKNOWLEDGEMENTS This work was performed at the University of Texas at Austin while the first author was a graduate student at the University of Massachusetts.  ...  Any opinions, findings and conclusions or recommendations expressed in this material are the authors and do not necessarily reflect those of the sponsors.  ...  Conflict misses Performance degrades by 13% in r r 7 due to a large number of cache conflict misses.  ... 
doi:10.1002/cpe.851 fatcat:oj3liilzdfae5a42rwsou4t364

The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems

Abdel-Hameed A. Badawy, Aneesh Aggarwal, Donald Yeung, Chau-Wen Tseng
2004 Journal of Instruction-Level Parallelism  
In this paper, we provide a comprehensive summary of current software prefetching and locality optimization techniques, and evaluate the impact of memory trends on the effectiveness of these techniques  ...  We propose and evaluate several algorithms to better integrate software prefetching and locality optimizations, including a modified tiling algorithm, padding for prefetching, and index prefetching.  ...  Acknowledgments The authors would like to thank Gabriel Rivera for providing insightful discussions about the tiling and padding techniques, and for providing the affine array codes used in this paper.  ... 
dblp:journals/jilp/BadawyAYT04 fatcat:2b3pgvkq7rah7d7ps7kmuh3wua

Fast indexing for blocked array layouts to reduce cache misses

Evangelia Athanasaki, Nectarios Koziris
2005 International Journal of High Performance Computing and Networking  
Finally, simulations verify that our enhanced performance is due to the considerable reduction of cache misses in all levels of memory hierarchy, and especially due to their concurrent minimization, for  ...  In this paper, we further reduce cache misses, restructuring the memory layout of multi-dimensional arrays, so that array elements are stored in a blocked way, exactly as they are swept by the tiled instruction  ...  ACKNOWLEDGMENTS We wish to express our profound gratitude to the anonymous reviewers for their suggestions, which considerably increased the clarity and quality of the original manuscript.  ... 
doi:10.1504/ijhpcn.2005.009429 fatcat:ctt2erwjnfco5mwuey356zfqjy

Tiling, block data layout, and memory hierarchy performance

Neungsoo Park, Bo Hong, V.K. Prasanna
2003 IEEE Transactions on Parallel and Distributed Systems  
of TLB misses compared with other techniques (copying, padding, etc.).  ...  To improve cache performance, block data layout is used in concert with tiling.  ...  ACKNOWLEDGMENTS The authors would like to thank Shriram Bhargava Gundala for careful reading of drafts of this work. They also would like to thank Sriram Vajapeyam and Cauligi S.  ... 
doi:10.1109/tpds.2003.1214317 fatcat:vfphmgyzezeshisoq54l7vpeuq

Tile size selection revisited

Sanyam Mehta, Gautham Beeraka, Pen-Chung Yew
2013 ACM Transactions on Architecture and Code Optimization (TACO)  
In this article, we propose a new analytical model for tile size selection that leverages the high set associativity in modern caches to minimize conflict misses.  ...  Past work using static models assumed a direct-mapped cache for the purpose of analysis and thus proved to be less robust.  ...  We would also like to acknowledge NSF grants CNS-0834599 and CCF-0708822 for supporting this work.  ... 
doi:10.1145/2541228.2555292 fatcat:o7w54sg7ufg57livvsfeqratba
« Previous Showing results 1 — 15 out of 278 results