19,428 Hits in 3.9 sec

Cache locality optimization for recursive programs

Jonathan Lifflander, Sriram Krishnamoorthy
2017 Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2017  
We present an approach to optimize the cache locality for recursive programs by dynamically splicing-recursively interleaving-the execution of distinct function invocations.  ...  loop programs and a domainspecific optimizer for stencil programs.  ...  Acknowledgments We thank the reviewers for their extensive feedback and suggestions. This work was supported in part by U. S  ... 
doi:10.1145/3062341.3062385 dblp:conf/pldi/LifflanderK17 fatcat:r24zpnzwbnexffnu6gga72nxoq

Cache oblivious algorithms for nonserial polyadic programming

Guangming Tan, Shengzhong Feng, Ninghui Sun
2007 Journal of Supercomputing  
The nonserial polyadic dynamic programming algorithm is one of the most fundamental algorithms for solving discrete optimization problems.  ...  In this paper, we develop algorithmic optimizations to improve the cache performance of the nonserial polyadic dynamic programming algorithm.  ...  Conclusions We have demonstrated decreased running times for nonserial polyadic dynamic programming algorithms by improving locality using a combination of algorithmic and architectural optimizations.  ... 
doi:10.1007/s11227-007-0106-8 fatcat:3dvaov2xhzhcplqc6cfa5uhdvi

An experimental comparison of cache-oblivious and cache-conscious programs

Kamen Yotov, Tom Roeder, Keshav Pingali, John Gunnels, Fred Gustavson
2007 Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07  
An important question is the following: how well do carefully tuned cache-oblivious programs perform compared to carefully tuned cache-conscious programs for the same problem?  ...  Our main finding is that in this domain, even highly optimized cache-oblivious programs perform significantly worse than corresponding cacheconscious programs.  ...  Acknowledgements: We would like to thank Matteo Frigo and Gianfranco Bilardi for useful discussions.  ... 
doi:10.1145/1248377.1248394 dblp:conf/spaa/YotovRPGG07 fatcat:jysuedsrvnhx3awp2w6vya3agq

Fast and Cache-Oblivious Dynamic Programming with Local Dependencies [chapter]

Philip Bille, Morten Stöckel
2012 Lecture Notes in Computer Science  
We present a simple, fast, and cache-oblivious algorithm for this type of local dynamic programming suitable for comparing large-scale strings.  ...  Surprisingly, our new simple algorithm is competitive with a complicated, optimized, and tuned implementation of the best cache-aware algorithm.  ...  [4, 6] for providing us with the source code of their algorithms.  ... 
doi:10.1007/978-3-642-28332-1_12 fatcat:24n3zfrnbzhhnkn7qfb72y5e4a

Recursive function data allocation to scratch-pad memory

Angel Dominguez, Nghi Nguyen, Rajeev K. Barua
2007 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems - CASES '07  
It has almost no software-caching overhead, and is able to move recursive function data back and forth between scratchpad and DRAM to better track the program's locality characteristics.  ...  This paper presents the first automatic scheme to allocate local (stack) data in recursive functions to scratch-pad memory (SPM) in embedded systems.  ...  The importance of allocating recursive stack data rises dramatically for programs making significant use of recursive functions, even more so when all other data has been optimized for SPM placement.  ... 
doi:10.1145/1289881.1289897 dblp:conf/cases/DominguezNB07 fatcat:3u6msndskvgt5fbh3ti2lnag6u

An Experimental Study of Self-Optimizing Dense Linear Algebra Software

M. Kulkarni, K. Pingali
2008 Proceedings of the IEEE  
The search for performance portability has led to the development of self-optimizing software systems.  ...  The cache-oblivious approach uses divide-and-conquer to perform approximate blocking; how well does approximate blocking perform compared to precise blocking?  ...  Intuitively, this version uses recursion to tile approximately for the L3 cache and explicit tiling to tile for the L2 cache.  ... 
doi:10.1109/jproc.2008.917732 fatcat:4kyi7ju3vzcgxjgscujjzmoo6y

Representation-transparent matrix algorithms with scalable performance

Peter Gottschling, David S. Wise, Michael D. Adams
2007 Proceedings of the 21st annual international conference on Supercomputing - ICS '07  
Positive results from new object-oriented tools for scientific programming are reported.  ...  Data types modeling both concepts enable the programmer to implement both iterative and recursive algorithms (or even both) on all of the aforementioned matrix representations at once for a wide family  ...  The recursive techniques are intended to improve the cache locality in a transparent manner.  ... 
doi:10.1145/1274971.1274989 dblp:conf/ics/GottschlingWA07 fatcat:yvrisgsebzbvlijic4lqqh7km4

Improving locality of nonserial polyadic dynamic programming

Guangming Tan, Ninghui Sun, Dongbo Bu
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
Dynamic programming (DP) is a commonly used technique for solving a wide variety of discrete optimization problems, which have different variants of dynamic programming formulation.  ...  We exploit the property of the algorithm to develop a high performance implementation using the combination of cache-oblivious and cache-conscious strategy.  ...  Conclusions We have demonstrated decreased running times for nonserial polyadic dynamic programming algorithm by improving locality using combination of algorithmic ideas and architectural capabilities  ... 
doi:10.1109/ipdps.2006.1639718 dblp:conf/ipps/TanSB06 fatcat:zpaudytz7jfdhjrbb2esdtmpsi

Dynamic programming in faulty memory hierarchies (cache-obliviously)

Saverio Caminiti, Irene Finocchi, Emanuele G. Fusco, Francesco Silvestri, Marc Herbstritt
2011 Foundations of Software Technology and Theoretical Computer Science  
(almost) optimal number of cache misses.  ...  Preliminaries Recursive dynamic programming [10, 11]. Our approach hinges upon a recursive framework for dynamic programming, introduced in [10, 11], that we briefly describe here F S T T C S 2 0 1 1  ...  Optimal cache efficiency is achieved for δ = O (log M n).  ... 
doi:10.4230/lipics.fsttcs.2011.433 dblp:conf/fsttcs/CaminitiFFS11 fatcat:4jcx3auq7fhbvcrwq6dsapytge

Memory Hierarchy Behavior Study during the Execution of Recursive Linear Algebra Library

I. Šimeček
2008 Acta Polytechnica  
For good performance of every computer program, good cache and TLB utilization is crucial.  ...  In this paper, we represent the recursive implementation ("divide and conquer" approach) of some routines from numerical algebra libraries.  ...  Acknowledgement This research has been supported by MŠMT under research program MSM6840770014.  ... 
doaj:c0c35568b18d4d508c178c0943aae00a fatcat:j2pkam6bb5fuhc6opemticrui4

Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks

Jiajia Li, Guangming Tan, Mingyu Chen
2010 2010 IEEE 16th International Conference on Parallel and Distributed Systems  
First, an algorithm-by-blocks for dynamic programming is designed to facilitate optimizing with well-known techniques including cache and register tiling.  ...  In this paper, we propose an Automatically Tuned Dynamic Programming (ATDP) to optimize performance of dynamic programming algorithm across various architectures.  ...  tuning system exploits locality for two levels of caches, we generate a L1 cache block algorithm for evaluating the benefits of the secondary level of cache block.  ... 
doi:10.1109/icpads.2010.117 dblp:conf/icpads/LiTC10 fatcat:v3h7txafvfb57faqhnaxwv6jhm

PCOT: Cache Oblivious Tiling of Polyhedral Programs [article]

Waruna Ranasinghe, Nirmal Prajapati, Tomofumi Yuki, Sanjay Rajopadhye
2018 arXiv   pre-print
This paper studies two variants of tiling: iteration space tiling (or loop blocking) and cache-oblivious methods that recursively split the iteration space with divide-and-conquer.  ...  The answer to this question is complicated for modern architecture due to a number of reasons.  ...  Serial and parallel implementations of recursive divide and conquer algorithms with optimal cache complexity have been developed and evaluated for a specific dynamic programming algorithm such as Longest  ... 
arXiv:1802.00166v1 fatcat:tsnkzovxmzdg5pbo3lotif3hoe

A Comparison of Locality Transformations for Irregular Codes [chapter]

Hwansoo Han, Chau-Wen Tseng
2000 Lecture Notes in Computer Science  
It is thus useful for optimizing programs whose running times are not known.  ...  Researchers have proposed several data and computation transformations to improve locality in irregular scientific codes.  ...  The contributions of this paper are: Experimentally evaluate the effectiveness of several locality optimizations for a range of input data and programs.  ... 
doi:10.1007/3-540-40889-4_6 fatcat:axq2iq4st5hnbaoinxxhaxccba

Enhancing locality for recursive traversals of recursive structures

Youngjoon Jo, Milind Kulkarni
2011 Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications - OOPSLA '11  
While there has been decades of work on developing automatic, locality-enhancing transformations for regular programs that operate over dense matrices and arrays, there has been little investigation of  ...  such transformations for irregular programs, which operate over pointer-based data structures such as graphs, trees and lists.  ...  Acknowledgments The authors would like to thank Bruce Walter for providing the Raytracing benchmark code, and Sebastian Thees for providing the Lightcuts benchmark code.  ... 
doi:10.1145/2048066.2048104 dblp:conf/oopsla/JoK11 fatcat:6cudnrjtffbdfkcfc7w3s5iaee

Optimizing Large-Scale Semi-Naïve Datalog Evaluation in Hadoop [chapter]

Marianne Shaw, Paraschos Koutris, Bill Howe, Dan Suciu
2012 Lecture Notes in Computer Science  
This work lays the foundation for a more comprehensive cost-based algebraic optimization framework for parallel recursive Datalog queries.  ...  Observing that several successful projects provide a relational algebra-based programming interface to Hadoop, we argue that a natural extension is to add recursion to support scalable social network analysis  ...  Specifically, HaLoop will cache the reducer inputs across all reduce nodes and create an index for the cached data and stores it on local disk.  ... 
doi:10.1007/978-3-642-32925-8_17 fatcat:dplnz7d4jbfhdpvnrhtan3rixy
« Previous Showing results 1 — 15 out of 19,428 results