A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
An experimental comparison of cache-oblivious and cache-conscious programs
2007
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07
An important question is the following: how well do carefully tuned cache-oblivious programs perform compared to carefully tuned cache-conscious programs for the same problem? ...
Our main finding is that in this domain, even highly optimized cache-oblivious programs perform significantly worse than corresponding cacheconscious programs. ...
Acknowledgements: We would like to thank Matteo Frigo and Gianfranco Bilardi for useful discussions. ...
doi:10.1145/1248377.1248394
dblp:conf/spaa/YotovRPGG07
fatcat:jysuedsrvnhx3awp2w6vya3agq
Ideal and Predictable Hit Ratio for Matrix Transposition in Data Caches
2020
Mathematics
We also analyze the energy consumption and execution time of matrix transposition on real hardware with pseudo-LRU (PLRU) caches. ...
Our analytical hit/miss assessment enables the usage of a data cache for matrix transposition in real-time systems, since the number of misses in the worst case is bound. ...
An Experimental Comparison of Cache-oblivious and Cache-conscious Programs Reference [10] compares experimentally cache-oblivious and cache-conscious (tiling transformation applied) programs for matrix ...
doi:10.3390/math8020184
fatcat:7ig76wmj65hvbnzcbyra33wpwy
Improving locality of nonserial polyadic dynamic programming
2006
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium
We exploit the property of the algorithm to develop a high performance implementation using the combination of cache-oblivious and cache-conscious strategy. ...
The efficiency in our improved algorithm comes from two sources: reducing the number of cache misses and TLB misses. ...
In this paper, we develop a high perfor-mance implementation of nonserial polyadic dynamic programming algorithm through the combination of cache conscious and oblivious approaches. ...
doi:10.1109/ipdps.2006.1639718
dblp:conf/ipps/TanSB06
fatcat:zpaudytz7jfdhjrbb2esdtmpsi
Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks
2010
2010 IEEE 16th International Conference on Parallel and Distributed Systems
First, an algorithm-by-blocks for dynamic programming is designed to facilitate optimizing with well-known techniques including cache and register tiling. ...
In this paper, we propose an Automatically Tuned Dynamic Programming (ATDP) to optimize performance of dynamic programming algorithm across various architectures. ...
[20] found that cache-oblivious programs are defeated by cache conscious ones for dense linear algebra, even though the cache-oblivious programs are highly optimized. ...
doi:10.1109/icpads.2010.117
dblp:conf/icpads/LiTC10
fatcat:v3h7txafvfb57faqhnaxwv6jhm
Efficiency vs. portability in cluster-based network servers
2001
Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming - PPoPP '01
To fill this gap, in this paper we use modeling and experimentation to study this tradeoff in the context of an interesting class of content-based network servers, the localityconscious servers, under ...
Efficiency and portability are conflicting objectives for clusterbased network servers that distribute the clients' requests across the cluster based on the actual content requested. ...
Acknowledgements We would like to thank Liviu Iftode, Rich Martin, Thu Nguyen, and Tao Yang for discussions on the topic of this paper. ...
doi:10.1145/379539.379589
dblp:conf/ppopp/CarreraB01
fatcat:tckl2xig5bayrcl5zvif3bhzim
Improving main memory hash joins on Intel Xeon Phi processors
2015
Proceedings of the VLDB Endowment
Second, hardware oblivious algorithms can outperform hardware conscious algorithms on a wide parameter window. ...
For each camp, we study the impact of architectural features and software optimizations on Xeon Phi in comparison with results on multi-core CPUs. ...
Acknowledgment We would like to thank the authors of [7] and [8] for providing the source code. This work is supported by a MoE AcRF Tier 2 grant (MOE2012-T2-2-067) in Singapore. ...
doi:10.14778/2735703.2735704
fatcat:ion4mquxq5difphvo3fe6pqfma
Cache oblivious algorithms for nonserial polyadic programming
2007
Journal of Supercomputing
Experimental results on several platforms show that the optimized algorithms improve the cache performance and achieves speedups of 2-10 times. ...
Based on the ideal cache model of the cache oblivious algorithm, the approximate bound of cache misses is given by ( n 3 Z L √ Z ). ...
Both our work and previous research [14, 19] show that the algorithmic transformation of dynamic programming is an efficient and important approach to optimize irregular programs. ...
doi:10.1007/s11227-007-0106-8
fatcat:3dvaov2xhzhcplqc6cfa5uhdvi
Efficient sorting using registers and caches
2002
ACM Journal of Experimental Algorithmics
Inadequate models lead to poor algorithmic choices and an incomplete understanding of algorithm behavior on real machines. ...
Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line fetching, ...
Fig. 3 .Fig. 4 . 34 Comparison with cache-conscious sort programs of LaMarca and Ladner[1997]. Time per key (microseconds/key) vs. number of keys (×2 20 keys). ...
doi:10.1145/944618.944627
fatcat:yewe6tclkrhn7lbvt3ktip343q
An Experimental Study of Self-Optimizing Dense Linear Algebra Software
2008
Proceedings of the IEEE
The cache-oblivious approach uses divide-and-conquer to perform approximate blocking; how well does approximate blocking perform compared to precise blocking? ...
Each step of divide-and-conquer generates problems of smaller size. ...
Acknowledgment The authors would like to acknowledge the contributions of a number of people who participated in the work discussed here and reported in earlier papers: M. Garzaran, J. Gunnels, F. ...
doi:10.1109/jproc.2008.917732
fatcat:4kyi7ju3vzcgxjgscujjzmoo6y
Redesigning the string hash table, burst trie, and BST to exploit cache
2010
ACM Journal of Experimental Algorithmics
We then replace the chains of the hash table, burst trie, and BST using dynamic arrays, creating new cache-conscious array representations called the array hash, array burst trie, and array BST, respectively ...
Our results show that, in an architecture with cache, our array data structures can yield startling improvements over their standard, compact, and clustered chained variants. ...
ACKNOWLEDGMENTS We thank the anonymous reviewers of this article and software. ...
doi:10.1145/1671970.1921704
fatcat:eimxbg3zvjcpfefwa7jr7wx76i
Efficient Sorting Using Registers and Caches
[chapter]
2001
Lecture Notes in Computer Science
Inadequate models lead to poor algorithmic choices and an incomplete understanding of algorithm behavior on real machines. ...
Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line fetching, ...
Fig. 3 .Fig. 4 . 34 Comparison with cache-conscious sort programs of LaMarca and Ladner[1997]. Time per key (microseconds/key) vs. number of keys (×2 20 keys). ...
doi:10.1007/3-540-44691-5_5
fatcat:ilvnnmf6xvbmrkpaxnlb7hnhoa
Lists Revisited: Cache Conscious STL Lists
[chapter]
2006
Lecture Notes in Computer Science
We present three cache conscious implementations of STL standard compliant lists. ...
In this paper, we show the competitiveness of our implementations with an extensive experimental analysis. This shows, for instance, 5-10 times faster traversals and 3-5 times faster internal sort. ...
We have implemented and experimentally evaluated three different variants of cache conscious lists supporting fully standard iterator functionality. ...
doi:10.1007/11764298_11
fatcat:766m6ty2ejgulgtmlt52327y3u
Lists revisited
2009
ACM Journal of Experimental Algorithmics
We present three cache conscious implementations of STL standard compliant lists. ...
In this paper, we show the competitiveness of our implementations with an extensive experimental analysis. This shows, for instance, 5-10 times faster traversals and 3-5 times faster internal sort. ...
We have implemented and experimentally evaluated three different variants of cache conscious lists supporting fully standard iterator functionality. ...
doi:10.1145/1498698.1564505
fatcat:fa7cdtpqxfaarho47hat5regju
Towards pB+Trees in the Field: Implementation Choices and Performance
2006
Evaluation of Data Management Systems
This paper is part of this trend towards the deployment of cache-conscious structures "in the field". ...
In particular, B + -trees have been shown to utilize cache memory poorly, triggering the development of many cache-conscious indices. ...
parts of the program and where cache-misses occur. ...
dblp:conf/expdb/JonssonJ06
fatcat:fueegi56trhafo7ecnqh3q4du4
Dynamic Data Layouts for Cache-Conscious Implementation of a Class of Signal Transforms
2004
IEEE Transactions on Signal Processing
Experimental results show that our FFT and WHT achieve performance improvement of up to 3.52 times over other state-of-the-art FFT and WHT packages. ...
In this paper, we develop a cache-conscious technique, called a dynamic data layout, to improve the performance of large signal transforms. ...
Moura, and M. Veloso. They also thank G. Govindu, B. Hong, and B. Gundala for their editorial assistance. ...
doi:10.1109/tsp.2004.828946
fatcat:3kw3rbjumrg33gbllyisj6pdri
« Previous
Showing results 1 — 15 out of 343 results