1,065 Hits in 3.1 sec

Data Oblivious Algorithms for Multicores [article]

Vijaya Ramachandran, Elaine Shi
2021 arXiv   pre-print
For other applications, we show data oblivious algorithms whose performance bounds match the best known insecure algorithms.  ...  For a subset of these applications, our data-oblivious algorithms asymptotically outperform the best known insecure algorithms.  ...  We initiate the study of data-oblivious algorithms for a multicore architecture where parallelism and synchronization are expressed with nested binary fork-join operations.  ... 
arXiv:2008.00332v2 fatcat:fpj5e7tqjfbe7audobaqj7xv3y

Oblivious algorithms for multicores and networks of processors

Rezaul Alam Chowdhury, Vijaya Ramachandran, Francesco Silvestri, Brandon Blakeley
2013 Journal of Parallel and Distributed Computing  
h i g h l i g h t s • Introduce the notion of multicore-oblivious algorithms. • Propose a hierarchical multi-level caching model for multicores. • Present efficient multicore-oblivious algorithms for matrix  ...  algorithms. a b s t r a c t We address the design of algorithms for multicores that are oblivious to machine parameters.  ...  Acknowledgments The authors thank the anonymous reviewers for their comments. F. Silvestri would also like to thank A. Pietracaprina and G. Pucci for useful discussions. V.  ... 
doi:10.1016/j.jpdc.2013.04.008 fatcat:ezthxkpdszfydgwhdgcp2m7i2e

Oblivious algorithms for multicores and network of processors

Rezaul Alam Chowdhury, Francesco Silvestri, Brandon Blakeley, Vijaya Ramachandran
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
First, and of independent interest, we propose HM, a hierarchical multi-level caching model for multicores, and we propose a multicore-oblivious approach to algorithms and schedulers for HM.  ...  We address the design of parallel algorithms that are oblivious to machine parameters for two dominant machine configurations: the chip multiprocessor (or multicore) and the network of processors.  ...  The authors would like to thank Andrea Pietracaprina and Keshav Pingali for useful discussions.  ... 
doi:10.1109/ipdps.2010.5470354 dblp:conf/ipps/ChowdhurySBR10 fatcat:wiynwlarl5cw7c5c7mgso5yzly

Resource Oblivious Sorting on Multicores

Richard Cole, Vijaya Ramachandran
2017 ACM Transactions on Parallel Computing  
The parallel complexity (or critical path length) of the algorithm is O( n · n), which improves on previous bounds for optimal cache oblivious sorting. The algorithm also has low false sharing costs.  ...  Finally, SPMS is resource oblivious in Athat the dependence on machine parameters appear only in the analysis of its performance, and not within the algorithm itself.  ...  cache-oblivious sorting, for which provably optimal algorithms are known [15] , optimal sorting algorithms addressing pure parallelism [3, 11] , and recent work on multicore sorting [5, 4, 6, 16] .  ... 
doi:10.1145/3040221 fatcat:sqh4ozlzq5fq5eommeo6onrscy

Resource Oblivious Sorting on Multicores [chapter]

Richard Cole, Vijaya Ramachandran
2010 Lecture Notes in Computer Science  
We also establish good bounds for our algorithm with the randomized work stealing scheduler.  ...  The parallel complexity (or critical path length) of the algorithm is O(log n log log n), which improves on previous bounds for deterministic sample sort.  ...  cache-oblivious sorting, for which provably optimal algorithms are known [15] , optimal sorting algorithms addressing pure parallelism [3, 11] , and recent work on multicore sorting [5, 4, 6, 16] .  ... 
doi:10.1007/978-3-642-14165-2_20 fatcat:bkb3otgap5hmlh2kgymmiffcve

A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops

Chi-Keung Luk, Ryan Newton, William Hasenplaugh, Mark Hampton, Geoff Lowney
2011 IEEE Software  
In the era of multicores, many applications that tend to require substantial compute power and data crunching (aka Throughput Computing Applications) can now be run on desktop PCs.  ...  In this paper, we propose one such approach for x86-based architectures.  ...  Cache-Oblivious Techniques A cache-oblivious algorithm is one that is designed to maximize data reuse in caches.  ... 
doi:10.1109/ms.2011.2 fatcat:3ysms4aeebarpfhdgbzprloyxi

Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System

Michel A. Kinsy, Michael Pellauer, Srinivas Devadas
2011 2011 21st International Conference on Field Programmable Logic and Applications  
In the baseline design, the microprocessor is attached to two caches, one instruction cache and one data cache, which are oblivious to the global memory organization.  ...  We also provide a small MIPS cross-compiler toolchain to assist in developing software for Heracles.  ...  ACKNOWLEDGMENT We thank Joel Emer, Li-Shiuan Peh, Omer Kan, Myong Hyon Cho, and Noah Keegan for interesting discussions throughout the course of this work.  ... 
doi:10.1109/fpl.2011.70 dblp:conf/fpl/KinsyPD11 fatcat:s6x3z553kfcpzbdfq36otatnti

Cache-Efficient Parallel Isosurface Extraction for Shared Cache Multicores [article]

Marc Tchiboukdjian, Vincent Danjean, Bruno Raffin
2010 Eurographics Symposium on Parallel Graphics and Visualization  
The algorithms are based on the FastCOL cache-oblivious data layout for irregular meshes.  ...  We theoretically prove that in both cases the number of cache misses is the same as for the sequential algorithm for the same cache size.  ...  The algorithm is based on the cache oblivious (CO) data layout for irregular meshes proposed in [TDR10] .  ... 
doi:10.2312/egpgv/egpgv10/081-090 fatcat:j7fjz2zkbvgzblbra4nw45umsi

Efficient Resource Oblivious Algorithms for Multicores with False Sharing

Richard Cole, Vijaya Ramachandran
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium  
We consider algorithms for a multicore environment in which each core has its own private cache and false sharing can occur.  ...  Most of these algorithms are derived from known multicore algorithms, but are further refined to achieve a low false sharing overhead.  ...  For instance, the multicore oblivious algorithms in [11] were shown to achieve efficiency under a specific scheduler.  ... 
doi:10.1109/ipdps.2012.28 dblp:conf/ipps/ColeR12 fatcat:ny6hz4nmgzcbbdybwefqrgqvgq

Multicore architecture and cache optimization techniques for solving graph problems [article]

Alvaro Tzul
2018 arXiv   pre-print
These data sets require in-depth analysis that provides intelligence for improvements in methods for academia and industry.  ...  Since the data sets are large, the time it takes to analyze the data is significant. Hence, in this paper, we explore techniques that can exploit existing multicore architecture to address the issue.  ...  Cache-aware and Cache-oblivious Algorithms: There are two types of cache optimization algorithms -Cache-aware and Cache-oblivious.  ... 
arXiv:1807.03383v1 fatcat:5qd475qfr5hfho2y77bant2pgm

How to achieve scalable fork/join on many-core architectures?

Mattias De Wael, Tom Van Cutsem
2012 Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity - SPLASH '12  
This research investigates implementations for Fork/Join to allow the transition to many-core.  ...  At the algorithmic level, the use of cache oblivious algorithms and data structures [6] , can aid programmers in creating scalable Fork/Join programs.  ...  We want to evaluate solutions proposed for multicore work stealing with improved data locality [1] , as well as as solutions proposed for distributed work stealing and PGAS-like languages [5] , and new  ... 
doi:10.1145/2384716.2384751 dblp:conf/oopsla/WaelC12 fatcat:mdem3y7wwjbgdj336h5zqldzjm

Big data: Scale down, scale up, scale out

Phillip B. Gibbons
2015 2015 IEEE International Parallel and Distributed Processing Symposium  
for M-fitting subtasks + Σ Cache miss for every access in glue M,B parameters either used in algorithm (cache-aware) or not (cache-oblivious) M M M [Simhadri, 2013] 25 © Phillip B.  ...  • Much of Big Data focus has been on Scale Out -Hadoop, etc • But if data fits in memory of multicore then often order of magnitude better performance -GraphLab1 (multicore) is 1000x faster than Hadoop  ...  A number of these slides were adapted from slides created by my co-authors, and I thank them for those slides.  ... 
doi:10.1109/ipdps.2015.123 dblp:conf/ipps/Gibbons15 fatcat:breojbm3infuhj2jdlngavbf2u

Dynamic programming in faulty memory hierarchies (cache-obliviously)

Saverio Caminiti, Irene Finocchi, Emanuele G. Fusco, Francesco Silvestri, Marc Herbstritt
2011 Foundations of Software Technology and Theoretical Computer Science  
The effect of memory errors is an important consideration in system design, especially for long-running and large-scale applications that work on massive data sets.  ...  Cache-oblivious algorithms [20] overcome this issue: they are designed in a two-level ideal-cache model with no explicit dependencies on hardware parameters, and can therefore adapt simultaneously to all  ...  In [12, 13, 14] , cache-oblivious algorithms for I-GEP are provided for single processors, parallel, and multicore machines.  ... 
doi:10.4230/lipics.fsttcs.2011.433 dblp:conf/fsttcs/CaminitiFFS11 fatcat:4jcx3auq7fhbvcrwq6dsapytge

Evaluating Multicore Algorithms on the Unified Memory Model

John E. Savage, Mohammad Zubair
2009 Scientific Programming  
While this is an issue for single-core architectures, it is a critical problem for multicore chips.  ...  In particular, we use it to analyze an option pricing problem using the trinomial model and develop an algorithm for it that has near-optimal memory traffic between cache levels.  ...  They point to two major reasons for this performance gap, ineffective utilization of the pipeline by cache oblivious algorithms and the inability to effectively hide memory latency by cache oblivious algorithms  ... 
doi:10.1155/2009/681708 fatcat:hi3up2ttufdlfc4bt6pfj6piw4

Cache-efficient dynamic programming algorithms for multicores

Rezaul Alam Chowdhury, Vijaya Ramachandran
2008 Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures - SPAA '08  
We consider three types of caching systems for CMPs: D-CMP with a private cache for each core, S-CMP with a single cache shared by all cores, and Multicore, which has private L1 caches and a shared L2  ...  We present cache-efficient chip multiprocessor (CMP) algorithms with good speed-up for some widely used dynamic programming algorithms.  ...  We acknowledge the suggestions of the SPAA referees to give informal descriptions of our algorithms and remove all pseudocode.  ... 
doi:10.1145/1378533.1378574 dblp:conf/spaa/ChowdhuryR08 fatcat:m6zzqihngfepjkcvfeob7xpceu
« Previous Showing results 1 — 15 out of 1,065 results