391 Hits in 3.5 sec

Resource Oblivious Sorting on Multicores [chapter]

Richard Cole, Vijaya Ramachandran
2010 Lecture Notes in Computer Science  
The PWS scheduler is both processor-and cache-oblivious (i.e., resource oblivious), and it tolerates asynchrony among the cores.  ...  Using PWS, we obtain a resource oblivious scheduling of our sorting algorithm that matches the performance of the processor-aware version. Our analysis includes the delay incurred by false-sharing.  ...  cache-oblivious sorting, for which provably optimal algorithms are known [15] , optimal sorting algorithms addressing pure parallelism [3, 11] , and recent work on multicore sorting [5, 4, 6, 16] .  ... 
doi:10.1007/978-3-642-14165-2_20 fatcat:bkb3otgap5hmlh2kgymmiffcve

Resource Oblivious Sorting on Multicores

Richard Cole, Vijaya Ramachandran
2017 ACM Transactions on Parallel Computing  
Finally, SPMS is resource oblivious in Athat the dependence on machine parameters appear only in the analysis of its performance, and not within the algorithm itself.  ...  The parallel complexity (or critical path length) of the algorithm is O( n · n), which improves on previous bounds for optimal cache oblivious sorting. The algorithm also has low false sharing costs.  ...  cache-oblivious sorting, for which provably optimal algorithms are known [15] , optimal sorting algorithms addressing pure parallelism [3, 11] , and recent work on multicore sorting [5, 4, 6, 16] .  ... 
doi:10.1145/3040221 fatcat:sqh4ozlzq5fq5eommeo6onrscy

Oblivious algorithms for multicores and networks of processors

Rezaul Alam Chowdhury, Vijaya Ramachandran, Francesco Silvestri, Brandon Blakeley
2013 Journal of Parallel and Distributed Computing  
transposition, FFT, sorting, the Gaussian Elimination Paradigm, list ranking, and connected components. • Show that several of our multicore-oblivious algorithms translate into efficient network-oblivious  ...  h i g h l i g h t s • Introduce the notion of multicore-oblivious algorithms. • Propose a hierarchical multi-level caching model for multicores. • Present efficient multicore-oblivious algorithms for matrix  ...  Sorting Sample Partition Merge Sort (SPMS) is a resource-oblivious algorithm for sorting on a multicore with just private caches [22] .  ... 
doi:10.1016/j.jpdc.2013.04.008 fatcat:ezthxkpdszfydgwhdgcp2m7i2e

Oblivious algorithms for multicores and network of processors

Rezaul Alam Chowdhury, Francesco Silvestri, Brandon Blakeley, Vijaya Ramachandran
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
We then use the network oblivious framework proposed earlier as an oblivious framework for a network of processors, and we present provably efficient network-oblivious algorithms for sorting, the Gaussian  ...  First, and of independent interest, we propose HM, a hierarchical multi-level caching model for multicores, and we propose a multicore-oblivious approach to algorithms and schedulers for HM.  ...  Sorting The network-oblivious algorithm for sorting n elements described in [7] is based on Column-Sort [27] and defined for an M(n) machine.  ... 
doi:10.1109/ipdps.2010.5470354 dblp:conf/ipps/ChowdhurySBR10 fatcat:wiynwlarl5cw7c5c7mgso5yzly

A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops

Chi-Keung Luk, Ryan Newton, William Hasenplaugh, Mark Hampton, Geoff Lowney
2011 IEEE Software  
In the era of multicores, many applications that tend to require substantial compute power and data crunching (aka Throughput Computing Applications) can now be run on desktop PCs.  ...  Our approach uses cache-oblivious techniques to divide a large problem into smaller subproblems which are mapped to different cores or threads.  ...  Our work shows that cache-oblivious techniques can also work well in practice on multicore processors.  ... 
doi:10.1109/ms.2011.2 fatcat:3ysms4aeebarpfhdgbzprloyxi

Cache-Adaptive Algorithms [chapter]

Michael A. Bender, Roozbeh Ebrahimi, Jeremy T. Fineman, Golnaz Ghasemiesfeh, Rob Johnson, Samuel McCauley
2013 Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms  
We also establish that if a cache-oblivious algorithm is optimal on "square" (well-behaved) memory profiles then, given resource augmentation it is optimal on all memory profiles.  ...  While the cache-oblivious sorting algorithm Lazy Funnel Sort does not have this recursive structure, we prove that it is nonetheless optimally cache-adaptive.  ...  Acknowledgments We gratefully acknowledge Goetz Graefe, Harumi Kuno, Bradley Kuszmaul, and Sivaramakrishnan Narayanan for discussions on memory adaptivity in databases.  ... 
doi:10.1137/1.9781611973402.71 dblp:conf/soda/BenderEFGJM14 fatcat:4qu2tcwed5fu3mrkxpvhu7f6hq

Efficient cache oblivious algorithms for randomized divide-and-conquer on the multicore model [article]

Neeraj Sharma, Sandeep Sen
2012 arXiv   pre-print
In this paper we present randomized algorithms for sorting and convex hull that achieves optimal performance (for speed-up and cache misses) on the multicore model with private cache model.  ...  We also present a simple randomized processor allocation technique without the explicit knowledge of the number of processors that is likely to find additional applications in resource oblivious environments  ...  Recently Cole and Ramachandran [11] presented a new optimal merge sort algorithm (SPMS) for resource oblivious multicore model.  ... 
arXiv:1204.6508v2 fatcat:2sjhtopfhjdkdhcyj3riwfmq4i

Mixed-criticality scheduling with memory bandwidth regulation

Muhammad Ali Awan, Pedro F. Souto, Konstantinos Bletsas, Benny Akesson, Eduardo Tovar
2018 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)  
Ourexperiments show that stall-oblivious schedulability analysis maybe optimistic due to contention on shared memory resources.  ...  Our experiments show that stall-oblivious schedulability analysis may be optimistic due to contention on shared memory resources.  ...  Nevertheless, AMC-rtb-FF is not entirely oblivious to memory demand as task-set sorting is still based on C m|κ i i Ti , hence, there is a smaller dip in Figure 5 at 0.  ... 
doi:10.23919/date.2018.8342211 dblp:conf/date/AwanSBAT18 fatcat:apnzimrx6vhl7lsv7pf4x35awa

Efficient Resource Oblivious Algorithms for Multicores with False Sharing

Richard Cole, Vijaya Ramachandran
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium  
., cache-line) in parallel, and at least one processor writes into a location in the block.  ...  We consider algorithms for a multicore environment in which each core has its own private cache and false sharing can occur.  ...  Resource Obliviousness.  ... 
doi:10.1109/ipdps.2012.28 dblp:conf/ipps/ColeR12 fatcat:ny6hz4nmgzcbbdybwefqrgqvgq

Hardware-oblivious parallelism for in-memory column-stores

Max Heimel, Michael Saecker, Holger Pirk, Stefan Manegold, Volker Markl
2013 Proceedings of the VLDB Endowment  
We propose an alternative design for a parallel database engine, based on a single set of hardware-oblivious operators, which are compiled down to the actual hardware at runtime.  ...  Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on laborintensive and error-prone manual tuning to exploit the full potential  ...  With this sort strategy, both on CPU and GPU Ocelot outperforms MonetDB's sort algorithm, which is based on quick-and mergesort.  ... 
doi:10.14778/2536360.2536370 fatcat:lhhr6q57c5csrg37s5563wq2ku

Efficient Resource Oblivious Algorithms for Multicores [article]

Richard Cole, Vijaya Ramachandran
2011 arXiv   pre-print
PWS schedules without using cache or block size information, and uses knowledge of processors only to the extent of determining the available locations from which tasks may be stolen; thus it schedules resource-obliviously  ...  We characterize the class of 'Hierarchical Balanced Parallel (HBP)' multithreaded computations for multicores.  ...  Schedulers and Resource Obliviousness.  ... 
arXiv:1103.4071v1 fatcat:g3rr2qmvfna6xbevugy5hx6jyu

Chapter 5. Realistic Computer Models [chapter]

Deepak Ajwani, Henning Meyerhenke
2010 Lecture Notes in Computer Science  
Cache-Oblivious Sorting Brodal et al.  ...  In the cache-oblivious setting, funnelsort [308] and lazy funnelsort [131] , also based on the merging framework, lead to sorting algorithms with a similar I/O complexity.  ... 
doi:10.1007/978-3-642-14866-8_5 fatcat:j326q2ymeffzfmo36nqst7msmq


Michel A. Kinsy, Michael Pellauer, Srinivas Devadas
2013 Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '13  
This paper presents Heracles, an open-source, functional, parameterized, synthesizable multicore system toolkit.  ...  It is a component-based framework with parameterized interfaces and strong emphasis on module reusability. The compiler toolchain is used to map C or C++ based applications onto the processing units.  ...  HAsim [18] , for example, has shown using its time multiplexing technique how one can model a shared-memory multicore system including detailed core pipelines, cache hierarchy, and on-chip network, on  ... 
doi:10.1145/2435264.2435287 dblp:conf/fpga/KinsyPD13 fatcat:63fcylghyjcrrmhgyao554rk2u

AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance

Muneeb Khan, Michael A. Laurenzanoy, Jason Marsy, Erik Hagersten, David Black-Schaffer
2015 2015 International Conference on Parallel Architecture and Compilation (PACT)  
A multitude of experiments with workload mixes and parallel applications on a modern high performance multicore show that AREP can increase throughput by up to 49% (8.1% on average).  ...  multicore processors.  ...  Khan et al. developed a resource-efficient software prefetching method to scale performance in multicores when shared resources are constrained [8] .  ... 
doi:10.1109/pact.2015.35 dblp:conf/IEEEpact/KhanLMHB15 fatcat:kg3qftrwynenhkcqdtsumnbs6u

Maximizing Performance Under a Power Cap

Huazhe Zhang, Henry Hoffmann
2016 SIGPLAN notices  
Power and thermal dissipation constrain multicore performance scaling.  ...  On average, PUPiL outperforms hardware by from 1.18-2.4× depending on workload and power target.  ...  The effort on this project is funded by the U.S.  ... 
doi:10.1145/2954679.2872375 fatcat:atrmpd43hzd4td5bcgqhufzrwy
« Previous Showing results 1 — 15 out of 391 results