A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is application/pdf
.
Filters
Resource Oblivious Sorting on Multicores
[chapter]
2010
Lecture Notes in Computer Science
The PWS scheduler is both processor-and cache-oblivious (i.e., resource oblivious), and it tolerates asynchrony among the cores. ...
Using PWS, we obtain a resource oblivious scheduling of our sorting algorithm that matches the performance of the processor-aware version. Our analysis includes the delay incurred by false-sharing. ...
cache-oblivious sorting, for which provably optimal algorithms are known [15] , optimal sorting algorithms addressing pure parallelism [3, 11] , and recent work on multicore sorting [5, 4, 6, 16] . ...
doi:10.1007/978-3-642-14165-2_20
fatcat:bkb3otgap5hmlh2kgymmiffcve
Resource Oblivious Sorting on Multicores
2017
ACM Transactions on Parallel Computing
Finally, SPMS is resource oblivious in Athat the dependence on machine parameters appear only in the analysis of its performance, and not within the algorithm itself. ...
The parallel complexity (or critical path length) of the algorithm is O( n · n), which improves on previous bounds for optimal cache oblivious sorting. The algorithm also has low false sharing costs. ...
cache-oblivious sorting, for which provably optimal algorithms are known [15] , optimal sorting algorithms addressing pure parallelism [3, 11] , and recent work on multicore sorting [5, 4, 6, 16] . ...
doi:10.1145/3040221
fatcat:sqh4ozlzq5fq5eommeo6onrscy
Oblivious algorithms for multicores and networks of processors
2013
Journal of Parallel and Distributed Computing
transposition, FFT, sorting, the Gaussian Elimination Paradigm, list ranking, and connected components. • Show that several of our multicore-oblivious algorithms translate into efficient network-oblivious ...
h i g h l i g h t s • Introduce the notion of multicore-oblivious algorithms. • Propose a hierarchical multi-level caching model for multicores. • Present efficient multicore-oblivious algorithms for matrix ...
Sorting Sample Partition Merge Sort (SPMS) is a resource-oblivious algorithm for sorting on a multicore with just private caches [22] . ...
doi:10.1016/j.jpdc.2013.04.008
fatcat:ezthxkpdszfydgwhdgcp2m7i2e
Oblivious algorithms for multicores and network of processors
2010
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
We then use the network oblivious framework proposed earlier as an oblivious framework for a network of processors, and we present provably efficient network-oblivious algorithms for sorting, the Gaussian ...
First, and of independent interest, we propose HM, a hierarchical multi-level caching model for multicores, and we propose a multicore-oblivious approach to algorithms and schedulers for HM. ...
Sorting The network-oblivious algorithm for sorting n elements described in [7] is based on Column-Sort [27] and defined for an M(n) machine. ...
doi:10.1109/ipdps.2010.5470354
dblp:conf/ipps/ChowdhurySBR10
fatcat:wiynwlarl5cw7c5c7mgso5yzly
A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops
2011
IEEE Software
In the era of multicores, many applications that tend to require substantial compute power and data crunching (aka Throughput Computing Applications) can now be run on desktop PCs. ...
Our approach uses cache-oblivious techniques to divide a large problem into smaller subproblems which are mapped to different cores or threads. ...
Our work shows that cache-oblivious techniques can also work well in practice on multicore processors. ...
doi:10.1109/ms.2011.2
fatcat:3ysms4aeebarpfhdgbzprloyxi
Cache-Adaptive Algorithms
[chapter]
2013
Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms
We also establish that if a cache-oblivious algorithm is optimal on "square" (well-behaved) memory profiles then, given resource augmentation it is optimal on all memory profiles. ...
While the cache-oblivious sorting algorithm Lazy Funnel Sort does not have this recursive structure, we prove that it is nonetheless optimally cache-adaptive. ...
Acknowledgments We gratefully acknowledge Goetz Graefe, Harumi Kuno, Bradley Kuszmaul, and Sivaramakrishnan Narayanan for discussions on memory adaptivity in databases. ...
doi:10.1137/1.9781611973402.71
dblp:conf/soda/BenderEFGJM14
fatcat:4qu2tcwed5fu3mrkxpvhu7f6hq
Efficient cache oblivious algorithms for randomized divide-and-conquer on the multicore model
[article]
2012
arXiv
pre-print
In this paper we present randomized algorithms for sorting and convex hull that achieves optimal performance (for speed-up and cache misses) on the multicore model with private cache model. ...
We also present a simple randomized processor allocation technique without the explicit knowledge of the number of processors that is likely to find additional applications in resource oblivious environments ...
Recently Cole and Ramachandran [11] presented a new optimal merge sort algorithm (SPMS) for resource oblivious multicore model. ...
arXiv:1204.6508v2
fatcat:2sjhtopfhjdkdhcyj3riwfmq4i
Mixed-criticality scheduling with memory bandwidth regulation
2018
2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)
Ourexperiments show that stall-oblivious schedulability analysis maybe optimistic due to contention on shared memory resources. ...
Our experiments show that stall-oblivious schedulability analysis may be optimistic due to contention on shared memory resources. ...
Nevertheless, AMC-rtb-FF is not entirely oblivious to memory demand as task-set sorting is still based on C m|κ i i Ti , hence, there is a smaller dip in Figure 5 at 0. ...
doi:10.23919/date.2018.8342211
dblp:conf/date/AwanSBAT18
fatcat:apnzimrx6vhl7lsv7pf4x35awa
Efficient Resource Oblivious Algorithms for Multicores with False Sharing
2012
2012 IEEE 26th International Parallel and Distributed Processing Symposium
., cache-line) in parallel, and at least one processor writes into a location in the block. ...
We consider algorithms for a multicore environment in which each core has its own private cache and false sharing can occur. ...
Resource Obliviousness. ...
doi:10.1109/ipdps.2012.28
dblp:conf/ipps/ColeR12
fatcat:ny6hz4nmgzcbbdybwefqrgqvgq
Hardware-oblivious parallelism for in-memory column-stores
2013
Proceedings of the VLDB Endowment
We propose an alternative design for a parallel database engine, based on a single set of hardware-oblivious operators, which are compiled down to the actual hardware at runtime. ...
Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on laborintensive and error-prone manual tuning to exploit the full potential ...
With this sort strategy, both on CPU and GPU Ocelot outperforms MonetDB's sort algorithm, which is based on quick-and mergesort. ...
doi:10.14778/2536360.2536370
fatcat:lhhr6q57c5csrg37s5563wq2ku
Efficient Resource Oblivious Algorithms for Multicores
[article]
2011
arXiv
pre-print
PWS schedules without using cache or block size information, and uses knowledge of processors only to the extent of determining the available locations from which tasks may be stolen; thus it schedules resource-obliviously ...
We characterize the class of 'Hierarchical Balanced Parallel (HBP)' multithreaded computations for multicores. ...
Schedulers and Resource Obliviousness. ...
arXiv:1103.4071v1
fatcat:g3rr2qmvfna6xbevugy5hx6jyu
Chapter 5. Realistic Computer Models
[chapter]
2010
Lecture Notes in Computer Science
Cache-Oblivious Sorting Brodal et al. ...
In the cache-oblivious setting, funnelsort [308] and lazy funnelsort [131] , also based on the merging framework, lead to sorting algorithms with a similar I/O complexity. ...
doi:10.1007/978-3-642-14866-8_5
fatcat:j326q2ymeffzfmo36nqst7msmq
Heracles
2013
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '13
This paper presents Heracles, an open-source, functional, parameterized, synthesizable multicore system toolkit. ...
It is a component-based framework with parameterized interfaces and strong emphasis on module reusability. The compiler toolchain is used to map C or C++ based applications onto the processing units. ...
HAsim [18] , for example, has shown using its time multiplexing technique how one can model a shared-memory multicore system including detailed core pipelines, cache hierarchy, and on-chip network, on ...
doi:10.1145/2435264.2435287
dblp:conf/fpga/KinsyPD13
fatcat:63fcylghyjcrrmhgyao554rk2u
AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance
2015
2015 International Conference on Parallel Architecture and Compilation (PACT)
A multitude of experiments with workload mixes and parallel applications on a modern high performance multicore show that AREP can increase throughput by up to 49% (8.1% on average). ...
multicore processors. ...
Khan et al. developed a resource-efficient software prefetching method to scale performance in multicores when shared resources are constrained [8] . ...
doi:10.1109/pact.2015.35
dblp:conf/IEEEpact/KhanLMHB15
fatcat:kg3qftrwynenhkcqdtsumnbs6u
Maximizing Performance Under a Power Cap
2016
SIGPLAN notices
Power and thermal dissipation constrain multicore performance scaling. ...
On average, PUPiL outperforms hardware by from 1.18-2.4× depending on workload and power target. ...
The effort on this project is funded by the U.S. ...
doi:10.1145/2954679.2872375
fatcat:atrmpd43hzd4td5bcgqhufzrwy
« Previous
Showing results 1 — 15 out of 391 results