A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Reducing the burden of parallel loop schedulers for many‐core processors
2021
Concurrency and Computation
This article enhances the scalability of parallel loop schedulers by specializing schedulers for fine-grain loops. ...
Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. ...
The speedup grows with increasing thread count, indicating that future many-core processors will be even more susceptible to scheduler burden. ...
doi:10.1002/cpe.6241
fatcat:4rluruunxjb4dehant4kjl354e
Reducing the burden of parallel loop schedulers for many‐core processors
2021
This article enhances the scalability of parallel loop schedulers by specializing schedulers for fine‐grain loops. ...
Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. ...
The speedup grows with increasing thread count, indicating that future many-core processors will be even more susceptible to scheduler burden. ...
doi:10.17863/cam.71347
fatcat:m4y6hdgbfff2nfdvmkophzusui
The Cilkview scalability analyzer
2010
Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures - SPAA '10
In addition, Cilkview analyzes scheduling overhead using the concept of a "burdened dag," which allows it to diagnose performance problems in the application due to an insufficient grain size of parallel ...
These metrics allow Cilkview to estimate parallelism and predict how the application will scale with the number of processing cores. ...
The Cilk++ parallel memcpy replaces the ÓÖ loop of the serial implementation with a Ð ÓÖ loop to enable parallelism. ...
doi:10.1145/1810479.1810509
dblp:conf/spaa/HeLL10
fatcat:mspvwpghfnahba3hfar5vidxqq
Research on the construction and simulation of PO-Dijkstra algorithm model in parallel network of multicore platform
2020
EURASIP Journal on Wireless Communications and Networking
The development of multicore hardware has provided many new development opportunities for many application software algorithms. ...
Using "divide by data" will reduce the cost and management difficulty of real-time temperature. Using "divide by function" is a good choice for streaming media data. ...
Acknowledgements No Authors' contributions Bo Zhang is responsible for the experimental part of the article, and DeJi Hu is responsible for the theoretical part of the article. ...
doi:10.1186/s13638-020-01680-x
fatcat:6ntzysupyjcptdgudj2poaewfm
Implementing communications systems on an SDR SoC
2008
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
In this paper we present techniques for implementing communications systems in software. We describe briefly the SB3011 platform and programming environment. ...
Software Defined Radios (SDRs) offer a programmable and dynamically reconfigurable method of reusing hardware to implement the physical layer processing of multiple communications systems. ...
To enable physical layer processing in software, processors should support many levels of parallelism. ...
doi:10.1109/icassp.2008.4518876
dblp:conf/icassp/GlossnerIMJNSIYSSPFT08
fatcat:lsvfngv4dfaudcskavuffu25zu
Lazy binary-splitting
2010
Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10
Besides being tedious, this tuning also over-fits the code to some particular dataset, platform and calling context of the do-all loop, resulting in poor performance portability for the code. ...
This threshold limits the parallelism and prevents excessive overheads for finegrain parallelism. ...
Motivation for Dynamic Scheduling Static scheduling of doall loops is easy: the number of iterations can be divided by the number of processors at run-time to yield how many iterations each processor should ...
doi:10.1145/1693453.1693479
dblp:conf/ppopp/TzannesCBV10
fatcat:j3x6vvurtrhvzj53253riur3ee
Lazy binary-splitting
2010
SIGPLAN notices
Besides being tedious, this tuning also over-fits the code to some particular dataset, platform and calling context of the do-all loop, resulting in poor performance portability for the code. ...
This threshold limits the parallelism and prevents excessive overheads for finegrain parallelism. ...
Motivation for Dynamic Scheduling Static scheduling of doall loops is easy: the number of iterations can be divided by the number of processors at run-time to yield how many iterations each processor should ...
doi:10.1145/1837853.1693479
fatcat:sm26nqo3ifhonndv6veqtoijhi
Machine learning based online performance prediction for runtime parallelization and task scheduling
2009
2009 IEEE International Symposium on Performance Analysis of Systems and Software
With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. ...
However, many systems lack a priori knowledge about the execution time of all tasks to perform effective load balancing with low scheduling overhead. ...
We are thankful to Hao (Helen) Zhang and Mihye Ahn from Department of Statistics at NCSU for providing the LMM application for our evaluation. ...
doi:10.1109/ispass.2009.4919641
dblp:conf/ispass/LiMSSSM09
fatcat:egyj4zpw2bas3guozkm3jtufqa
Exploiting inter-thread temporal locality for chip multithreading
2010
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
While this has been studied for concurrent threads with disjoint working sets, the problem has not been addressed for multi-threaded data-parallel workloads in which threads can be scheduled or constructed ...
This paper proposes the symbiotic affinity scheduling (SAS) algorithm in which work is first partitioned according to the number of cores (i.e., the number of caches), and these partitions are then subdivided ...
We would like to thank Shuai Che and Jiawei Huang who helped us on the coding of HotSpot and LU for benchmarking, Jie Li who modeled the hardware schedulers in FPGA, and Michael Boyer and Mario Donato ...
doi:10.1109/ipdps.2010.5470465
dblp:conf/ipps/MengSS10
fatcat:6b33ba2lmzcnzjo24mlogswgza
Predicting Potential Speedup of Serial Code via Lightweight Profiling and Emulations with Memory Performance Model
2012
2012 IEEE 26th International Parallel and Distributed Processing Symposium
Parallel Prophet models many realistic features of parallel programs: unbalanced workload, multiple critical sections, nested and recursive parallelism, and specific thread schedulings and paradigms, which ...
With Parallel Prophet, programmers simply insert annotations that describe the parallel behavior of the serial program. ...
Each benchmark is estimated by (1) the synthesizer without the memory model ('Pred'), (2) the synthesizer with the memory model ('PredM'), and (3) Suitability ('Suit'). ...
doi:10.1109/ipdps.2012.128
dblp:conf/ipps/KimKKB12
fatcat:hcolajzgqfayzduw4nnb74fkt4
Runtime Aware Architectures
2018
Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation - SIGSIM-PADS '18
The runtime of the parallel application has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and resilience that multi-cores have. ...
) in superscalar processors. ...
Acknowledgments This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2012-34557, the HiPEAC Network of Excellence, and by the European Research Council ...
doi:10.1145/3200921.3204479
dblp:conf/pads/Cortes18
fatcat:ctgvsceil5cgxpba7hhoy5f3ae
Exploiting Both Pipelining and Data Parallelism with SIMD Reconfigurable Architecture
[chapter]
2012
Lecture Notes in Computer Science
number of cores. ...
We further present data tiling and evaluate a conflict-free scheduling algorithm as a way to eliminate bank conflicts for a certain class of iteration and data mapping. ...
Also for large loops with many operations in the loop body, our small core might not be a good match. ...
doi:10.1007/978-3-642-28365-9_4
fatcat:n2mauiwx65a2vic6wripfqjkm4
Scheduling task parallelism on multi-socket multicore systems
2011
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '11
The recent addition of task parallelism to the OpenMP shared memory API allows programmers to express concurrency at a high level of abstraction and places the burden of scheduling parallel execution on ...
For cores on the same chip, a shared LIFO queue allows exploitation of cache locality between sibling tasks as well between a parent task and its newly created child tasks. ...
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04 ...
doi:10.1145/1988796.1988804
fatcat:r7fcxjxulbe7pm66zacsdn2gam
The Cilk++ concurrency platform
2009
Proceedings of the 46th Annual Design Automation Conference on ZZZ - DAC '09
The availability of multicore processors across a wide range of computing platforms has created a strong demand for software frameworks that can harness these resources. ...
The Cilk++ runtime system guarantees to load-balance computations effectively. ...
Thanks to Patrick Madden of SUNY Binghamton for proposing extensive revisions to the original manuscript. ...
doi:10.1145/1629911.1630048
dblp:conf/dac/Leiserson09
fatcat:5oenlyp7gvfidgh2snrrik7vdi
Multicore compilation strategies and challenges
2009
IEEE Signal Processing Magazine
This article provides an overview of parallelism and compiler technology to help the community understand the software development challenges and opportunities for multicore signal processors. ...
The burden is placed on software developers and tools to find and exploit coarse-grain parallelism to effectively make use of the abundance of computing resources provided by these systems. ...
Many new languages have been proposed to ease the burden of writing parallel programs, including Atomos, Cilk, and StreamIt. ...
doi:10.1109/msp.2009.934117
fatcat:xacwdf6mljdvnafkb3m5kfjlfu
« Previous
Showing results 1 — 15 out of 5,154 results