A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Hints to improve automatic load balancing with LeWI for hybrid applications
2014
Journal of Parallel and Distributed Computing
The DLB (Dynamic Load Balancing) library and LeWI (LEnd When Idle) algorithm provide a runtime solution to deal with the load imbalance of parallel applications independently of the source of imbalance ...
This work is a deep analysis of the sources of efficiency loss correlated with application characteristics, parallelization schemes and programming models. ...
LeWI (Lend When Idle) [1] is a load balancing algorithm that provides a runtime solution for load balancing problems of hybrid applications independently of the source of imbalance. ...
doi:10.1016/j.jpdc.2014.05.004
fatcat:noqletmngbfrliwjlscysxg5um
Compiler and runtime support for efficient software transactional memory
2006
SIGPLAN notices
We present a highperformance software transactional memory system (STM) integrated into a managed runtime environment. ...
This paper presents compiler and runtime optimizations for transactional memory language constructs. ...
Acknowledgments We'd like to thank Dan Grossman and the anonymous reviewers for their feedback on this paper. ...
doi:10.1145/1133255.1133985
fatcat:74y2op54xrfvjgvnvuxn4ozk24
Compiler and runtime support for efficient software transactional memory
2006
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation - PLDI '06
We present a highperformance software transactional memory system (STM) integrated into a managed runtime environment. ...
This paper presents compiler and runtime optimizations for transactional memory language constructs. ...
Acknowledgments We'd like to thank Dan Grossman and the anonymous reviewers for their feedback on this paper. ...
doi:10.1145/1133981.1133985
dblp:conf/pldi/Adl-TabatabaiLMMSS06
fatcat:p3khq7enrneatok732mqktw3ti
Design and evaluation of a compiler for embedded stream programs
2008
SIGPLAN notices
We have used our language to build and deploy a sensornetwork for the acoustic localization of wild animals, in particular, the Yellow-Bellied marmot. ...
Applications that combine live data streams with embedded, parallel, and distributed processing are becoming more commonplace. ...
Stream Graph Optimizations There are a breadth of well-understood transformations to static and dynamic dataflow graphs that adjust the parallelism within a graph-balancing load, exposing additional parallelism ...
doi:10.1145/1379023.1375675
fatcat:buh4kwpnkvccblfgrjy23z2jjm
GPU-accelerated simulations of isolated black holes
2018
Classical and quantum gravity
Since this code must be maintained in parallel with SpEC itself, a primary design consideration is to perform as few explicit code changes as possible. ...
We therefore rely on a hierarchy of automated porting strategies. ...
Acknowledgments We thank Nils Deppe and Mark Scheel for helpful discussions. Calculations were performed with the SpEC-code [32]. ...
doi:10.1088/1361-6382/aab256
fatcat:a47tr4i7hjf2tfszdwxpyippqy
Design and evaluation of a compiler for embedded stream programs
2008
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems - LCTES '08
We have used our language to build and deploy a sensornetwork for the acoustic localization of wild animals, in particular, the Yellow-Bellied marmot. ...
Applications that combine live data streams with embedded, parallel, and distributed processing are becoming more commonplace. ...
Stream Graph Optimizations There are a breadth of well-understood transformations to static and dynamic dataflow graphs that adjust the parallelism within a graph-balancing load, exposing additional parallelism ...
doi:10.1145/1375657.1375675
dblp:conf/lctrts/NewtonGCMM08
fatcat:vzuxt43dzbe37cgouwqimiyppe
ZPL: a machine independent programming language for parallel computers
2000
IEEE Transactions on Software Engineering
The goal of producing architecture-independent parallel programs is complicated by the competing need for high performance. ...
This paper describes ZPL and provides a comprehensive evaluation of the language with respect to its goals of performance, portability, and programming convenience. ...
We thank the anonymous referees for their helpful comments. ...
doi:10.1109/32.842947
fatcat:phdanl2m6rgbzijh2nb2kq24sm
Robust sampling for weak lensing and clustering analyses with the Dark Energy Survey
[article]
2022
arXiv
pre-print
We determine that provides a good balance of speed and robustness, and recommend different settings for testing purposes and final chains for analyses with DES Y3 data. ...
We find that the ellipsoidal nested sampling algorithm reports inconsistent estimates of the Bayesian evidence and somewhat narrower parameter credible intervals than the sliced nested sampling implemented ...
Frieman), which is managed the Association of Universities for Research in Astronomy (AURA) under a cooperative agreement with the National Science Foundation. ...
arXiv:2202.08233v1
fatcat:fbg3ixw27vbihmx3dlydfiuro4
A compiler optimization algorithm for shared-memory multiprocessors
1998
IEEE Transactions on Parallel and Distributed Systems
This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, sharedmemory multiprocessors. ...
The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. ...
Conclusions This paper presents a new parallelization algorithm that balances parallelism and data locality. ...
doi:10.1109/71.706049
fatcat:3m5odkybzvgm3putgvlki3aznu
Cobaya: Code for Bayesian Analysis of hierarchical physical models
[article]
2021
arXiv
pre-print
novel algorithm. ...
It can exploit hybrid OpenMP/MPI parallelization, and has sub-millisecond overhead per posterior evaluation. ...
Applicability to PolyChord PolyChord [3, 4] is a nested sampling [38] algorithm that utilizes slice sampling [39] for sampling within isolikelihood contours. ...
arXiv:2005.05290v2
fatcat:v6bscunjb5gvrhvdkl4zsi7v4y
Where is software headed? A virtual roundtable
1995
Computer
Languages such as HPF and runtime libraries such as the University of Maryland's CHAOS library for irregular applications implement a shared address space using compilers or runtime code. ...
-Ted Lewis, Naval Postgraduate School GETTING SERIOUS. If parallel processing is to grow, it has to adapt to popular applications. ...
doi:10.1109/2.402054
fatcat:tfbfxmsew5ajppnpwof52gjyaa
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors
1994
Proceedings of the 8th international conference on Supercomputing - ICS '94
We present a parallel code generation algorithm for complete applications and a new experimental methodology that tests the efficacy of our approach. ...
., the compiler was required to use its analysis and algorithms to parallelize the program and could not rely on user assertions that for example, a loop is parallel. ...
Acknowledgements I especially want to thank Ken Kennedy, who provided impetuous and guidance for much of this research. ...
doi:10.1145/181181.181265
dblp:conf/ics/McKinley94
fatcat:4eulalgo3rc5naym2lhb2wjl6i
Multi-dimensional intra-tile parallelization for memory-starved stencil computations
[article]
2015
arXiv
pre-print
We propose a flexible multi-dimensional intra-tile parallelization method for stencil algorithms on multicore CPUs with a shared outer-level cache. ...
Optimizing the performance of stencil algorithms has been the subject of intense research over the last two decades. ...
ACKNOWLEDGMENTS For computer time, this research used the resources of the Extreme Computing Research Center (ECRC) at KAUST. The authors thank the ECRC for supporting T. Malas. ...
arXiv:1510.04995v1
fatcat:twbfi3zicbe7bdu3hgn7d37h7q
Reconstructing Hardware Transactional Memory for Workload Optimized Systems
[chapter]
2011
Lecture Notes in Computer Science
This biennial event provides a forum for representing this community's research efforts and exchanging viewpoints. ...
As an event that has taken place for 16 years, APPT aims at providing a high-quality program for all attendees. We accepted 13 papers out of 40 submissions, presenting an acceptance rate of 32.5%. ...
The block parallel scheme with balance allocation algorithm achieve a speedup by a factor of 3.94x. ...
doi:10.1007/978-3-642-24151-2_1
fatcat:32cx745cn5cfdm5sbeah6eyiey
The OpenMOC method of characteristics neutral particle transport code
2014
Annals of Nuclear Energy
The OpenMOC code is being developed at the Massachusetts Institute of Technology to investigate algorithmic acceleration techniques and parallel algorithms for MOC. ...
The method of characteristics (MOC) is a numerical integration technique for partial differential equations, and has seen widespread use for reactor physics lattice calculations. ...
Acknowledgments The software design principles employed for OpenMOC are in large part inspired by the legacy left behind by Paul Romano on the MIT Computational Reactor Physics Group. The ...
doi:10.1016/j.anucene.2013.12.012
fatcat:fr6tuyl2hbf7nfv4bd37w72gpm
« Previous
Showing results 1 — 15 out of 361 results