A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Barrier elision for production parallel programs
2015
SIGPLAN notices
In this paper, we present context-sensitive dynamic optimizations that elide barriers redundant during the program execution. ...
In our technique, we perform data race detection alongside the program to identify redundant barriers in their calling contexts; after an initial learning, we start eliding all future instances of barriers ...
Acknowledgments Support for this work was provided in part through the X-Stack program funded by the U.S. ...
doi:10.1145/2858788.2688502
fatcat:ijdxemwqcvfefihilzvif3hyn4
Barrier elision for production parallel programs
2015
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015
In this paper, we present context-sensitive dynamic optimizations that elide barriers redundant during the program execution. ...
In our technique, we perform data race detection alongside the program to identify redundant barriers in their calling contexts; after an initial learning, we start eliding all future instances of barriers ...
Acknowledgments Support for this work was provided in part through the X-Stack program funded by the U.S. ...
doi:10.1145/2688500.2688502
dblp:conf/ppopp/ChabbiLJSMI15
fatcat:u64m4ekyhzhdlimzujxenrj47q
Using Barrier Elision to Improve Transactional Code Generation
2022
ACM Transactions on Architecture and Code Optimization (TACO)
Furthermore, it shows that, by correctly using the annotations on just a few lines of code, it is possible to reduce the total number of instrumented barriers by 95% and to achieve speed-ups of up to 7x ...
compiler framework, which is decoupled from any TM runtime, and presents the following novel contributions: (a) it shows that STM's performance overhead, due to an excessive amount of read and write barriers ...
ACKNOWLEDGMENTS This work was supported by FAPESP, and the Center for Computational Engineering and Sciences (CCES). ...
doi:10.1145/3533318
fatcat:s4jfcnwz3zcfnjfkqppviuel7a
Performance evaluation of Intel® transactional synchronization extensions for high-performance computing
2013
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
When applied to a parallel user-level TCP/IP stack, Intel TSX provides 1.31x average bandwidth improvement on network intensive applications. ...
ACKNOWLEDGMENTS We thank the anonymous reviewers for their constructive feedback. ...
We also thank Pradeep Dubey, Ronak Singhal, Joseph Curley, Justin Gottschlich, and Tatiana Shpeisman for their feedback on the paper. ...
doi:10.1145/2503210.2503232
dblp:conf/sc/YooHLR13
fatcat:aae73ggaxvcs3jastqcmcw2eym
A Comparative Study of Asynchronous Many-Tasking Runtimes: Cilk, Charm++, ParalleX and AM++
[article]
2019
arXiv
pre-print
With the emergence of the next generation of supercomputers, it is imperative for parallel programming models to evolve and address the integral challenges introduced by the increasing scale. ...
The comparison study includes a survey of each runtime system's programming models, their corresponding execution models, their stated features, and performance and productivity goals. ...
Programming Model A parallel programming model provides the constructs for exposing and expressing latent parallelism in a program. ...
arXiv:1904.00518v1
fatcat:euvfhakryzcbdmhpbrrxrfu6he
BCL: A Cross-Platform Distributed Container Library
[article]
2019
arXiv
pre-print
One-sided communication is a useful paradigm for irregular parallel applications, but most one-sided programming environments, including MPI's one-sided interface and PGAS programming languages, lack application ...
The BCL Core has backends for MPI, OpenSHMEM, GASNet-EX, and UPC++, allowing BCL data structures to be used natively in programs written using any of these programming environments. ...
HPX is a task-based runtime system for parallel C++ programs [21] . ...
arXiv:1810.13029v2
fatcat:ip5tuv3e35hz3ecp4nzrbliydi
FlexBulk
2011
Proceeding of the 38th annual international symposium on Computer architecture - ISCA '11
Such architectures can boost both performance and software productivity, and enable unique compiler optimization opportunities. ...
parallel program development and debugging such as deterministic execution [9] , parallel program replay [15] , and atomicity violation debugging [12] . ...
For example, the two flavors of barrier are effective. Highcontention critical sections are optimized with Head&Tail Commit & Stall and with Lock Elision. ...
doi:10.1145/2000064.2000070
dblp:conf/isca/AgarwalT11
fatcat:wdsiaghfprg55hzwbxf5lv26sq
FlexBulk
2011
SIGARCH Computer Architecture News
Such architectures can boost both performance and software productivity, and enable unique compiler optimization opportunities. ...
parallel program development and debugging such as deterministic execution [9] , parallel program replay [15] , and atomicity violation debugging [12] . ...
For example, the two flavors of barrier are effective. Highcontention critical sections are optimized with Head&Tail Commit & Stall and with Lock Elision. ...
doi:10.1145/2024723.2000070
fatcat:vevqq6flpjdazeaos2ddpsyhg4
Programming with exceptions in JCilk
2006
Science of Computer Programming
Speculation is essential in order to parallelize programs such as branch-and-bound or heuristic search. ...
We show how JCilk's linguistic mechanisms can be used to program the "queens" puzzle and a parallel alpha-beta search. ...
If the JCilk keywords for parallel control are elided from a JCilk program, however, a syntactically correct Java program results, which we call the serial elision [11] of the JCilk program. ...
doi:10.1016/j.scico.2006.05.008
fatcat:5i2wrjdu7rextkfmq323gk5oxq
D atom loss in the photodissociation of the DNCN radical: Implications for prompt NO formation
2007
Journal of Chemical Physics
The results suggest a relatively facile pathway for the reaction CH + N 2 → H + NCN that proceeds through the HNCN intermediate and support a recently proposed mechanism for prompt NO production in flames ...
translational energy distributions describing the D + NCN channel are peaked at low energy, consistent with internal conversion to the ground state followed by statistical decay and the absence of an exit barrier ...
Data obtained with both parallel and perpendicular laser polarization are shown. ...
doi:10.1063/1.2710271
pmid:17381210
fatcat:poahizgqcjfuljywe6unvcgq24
Making lock-free data structures verifiable with artificial transactions
2015
Proceedings of the 8th Workshop on Programming Languages and Operating Systems - PLOS '15
Among all classes of parallel programming abstractions, lock-free data structures are considered one of the most scalable and efficient because of their fine-grained style of synchronization. ...
However, they are also challenging for developers and tools to verify because of the huge number of possible interleavings that result from fine-grained synchronizations. ...
Speculative Lock Elision. A technique related to TXIT is speculative lock elision. ...
doi:10.1145/2818302.2818309
dblp:conf/sosp/YuanWYS15
fatcat:62beyourzva45jmvja35jzd5ue
Making Lock-free Data Structures Verifiable with Artificial Transactions
2016
ACM SIGOPS Operating Systems Review
Among all classes of parallel programming abstractions, lock-free data structures are considered one of the most scalable and efficient because of their fine-grained style of synchronization. ...
However, they are also challenging for developers and tools to verify because of the huge number of possible interleavings that result from fine-grained synchronizations. ...
Speculative Lock Elision. A technique related to TXIT is speculative lock elision. ...
doi:10.1145/2883591.2883603
fatcat:er4v433zqjcyfazm62zx52vlpa
Bottleneck identification and scheduling in multithreaded applications
2012
Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12
Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages. ...
BIS identifies which bottlenecks are likely to reduce performance by measuring the number of cycles threads have to wait for each bottleneck, and accelerates those bottlenecks using one or more fast cores ...
Acknowledgments We thank Eiman Ebrahimi, Veynu Narasiman, Santhosh Srinath, other members of the HPS research group, our shepherd Ras Bodik and the anonymous reviewers for their comments and suggestions ...
doi:10.1145/2150976.2151001
dblp:conf/asplos/JoaoSMP12
fatcat:iwf4i7vfy5gx7gddspyjjtun3u
Compiler and runtime techniques for software transactional memory optimization
2009
Concurrency and Computation
We present initial work on supporting automatic instrumentation of STM primitives for C/C++ and Java programs in the IBM XL compiler and J9 JVM. ...
We evaluate and discuss the performance of several transactional programs running on our system. ...
form of speculative lock elision [31] . ...
doi:10.1002/cpe.1336
fatcat:fnbq5vnwenejxez5kr2xiv375a
Bottleneck identification and scheduling in multithreaded applications
2012
SIGARCH Computer Architecture News
Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages. ...
BIS identifies which bottlenecks are likely to reduce performance by measuring the number of cycles threads have to wait for each bottleneck, and accelerates those bottlenecks using one or more fast cores ...
Acknowledgments We thank Eiman Ebrahimi, Veynu Narasiman, Santhosh Srinath, other members of the HPS research group, our shepherd Ras Bodik and the anonymous reviewers for their comments and suggestions ...
doi:10.1145/2189750.2151001
fatcat:gdo2bg5wpvbxho53a5cg6mctte
« Previous
Showing results 1 — 15 out of 1,982 results