Filters








1,982 Hits in 3.3 sec

Barrier elision for production parallel programs

Milind Chabbi, Wim Lavrijsen, Wibe de Jong, Koushik Sen, John Mellor-Crummey, Costin Iancu
2015 SIGPLAN notices  
In this paper, we present context-sensitive dynamic optimizations that elide barriers redundant during the program execution.  ...  In our technique, we perform data race detection alongside the program to identify redundant barriers in their calling contexts; after an initial learning, we start eliding all future instances of barriers  ...  Acknowledgments Support for this work was provided in part through the X-Stack program funded by the U.S.  ... 
doi:10.1145/2858788.2688502 fatcat:ijdxemwqcvfefihilzvif3hyn4

Barrier elision for production parallel programs

Milind Chabbi, Wim Lavrijsen, Wibe de Jong, Koushik Sen, John Mellor-Crummey, Costin Iancu
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
In this paper, we present context-sensitive dynamic optimizations that elide barriers redundant during the program execution.  ...  In our technique, we perform data race detection alongside the program to identify redundant barriers in their calling contexts; after an initial learning, we start eliding all future instances of barriers  ...  Acknowledgments Support for this work was provided in part through the X-Stack program funded by the U.S.  ... 
doi:10.1145/2688500.2688502 dblp:conf/ppopp/ChabbiLJSMI15 fatcat:u64m4ekyhzhdlimzujxenrj47q

Using Barrier Elision to Improve Transactional Code Generation

Bruno Chinelato Honorio, João Paulo Labegalini de Carvalho, Catalina Munoz Morales, Alexandro Baldassin, Guido Araujo
2022 ACM Transactions on Architecture and Code Optimization (TACO)  
Furthermore, it shows that, by correctly using the annotations on just a few lines of code, it is possible to reduce the total number of instrumented barriers by 95% and to achieve speed-ups of up to 7x  ...  compiler framework, which is decoupled from any TM runtime, and presents the following novel contributions: (a) it shows that STM's performance overhead, due to an excessive amount of read and write barriers  ...  ACKNOWLEDGMENTS This work was supported by FAPESP, and the Center for Computational Engineering and Sciences (CCES).  ... 
doi:10.1145/3533318 fatcat:s4jfcnwz3zcfnjfkqppviuel7a

Performance evaluation of Intel® transactional synchronization extensions for high-performance computing

Richard M. Yoo, Christopher J. Hughes, Konrad Lai, Ravi Rajwar
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
When applied to a parallel user-level TCP/IP stack, Intel TSX provides 1.31x average bandwidth improvement on network intensive applications.  ...  ACKNOWLEDGMENTS We thank the anonymous reviewers for their constructive feedback.  ...  We also thank Pradeep Dubey, Ronak Singhal, Joseph Curley, Justin Gottschlich, and Tatiana Shpeisman for their feedback on the paper.  ... 
doi:10.1145/2503210.2503232 dblp:conf/sc/YooHLR13 fatcat:aae73ggaxvcs3jastqcmcw2eym

A Comparative Study of Asynchronous Many-Tasking Runtimes: Cilk, Charm++, ParalleX and AM++ [article]

Abhishek Kulkarni, Andrew Lumsdaine
2019 arXiv   pre-print
With the emergence of the next generation of supercomputers, it is imperative for parallel programming models to evolve and address the integral challenges introduced by the increasing scale.  ...  The comparison study includes a survey of each runtime system's programming models, their corresponding execution models, their stated features, and performance and productivity goals.  ...  Programming Model A parallel programming model provides the constructs for exposing and expressing latent parallelism in a program.  ... 
arXiv:1904.00518v1 fatcat:euvfhakryzcbdmhpbrrxrfu6he

BCL: A Cross-Platform Distributed Container Library [article]

Benjamin Brock, Aydın Buluç, Katherine Yelick
2019 arXiv   pre-print
One-sided communication is a useful paradigm for irregular parallel applications, but most one-sided programming environments, including MPI's one-sided interface and PGAS programming languages, lack application  ...  The BCL Core has backends for MPI, OpenSHMEM, GASNet-EX, and UPC++, allowing BCL data structures to be used natively in programs written using any of these programming environments.  ...  HPX is a task-based runtime system for parallel C++ programs [21] .  ... 
arXiv:1810.13029v2 fatcat:ip5tuv3e35hz3ecp4nzrbliydi

FlexBulk

Rishi Agarwal, Josep Torrellas
2011 Proceeding of the 38th annual international symposium on Computer architecture - ISCA '11  
Such architectures can boost both performance and software productivity, and enable unique compiler optimization opportunities.  ...  parallel program development and debugging such as deterministic execution [9] , parallel program replay [15] , and atomicity violation debugging [12] .  ...  For example, the two flavors of barrier are effective. Highcontention critical sections are optimized with Head&Tail Commit & Stall and with Lock Elision.  ... 
doi:10.1145/2000064.2000070 dblp:conf/isca/AgarwalT11 fatcat:wdsiaghfprg55hzwbxf5lv26sq

FlexBulk

Rishi Agarwal, Josep Torrellas
2011 SIGARCH Computer Architecture News  
Such architectures can boost both performance and software productivity, and enable unique compiler optimization opportunities.  ...  parallel program development and debugging such as deterministic execution [9] , parallel program replay [15] , and atomicity violation debugging [12] .  ...  For example, the two flavors of barrier are effective. Highcontention critical sections are optimized with Head&Tail Commit & Stall and with Lock Elision.  ... 
doi:10.1145/2024723.2000070 fatcat:vevqq6flpjdazeaos2ddpsyhg4

Programming with exceptions in JCilk

John S. Danaher, I.-Ting Angelina Lee, Charles E. Leiserson
2006 Science of Computer Programming  
Speculation is essential in order to parallelize programs such as branch-and-bound or heuristic search.  ...  We show how JCilk's linguistic mechanisms can be used to program the "queens" puzzle and a parallel alpha-beta search.  ...  If the JCilk keywords for parallel control are elided from a JCilk program, however, a syntactically correct Java program results, which we call the serial elision [11] of the JCilk program.  ... 
doi:10.1016/j.scico.2006.05.008 fatcat:5i2wrjdu7rextkfmq323gk5oxq

D atom loss in the photodissociation of the DNCN radical: Implications for prompt NO formation

David E. Szpunar, Ann Elise Faulhaber, Kathryn E. Kautzman, Paul E. Crider, Daniel M. Neumark
2007 Journal of Chemical Physics  
The results suggest a relatively facile pathway for the reaction CH + N 2 → H + NCN that proceeds through the HNCN intermediate and support a recently proposed mechanism for prompt NO production in flames  ...  translational energy distributions describing the D + NCN channel are peaked at low energy, consistent with internal conversion to the ground state followed by statistical decay and the absence of an exit barrier  ...  Data obtained with both parallel and perpendicular laser polarization are shown.  ... 
doi:10.1063/1.2710271 pmid:17381210 fatcat:poahizgqcjfuljywe6unvcgq24

Making lock-free data structures verifiable with artificial transactions

Xinhao Yuan, David Williams-King, Junfeng Yang, Simha Sethumadhavan
2015 Proceedings of the 8th Workshop on Programming Languages and Operating Systems - PLOS '15  
Among all classes of parallel programming abstractions, lock-free data structures are considered one of the most scalable and efficient because of their fine-grained style of synchronization.  ...  However, they are also challenging for developers and tools to verify because of the huge number of possible interleavings that result from fine-grained synchronizations.  ...  Speculative Lock Elision. A technique related to TXIT is speculative lock elision.  ... 
doi:10.1145/2818302.2818309 dblp:conf/sosp/YuanWYS15 fatcat:62beyourzva45jmvja35jzd5ue

Making Lock-free Data Structures Verifiable with Artificial Transactions

Xinhao Yuan, David Williams-King, Junfeng Yang, Simha Sethumadhavan
2016 ACM SIGOPS Operating Systems Review  
Among all classes of parallel programming abstractions, lock-free data structures are considered one of the most scalable and efficient because of their fine-grained style of synchronization.  ...  However, they are also challenging for developers and tools to verify because of the huge number of possible interleavings that result from fine-grained synchronizations.  ...  Speculative Lock Elision. A technique related to TXIT is speculative lock elision.  ... 
doi:10.1145/2883591.2883603 fatcat:er4v433zqjcyfazm62zx52vlpa

Bottleneck identification and scheduling in multithreaded applications

José A. Joao, M. Aater Suleman, Onur Mutlu, Yale N. Patt
2012 Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12  
Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages.  ...  BIS identifies which bottlenecks are likely to reduce performance by measuring the number of cycles threads have to wait for each bottleneck, and accelerates those bottlenecks using one or more fast cores  ...  Acknowledgments We thank Eiman Ebrahimi, Veynu Narasiman, Santhosh Srinath, other members of the HPS research group, our shepherd Ras Bodik and the anonymous reviewers for their comments and suggestions  ... 
doi:10.1145/2150976.2151001 dblp:conf/asplos/JoaoSMP12 fatcat:iwf4i7vfy5gx7gddspyjjtun3u

Compiler and runtime techniques for software transactional memory optimization

Peng Wu, Maged M. Michael, Christoph von Praun, Takuya Nakaike, Rajesh Bordawekar, Harold W. Cain, Calin Cascaval, Siddhartha Chatterjee, Stefanie Chiras, Rui Hou, Mark Mergen, Xiaowei Shen (+3 others)
2009 Concurrency and Computation  
We present initial work on supporting automatic instrumentation of STM primitives for C/C++ and Java programs in the IBM XL compiler and J9 JVM.  ...  We evaluate and discuss the performance of several transactional programs running on our system.  ...  form of speculative lock elision [31] .  ... 
doi:10.1002/cpe.1336 fatcat:fnbq5vnwenejxez5kr2xiv375a

Bottleneck identification and scheduling in multithreaded applications

José A. Joao, M. Aater Suleman, Onur Mutlu, Yale N. Patt
2012 SIGARCH Computer Architecture News  
Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages.  ...  BIS identifies which bottlenecks are likely to reduce performance by measuring the number of cycles threads have to wait for each bottleneck, and accelerates those bottlenecks using one or more fast cores  ...  Acknowledgments We thank Eiman Ebrahimi, Veynu Narasiman, Santhosh Srinath, other members of the HPS research group, our shepherd Ras Bodik and the anonymous reviewers for their comments and suggestions  ... 
doi:10.1145/2189750.2151001 fatcat:gdo2bg5wpvbxho53a5cg6mctte
« Previous Showing results 1 — 15 out of 1,982 results