Filters








22 Hits in 7.0 sec

A hierarchical approach to reducing communication in parallel graph algorithms

Harshvardhan, Nancy M. Amato, Lawrence Rauchwerger
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex.  ...  The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent.  ...  PPoPP'15, February 7-11, 2015, San Francisco, CA, USA.  ... 
doi:10.1145/2688500.2700994 dblp:conf/ppopp/HarshvardhanAR15 fatcat:xfa5a2hp2rexflc2yqkhtziaum

Combining phase identification and statistic modeling for automated parallel benchmark generation

Ye Jin, Mingliang Liu, Xiaosong Ma, Qing Liu, Jeremy Logan, Norbert Podhorszki, Jong Youl Choi, Scott Klasky
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications.  ...  Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks.  ...  PPoPP'15, , February 7-11, 2015, San Francisco, CA, USA.  ... 
doi:10.1145/2688500.2688541 dblp:conf/ppopp/JinLMLLPCK15 fatcat:l4pull2eorebvf6wzlgxdpsgki

An OpenACC-based unified programming model for multi-accelerator systems

Jungwon Kim, Seyong Lee, Jeffrey S. Vetter
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC.  ...  This paper proposes a novel SPMD programming model of Ope-nACC.  ...  The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide  ... 
doi:10.1145/2688500.2688531 dblp:conf/ppopp/KimLV15 fatcat:kuhsqh4kkrhohkej7qezsluxwu

A collection-oriented programming model for performance portability

Saurav Muralidharan, Michael Garland, Bryan Catanzaro, Albert Sidelnik, Mary Hall
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
Surge exposes a code generation interface, decoupled from the core computation, that enables programmers and autotuners to easily generate multiple implementations of the same computation on various parallel  ...  This paper describes Surge, a collection-oriented programming model that enables programmers to compose parallel computations using nested high-level data collections and operators.  ...  PPoPP'15, February 7-11, 2015, San Francisco, CA, USA.  ... 
doi:10.1145/2688500.2688537 dblp:conf/ppopp/MuralidharanGCSH15 fatcat:itm5vile45f7zaxmdmckqk6rtq

SemCache++: semantics-aware caching for efficient multi-GPU offloading

Nabeel Al-Saber, Milind Kulkarni
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
Sem-Cache++ is used to build the first multi-GPU drop-in replacement library that (a) uses the virtual memory to automatically manage and optimize multi-GPU communication and (b) requires no program rewriting  ...  Such encapsulation prevents the reuse of the data between successive kernel invocations resulting in redundant communication.  ...  PPoPP'15, , February 7-11, 2015, San Francisco, CA, USA.  ... 
doi:10.1145/2688500.2688527 dblp:conf/ppopp/AlSaberK15 fatcat:yyb4jnuavnbuviotyl64llr7q4

The lock-free k-LSM relaxed priority queue

Martin Wimmer, Jakob Gruber, Jesper Larsson Träff, Philippas Tsigas
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
We present a new, concurrent, lock-free priority queue that relaxes the delete-min operation to allow deletion of any of the ρ + 1 smallest keys instead of only a minimal one, where ρ is a parameter that  ...  For keys added and removed by the same thread the behavior is identical to a non-relaxed priority queue.  ...  ACM 978-1-4503-3205-7/15/02. http://dx.doi.org/10.1145/nnnnnnn.nnnnnnn PPoPP' 15 ,Figure 1 . 151 February 7-11, 2015, San Francisco, CA, USA ACM 978-1-4503-3205-7/15/02 http://dx.doi.org/10.1145/2688500.2688547  ... 
doi:10.1145/2688500.2688547 dblp:conf/ppopp/0003GTT15 fatcat:t5g2mgajqfflpdmkqhe5afppxq

Decoupled load balancing

Olga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Nancy M. Amato
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
A balanced assignment of the computational load is critical for parallel performance.  ...  We propose to decouple the load balance algorithm from the application, and to offload the load balance computation so that it runs concurrently with the application on a smaller number of processors.  ...  We have presented an approach to load balancing based on decoupling the load balance algorithm from the application and offloading the load balance computation to overlap it with application execution.  ... 
doi:10.1145/2688500.2688539 dblp:conf/ppopp/PearceGSSA15 fatcat:4qvqndcxjzbc5mf24svxcbkkwi

JAWS: a JavaScript framework for adaptive CPU-GPU work sharing

Xianglan Piao, Channoh Kim, Younghwan Oh, Huiying Li, Jincheon Kim, Hanjun Kim, Jae W. Lee
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
Unlike conventional heterogeneous parallel programming environments for JavaScript, which use only one compute device when executing a single kernel, JAWS accelerates kernel execution by exploiting both  ...  The JAWS runtime provides shared arrays for multiple parallel contexts, hence eliminating extra copy overhead for input and output data.  ...  Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). PPoPP'15, February 7-11, 2015, San Francisco, CA, USA.  ... 
doi:10.1145/2688500.2688525 dblp:conf/ppopp/PiaoKOLKKL15 fatcat:f3shorhnbngtrdjkohaxz5jnwi

Static/Dynamic validation of MPI collective communications in multi-threaded context

Emmanuelle Saillard, Patrick Carribault, Denis Barthou
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
Scientific applications mainly rely on the MPI parallel programming model to reach high performance on supercomputers.  ...  Thus, the correctness of hybrid programs requires a special care regarding MPI calls location.  ...  PPoPP'15, February 7-11, 2015, San Francisco, CA, USA. Copyright c 2015 ACM 978-1-4503-3205-7/15/02. . . $15.00. http://dx.doi.org/10.1145/2688500.2688548  ... 
doi:10.1145/2688500.2688548 dblp:conf/ppopp/SaillardCB15 fatcat:oc75l4pkajehpk62v6gc34u5tu

Fence placement for legacy data-race-free programs via synchronization read detection

Andrew J. McPherson, Vijay Nagarajan, Susmit Sarkar, Marcelo Cintra
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
Fence placement is required to ensure legacy parallel programs operate correctly on relaxed architectures. The challenge is to place as few fences as possible without comprising correctness.  ...  By identifying necessary conditions for a read to be an acquire we improve upon the state of the art for legacy DRF programs by up to 2.64x.  ...  Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). PPoPP'15, February 7-11, 2015, San Francisco, CA, USA.  ... 
doi:10.1145/2688500.2688524 dblp:conf/ppopp/McPhersonNSC15 fatcat:biulta5b7fhunegkm3az52kaam

Optimization of asynchronous graph processing on GPU with hybrid coloring model

Xuanhua Shi, Junling Liang, Sheng Di, Bingsheng He, Hai Jin, Lu Lu, Zhixiang Wang, Xuan Luo, Jianlong Zhong
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
Accordingly, our solution will separate the processing of the vertices based on the distribution of colors.  ...  We find that majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency  ...  Copyright is held by the owner/author(s). PPoPP '15, February 7-11, 2015, San Francisco, CA, USA. ACM 978-1-4503-3205-7/15/02.  ... 
doi:10.1145/2688500.2688542 dblp:conf/ppopp/ShiLDHJLWLZ15 fatcat:33u3utczobgp3fbcvolj6seqja

CASTLE: fast concurrent internal binary search tree using edge-based locking

Arunmoezhi Ramachandran, Neeraj Mittal
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
Our algorithm is based on an internal representation of a search tree and it operates at edge-level (locks edges) rather than at node-level (locks nodes); this minimizes the contention window of a write  ...  Some of the desirable characteristics of our algorithm are: (i) a search operation uses only read and write instructions, (ii) an insert operation does not acquire any locks, and (iii) a delete operation  ...  PPoPP'15, , February 7-11, 2015, San Francisco, CA, USA.  ... 
doi:10.1145/2688500.2688551 dblp:conf/ppopp/RamachandranM15 fatcat:s6je4xtluzgv3jp2dwecdzfbli

A programming model and runtime system for significance-aware energy-efficient computing

Vassilis Vassiliadis, Konstantinos Parasyris, Charalambos Chalios, Christos D. Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, Dimitrios S. Nikolopoulos
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
We introduce a task-based programming model and runtime system that exploit the observation that not all parts of a program are equally significant for the accuracy of the end-result, in order to trade  ...  off the quality of program outputs for increased energyefficiency.  ...  This work has been partially supported by the "Aristeia II" action (grant agreement 5211, project "Centaurus") of the operational program Education and Lifelong Learning and is co-funded by the European  ... 
doi:10.1145/2688500.2688546 dblp:conf/ppopp/VassiliadisPCALBVN15 fatcat:3iryidbhjvbqhacnzb3reyy3qq

Towards batched linear solvers on accelerated hardware platforms

Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, Jack Dongarra
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
In this paper, we describe the development of the main one-sided factorizations: LU, QR, and Cholesky; that are needed for a set of small dense matrices to work in parallel.  ...  We illustrate how our performance analysis together with the profiling and tracing tools guided the development of batched factorizations to achieve up to 2-fold speedup and 3-fold better energy efficiency  ...  PPoPP'15, February 7-11, 2015 Figure 1 .  ... 
doi:10.1145/2688500.2688534 dblp:conf/ppopp/HaidarDLTD15 fatcat:ie4m7pqjzjg7naiy5an3mtjl3u

PLUTO+: near-complete modeling of affine transformations for parallelism and locality

Aravind Acharya, Uday Bondhugula
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
We perform an experimental evaluation of both, the effect on compilation time, and performance of generated codes.  ...  The ensuing practical tradeoffs lead to the exclusion of certain useful transformations, in particular, transformation compositions involving loop reversals and loop skewing by negative factors.  ...  We thank the Parallel Computing group at Intel Labs Bangalore for donating us Intel compiler software, loaning equipment that was used in part for the experimental evaluation, and for discussions related  ... 
doi:10.1145/2688500.2688512 dblp:conf/ppopp/AcharyaB15 fatcat:co5tgybozjgu3fvweoszp4czdy
« Previous Showing results 1 — 15 out of 22 results