Filters








51 Hits in 4.1 sec

CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures

Yongpeng Zhang, Frank Mueller
2012 2012 41st International Conference on Parallel Processing  
We propose cuNesl, a compiler framework to translate and optimize NESL into parallel CUDA programs for SIMT architectures.  ...  By converting recursive calls into while loops, we ensure that the hierarchical execution model in GPUs can be exploited on the "flattened" code.  ...  In this work, we design a source-to-source compiler to directly convert NESL to CUDA code that can be efficiently executed on contemporary NVIDIA GPUs. We focus on recursive NESL functions.  ... 
doi:10.1109/icpp.2012.21 dblp:conf/icpp/ZhangM12 fatcat:wy54kojepjfxflthzdjeimcbty

Nested data-parallelism on the gpu

Lars Bergstrom, John Reppy
2012 Proceedings of the 17th ACM SIGPLAN international conference on Functional programming - ICFP '12  
NESL is a first-order functional language that was designed to allow programmers to write irregular-parallel programs -such as parallel divide-and-conquer algorithms -for wide-vector parallel computers  ...  While our performance does not match that of hand-tuned CUDA programs, we argue that the notational conciseness of NESL is worth the loss in performance.  ...  We thank the NVIDIA Corporation for their generous donation of both hardware and financial support.  ... 
doi:10.1145/2364527.2364563 dblp:conf/icfp/BergstromR12 fatcat:caeanju54jcrhlv432d56pwvs4

Nested data-parallelism on the gpu

Lars Bergstrom, John Reppy
2012 SIGPLAN notices  
NESL is a first-order functional language that was designed to allow programmers to write irregular-parallel programs -such as parallel divide-and-conquer algorithms -for wide-vector parallel computers  ...  While our performance does not match that of hand-tuned CUDA programs, we argue that the notational conciseness of NESL is worth the loss in performance.  ...  We thank the NVIDIA Corporation for their generous donation of both hardware and financial support.  ... 
doi:10.1145/2398856.2364563 fatcat:qykmre6bmzdexoq2lxkhseweka

Towards a streaming model for nested data parallelism

Frederik M. Madsen, Andrzej Filinski
2013 Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing - FHPC '13  
to be expressed in a space-efficient way, in the sense that memory usage on a single (or a few) processors is of the same order as for a sequential formulation of the algorithm, and in general scales  ...  The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable  ...  This research has been partially supported by the Danish Strategic Research Council, Program Committee for Strategic Growth Technologies, for the research center HIPERFIT: Functional High Performance Computing  ... 
doi:10.1145/2502323.2502330 dblp:conf/icfp/MadsenF13 fatcat:rqkmgka6yvep3evubr7vsfjqou

Copperhead

Bryan Catanzaro, Michael Garland, Kurt Keutzer
2011 SIGPLAN notices  
We define the restricted subset of Python which Copperhead supports and introduce the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations.  ...  We demonstrate the effectiveness of our techniques with several examples targeting the CUDA platform for parallel programming on GPUs.  ...  Firstly, existing C++ compilers provide excellent support for translating well structured C++ to efficient machine code.  ... 
doi:10.1145/2038037.1941562 fatcat:qixj2jws5nbtxbm6k34w5flhfi

Copperhead

Bryan Catanzaro, Michael Garland, Kurt Keutzer
2011 Proceedings of the 16th ACM symposium on Principles and practice of parallel programming - PPoPP '11  
We define the restricted subset of Python which Copperhead supports and introduce the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations.  ...  We demonstrate the effectiveness of our techniques with several examples targeting the CUDA platform for parallel programming on GPUs.  ...  Firstly, existing C++ compilers provide excellent support for translating well structured C++ to efficient machine code.  ... 
doi:10.1145/1941553.1941562 dblp:conf/ppopp/CatanzaroGK11 fatcat:yn7iox4k35atnefmlh4ahu7wom

Scout: a data-parallel programming language for graphics processors

Patrick McCormick, Jeff Inman, James Ahrens, Jamaludin Mohd-Yusof, Greg Roth, Sharen Cummins
2007 Parallel Computing  
While the performance of the GPU is well suited for computational science, the programming interface, and several hardware limitations, have prevented their wide adoption.  ...  In this paper we present Scout, a data-parallel programming language for graphics processors that hides the nuances of both the underlying hardware and supporting graphics software layers.  ...  Special thanks to John Owens and Mike Houston for help with the CPU and GPU performance results presented in Section 1.  ... 
doi:10.1016/j.parco.2007.09.001 fatcat:ydbwxzc4qvhwpmgx5k3uqdtz2a

A Heterogeneous Parallel Framework for Domain-Specific Languages

Kevin J. Brown, Arvind K. Sujeeth, Hyouk Joong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, Kunle Olukotun
2011 2011 International Conference on Parallel Architectures and Compilation Techniques  
However, targeting these emerging devices often requires using multiple disparate programming models and making decisions that can limit forward scalability.  ...  Computing systems are becoming increasingly parallel and heterogeneous, and therefore new applications must be capable of exploiting parallelism in order to continue achieving high performance.  ...  ACKNOWLEDGMENT The authors thank the reviewers for their feedback, Michael Garland for shepherding this paper, and Peter B. Kessler for reviewing drafts.  ... 
doi:10.1109/pact.2011.15 dblp:conf/IEEEpact/BrownSLRCOO11 fatcat:gabjxlvpvbgili7y2fvxfulhzu

Locality-Aware Mapping of Nested Parallel Patterns on GPUs

Hyoukjoong Lee, Kevin J. Brown, Arvind K. Sujeeth, Tiark Rompf, Kunle Olukotun
2014 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture  
To address this issue, we present a general analysis framework for automatically and efficiently mapping nested patterns onto GPUs.  ...  However, the problem of efficiently mapping patterns to GPU hardware becomes significantly more difficult when the patterns are nested, which is common in nontrivial applications.  ...  ACKNOWLEDGEMENTS We thank the anonymous reviewers, Nithin George (EPFL), and Wonchan Lee (Stanford) for their comments and valuable suggestions.  ... 
doi:10.1109/micro.2014.23 dblp:conf/micro/LeeBSRO14 fatcat:scdxpm46ebhbdl2puj7jtal2pm

Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, Cosmin E. Oancea
2017 Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2017  
Futhark is a purely functional data-parallel array language that offers a machine-neutral programming model and an optimising compiler that generates OpenCL code for GPUs.  ...  First, in order to express efficient code inside the parallel constructs, we introduce a simple type system for in-place updates that ensures referential transparency and supports equational reasoning.  ...  code for the various benchmark programs.  ... 
doi:10.1145/3062341.3062354 dblp:conf/pldi/HenriksenSEHO17 fatcat:p4e5vynbhzhb5aoxsa6tecq33q

Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns

Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Arvind K. Sujeeth, Christopher De Sa, Christopher Aberger, Kunle Olukotun
2016 Proceedings of the 2016 International Symposium on Code Generation and Optimization - CGO 2016  
We present experimental results for a range of applications spanning multiple domains and demonstrate highly efficient execution compared to manually-optimized counterparts in multiple distributed programming  ...  To optimize distributed applications both for modern hardware and for modern programmers we need a programming model that is sufficiently expressive to support a variety of parallel applications, sufficiently  ...  Acknowledgments We are grateful to the anonymous reviewers for their comments and suggestions.  ... 
doi:10.1145/2854038.2854042 dblp:conf/cgo/BrownLRSSAO16 fatcat:cye5j5gi3vfgzh7xyku5cfttq4

Region-based memory management for GPU programming languages

Eric Holk, Ryan Newton, Jeremy Siek, Andrew Lumsdaine
2014 Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications - OOPSLA '14  
Indeed, most GPU programming languages limit the user to simple data structures-typically only multidimensional rectangular arrays of scalar values.  ...  Regions enable rich data structures by providing a uniform representation for pointers on both the CPU and GPU and by providing a means of transferring entire data structures between CPU and GPU memory  ...  New parallel constructs are likely necessary in order to most efficiently work with tree structures. Nested data parallel languages like NESL [2] could provide inspiration.  ... 
doi:10.1145/2660193.2660244 dblp:conf/oopsla/HolkNSL14 fatcat:ypxgqggcnzbwtoekkwig7z2qgi

Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code

Michel Steuwer, Christian Fensch, Sam Lindley, Christophe Dubach
2015 SIGPLAN notices  
The performance of the generated OpenCL code is on a par with highly tuned implementations for multicore CPUs and GPUs written by experts.  ...  Starting from a high-level functional expression we apply a simple set of rewrite rules to transform it into a low-level functional representation close to the OpenCL programming model from which OpenCL  ...  Bergstrom and Reppy [3] compile NESL, which is a first-order dialect of ML supporting nested data-parallelism, to GPU code.  ... 
doi:10.1145/2858949.2784754 fatcat:rpvqwd4x75fzpfvllfb7pbong4

Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code

Michel Steuwer, Christian Fensch, Sam Lindley, Christophe Dubach
2015 Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming - ICFP 2015  
The performance of the generated OpenCL code is on a par with highly tuned implementations for multicore CPUs and GPUs written by experts.  ...  Starting from a high-level functional expression we apply a simple set of rewrite rules to transform it into a low-level functional representation close to the OpenCL programming model from which OpenCL  ...  Bergstrom and Reppy [3] compile NESL, which is a first-order dialect of ML supporting nested data-parallelism, to GPU code.  ... 
doi:10.1145/2784731.2784754 dblp:conf/icfp/SteuwerFLD15 fatcat:uwphrburd5a27pebgqrqgu3tmu

Hidp: A hierarchical data parallel language

Yongpeng Zhang, F. Mueller
2013 Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)  
Internally, it performs necessary analysis to compose user code with efficient and architecture-aware code snippets.  ...  HiDP is a sourceto-source compiler that converts a very concise data parallel language into CUDA C++ source code.  ...  All further analysis is performed hierarchically at each statement level. The HiDP front end does not expand dataparallel expressions into for loops throughout the code transformations.  ... 
doi:10.1109/cgo.2013.6494994 dblp:conf/cgo/ZhangM13 fatcat:fparo6cmjzdypdmotsdn7swop4
« Previous Showing results 1 — 15 out of 51 results