Filters








2,177 Hits in 7.2 sec

Forma: a DSL for image processing applications to target GPUs and multi-core CPUs

Mahesh Ravishankar, Justin Holewinski, Vinod Grover
2015 Proceedings of the 8th Workshop on General Purpose Processing using GPUs - GPGPU 2015  
Domain-Specific Languages (DSL) tackle both these issues by allowing developers to specify the computation at a high level, allowing the compiler to handle many tedious and error-prone tasks, while generating  ...  The high-level description allows the compiler to generate efficient code through use of compile-time analysis and by taking advantage of hardware resources, like texture memory on GPUs.  ...  Sadayappan from Ohio State University for his comments. Finally, we thank the reviewers of this paper for their helpful comments regarding related work and possible enhancements.  ... 
doi:10.1145/2716282.2716290 dblp:conf/ppopp/RavishankarHG15 fatcat:cwggdu43ufczdjtiz4bluw7bba

Domain-Specific Multi-Level IR Rewriting for GPU [article]

Tobias Gysi, Christoph Müller, Oleksandr Zinenko, Stephan Herhut, Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoefler, Tobias Grosser
2020 arXiv   pre-print
In particular, we develop a prototype compiler and design stencil- and GPU-specific dialects based on a set of newly introduced design principles.  ...  We demonstrate the effectiveness of this approach for the weather and climate domain.  ...  ACKNOWLEDGEMENTS We thank Jean-Michel Gorius for his foundational stencil compiler work and the continuous support of our project.  ... 
arXiv:2005.13014v2 fatcat:3kjj5bdukbemte6yf4zgeq7spq

Physis

Naoya Maruyama, Tatsuo Nomura, Kento Sato, Satoshi Matsuoka
2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11  
Our framework translates the user-written code into actual implementation code in CUDA for GPU acceleration and MPI for node-level parallelization with automatic optimizations such as computation and communication  ...  Experimental results on the TSUBAME2.0 GPU-based supercomputer show that the performance is comparable as hand-written code and good strong and weak scalability up to 256 GPUs.  ...  Acknowledgments We deeply thank Mark Silberstein for valuable discussions on the initial design of the domain-specific approach.  ... 
doi:10.1145/2063384.2063398 dblp:conf/sc/MaruyamaNSM11 fatcat:xzeyaedfojci5hatdiwvbbpb4q

Snowflake: A Lightweight Portable Stencil DSL

Nathan Zhang, Michael Driscoll, Charles Markley, Samuel Williams, Protonu Basu, Armando Fox
2017 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
In this paper we present Snowflake, a Domain Specific Language (DSL) for stencils that uses a "micro-compiler" approach, i.e., small, focused, domain-specific code generators.  ...  Stencil computations are not well optimized by general-purpose production compilers and the increased use of multicore, manycore, and accelerator-based systems makes the optimization problem even more  ...  Similarly, a number of general-purpose compilers and domain specific language projects perform optimizations on stencils from scientific code, but without the full generality needed by real applications  ... 
doi:10.1109/ipdpsw.2017.89 dblp:conf/ipps/ZhangDMWBF17 fatcat:ufctve6gqfhk3bkibdr5ik3m6u

Enabling efficient stencil code generation in OpenACC

Alyson D. Pereira, Rodrigo C.O. Rocha, Márcio Castro, Luís F.W. Góes, Mario A.R. Dantas
2017 Procedia Computer Science  
Therefore, this general-purpose approach delivers good performance on average, but it misses optimization opportunities for code generation and execution of specific classes of applications.  ...  Our results show that our stencil extensions may improve the performance of OpenACC in up to 28% and 45% on GPU and CPU, respectively.  ...  Code 2 exemplifies the use of the PSkel framework to perform on the GPU the same stencil computation (Code 1).  ... 
doi:10.1016/j.procs.2017.05.155 fatcat:lrp2ydzi7ne37oh2nmlqaigr64

Panda: A Compiler Framework for Concurrent CPU $$+$$ + GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

Mohammed Sourouri, Scott B. Baden, Xing Cai
2016 International journal of parallel programming  
Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes from our compiler can achieve about 90% of the performance of highly optimized handwritten codes, for both  ...  Previous domain-specific solutions [42, 5] have outcompeted the generic approach. We have therefore decided to restrict Panda's applicability to 3D stencil computations on structured grids. While  ...  Furthermore, information about the stencil is essential for performing domain-specific code optimizations.  ... 
doi:10.1007/s10766-016-0454-1 fatcat:7qh6satsl5at7h5guohchgjqxa

High performance stencil code generation with Lift

Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, Christophe Dubach
2018 Proceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 2018  
Domain Specific Languages (DSLs) have raised the programming abstraction and offer good performance.  ...  Although stencil computations have been extensively studied, optimizing them for increasingly diverse hardware remains challenging.  ...  Acknowledgments We would like to thank the Lift team; Prashant Singh Rawat for help with the PPCG comparison; Ari Rasch and students of the University of Münster for help with the ATF framewok and its  ... 
doi:10.1145/3179541.3168824 fatcat:4yzhoig3xnfs3bx2ekz3c7wida

High performance stencil code generation with Lift

Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, Christophe Dubach
2018 Proceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 2018  
Domain Specific Languages (DSLs) have raised the programming abstraction and offer good performance.  ...  Although stencil computations have been extensively studied, optimizing them for increasingly diverse hardware remains challenging.  ...  Acknowledgments We would like to thank the Lift team; Prashant Singh Rawat for help with the PPCG comparison; Ari Rasch and students of the University of Münster for help with the ATF framewok and its  ... 
doi:10.1145/3168824 dblp:conf/cgo/HagedornSSGD18 fatcat:7zu6coqa5nd2vk7fiipkdygmnq

Analytical Cost Metrics : Days of Future Past [article]

Nirmal Prajapati, Sanjay Rajopadhye, Hristo Djidjev
2018 arXiv   pre-print
Scientists and researchers are continuously investing in tuning the performance of extreme-scale computational problems.  ...  The architectures are constantly evolving making the current performance optimizing strategies less applicable and new strategies to be invented.  ...  Thus, we formulate the domain-specific optimization problem: simultaneously optimize compilation and hardware/architectural parameters to compile stencil computations to GPUs.  ... 
arXiv:1802.01957v1 fatcat:r6lajnt75zb4xahkznt5gb4wx4

Tiling Optimizations for Stencil Computations Using Rewrite Rules in Lift

Larisa Stoltzfus, Bastian Hagedorn, Michel Steuwer, Sergei Gorlatch, Christophe Dubach
2019 ACM Transactions on Architecture and Code Optimization (TACO)  
Extension of Conference Paper High performance stencil code generation with Lift published at CGO 2018 [22].  ...  We introduce two optimizations that provide high performance for stencils in particular: classical overlapped tiling for multi-dimensional stencils and 2.5D tiling specifically for 3D stencils.  ...  By reusing Lift's existing exploration mechanism, we automatically generate high-performance stencil code for AMD, NVIDIA and ARM GPUs.  ... 
doi:10.1145/3368858 fatcat:hhjefzi2s5fynkknr6n5ydt5m4

Productive Performance Engineering for Weather and Climate Modeling with Python [article]

Tal Ben-Nun, Linus Groner, Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver D. Elbert, Rhea George, Jeremy McGibbon, Lukas Trümper, Elynn Wu, Oliver Fuhrer (+2 others)
2022 arXiv   pre-print
By using a declarative Python-embedded stencil domain-specific language and data-centric optimization, we abstract hardware-specific details and define a semi-automated workflow for analyzing and optimizing  ...  We present a detailed account of optimizing the Finite Volume Cubed-Sphere Dynamical Core (FV3), improving productivity and performance.  ...  We acknowledge contributions from the whole GT4Py team, specifically Hannes Vogt (CSCS) and Enrique Gonzalez (CSCS), for their help in implementing a validating version of FV3 using GT4Py.  ... 
arXiv:2205.04148v2 fatcat:rhvxrwd4frabtboumy53ek7zaq

GPU Support for Automatic Generation of Finite-Differences Stencil Kernels [article]

Vitor Hugo Mickus Rodrigues, Lucas Cavalcante, Maelso Bruno Pereira, Fabio Luporini, István Reguly, Gerard Gorman, Samuel Xavier de Souza
2019 arXiv   pre-print
We embed it with the Oxford Parallel Domain Specific Language (OP-DSL) in order to enable automatic code generation for GPU architectures from a high-level representation.  ...  Graphical processing units (GPUs) are an attractive architectural target for stencil computations because of its high degree of data parallelism.  ...  The optimization of regular grid and stencil computations has also produced a vast range of libraries and DSLs that aim to ease the efficient automated creation of high-performance codes [4, 5, 9, 16]  ... 
arXiv:1912.00695v1 fatcat:w2z25hm3ujchrd3ke5wcjkik4y

Auto-generation and auto-tuning of 3D stencil codes on GPU clusters

Yongpeng Zhang, Frank Mueller
2012 Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12  
and generates the code with optimal parameter configurations for different GPUs.  ...  This paper develops and evaluates search and optimization techniques for auto-tuning 3D stencil (nearest-neighbor) computations on GPUs.  ...  NVIDIA GPUs ranging from consumer-end graphics card to high performance computing GPUs.  ... 
doi:10.1145/2259016.2259037 dblp:conf/cgo/ZhangM12 fatcat:fly3dlcqnrdbnbneeab4ylcazy

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Phillip Colella, Mary Hall
2017 Parallel Computing  
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing.  ...  GPU-based architectures as well as for a multiple stencil discretizations and smoothers.  ...  These efforts can be classified as programming language extensions to target GPUs, domain-specific languages that optimize stencil computations for both GPUs and multicores, and finally, code generators  ... 
doi:10.1016/j.parco.2017.04.002 fatcat:cmhgr2cobzdzhkbb4h5ylk22hm

Autogeneration and Autotuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters

Yongpeng Zhang, Frank Mueller
2013 IEEE Transactions on Parallel and Distributed Systems  
and generates the code with optimal parameter configurations for different GPUs.  ...  This paper develops and evaluates search and optimization techniques for auto-tuning 3D stencil (nearest-neighbor) computations on GPUs.  ...  The contributions of this paper are: • We abstract a wide variety of stencil computations into a set of domain-specific specifications.  ... 
doi:10.1109/tpds.2012.160 fatcat:ps3snatb6vdehct2hlfrviav4e
« Previous Showing results 1 — 15 out of 2,177 results