Filters








1,516 Hits in 3.4 sec

Efficient and Correct Stencil Computation via Pattern Matching and Static Typing

Dominic Orchard, Alan Mycroft
2011 Electronic Proceedings in Theoretical Computer Science  
As a programming pattern, stencil computations are highly regular and amenable to optimisation and parallelisation.  ...  Stencil computations, involving operations over the elements of an array, are a common programming pattern in scientific computing, games, and image processing.  ...  , and Tomas Petricek for insightful comments and feedback.  ... 
doi:10.4204/eptcs.66.4 fatcat:l2xdwxmoiravtkywuz4rxwmsyq

Ypnos

Dominic A. Orchard, Max Bolingbroke, Alan Mycroft
2010 Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming - DAMP '10  
, static analysis to generate optimised, parallel implementations.  ...  We introduce the language and provide some discussion on the theoretical aspects of the language semantics, particularly the structuring of computations around the category theoretic notion of a comonad  ...  Many thanks to Tom Schrijvers for various insights and help with paper, and to Marcelo Fiore for many interesting discussions.  ... 
doi:10.1145/1708046.1708053 dblp:conf/popl/OrchardBM10 fatcat:f7hqph5s45aw3i3f6zeyjjoxsa

Poster reception---Scalable compression and replay of communication traces in massively parallel environments

Michael Noeth, Jaydeep Marathe, Frank Mueller, Martin Schulz, Bronis de Supinski
2006 Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06  
We introduce intra-and inter-node compression techniques of MPI events and present results of our implementation for BlueGene/L.  ...  Characterizing the communication behavior of largescale applications is a difficult and costly task due to code and system complexity as well as their long execution times.  ...  A characterization of MPI communication patterns for the NAS parallel benchmarks has determined that communication end-points are, if not static, almost exclusively persistent and hardly even dynamic  ... 
doi:10.1145/1188455.1188605 dblp:conf/sc/NoethMMSS06 fatcat:zp2zc6dcv5f3xkw5a46cdxukya

Efficient parallel stencil convolution in Haskell

Ben Lippmeier, Gabriele Keller
2011 Proceedings of the 4th ACM symposium on Haskell - Haskell '11  
Stencil convolution is a fundamental building block of many scientific and image processing algorithms.  ...  We present a declarative approach to writing such convolutions in Haskell that is both efficient at runtime and implicitly parallel.  ...  Acknowledgements Thanks to Rami Mukhtar and Ben Lever for writing the original Canny Edge Detection code, Roman Leshchinskiy for suggesting the use of cursored arrays, and Simon Peyton Jones for describing  ... 
doi:10.1145/2034675.2034684 dblp:conf/haskell/LippmeierK11 fatcat:isvr7myqpjfdlbib3mcah7anf4

Efficient parallel stencil convolution in Haskell

Ben Lippmeier, Gabriele Keller
2012 SIGPLAN notices  
Stencil convolution is a fundamental building block of many scientific and image processing algorithms.  ...  We present a declarative approach to writing such convolutions in Haskell that is both efficient at runtime and implicitly parallel.  ...  Acknowledgements Thanks to Rami Mukhtar and Ben Lever for writing the original Canny Edge Detection code, Roman Leshchinskiy for suggesting the use of cursored arrays, and Simon Peyton Jones for describing  ... 
doi:10.1145/2096148.2034684 fatcat:zegqgfinhbhsjbehu3kkfh3b34

Protocols by Default [chapter]

Nicholas Ng, Jose Gabriel de Figueiredo Coutinho, Nobuko Yoshida
2015 Lecture Notes in Computer Science  
The code generation framework also integrates an optimisation method that overlaps communication and computation, and can derive not only representative parallel programs with common parallel patterns  ...  (such as ring and stencil), but also distributed applications from any MPST protocols.  ...  We thank Raymond Hu, Dominic Orchard and the anonymous reviewers for comments and suggestions.  ... 
doi:10.1007/978-3-662-46663-6_11 fatcat:6alwdcibi5ajba2uizaywdcdoi

The four Rs of programming language design

Dominic Orchard
2011 Proceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software - ONWARD '11  
Thanks are also due to Ellie Beagley, Robin Message, Alan Mycroft, Tomas Petricek, and James Willmoth for helpful discussions and comments on a draft of this essay, and to the Cambridge Programming Research  ...  Any remaining infelicities and errors are entirely my own.  ...  Further, as the types of operations are known, and are strict, the domains and ranges of computations can be matched, allowing equational rewriting to be safely applied.  ... 
doi:10.1145/2089131.2089138 dblp:conf/oopsla/Orchard11 fatcat:2simdfvrefhc7k6cdc2r3j2guy

Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization

Roberto Belli, Torsten Hoefler
2015 2015 IEEE International Parallel and Distributed Processing Symposium  
We also evaluate our implementation on three real-world benchmarks: a stencil computation, a tree computation, and a Cholesky factorization implemented with tasks.  ...  This scheme enables direct and efficient synchronization with a minimum number of messages.  ...  We thank James Dinan (Intel), Jeff Hammond (Intel), Kathy Yelick (LBNL), Edgar Solomonik, Timo Schneider, and Salvatore Di Girolamo for helpful discussions, Larry Kaplan (Cray) for help with uGNI, and  ... 
doi:10.1109/ipdps.2015.30 dblp:conf/ipps/BelliH15 fatcat:vihdr3456zd5popdbgdd5h7hdi

ATM: Approximate Task Memoization in the Runtime System

Iulian Brumar, Marc Casas, Miquel Moreto, Mateo Valero, Gurindar S. Sohi
2017 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)  
Multiple factors contribute to these unnecessary computations, such as repetitive inputs and patterns, calling functions with the same parameters or bad programming habits.  ...  When evaluated on a real 8-core processor with applications from different domains (financial analysis, stencil-computation, machine-learning and linear-algebra), ATM achieves a 1.4x average speedup when  ...  Figure 4 . 4 Correctness with static and dynamic Approximate Task Memoization (ATM).  ... 
doi:10.1109/ipdps.2017.49 dblp:conf/ipps/BrumarCMVS17 fatcat:l225x4vjb5gerdyf3ot4x4j47i

Domain-Specific Multi-Level IR Rewriting for GPU [article]

Tobias Gysi, Christoph Müller, Oleksandr Zinenko, Stephan Herhut, Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoefler, Tobias Grosser
2020 arXiv   pre-print
In particular, we develop a prototype compiler and design stencil- and GPU-specific dialects based on a set of newly introduced design principles.  ...  These IRs are usually low-level and close to machine instructions.  ...  ACKNOWLEDGEMENTS We thank Jean-Michel Gorius for his foundational stencil compiler work and the continuous support of our project.  ... 
arXiv:2005.13014v2 fatcat:3kjj5bdukbemte6yf4zgeq7spq

Productive Performance Engineering for Weather and Climate Modeling with Python [article]

Tal Ben-Nun, Linus Groner, Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver D. Elbert, Rhea George, Jeremy McGibbon, Lukas Trümper, Elynn Wu, Oliver Fuhrer (+2 others)
2022 arXiv   pre-print
This coupling stems from using imperative languages that hard-code computation schedules and layout.  ...  By using a declarative Python-embedded stencil domain-specific language and data-centric optimization, we abstract hardware-specific details and define a semi-automated workflow for analyzing and optimizing  ...  The authors also wish to acknowledge the support from the PASC program (Platform for Advanced Scientific Computing) for the DaceMI project.  ... 
arXiv:2205.04148v2 fatcat:rhvxrwd4frabtboumy53ek7zaq

Position-dependent arrays and their application for high performance code generation

Federico Pizzuti, Michel Steuwer, Christophe Dubach
2019 Proceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing - FHPNC 2019  
This paper shows how these challenges are addressed by extending the existing Lift type system and compiler.  ...  This paper presents an extended array type that lifts this restriction.  ...  Acknowledgments We thank Larisa Stoltzfus for her help with plotting results and Bastian Hagedorn for advice with the Lift stencil codes.  ... 
doi:10.1145/3331553.3342614 dblp:conf/icfp/PizzutiSD19 fatcat:smkgmq3zmncobcmu5s54aeq2iq

Modeling Stencil Computations on Modern HPC Architectures [chapter]

Raúl de la Cruz, Mauricio Araya-Polo
2015 Lecture Notes in Computer Science  
Stencil computations are widely used for solving Partial Differential Equations (PDEs) explicitly by Finite Difference schemes.  ...  Performance models help expose bottlenecks and predict suitable tuning parameters in order to boost stencil performance on any given platform.  ...  They developed a set of formulas via regression analysis to model the overall performance on 7 and 27-point Jacobi and Gauss-Seidel computations.  ... 
doi:10.1007/978-3-319-17248-4_8 fatcat:giiippdb2jcivb7jatpm2zmxvm

Automatic Matching of Legacy Code to Heterogeneous APIs

Philip Ginsbach, Toomas Remmelg, Michel Steuwer, Bruno Bodin, Christophe Dubach, Michael F. P. O'Boyle
2018 Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '18  
We focus on calculations that are well supported by established APIs: sparse and dense linear algebra, stencil codes and generalized reductions and histograms.  ...  CCS Concepts • Computer systems organization → Heterogeneous (hybrid) systems; • Software and its engineering → Domain specific languages; ACM Reference Format:  ...  /1) and the University of Edinburgh.  ... 
doi:10.1145/3173162.3173182 dblp:conf/asplos/GinsbachRSBDO18 fatcat:d23trzn4ujdgvbynkzmruqzj54

Static macro data flow: Compiling global control into local control

Pritish Jetley, Laxmikant V. Kale
2010 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)  
Inter-object interactions are realized through the production and consumption of data. The compiler infers communication patterns between objects and generates appropriate messaging code.  ...  We present our work in the context of Charisma, a language that describes global data and control flow through a simple script-like language.  ...  Stencil Computation Stencil computations and other successive overrelaxation techniques find use in many scientific computing domains, chief among them the solution of linear systems.  ... 
doi:10.1109/ipdpsw.2010.5470944 dblp:conf/ipps/JetleyK10 fatcat:q5gmcet7fzbnlauqx3q2js2n7a
« Previous Showing results 1 — 15 out of 1,516 results