Filters








233 Hits in 2.8 sec

Automatic SIMD vectorization for Haskell

Leaf Petersen, Dominic Orchard, Neal Glew
2013 SIGPLAN notices  
We describe an implementation of automatic SIMD vectorization in a Haskell compiler which gives significant vector speedups for a range of programs written in a natural programming style.  ...  Expressing algorithms using immutable arrays greatly simplifies the challenges of automatic SIMD vectorization, since several important classes of dependency violations cannot occur.  ...  Related and Future Work, Conclusions There is a vast and rich literature on automatic SIMD vectorization in compilers.  ... 
doi:10.1145/2544174.2500605 fatcat:dwl3ggrrzjhppkjjjsif5tixaq

Automatic SIMD vectorization for Haskell

Leaf Petersen, Dominic Orchard, Neal Glew
2013 Proceedings of the 18th ACM SIGPLAN international conference on Functional programming - ICFP '13  
We describe an implementation of automatic SIMD vectorization in a Haskell compiler which gives substantial vector speedups for a range of programs written in a natural programming style.  ...  Expressing algorithms using immutable arrays greatly simplifies the challenges of automatic SIMD vectorization, since several important classes of dependency violations cannot occur.  ...  In this paper, we describe the implementation of an automatic SIMD vectorization optimization pass in the Intel Labs Haskell Research Compiler (HRC).  ... 
doi:10.1145/2500365.2500605 dblp:conf/icfp/PetersenOG13 fatcat:nut5dukzijcifdci7vvqpgn7aq

Exploiting vector instructions with generalized stream fusio

Geoffrey Mainland, Roman Leshchinskiy, Simon Peyton Jones
2013 Proceedings of the 18th ACM SIGPLAN international conference on Functional programming - ICFP '13  
It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed vectors.  ...  Moreover, using DPH, programs can easily exploit SIMD instructions and automatically parallelize to take advantage of multiple cores.  ...  Acknowledgments The authors would like to thank Simon Marlow for his help debugging GHC's runtime system and code generator. We are also grateful for Andrew Fitzgibbon's many insightful comments.  ... 
doi:10.1145/2500365.2500601 dblp:conf/icfp/MainlandLJ13 fatcat:q5gbrauyr5d2vjjsbwi7qcwq2a

Exploiting vector instructions with generalized stream fusio

Geoffrey Mainland, Roman Leshchinskiy, Simon Peyton Jones
2013 SIGPLAN notices  
It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed vectors.  ...  Moreover, using DPH, programs can easily exploit SIMD instructions and automatically parallelize to take advantage of multiple cores.  ...  Acknowledgments The authors would like to thank Simon Marlow for his help debugging GHC's runtime system and code generator. We are also grateful for Andrew Fitzgibbon's many insightful comments.  ... 
doi:10.1145/2544174.2500601 fatcat:saucwgkklbctze4kv6q2bjwzy4

Language abstractions for low level optimization techniques

Gergely Dévai, Zoltán Gera, Zoltán Kelemen
2014 Computer Science and Information Systems  
This paper presents such language abstractions for two well-known optimizations: bitvectors and SIMD (Single Instruction Multiple Data).  ...  Even if compilers are smart nowadays and provide the user with many automatically applied optimizations, practice shows that in some cases it is hopeless to optimize the program automatically without the  ...  We also thank the anonymous reviewers for their comments which helped us improving this paper.  ... 
doi:10.2298/csis130224080d fatcat:n2lm7tfbcrblnie3yfyoaxgj6y

Type-safe runtime code generation: accelerate to LLVM

Trevor L. McDonell, Manuel M. T. Chakravarty, Vinod Grover, Ryan R. Newton
2015 SIGPLAN notices  
In fact, the situation for CPUs is similar once SIMD vector instructions are considered.  ...  Accelerate code embedded into Haskell is not compiled to parallel SIMD code by the Haskell compiler; instead, the Accelerate library includes a runtime compiler that generates parallel SIMD code at application  ... 
doi:10.1145/2887747.2804313 fatcat:amxpznfzajadhhhsjkw6i3fjfy

Towards Hume Simd Vectorisation

Abdallah Al Zain
2009 Zenodo  
We would like to thank our colleagues Andy Wallace and Jing Ye for their collaboration.  ...  characterised SIMD processors; f) inform automatic parallelisation using the analyses derived from the cost models.  ...  General purpose CPUs have provided additional SIMD vectorisation for at least 10 years.  ... 
doi:10.5281/zenodo.41820 fatcat:a7zkbrpqkzcqtbei3joqrgbaxe

Type-safe runtime code generation: accelerate to LLVM

Trevor L. McDonell, Manuel M. T. Chakravarty, Vinod Grover, Ryan R. Newton
2015 Proceedings of the 8th ACM SIGPLAN Symposium on Haskell - Haskell 2015  
In fact, the situation for CPUs is similar once SIMD vector instructions are considered.  ...  Accelerate code embedded into Haskell is not compiled to parallel SIMD code by the Haskell compiler; instead, the Accelerate library includes a runtime compiler that generates parallel SIMD code at application  ... 
doi:10.1145/2804302.2804313 dblp:conf/haskell/McDonellCGN15 fatcat:zps4zwa7jrg3va2n2stjyipfr4

Array languages and theN-body problem

P. Cockshott, Y. Gdura, P. Keir
2013 Concurrency and Computation  
and for exploiting the parallelism that computer vision applications require.  ...  Our group is part of the Computer Vision and Graphics research group and we have for some years been developing array compilers because we think these are a good tool both for expressing graphics algorithms  ...  The fastest implementation reported was one in C++ using TBB for multi-core parallelism and SIMD intrinsics for vector parallelism.  ... 
doi:10.1002/cpe.3088 fatcat:um2yz2l32bb5pnc6sl6csufnsu

Optimising purely functional GPU programs

Trevor L. McDonell, Manuel M.T. Chakravarty, Gabriele Keller, Ben Lippmeier
2013 SIGPLAN notices  
Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs.  ...  Both techniques are well known from other contexts, but they present unique challenges for an embedded language compiled for execution on a GPU.  ...  The arguments to dotp are of plain Haskell type Vector Float.  ... 
doi:10.1145/2544174.2500595 fatcat:duwcm3bo3faydmjth5kvpnpv5y

Optimising purely functional GPU programs

Trevor L. McDonell, Manuel M.T. Chakravarty, Gabriele Keller, Ben Lippmeier
2013 Proceedings of the 18th ACM SIGPLAN international conference on Functional programming - ICFP '13  
Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs.  ...  Both techniques are well known from other contexts, but they present unique challenges for an embedded language compiled for execution on a GPU.  ...  The arguments to dotp are of plain Haskell type Vector Float.  ... 
doi:10.1145/2500365.2500595 dblp:conf/icfp/McDonellCKL13 fatcat:jf5n7oqnejeulejrtedygjjuxa

Financial software on GPUs

Cosmin E. Oancea, Christian Andreetta, Jost Berthold, Alain Frisch, Fritz Henglein
2012 Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing - FHPC '12  
Given the observed difficulty of automatically parallelizing imperative sequential code and the inherent labor of porting hardwareoriented and -optimized programs, our case study suggests that functional  ...  programming technology can facilitate high-level expression of leading-edge performant portable high-performance systems for massively parallel hardware architectures.  ...  Haskell version for multicore platforms.  ... 
doi:10.1145/2364474.2364484 dblp:conf/icfp/OanceaABFH12 fatcat:emwjal43irftxlhsstzlbyqga4

Accelerating Haskell array codes with multicore GPUs

Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, Vinod Grover
2011 Proceedings of the sixth workshop on Declarative aspects of multicore programming - DAMP '11  
Current GPUs are massively parallel multicore processors optimised for workloads with a large degree of SIMD parallelism.  ...  We embed this purely functional array language in Haskell with an online code generator for NVIDIA's CUDA GPGPU programming environment.  ...  We are grateful to Ben Lever and Rami Mukhtar for their very helpful and constructive feedback on Accelerate. We thank the anonymous reviewers for their suggestions on improving the paper.  ... 
doi:10.1145/1926354.1926358 dblp:conf/popl/ChakravartyKLMG11 fatcat:nc73etjhxjgohaieyh4y2yd6zu

Deterministic Parallel Programming with Haskell

D. Coutts, A. Loh
2012 Computing in science & engineering (Print)  
solver for a 1D Poisson equation 1 .  ...  We argue that Haskell and deterministic parallelism are a good match for many computing problems in science and engineering and demonstrate the effectiveness of this approach using the example of a naïve  ...  Finally, there is work under way to make the compiler GHC able to take advantage of CPU SIMD vector instructions, such as Intel's SSE and AVX instruction sets.  ... 
doi:10.1109/mcse.2012.68 fatcat:bb7izs5m6jgbzgqp6wx2pejpve

2D Image Convolution using Three Parallel Programming Models on the Xeon Phi [article]

Ashkan Tousimojarad, Wim Vanderbauwhede, W Paul Cockshott
2017 arXiv   pre-print
After optimising the naive codes using loop unrolling and SIMD vectorisation, we choose the algorithm with better performance as the baseline for parallelisation.  ...  Image convolution is widely used for sharpening, blurring and edge detection. In this paper, we review two common algorithms for convolving a 2D image by a separable kernel (filter).  ...  The reported Ninja gap for the Intel Labs Haskell Research Compiler (HRC) for 8192×8192 images on the Xeon Phi using the single-pass algorithm is 3.7× (for 57 threads) [13] .  ... 
arXiv:1711.09791v1 fatcat:rt7ufla5evf3jfyjvu7wnnnloi
« Previous Showing results 1 — 15 out of 233 results