Exploiting vector instructions with generalized stream fusio
Proceedings of the 18th ACM SIGPLAN international conference on Functional programming - ICFP '13
Stream fusion  is a powerful technique for automatically transforming high-level sequence-processing functions into efficient implementations. It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed vectors. However, some operations, like vector append, still do not perform well within the standard stream fusion framework. Others, like SIMD computation using the SSE and AVX instructions available on modern x86 chips, do not seem to fit in
... the framework at all. In this paper we introduce generalized stream fusion, which solves these issues. The key insight is to bundle together multiple stream representations, each tuned for a particular class of stream consumer. We also describe a stream representation suited for efficient computation with SSE instructions. Our ideas are implemented in modified versions of the GHC compiler and vector library. Benchmarks show that high-level Haskell code written using our compiler and libraries can produce code that is faster than both compiler-and hand-vectorized C. Everything we describe is fully implemented in GHC and its libraries (Section 4). Our benchmarks compare to the very best C and C++ compilers and libraries that we could find. Remarkably, our benchmarks show that choosing the proper stream representations can result in machine code that beats compiler-vectorized C, and that is competitive with hand-tuned assembly. Moreover, using DPH, programs can easily exploit SIMD instructions and automatically parallelize to take advantage of multiple cores.