A T2 graph-reduction approach to fusion
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing - FHPC '13
Fusion is one of the most important code transformations as it has the potential to substantially optimize both the memory hierarchy time overhead and, sometimes asymptotically, the space requirement. In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms. In imperative languages, fusing producer-consumer loops
... ucer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as "heroic effort" and, if at all, is supported only in its simplest and most conservative form in industrial compilers. Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation. We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1-T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation. We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compilergenerated statistics related to fusion on a set of six benchmarks.